We often think of international assessments as being synonymous with PISA, the OECD international assessment that has been the focus of many shows in FreshEd’s mini-series on global learning metrics.
But international assessments have a history far beyond PISA. In fact, it was the International Association for the Evaluation of Educational Achievement, known as the IEA, that first introduced large-scale comparative studies of educational systems in the late 1950s.
This history is important to consider when thinking about global learning metrics today.
My guest today is Dirk Hastedt, Executive Director of the IEA. He’s spent many years working with the IEA, seeing the development of assessments in new subjects, such as citizenship and computer literacies, and the emergence of league tables, which rank education systems and have become popular today. Drik offers valuable insight for any discussion on the feasibility or desirability of global learning metrics.
Citation: Hastedt, Dirk, interview with Will Brehm, FreshEd, 49, podcast audio, November 7, 2016. https://freshedpodcast.com/dirkhastedt/
Will Brehm 1:54
Dirk Hastedt, welcome to FreshEd.
Dirk Hastedt 1:56
Thank you, Will. It’s a pleasure for me.
Will Brehm 2:00
The history of international assessments reveals many different actors and debates about the role and purpose and presentation of assessments that we sometimes forget today. I mean, today, we often talk about PISA or TIMSS, but we don’t remember the history of those tests and international testing in general. Can you give us a brief kind of background to the history of international assessments?
Dirk Hastedt 2:26
Sure, Will. It’s a pleasure. I think it’s also very important to go back and think about what it’s all about. Well, the IEA’s origin dates back to 1958 already. And it started as a collaborative venture between measurement scholars, educational psychologists, sociologists, and psychometricians from different countries. They first gathered at the UNESCO Institute for Lifelong Learning in Hamburg, and then also in London, they met a couple of times. And the idea was to exchange ideas about educational processes in order to improve educational systems. And at that time, little was known about the process of education across the world. And the founders of the IEA proposed to engage in cross-national comparisons to student achievement, as they believe that the country’s educational systems can better be understood when comparing educational systems of different countries. And their opinion was that surveys must take into account not only inputs but both input and outcomes of educational processes. And outcomes were defined quite broadly, as achieved knowledge but also as attitudes and participation in education, which now is also another focus that we have in developing countries. And their interest was to collect data that would enable to better understand educational systems and to identify relevant factors that had an impact on student learning. So, the target of their analysis was notably the educational system, not the individual student. And to facilitate further cooperation of researchers, finally, the IEA became a legal entity in 1967. And that’s when all of this started.
Will Brehm 4:25
A legal entity of a particular country?
Dirk Hastedt 4:29
Well, that’s an interesting question. At that time, the researchers tried to found an umbrella organizations to do international research with international people from around the world. And so, they finally decided that this is only possible under the Belgium law. So, it first was emerged as a Belgium association with board members from different countries actually.
Will Brehm 4:59
And you said that originally it was focused on improving education systems, not the achievement of individual students. So, has that changed over time?
Dirk Hastedt 5:12
No, not really. The focus is still on the system level. But the focus always was on comparing systems, and not as we sometimes see today, the competition of systems. So, who is first in league tables? That was never the intention, and I think it still should not be. The question was, what can we learn from other countries’ practices? And in that sense, at that time, it was unclear if cross- cultural and cross language assessment at the beginning were feasible at all. So, they first conducted a study just to prove that this kind of assessment makes sense at all in general.
Will Brehm 6:05
Can you tell us a little bit about that first test? What exactly was uncovered and found?
Dirk Hastedt 6:11
Oh, that’s a good question. The first study was the pilot 12 country study. And that was done in the 1960s beginning. And the subjects included mathematics, reading comprehension, geography, science, and also non-verbal abilities. And the first study, however, started with mathematics because, at that time, the researchers also believed that the subject mathematics would be most language and cultural independent, and consequently, easiest to assess in international study. Later on, the six subject study that was conducted in 1970, and 71, expanded also the scope to science, reading comprehension, literature, education, English as a foreign language, and civic education. So, you can see that the target was very broadly defined in the beginning.
Will Brehm 7:18
And is that target still broadly defined today?
Dirk Hastedt 7:23
Well, there are different projections taking place. One is it’s always a question of what’s a focus in a particular time. So, for example, in the 1990s and late 80s, computers in education also became a focus. So, the IEA, at that time, started a ‘Computers in Education Study,’ COMPED, which was conducted in 1989 and 92, and provided data on the educational use of computers. This trend also was followed later on with our computer information study size and also the current International Computer Information Literacy study. And also, political changes in the world of the 1970s gave rise to the subject of citizenship. As a response, the IEA conducted, at that time, the civic education study, CIVED, and it investigated civic knowledge and engagement and policies and practices.
Will Brehm 8:39
So, different subjects or topics become of interest to the IEA depending on what’s going on in the world. So, today, what’s going on in the world? And what are the topics that are kind of trending?
Dirk Hastedt 8:56
Well, actually, there’s a little shift also in the 1990s, which we sometimes called the empirical shift also. And at that time, international large-scale assessment started to be used increasingly by policymakers, also outside the domain of education. And also, at that time, an extension to more developing countries took place. Before, these studies were more academic studies, and support from policy side was not that big. But at that time, this whole environment changed. And on one hand, this also had a change of the focus in skills like reading and numeracy, which are seen as preconditions for further learning of students. So, without reading abilities, textbooks and other areas can’t be read and understood. But also, economists were interested in relating educational outcomes to economic wealth. And this also resulted in a focus and change of interest of subjects to subjects that are useful for employability and regarded as preconditions for economic wealth. And here again, reading, numeracy, and also science became a focus of international assessments.
Will Brehm 10:32
So, it sounds like these assessments, they change for all sorts of reasons over time, and different actors maybe get more influence inside the testing agencies. And one of the moments that you referenced earlier was when league tables came about. And you were part of the IEA when this debate was happening. Can you give us a little insight about what were the different sides of the debate to either include or exclude league tables?
Dirk Hastedt 11:08
Oh, that’s a very good question. League tables were highly debated also in the studies, and that most of these debates took place in the 1980s, actually already, and beginning of 1990s. The IEA was always interested in understanding educational processes, and which education system is first or second or seventh doesn’t matter too much, actually. And it’s also not very informative for most policymakers. So, consequently, in the early studies, the IEA produced reports and related background instruments and background information to achievement but did not produce league tables. But then, the IEA realized that if they don’t do it, other people created league tables. And we found league tables in magazines, newspapers, and also in some academic journals. But these league tables were also then created in a wrong way. So, they were wrong. So, at that time –
Will Brehm 12:21
How were they wrong?
Dirk Hastedt 12:23
Well, we have a very complex system for assessing the achievement. For example, we use rotated booklet. That means that we want to cover different areas, and consequently, not all students are getting all items, because then every student would have to sit down for six or 10 hours to cover all the areas, which we, of course, do not do. So, that’s one thing. And secondly, also, we are taking a random sample of schools. And then, within the schools, we are sampling one or two classrooms with students. And the researchers call this a cluster sample. And a cluster sample is very different psychometrically from having a simple random sample. And we use different mathematical procedures to calculate, for example, standard errors, or if differences are statistically significant different. And when not considering this, you get wrong results, and you consider results different that are statistically not different. So, there are some parts that really require a good and deep understanding of the projects and its procedures.
Will Brehm 13:46
So, these league tables that weren’t necessarily considering all these statistical and methodological considerations as the IEA was considering. These league tables were basically being produced with potentially wrong information. And so, the IEA decided that it needed to jump in and actually produce accurate tables?
Dirk Hastedt 14:08
That is right. So, when we saw that other people are producing, obviously, these league tables anyway, because of interest. Then the IEA decided, well, if it’s done, then it should be done correctly. So, we decided to include this also in the report.
Will Brehm 14:25
So, over the last 30 years or so, when these league tables have been included in the report, what sort of effects or outcomes have you seen being used of these league tables? Because this is, for the most part in like the popular press, this is what we read about. We read about which school system or country is ahead of another country. And so, these league tables have really, in a way, have become the defining feature of these international assessments.
Dirk Hastedt 14:56
Yeah, well, I think there are different components and different things that our international large-scale assessments are good for. On one hand, of course, this international large-scale assessment describes the status quo in terms of how educational practice in a country is conducted. And this helps researchers and policymakers to compare a country to other countries. And then there’s a comparative perspective. So, identifying differences in ways in which education is organized and practiced across cultures and societies. Then this gives you an understanding of where you are actually. Secondly, I think international assessments like our Trends in International Mathematics and Science Study (TIMSS), or the Progress in International Reading Literacy Study (PIRLS), or the ICILS study, and the Civic and Citizenship Education study that I mentioned, they are measuring trends. And this helps policymakers and researchers to understand changes in educational outcomes and processes, also in comparison to other countries’ development. And especially when curricula or educational policies are changed, it is important to monitor changes. Imagine a captain of a big ship on the ocean. Which captain would not look at his instruments or look outside the window to see where his ship is heading to? And I think there is also the value of maybe also the league tables if you take them and interpret them correctly, in its context, and also looking at trends over time.
Will Brehm 16:42
So, looking at these trends and monitoring and describing the status quo, have these helped policymakers, do you think?
Dirk Hastedt 16:53
Oh, sure. I think policymakers also have seen different impacts from these studies, actually. And it depends very much on the country itself, actually. Since we are research organizations, we are conducting these studies, and we also help countries understanding and interpreting the data. But we do not give any policy recommendations from our side. But we help the countries to do that. And the focus in different countries is very different. We have some countries which, even before the actual assessment start, when we look at the curriculum, they realized that their curriculum is not in line too much with other countries curricula in terms of curriculum expectations. So, some countries, when they looked at this, changed the curriculum expectations and have now higher expectations for the students to be also international comparable. Other countries looked at different sub-populations. For example, immigrants, or they looked at boy-girl differences. So, there’s a lot of questions around equity in education. And policymakers, after looking at the results, took measures to provide more equal learning opportunities to students. One example that we can see is that there was always a discussion about differences of boys and girls in mathematics and science in particular. And 20 years ago, in most countries, boys were performing much better than girls actually. So, researchers and policymakers asked, is this a natural given that boys are better in these areas? And actually, it is not. But it depends very much on motivation, and what are the textbooks, and what are the examples that they used in textbooks? Are they also engaging for girls? So, in a lot of countries, there’s now a shift also to have more relevant and interesting materials that also engaged student girls more in science and mathematics education with the outcome that now the girls are doing as good as boys in most countries. And in a lot of countries, girls are even outperforming the boys in mathematics and science. So, there’s a lot of different conclusions that policymakers have drawn from looking at the results of these international large-scale assessments like the IEA studies.
Will Brehm 20:01
Have there ever been any misuses by policymakers of large-scale international assessments?
Dirk Hastedt 20:10
Misuses. Well, I think it’s not only maybe a question of misusing the data, but maybe also misunderstanding the data. What we do is we do statistically analysis. And there’s always a certain error margin around it. So, coming back to this league tables, again, some of the misunderstandings around these league tables by policymakers, or also the general public, are that if a country is one score point better on a scale with a mean of 500 and a standard deviation of 100, it is better. But we always have a measurement error and sampling error around it. So, there’s some variance around it. So, if one country’s achievement is one score point better than another one, or if one country’s achievement increased by one score point, that doesn’t mean anything, actually. But it might just be a question of the error term around it. So, actually, no change has happened, but it appears as if there would be an increase. But this is statistically not significant, and consequently, there are some misinterpretations of these results if you look only at league tables.
Will Brehm 21:48
So, in your experience, how many policymakers actually understand the statistics behind these studies?
Dirk Hastedt 21:55
Well, probably not many policymakers. And actually, I don’t think it’s something that they need to understand, actually. But I think it’s also something that researchers in the field have to explain to policymakers. And this is probably one of the crucial aspects to communication between researchers and policymakers. Policymakers have their perspective, and they want to have a clear and straight answer. But educational systems are quite complex. And if you look at the outcome of the studies and what researchers think about it, their answer is usually not that simple. And simplifying the results of educational studies in a way that it’s not over-interpreting the results of the studies is very difficult and challenging. So, I think we need more and more to pay attention about how to communicate results and a good communication between researchers who have the technical terms, and policymakers and the general public.
Will Brehm 23:17
But of course, we also have to be realistic that policymakers are navigating domestic politics and may use these international assessments to further their own interests, even if they use a finding in a wrong way or something like that. That will happen, and I think many researchers have shown that that happens.
Dirk Hastedt 23:41
I think you’re right. There’s also, maybe some misuse from policymaker side. But my experience, actually, is that this happens sometimes in a way that policymakers want to look good from the outside. So, they want to have good results for their educational systems, which I think is very natural. Everyone wants to have good results from the work that you do. But I think when it comes to making use of the results, there’s a strong focus from a lot of policymakers that they really want to improve their system. And this can be only done if you really interpret the results correctly. So, maneuvering blind and pretending you’re good might be good for that day, if there’s a day of election, but in a long-term process, you need to understand the educational system. And I think this is also what policymakers understand. That they use the data, and we call it “data-driven policies,” to look what are the strengths and weaknesses of their systems, and then, by that, also try to improve their systems. And if you look at policies today, they really make use of the data with the aim to improve educational systems. And this is, I think, what policymakers are mostly doing.
Will Brehm 25:20
What I find so interesting about the IEA and the history that you’ve recounted is that in the beginning in 1958, doing these cross-national assessments was very much an academic pursuit. You know, is it possible? Can we statistically compare education systems in different countries? And it seems, over time, it’s shifted to being not only an academic pursuit but also very much an issue of policy, and it seems like the role of the IEA, in a way, has slightly changed.
Dirk Hastedt 26:03
That’s a good point, I think. I don’t know if the role of IEA has changed. But it’s clear that we have member institutions, and these member institutions are sometimes research institutes from different countries, and we have member institutions from more than 60 countries. But it’s also people from ministries of education, or institutes that are connected directly to the ministries of education. And surely, they want to learn which policies can help improving achievement in their countries, which is very different from maybe the researchers point who for purely academic intents how … want to understand how educational processes are working and to understand these processes more in-depth.
Will Brehm 27:05
It’s also interesting that originally the test was -mathematics was the subject. Because, as you said, it was, in a sense, “easier” to control for context and local variation, and language variation, for that matter, around the world. But today, you’ve now talked all about all these different subject areas that are tested. How has the issue of context and variation been controlled in these various tests?
Dirk Hastedt 27:40
Oh, that’s a good question. What we are using in IEA is, we are looking at what we call the curriculum model. So, we are looking at the intended curriculum in the different countries. So, what they have written as policy documents, what should be taught in schools, and what are the aims. Then we look at the implemented curriculum. So, what teachers are actually doing in the different countries. And finally, we look at the attained curriculum. So, what have students learned. And this is what we measure with our assessments. But there are also these other fields like the intended and implemented curriculum, and we look at this whole process. And of course, when curricula are changing in participating countries, then also, there is a slight shift in what we are measuring in our international assessment. And I think what’s also very important is to understand that in these studies, we are not only having an assessment, but we also have a huge amount of background variables. So, when we are doing one of this assessment, we are looking at system- level information. So, the curriculum and other relevant information. We have a questionnaire to the school principals of the schools where the students are in about how education is organized, what’s the school’s size and background instruments on that. We have questionnaires to the teachers of the actual students so that we can directly relate teachers’ attitudes, perceptions, and background to student achievement. And then, we also have background questionnaires for all the students. And in grade four, we also have questionnaires to the students’ parents. So, we have huge amounts of background information to analyze what are the factors related to student outcome. And outcome are still not only achievement but also opinions, self-confidence, etc.
Will Brehm 30:14
And this wide range of measures allows IEA, or the tests, to compare across nations and account for the cultural variation?
Dirk Hastedt 30:34
Well, the cross-country cultural comparisons are, of course, always challenging. Education always takes place in a different culture, in different societies, with a history and with a background. But what we can see is that there are a lot of similarities across countries. Sometimes more than you would expect at first glance. Let me cite one of the results from one of our recent studies, which was the Computer Information Literacy study, ICILS, where we looked at what matters most for the usage of computers and computer information literacy of the students. And we had 22 educational systems taking part in that study. And we found that in all these educational systems, the crucial points were the teachers and the teacher education and what they think about computers, and if they want to use computers, and how self-confident they were in using computers in their teaching. And that was the same across all 22 different educational systems. And we had participants from Europe, from Latin America, from Asia. So, for more or less all around the world, and you see that you find the same patterns across different cultures, which was a surprise to us. But also, probably, to a lot of countries because we often hear that, “Well, our educational system is very different. And we have a problem in our country that’s probably very unique.” And when you look at these international assessments, you can see that a lot of countries are facing very similar problems.
Will Brehm 32:45
So, you’ve been involved with the IEA for quite some time, and you’ve mentioned all of these different shifts that have happened. So, in the 1970s, the focus became citizenship, and in the 1980s, it became computers, and in the 1990s, you talked about this empirical shift, I wanted to see -kind of reading the tea leaves- do you see any shifts occurring now? Or do you envision any future shifts about what the focus will be for these international assessments?
Dirk Hastedt 33:20
Well, on one hand, I think that the current process will continue. But on the other hand, I think a huge influence currently is with the UN and the Declaration for the Sustainable Development Goals, where the UN declarations set under target four is concerning with education. So, there’s also in all member countries also UN which means mostly all the world. There’s a common agreement that education plays a more and more important role. And when we look at the target four of the Sustainable Development Goals, one is universal primary and more and more secondary education. And here I see, for example, also a shift on the UN side, which before looked mostly on enrollment rates, now also target minimum standards of outcomes in literacy and numeracy for all countries. So, I see that there will be an expansion also to more developing countries, but still with a focus on numeracy and literacy abilities. But under the target 4.7, it also targets, for example, global citizenship and sustainable development. So, it’s also seen that this is an increasingly important aspect, which we see. And I think that’s a very, very important thing to understand that education is not only with means of educating the future workforce, but education is much broader and also helps us to live together in peace and to also understand the needs of future generations. And there I see also a shift on looking also at global citizenship competencies around the world.
Will Brehm 35:54
Well, Dirk Hastedt, thank you so much for joining FreshEd.
Dirk Hastedt 35:58
Thank you very much. It was a pleasure to meet.
History and development of international assessments