Today’s topic: the dreaded subject of tests.
I hated tests growing up. They made me feel physically ill.
But we aren’t going to look at the types of tests I disliked so much, those given by a teacher to her or his students. We aren’t even going to look at standardized tests administered across one country.
Instead, today’s show focuses on tests that are administered around the world. We call these types of tests international large-scale assessments. One of the most popular today is called PISA — the Programme for International Student Assessment. PISA tests 15-year-old’s scholastic performance on mathematics, science, and reading. The latest test, in 2015, was administered in 72 countries.
Think for a moment of how complex it must be to create, administer, and interpret PISA across 72 countries. The test must be reliable in different contexts; it must successfully recruit national government officials to help collect data; and it must rely on a small army of statisticians to discern what the test results actually mean. For many, the benefit of a test like PISA is that it allows governments to make evidence based policy. After learning where its students sit globally, education officials from one country can enact new and hopefully better policies to improve student learning.
Sounds good, right?
But that’s the whole story. Cross-national assessments have produced countless controversies — some within specific countries and others in the academic literature.
With me today is Gustavo Fischman. He’s been studying this subject for some time. In November 2016, he helped organize a symposium at Arizona State University looking at these so-called “global learning metrics.” You might remember a few FreshEd podcasts on the subject. He has also recently co-written a working paper for the Open Society Foundation on the topic, which will be released later this year.
Gustavo Fischman is a professor of educational policy and director of edXchange the knowledge mobilization initiative at the Mary Lou Fulton Teachers College at Arizona State University.
Citation: Fischman, Gustavo, interview with Will Brehm, FreshEd, 70, podcast audio, April 24, 2017. https://www.freshedpodcast.com/gustavofischman/
Transcript, Translation, Resources:
Will Brehm: 2:00
Gustavo Fischman, welcome to FreshEd.
Gustavo Fischman: 2:03
Thank you for the invitation. I’m really pleased to be sharing these ideas, our global learning metrics with the audience of FreshEd.
Will Brehm: 2:16
So, you and a team of people have been working on global learning metrics, you organized the symposium last year, and you’ve now been working on a working paper for the Open Society Foundations on the subject. So, you have quite a lot of knowledge, I would imagine of these global learning metrics. But before we jump into that topic, I want to, in a sense, take a step back and just talk a little bit about international large-scale assessments. What are international large-scale assessments? We hear a lot about it in the comparative education world, but for, you know, people outside of that field, what exactly are these assessments?
Gustavo Fischman: 3:00
Since the mid-1960s, our attempts to try to measure in across national and comparative efforts, the state of access to education, first it was how many kids were incorporated and how many students were incorporated into the educational systems worldwide. And later, it was access plus learning. So those international comparative assessments of access into education and access into what type of learning education are broadly categorized as international large-scale assessments.
Will Brehm: 3:52
Right. So, I mean, I get when we think about access to education, that seems like a fairly straightforward measurement, right? Like if someone is in school, then that counts as access. Learning seems like it would be a lot harder to measure.
Gustavo Fischman: 4:13
Yes, that’s the answer. Education is a local phenomenon. And the different ways of learning the different emphasis of national curriculums and local curriculums, the different styles that characterize, you know, national systems of education are really difficult to measure. And so for the last four years, we are dealing and trying to improve the measures and the metrics to try to understand and to see to what extent is possible to make comparisons about, you know, they have the depth of educational systems.
Will Brehm: 5:03
So, what sort of ways have these large scale assessments attempted to measure learning?
Gustavo Fischman: 5:10
The first ones were looking and solve. These are large number of those measures that look at language skills and math, then the area of sciences. But there are also studies that look at citizenship, there are studies that look at, you know, creativity if you want, but and most of them, and particularly the ones that are becoming more and more popular in the media, such as PISA, look at outcomes around issues of language, math and sciences.
Will Brehm: 6:01
And are these seen as the, you know, most important subjects or they just simply the ones that are easiest to measure?
Gustavo Fischman: 6:08
Both. On the one hand, language, reading, writing and math are foundational, so they’re very important to assess the level of, you know, mastery of those skills. At the same time, those are ones that we assume that most countries are emphasizing and giving priority. So in that sense, it will be easier to make comparisons and to try to establish the validity of the measures used to try to establish, to what extent, yes the students in Finland or the students in Zambia are learning similar content and acquiring more or less the same skills in mastering those areas.
Will Brehm: 7:08
And this is all done at the national level. So, meaning that with this data, we can compare how nations do on some sort of learning metric, but can we get further down into say, you know, states or provinces or even individual schools?
Gustavo Fischman: 7:28
Yes, you can, depending those the different status, you are going to have different levels of aggregation. At the individual school, probably you have all these issues of confidentiality to consider. But yes, you can disaggregate the data with PISA, you can see by states, by regions, in many of those studies, you can look at different regions within the country. So different countries will have, will use, you know, either the national aggregate data or look at the smaller units.
Will Brehm: 8:19
What are the different tests that exist on the international level? So, you’ve mentioned PISA, the Program for International Student Assessment, which is run by the OECD, but are there other tests that are administered cross nationally?
Gustavo Fischman: 8:33
There is a long list, I’m going to read the names of a few of them. So you have the World Education Indicators survey of primary schools, the International Association for the Evaluation of Educational Achievement, the Progress in International Reading Literacy Study (PIRLS), the Trends in International Mathematics and Science Study (TIMSS), that’s another one that is very famous, the International Civic and Citizenship Education. And then you have regional assessment. So, in Latin America, you have the Latin-American Laboratory for Assessment of the Quality in Education. In Africa, you have the Southern and Eastern Africa Consortium for Monitoring Educational Quality. In India, you have the Annual Status of Education Report survey. So, you have the early grade mathematics assessment, early grade reading assessment. There now a new PISA study that is PISA for the development world, there are studies that are looking at early childhood education. So, I’m sure that I’m not mentioning all of them, all of the most important but these are ones that are, you know, popular, well known, and they are mentioned in media.
Will Brehm: 10:16
What countries are involved? Are these examinations reaching across the world into most countries? Or is it a subset of countries? And relatedly, does a country take multiple exams, participate in multiple international assessments?
Gustavo Fischman: 10:33
Yes, the countries that participate in OECD, they could be involved in three or four of these studies. In the 1960s, there was only one of the studies and, you know, with five or six countries. Today, more than 90 countries are participating in several of these studies. And some of the studies are taking every four years, others every three years, others every six years. So one of the consequences of these studies, in addition to providing more or less rigorous and interesting and controversial information about education at the global level, they also created an industry, an industry on evaluation, and a group of experts that didn’t exist 30 years ago. So today, this is an industry involving offices for research and evaluation, that 30 years ago, 20 years ago, in some Latin America, in the last 15 years, most countries started to have these offices looking at this data. And so, trying to participate. And the curve showing the increase of the number of countries participating in the studies keeps growing. Although there is also now, due to the controversial nature of the studies, there are some countries arguing about the benefits, or that you’re trying to opt out of the countries, not all countries, you know, stay forever in these measurements.
Will Brehm: 12:42
So before we jump into this controversy that some of these tests have created in different national contexts, I just want to ask a little bit about this idea of the industry that these large scale international assessments has kind of created over the last 40 years. How does the money work? Where, you know, if it’s an industry, where is the money coming from? Who is making money off of these tests?
Gustavo Fischman: 13:08
If you want to participate in any of these tests, the country has to pay for the test, some of these tests are quite affordable, or they are administered for free, for PISA you have to pay. For PIRLS, I also think that you have to pay. So, it is a debate. This part of the debate about how beneficial these examinations are or not. But there is a cost involved in developing. These are very complicated studies. This is not an easy endeavor, so I’m using the industry for the size, not in a pejorative way. And I’m not trying to diminish the level of expertise and well intentions of the great majority of people working here. But as a phenomenon is the accountability, the model of education accountability at the level and scale that we’re seeing today is a relatively new phenomenon. If there is something that is important to recognize is that today we can make comparisons, appropriate and inappropriate comparisons, because we have these instruments and we have this data and 25 years ago, we didn’t have that so we couldn’t make those comparisons. And it is very difficult to administer this type of examination, it is also very difficult to interpret and to interpret appropriately this examination, there are issues of validity, reliability of these tests, this is really complex. That is why the industry need to develop a large group of experts studying these issues, there was no literature on these topics 30 years ago, today is a very large literature.
Will Brehm: 15:36
So, what are some of the benefits that, you know, people that have participated in this very large industry as experts or participants of these tests? What are some of the perceived benefits of participating in cross national assessments and perhaps, you know, the perceived benefit of the ability to compare?
Gustavo Fischman: 16:01
The simple answer for that is allow the different governments and the different societies to have access to data that will allow them to make decisions based on data about the well-being of their educational systems. And that’s the theory of action. If you know how well or not your system is doing and performing, you can make policy changes and you can allocate resources in more efficient way and more effective way or more equitable. So, depending in their intention of the governments and the stakeholders, that’s the basic idea that we need to know, and these international comparisons are a good way of accessing to always very difficult information.
Will Brehm: 17:09
So, are there examples that are typically used to as like success stories of how a country has used cross national data to make data informed decisions that improved their system of education?
Gustavo Fischman: 17:23
Well, the one example that is mentioned in the literature quite often is the famous PISA shock of the German case, when, you know, there were released the results of the first PISA studies in the 90s, Germany was shocked by not seeing their system performing at the top level or to having a better performance than what they did. So, based on that shock the society and the government of Germany examined different alternatives and changed the curriculum and changed practices. The United States, you know, is constantly having in the newspapers and the media and in every single policy discussion debate about, you know, how to regain the primary spot as an educated society. And they use the information of PISA, you know, to inform or to increase the levels of accountability and the use of testing to make high stakes decisions. The industry around Finland as the top performing country or people talking about the Singaporean educational model. So yes, we have plenty of countries that are relying on the information derived from these studies to make decisions, sometimes in ways that are surprising.
Will Brehm: 19:20
Such as?
Gustavo Fischman: 19:21
Well, the creators of this assessment in general are very cautious and they never, or they don’t say use the data of the test to change your policy. Because they know that some of the comparisons are impossible in the way the countries are interpreting the data is always informed by the internal, the national policies. So, this one thing is the creators of the assessment and how they use the data. And I think it’s very different from what the different countries are using. And that’s one thing I want to be careful there, not making, you know, gross accusations here. Independently of that, the cycle that a government pays attention is not the same cycle as the test is administered. And so, we need to put the data in the policy context of each country. What I’m trying to say is this, the countries that have good performance and have very different policies and countries that have very low performance in the test and they have also very different policies or the same policies, there is no causality between good results and a set of policies, or bad results and a set of policies, you have a very mix situation or by regions, you know, Finland outperforms. The one classic comparison is between Finland, Sweden, Cuba, Chile, and the US and Canada. As you should say, oh, okay, all these countries, they should align and start having more or less the same policies. And now they have very different policies and, you know, Sweden, the US and Chile and for long periods of time, they increase accountability choice and, you know, increased process of privatization. And you don’t see an improvement supposedly based on the results of the assessment, and you don’t see an improvement in educational performance in those countries, compared to the other three countries that increase you know, public presence, teacher training and, you know, different types of curriculum and they got that better results but, you know, everybody says the same we have to be very cautious about making gross comparisons. The problem is, you open the newspapers of every country after the PISA or the TIMSS or the PIRLS, the results are there and the comparisons very inappropriate appear always in the front page.
Will Brehm: 22:46
And it’s usually in like some very simple table.
Gustavo Fischman: 22:49
Yes, yes, no, it’s always you know, my country is doing better or worse than your country and blaming, it seems that in the paper that we work with, you know, use the idea that maybe these are very good assessments in search for a good problem to solve, but the problem that should be solving it’s not there. So, is that a complicated situation? We know how important our cultural, social, political, economical conditions in relationship to the performance of any student or the achievement of any student in any school. So, if you look at these assessments, without considering all these other contextual elements, we have problems. And these international assessments always involve translations and we know that translations are also very problematic, and you know, chance lags things that may happen in a country in that particular year, that will affect performance. And if we don’t consider those elements, our understanding of those results could be very simplistic. I think that these are condition that is affecting education in general, not just the use of these assessments.
Will Brehm: 24:34
Well, yeah, I mean, it seems like there is this hope that education is just some sort of technical issue that we know if we get the right policy and if we get the right teacher education training, then the students are going to, you know, have more knowledge, and then that knowledge is going to translate somehow into a better society. And it’s often seen as this technical problem, but it sometimes forgets how messy education can be politically and how there’s so many different interests at play inside the education systems. And I would imagine that complexity gets compounded when you are trying to compare countries around the world.
Gustavo Fischman: 25:20
Yeah, for our analysis of the data and the literature on these topics, you know, shows that on the one hand, everybody asked for, let’s be cautious. Let’s make careful assessment of this. And the interpretations at the more popular, less specialized literature is, you know, completely ignoring those calls to be cautious. And if you think about the case of the States, the United States, the increase of assessments and models of high stakes testing for accountability has increased tremendously in the last 35 years. And so, based on this model of action, you should see a better performance because now we have more data, we have more assessments. And if you do well in the past, then you will be motivated to continue to do well. And if you don’t do well, then you know what you’re not doing well, and you should be motivated. Well, we have 35 years of data that says, well, no, it’s not working, but we keep increasing the demands. And so, the shift if you want, one of the elements that is happening there is that there was a shift from assessing what the students learn. First, were the students at school? Yes. So, now that there are schools, what are they learning? That’s one component, and now there’s more, there’s a movement, if you want to transfer what the students are learning to what the teachers are teaching, and what the value added of each individual teacher to the learning of the students. And for everybody that was in a school teaching something, you know that the learning depends on many, many, many, many factors. And teachers are important, but they don’t account for everything. Some people would say it is 20% or would say 30%, but, you know, it is definitely not the dominant factor. It is important, yes, but it’s not the exclusive factor that will explain learning in the students. So, again, I am seeing this development with a lot of concern, not because the assessments are per se, or in themselves negative, we shouldn’t assess, no, no. Assessing this is good but the way we are using them in the middle of very controversial policy agendas is really complicated. So, if you want, we need to have more literacy, more debate about what are the appropriate and what are the inappropriate uses of this assessment. There’s one thing come. Yeah, it seems very clear in the data is that we will continue to use international large-scale assessments and global learning metrics for the foreseeable future. So, it’s not a phenomenon that will disappear. So, we need to use them in the best possible way, preventing, you know, excesses in poor interpretations or simplistic interpretations just advance different policy agendas.
Will Brehm: 29:34
What do you have recommendations for, you know, how these large-scale international assessments can be used in a more appropriate way than maybe they have been in the past?
Gustavo Fischman: 29:46
Recommendations, the recommendations will be to work and I’m not criticizing the agencies that are already doing this as they’re not doing enough, they’re doing every time there is a release of new report, you see the organizations, you know, sending policy briefs, explanations, long documents with lots of footnotes about how not to say inappropriate things about this. But then the translation of that to the headlines of a newspaper is completely different. But a lot more education on how to use them, that’s for sure. Better preparation in terms of what are the different responsibilities of, who is benefiting from these, teachers have to administer in many cases these assessments, and they are forced, in many cases, to teach things that they don’t consider relevant for them. They get their curriculum displaced by these things. And I think that we need to have a better situation. On the other hand, we know that the farther from the instruction these assessments are happening, the less useful are for the teachers and for the students to learn. So do we need all these technologies like do we need all of them?, seriously these are costing money, we need to have a discussion about how culturally relevant and appropriate these tests are, what’s the level of sophistication that we need to put in measuring mathematics, why couldn’t we also look for creativity or, you know , there is citizenship and engagement. But if all these tests are forcing, or enforcing, and they’re not forcing, they’re incentivizing or motivating to basically focus on language and math, then there are other areas of curriculum and education that are going to be neglected. And we see that movement, lots of our colleagues in the field of international comparative education, they always say, well, at the global level, those are the areas that are priority and we should focus on that, and I understand that, but I’m also very concerned with using schools in such utilitarian way and neglecting all other areas that are very important. And we see this happening in many parts.
Will Brehm: 33:10
Yeah, and I always think about the student in school, having to take how many tests are the students now, taking? They’re taking their national exams, and whatever that entails. And each nation probably has a different system, and then this battery of international exams, I mean, I hated tests when I was in school I used to kind of I would play hooky, I would skip days when I knew I had to take a test because I was I get so nervous and I just can’t imagine what it would be like to be a student now and having to take so many examinations.
Gustavo Fischman: 33:48
And on top of that you have the issue that increased pressure by the Ministries of Education, Secretaries of Education to show high performance and nobody wants to show lower performance and we know that the more emphasis you put in the highest performance, the chances of increase the tendency and possibilities of corruption and cheating, direct cheating increases. Again we have in the United States, you know, people that are in jail for cheating on those tests, the case of Atlanta and lots of you know incentives to perform well that are creating problems and then you don’t know what are you assessing. Dr. David Berliner, colleague at the ASU, was sharing with us the analysis of PISA, you know, was showing how everybody is complaining that the United States is number 24 or 34 in PISA and how poorly we’re doing but when you disaggregate the data by percentage of students receiving free lunch, some indicator of poverty, that the Asian American students outperformed the Asian students and so he was saying like well our Asians beat the other Asians. Yeah, if you take Massachusetts alone, you know, instead of the whole country, Massachusetts performs as the second or the third best performing in education. So this is another proof, when you are taking the country as a whole, you mask the differences within each country and you know, this is another issue that is concerned with countries that are multiethnic, that are plurilingual, they are with very diverse populations. To what extent these large-scale assessments are forcing, you know, to make more homogeneous curriculum, to ignore linguistic diversities, to ignore ethnic and racial diversity. So, schools are a very complicated organization, and they have local cultures that are important, relevant, and we need to honor, we shouldn’t ignore them, when you start looking at these issues at the global level, those seem to be minor details.
Will Brehm: 36:56
It is so fascinating to think how complex it is to manage a school system locally, but also nationally, and now we’re adding in these kind of international players like the OECD and the IEA, and all sorts of other agencies that are administering these large scale international assessments and the power relationships are getting so… Well, it’s new in many ways, and these government ministry officials have to balance so many competing interests. And ultimately, you know, the schools experience all sorts of consequences as a result.
Gustavo Fischman: 37:38
Yeah. And add to that somebody is producing the test and charging money for that, and somebody else is going to produce the curriculum that will be better aligned to those tests, that is also raising money, and that there are different policies agendas that are, you know, part of encouraging the use of these tests, and they are also producing different effect. So, I’m not saying that because you administer a test, immediately you are aligned with one policy or another, that’s exactly what we couldn’t find that within that same region, you are going to have countries that are going to adopt completely different policies with very similar results. So, you cannot see causality there, but that there is an alignment and there are some forces pushing for one particular model of reform that increases accountability, particularly blaming teachers and just kind of the easy target. If the schools are not doing well, well, it must be the teachers and I don’t think that that’s a fair explanation.
Will Brehm: 39:11
Right. Well, I mean, it is a fascinating topic, it seems like it’s going to be going for the foreseeable future. So, we look forward to more of your writings on the subject because it seems like there’s so many different areas to explore. But Gustavo Fischman, it was really great to talk today. Thanks so much for joining Fresh Ed.
Gustavo Fischman: 39:31
Thank you.
The power and perils of international large scale assessments