Over a hundred billion dollars are spent on international aid each year. Most aid providers undergo periodic evaluations to assess their support. Have their policies worked? What priorities have guided aid? And what practices have been effective?

With such large sums of money circulating in the evaluation process, an aid evaluation industry has emerged. Formal evaluations are undertaken by “experts” who are hired by companies that bid on evaluation contracts. Sometimes universities themselves bid on the same contracts. And professors navigate the tricky terrain of research-for-hire. Many of FreshEd’s listeners have likely participated in an evaluation of an aid project. I know I have.

My guest today, Professor Joel Samoff, thinks it’s long overdue to “re-think evaluations, from conception through method to use.”

Joel Samoff is Adjunct Professor in the School of Humanities and Sciences at the Center for African Studies at Stanford University. He studies and teaches about development and underdevelopment, with a particular interest in education, and with a primary geographic focus on Africa. He has recently co-written a report for The Expert Group for Aid Studies entitled Capturing complexity and context: evaluating aid to education.

CORRECTIONS [January 31, 2017]: In the podcast, I state that there are “hundreds of billions of dollars” spent on aid each year. That number is likely exaggerated. A more accurate figure would be a hundred billion dollars (see here or here). Also, I misstated Joel Samoff’s title. Since Stanford University retired the title “Consulting Professor” in September 2016, his correct title should be “Adjunct Professor.” I’ve corrected the blog post accordingly and apologize for the mistakes in the podcast.

Citation: Samoff, Joel, interview with Will Brehm, FreshEd, 58, podcast audio, January 30, 2017. https://freshedpodcast.com/joelsamoff/

Transcript, Translation, Resources:

TranscriptResources

Will Brehm 1:54
Joel Samoff, welcome to FreshEd.

Joel Samoff 1:57
Thank you, Will.

Will Brehm 1:59
Billions of dollars every year have been spent on development assistance or aid. So, how is aid evaluated? How do we know how that money is being spent and the effectiveness of that money?

Joel Samoff 2:14
In the current era, every aid program requires an evaluation that has become standard practice. So, every allocation, whether it is the United States, or England, or the European Union, or the World Bank, requires an evaluation of some sort. So, in principle, there is a pretty direct way to make that assessment. In practice, it is the aid providers who are also making an independent judgment about how well they are doing it whatever they think they are trying to do. And that likely has more impact than the formal evaluations.

Will Brehm 2:46
And so, what are the evaluations for?

Joel Samoff 2:51
To look at the specification of evaluations, you would think that they serve multiple purposes, think about the aid relationship as a whole, aid is intended as a transfer of resources from a provider to a recipient. The provider is nearly always a country. There may be an intermediary, so that, for example, Sweden, at roots part of its aid through UNESCO, but the provider, the initial provider is almost always a country. The recipient is formally generally a country. And in that relationship in the development of an evaluation, the claim is, or the expectation is that the evaluation will serve multiple purposes, it will help the funding agency know where its money has gone, it will help the funding agency have some confidence that it was spent for the purposes that were intended, it will help the funding agencies see that the reporting that was required, or the measurements that were required, or the assessments that were required, were actually accomplished. In principle, the evaluation should also be of used to those who are the recipients. And recipients generally in education are a government that is a ministry of education. But then ultimately, there is a set of schools or a set of teachers or a set of textbook publishers, or somebody who is responsible for doing whatever the tasks are, that are being funded. And in the stated expectations, evaluations should be of used to all of those people that is from one end of the process to the other, from the provider to the recipients. In practice, for the most part, evaluations are done in ways that serve the needs of the providers, not the needs of the recipients, and the specific need of the providers to justify what they have done. So that in essentially all countries, the aid agency must report either directly to a ministry or sometimes directly to a parliament about what it is done. And to the extent to which they’re used, that becomes the main use of evaluations, we should, I think, back up a step to note that the responses I’m offering or the comments I’m offering are drawn heavily from a synthesis of evaluations of aid to education in poor countries around the world. That was commissioned by a group in Sweden that I did with two graduate student colleagues at Stanford. So, the comments I am making are based not only on many years of work on aid and evaluations of aid, but an explicit synthesis of evaluations undertaken over the past decade.

Will Brehm 5:32
So, you were just saying earlier that the evaluations are, in practice, they are not really that useful for those receiving the money rather than those providing the money. Is that, that’s right?

Joel Samoff 5:50
The evaluations in general are done in a way that meet the needs of the providers and don’t meet the needs of the recipients. And in the simplest form, if you look at the evaluations that are written in a way, think of an evaluation of aid to teacher education, or to assist preschool teachers in developing new pedagogical skills, the evaluations are written by experts in evaluation generally, for the aid providers, and they are not written in a form that is readily comprehensible to those who are the recipients of the aid. So, notwithstanding the stated expectations, in practice, there’s very little way for the recipients to use the evaluations.

Will Brehm 6:33
And, for the providers, you know, presumably, a lot of these countries that are giving money to multiple countries, is there a level of learning about the practice of say educational development, and then trying to replicate some of these projects based on the evaluations in other countries?

Joel Samoff 6:56
In the era we live now, there is a great deal of attention to it is called evidence-based policy and evidence-based practice. So the presumption is that if we are successful in getting good evidence about what was done, and therefore, what accomplished its objectives, and what did not, that can then help to shape whatever is the next decision, about policy and about practice, in the way in which it unfolds, however, evaluations are mostly internal documents commissioned by funding agencies, they are reports to the funding agencies. And although well sometimes they are kept in a kind of confidential mode for concern about the politics of showing reporting on things that have not gone well. For the most part, they are accessible, formally accessible, but are formally public but not easily access, they are the property of a funding agency and the Ministry of Education. And unless you know that the evaluation has been done, and that it has what it is called, it would be very hard for anybody else to find it. So, in that respect, there’s very little mechanism for sharing, not only across agencies but even often within agencies. One of the things we did in the study was to spend a bit of time talking with agencies about how they use evaluations. And we found, and I have found over my work and evaluations over two decades, that there is very little internal learning, within agencies, from the evaluations that are undertaken, and even less across agencies. And if you think about that, there is a pretty good literature on what is problematic about aid. And what striking about that is that the identified problems persist. And one of the problems that regularly identified is that the officers who are responsible for a particular aid program are rarely in place long enough to see the end of whatever it is they’ve helped to organize the funding for. And their successors need in order to make their own careers need their own projects. So officer A has a project in country X and develops a reputation in part based on the quality in some sense of that program, when officer A moves on, and it is succeeded by officer B. Officer B has to develop her own programs, she will not get promoted on the basis of follow up work on officer A’s programs. And therefore, there is a structured disincentive or a structured pressure, not to be very attentive to the previous experience, and therefore not to learn from that experience.

Will Brehm 9:59
And so hence these evaluations that get done, basically sit in a place that are actually quite hard to find and locate and read and perhaps learn from.

Joel Samoff 10:11
Well, yes, and no, that the evaluations are completed, they are submitted to whoever commissioned them, normally the funding agency, occasionally is the Ministry of Education in a recipient country. At that point, they are documents that belong to whoever it is you commission them, they are often formally public. So certainly, for the Scandinavian countries, for example, the evaluations are all public. And it is certainly possible for you or for me or for anyone else, if you know about it, to go and request a copy the evaluation and increasingly they are available online. So, it is possible to get access to them with some effort. But it requires prior knowledge that there was a project that the project had an evaluation that the evaluation was completed, that it is gone through its review process, and that it is now available. So, we gained ground over time, they used to be much more difficult to get at, now if you work at it, you can get to the evaluations.

Will Brehm 11:11
Right, but you still have to do quite a lot of work to uncover these documents.

Joel Samoff 11:17
Well, yes, they are generally not published. And as evaluators often point out, in the work that they are required to do the evaluation, they do not have very much time to turn the evaluation into an academic article, even if they are themselves academics. It is uncommon for an evaluation to result in an academic article.

Will Brehm 11:38
So how did these evaluations, like what sort of methods are these evaluations using, after you have studied evaluations and aid for many years now, so in general, what is the preferred method?

Joel Samoff 11:54
The funding agencies start with a notion that they want to know what works, and therefore they want to be able to use what works, the knowledge of what works, to shape future programs. And so, they are asking that question. What is happened over the, in that realm, in the aid world is an inclination, a strong preference to point to evidence and evidence that comes from systematic inquiry of some sort or other. One way to think about that is in order to participate in a discussion to make the case, to support one program or strategy or project over another, you need to be able to begin a sentence by saying, “research shows that…” and then you have to complete the sentence with some pointing to some research that you think shows something that is, whatever it is you’re trying to support. And so, there is this emphasis on evidence-based work, evidence-based policy, evidence-based evaluation, evidence-based practice. And that then has followed a fairly narrow notion of what constitutes relevant evidence. And so, if you think about what is happened in the social sciences, in general, that is reflected in the aid world and this attention to evaluations. And so there is a disinclination to gather the kind of evidence that would be, for example, participants reports on their experiences, or evidence that is generated by participants working as evaluators, that is those at the bottom of the aid chain, and much more inclination to do what is now regarded as standard social science, which is a large scale study of some sort or other usually focused on what’s called impact, and often involving some kind of control, comparison, maybe a Randomized Control Trial. And that is regarded as by many people as the strongest evidence that could be presented. So in the statement, “research shows that…” if at that point, one can point to an impact assessment, and particularly an impact assessment with a controlled trial, then that is deemed to be more persuasive than other kinds of evidence. In our work, in the work that we have done, we are skeptical of those claims and skeptical of the role of, or we do not find persuasive. The argument that impact assessments with controlled trials are the only useful way of evaluating aid supportive projects. Indeed, we find that there is some role for that sort of evaluation, but probably a modest role in a limited number of evaluations.

Will Brehm 14:49
Why is Random Controlled Trials? Why are they so attractive for this evidence-based policymaking or practice or evaluation?

Joel Samoff 15:00
There is the starting point of all that is a significant influence from the health sector. And so in the health sector, as I’m sure you and others are aware, what’s considered to be the absolute standard way to assess whether or not a new drug is effective at doing what it’s doing is to do a double blind Randomized Controlled Trial in which not only the participants but the doctors, who are overseeing it, don’t know who’s getting the new drug and who’s getting the compared drug. And so, there is an effort to try to reproduce that sort of arrangement in education, which, in our view, does not work very well. That is intended to try to deal with what is common in education, that is that every outcome that one could measure has multiple causes. And so often people use, for example, scores on an examination as an outcome measure. Does reading strategy A work better than reading strategy B? How do we know? Well, we will use reading strategy A in one place and reading strategy B in another school, and then look at the exam scores of the students in the two schools. And if the exam scores of the student in the school, where reading strategy A was employed, are significantly better than the exam scores of the students, where reading strategy B was employed, we can reasonably in this view conclude that reading strategy A is better than reading strategy B. The problem with that for educators is that examination results have many sources. And in the case of an examination, reading skills, the reading strategy employed in the classroom, in the year or years preceding the examination, they’re the only one of many influences on these examination scores in the end. And so, the notion is that these controlled trials and especially Randomized Control Trials can gain ground in eliminating what are thought to be alternative causal explanations in a narrow attention to the one that is being assessed or being evaluated. Mostly, we think that does not work very well. There is several reasons why it doesn’t work very well. One is Randomized Control Trials are very expensive, so you can’t do very many of them. I have heard now several times among recipients of foreign aid in Africa, the argument that, why should we spend three or four or 5% of the aid budget on evaluations, hire three more teachers instead. And that will have a much better impact on outcomes than more expensive evaluations. A second problem is that randomization in education is generally either not feasible or extraordinarily difficult. There are practical problems, there are political problems, there are ethical problems in trying to randomize the introduction of in my example, the new strategy for teaching reading, or teaching mathematics, and trying do what the researchers or the evaluators would call holding everything else constant. Basically, saying to teachers, do not change anything, we only want to change that we are trying to measure, do not do anything else different. And in practice, in education, that does not work very well. We want people to continue to experiment and evolve, and all of that. There are also political issues, and that is the people who are responsible for education may make decisions about where to introduce new strategies, and where not that do not fit very well with the evaluation strategy. The findings that come out of those controlled trials are generally very inattentive to context. And yet we know well that it may be that the context in which a new strategy is introduced, is far more important than the strategy itself. So, for example, introducing a new strategy in a school, which is fairly well resourced. And then trying to compare that with the introduction of a strategy in some other school that’s less well-resourced is that it’s difficult to know whether it’s the school itself, that’s the difference that leads to the difference in exam scores or the new reading strategy. And while the evaluators will try to what they call “hold constant,” the factors they think are important, there are always choices being made about which factors are important to hold constant, because it’s never possible to have two, the equivalent of two experimental labs, or two lab like settings, laboratory like settings, in which to do the comparison, the real world settings are never that similar. Therefore, the evaluators always have to make choices about what to try to make similar or hold similar, and what to ignore. And we also found in our review, that there are, there have been some very careful reviews of reviews, that is reviews of evaluations, reviews of reviews of evaluations, people who have sought to look at impact assessments with Randomized Control Trials, and if you look at three or four efforts to do that, you find that they come to very different conclusions about what works and what doesn’t work. So, what one would expect to lead to consensus, that is the same approach in practice has led to discord in findings and discord in conclusions. It has mostly to do with which study has given greater weight and which that has given less weight. But the outcome does not help us understand better on the initial question, what is more or less effective?

Will Brehm 21:02
I just, you know, I just keep coming back to question of why are Random Controlled Trials seen as the method to use in evaluations of educational development?

Joel Samoff 21:18
Well, in addition to the influence of the health sector, in addition to the general notion of science, there is that has influenced the how one goes about generating evidence, there is a strong notion of detachment, there is an effort to define objectivity as noninvolvement or detachment. And so there is a kind of medical metaphor in which people are looking at education systems in Africa, there is an external diagnostician, generally an external funding agency, and it’s representatives who are assessing what goes well, and what doesn’t go well in an education system in Africa, for example. And then having made that assessment prescribing some remedy, often a foul-tasting remedy that the country is required to swallow. But the notion is that the diagnostician should be external to the environment, that being a participant in the environment can be corrupting. And that therefore, an impact assessment with a Randomized Controlled Trial is intended to reduce the role of the evaluator, reduce the possibility that the evaluator will contribute a bias or a tilt in the evaluation in one direction or another. In our assessment, the cost of that is the loss of context specific information, and the loss of the insights of the participants in whatever it is that has been the aid funded project that is being evaluated. So whatever gain, there might be in limiting evaluator bias, there is a much bigger loss. And the loss reduces the quality and utility of the information that is generated.

Will Brehm 23:16
So presumably, evaluations have been conducted on educational projects since educational projects and aid has happened. What was the method of evaluation before Random Control Trials became so predominant?

Joel Samoff 23:33
You go far enough back in the early days of foreign aid, so back basically to the early days of I am most familiar in a longer history with the Africa data. Now, if you go back to the evaluations of education projects in the early era of foreign aid to education in Africa, an evaluation was an expert who was sent out by the funding agency to look at things and who wrote a report, or it was sometimes the representative of the funding agency in the country in which the project was being implemented who wrote a report. So, there were an expert observer kind of evaluation. There is a bit of history of evaluations in which there is an effort to pull together some external observers and some participants in the project and to draw evaluative insight from that process. But fairly quickly, the shift has been toward external evaluators. And even those funding agencies that had their own well-developed evaluation departments have now largely shifted to what they term ‘outsourcing’ that is, they hire somebody else to do the evaluation. So Sweden, for example, which had a very thorough and well respected evaluation department in the Swedish aid agency, has transformed the evaluation department into a unit that monitors evaluations that commissions evaluations and then monitors evaluations that are done by the people who are hired to do that. And so, in the current era, there is generally a published specification of what is sought. And the evaluators are evaluation firms these days respond, and then either win or do not win the contract to do the evaluation.

Will Brehm 25:34
So, there is a whole evaluation industry it sounds like.

Joel Samoff 25:38
There is a very significant evaluation industry and that itself, we think, is problematic, not that the people who are the evaluators are themselves not competent, generally, we think they are, but they are now embedded in a process that requires a continuing flow of commissions in order to fund the company that is doing the evaluations. And that then creates a momentum in which the company must be attentive to whether or not it is doing things that will reduce its reputation or damage its reputation or undermine its reputation with the funding agency, it has to be worried constantly whether it will get the next contract. It also means that there has resulted in very little self-reflection among evaluators. So using the examples I use before, an evaluation that is commissioned to see whether or not a particular strategy of teaching mathematics is better than some other strategy or new textbooks or some way of organizing classrooms or some other pedagogical innovation is worth pursuing, may focus on that, but doesn’t ask many questions about the nature of the evaluation itself, or about the aid relationship, or about the role of aid in whether or not a funded initiative is more or less successful.

Will Brehm 27:15
Are universities tied up into this aid industry that you describe?

Joel Samoff 27:21
Universities are tied in several ways, more direct and indirect. On the African side, we use the African examples again, where funding for education research is very limited. Researchers and African Universities find themselves pushed to become consultants to funding agencies that will provide support that they can then use to do research. And when researchers become consultants, there are consequences for the research process. So that is a major effect on the African side of the relationship. The universities in the providing countries, that is the countries that are providing the foreign aid, are often the source of the evaluators, or at least some of the evaluators or some people who participate in evaluation teams. And that then becomes a source of revenue both for those individuals, but occasionally for the university, as well. And the university may then find itself rather like the private companies motivated to continue that revenue stream and to worry about losing that revenue stream through the behavior, the reports, the findings, the presentations of its evaluators.

Will Brehm 28:37
So, thinking about all of these, well, you know, problems that you have pointed to in the aid evaluation, what sort of recommendations would you have going forward? Like how would you like to see the aid industry change?

Joel Samoff 28:55
Well, now you are asking a question about the aid industry or aid in general, and that is a different set of work that I have done, and it is focused on the aid process. But let’s talk for the moment about evaluations. And it seems to us that from our work, it’s very clear that what’s needed is a set of evaluation strategies or, and what’s more common terminology now a portfolio of evaluation strategies so that when a funding agency has agreed to support a particular initiative, rather than using a kind of standard, this is the standard evaluation, and we use it for all whole aid-funded projects, the agency can have four or five different sorts of evaluative approaches, and say that approach B is the one that makes sense in this circumstance, not approach D or E or effort G or R or Q. So, one finding, one observation, one recommendation is that funding agencies need a set of strategies. And the second, they need to be willing to draw from that set, the particular approach that makes sense in the most in the current circumstance. A second observation is that the evaluations while they do need to serve the purpose for the funding agency of showing how the money was spent, and was it spent in the way intended, the funding agencies if they’re serious about accomplishing whatever it is that they’re funding need to reorient much more of the evaluation process toward outcomes that are useful to those who are the recipients of the funding, not just the providers of the funding, but the recipients of the funding. Because unless the evaluations serve the recipients of the funding, and unless they are perceived by the recipients of the funding, as evaluations, that they have some influence in some control over some role in, some participation in, then they will be a kind of an external event. From the participants’ perspective, it is almost like something we have to endure. Think of yourself as a teacher in a setting where you are getting some money to introduce new textbooks. Part of the cost of getting the money is that periodically, somebody will show up to ask you how well you are doing and to administer a questionnaire and maybe to measure some to administer a test. And that is just the cost of doing business. It is not your evaluation as a teacher, it is not an evaluation that you had much role in. And it is not an evaluation that you can use to do much to improve whatever it is you are doing the changing your way of teaching mathematics or introducing a new textbook.

So, one of the strong findings and recommendations from our work is that much more of the evaluation process needs to be attentive to the needs of the recipients. And that means there needs to be a significant shift in the direction of increasing what in the aid world is called the ownership, just as the aid project that is the aid-funded project needs to be owned by local people, the ones who are administering it, that are implementing it in order to be successful. So too the evaluation needs to have a strong degree of local ownership to be effective for those who are the recipients of the funding. And that leads them to another recommendation. And that is, there is much more room for participatory evaluations than is currently the case. And that is in some ways, the opposite end of a continuum from Randomized Control Trials where the role of the participants is to be held constant rather than encouraged in our reading of what’s happening and what will be useful. It seems clear to us that there needs to be an increased role for participatory evaluations. And, while there is a risk that might have its own bias or might have its own tilt on the outcome of the evaluations that we think can be managed in pretty straightforward ways. And that the benefits of doing that are far outweigh whatever the risks of that bias are. The bias can be managed, but even if some bias persists, that is less problematic than not having the participants involved in the evaluation.

Will Brehm 33:33
Are you hopeful that aid evaluation will move in this direction?

Joel Samoff 33:40
Am I hopeful? Well, I think the pressures on the funding agencies are very powerful not to move in that direction. And so, I guess, in the short term, I am not very optimistic about progress in those directions. In the longer term, I think it is likely that there will emerge, which has not emerged yet so very much, but will emerge stronger pressure from the recipients of the aid funding for a revision of the evaluation process. There have been every few years major conferences about how to make aid work better. And mostly they do not have much impact on aid. But there’s some, and my expectation is that in some future conference, they will be greater attention to evaluation, driven by concerns of the recipients, which will then push the providers to reorient a bit, at least, the evaluations to be more useful to the recipients.

Will Brehm 34:46
Well, Joel Samoff, thank you so much for joining FreshEd. It was really a pleasure to talk today.

Joel Samoff 34:52
Thank you for inviting me.

Translation sponsored by NORRAG.

Want to help translate this show? Please contact info@freshedpodcast.com

Have any useful resources related to this show? Please send them to info@freshedpodcast.com