‘Assessment’

Samuel Gerstin
8 min readJul 22, 2020

M y job title once included ‘Monitoring & Evaluation’; today it includes ‘Monitoring, Evaluation & Learning’.

In practice, I never describe myself as a ‘monitor-er,’ ‘evaluator’ or ‘learner.’ I never draft ‘Monitoring Trip’ into a scope of work for visits to overseas offices. I haven’t yet claimed to administer a program ‘Evaluation.’ And I have yet to justify ‘Learning’ as the result of any data analysis, report or project-level activity I’ve led (or been otherwise engaged).

Rather, there is another word I always return to: Assessment.

There are several reasons for which I feel the term ‘Assessment’ more appropriately captures what our sector presently identifies as ‘MEL’ functions.

For better or worse, ‘monitoring’ and ‘evaluation’ struggle with the accountability stigma. Both words associate with top-down judgment against predetermined criteria of what is to be valued. In other words, no matter how nuanced the takeaways and analysis, a ‘monitoring’ or ‘evaluation’ effort presents a judgment: the realized value was positive; the realized value was negative. This is contrary to examining what is to be valued, in more deeply exploring the nature and essence of the program itself, as implementation unfolds and brings ever-new considerations to the fore. The act of exploring just what it is about a given program that makes it what it is — which our development sector seemingly cares most about these days — more closely resembles an ‘assessment.’

‘Learning’ as a sector discipline has, in my opinion, lost some touch with reality. If all development organizations, public and private sector alike, have meaningfully and systematically ‘learned’ from their programming (we each employ Learning Specialists*, after all), then truly all involved would be able to articulate what this learning is, and how it reflects upon our sector. Yet we cannot. All I can conclude is ‘Learning’ is at once too cheaply thrown about, and at the same time held to an ideal of advancing sector-level truths…impractically removed from program-level assessment.

(Sidebar: here is how I would define ‘assessment’ in the context of program implementation. The assessment process directs a team to explore what is driving their program in the direction it is headed; what this exploration reveals should be responsive to what the program values and needs to know.)

Let’s dive further into each term: Monitoring, Evaluation, Learning.

Monitoring. Indeed, I have yet to describe a country-level program visit or relevant deliverables as ‘monitoring’ or ‘tracking’ efforts. I don’t go to track or document; that work is already handled by the team. I do visit teams to speak with them about their program, gain a sense of what information is being tracked, and help determine whether this is bringing clairvoyance or insight to daily decision-making. Ultimately, I assess whether what is being tracked through routine program functions is of value.

The term ‘monitoring’ further implies we as development professionals stand apart from the data — that data is not to be considered beyond its face value in accounting for a given metric. To merely ‘monitor’ information thusly distances us from the crucial step of interpreting** the information, from extrapolating what it is about the datum we truly care about, for our team, at this moment. Take, for example, a datum reflective of a hypothetical team’s case management system: intake percentage (or similar). We can verify a stability or rise in intake numbers and conclude — well, we’re on track. But is this fact revelatory? (Assume the team already has a firm sense of their intake, considering they are constantly busy with case management). Is this conclusion what we truly care about? (Assume the team figures to sustain a healthy case portfolio…their donor keeps funding its operation). What about the fact the full team feels the system’s intake numbers inadequately reflect upon its effectiveness? (Didn’t they indicate their partners rarely if ever mention the presumed influence of their case management?) In short, the intake numbers are not bringing the insight the team is looking for; it may be worth re-envisioning how the team uses the datum, or look at another datum altogether.

Consequently, I would not monitor whether team data is accurately tracking the program’s case management system rollout (it is); rather, I would assess what it is about this data that is or is not meaningful.

Evaluation. At the onset of this post, I noted the presumed objective of ‘evaluation’ is to render judgement about a program. Did the program achieve X, Y and Z — yes or no? I appreciate the utility of rendering judgement. I also recognize good-practice guidance would encourage an evaluation to go beyond the simple yes/no response, to effectively capture just how and why the program did or did not achieve X, Y and Z.

My concern is we are forced into a situation where, in order to arrive at a judgement, the program must be separate from the evaluation. To a degree, our sector figures a program implementer is unable to divorce themselves from the evaluation to the effect of biasing the result, and must therefore disassociate from the process. Our evaluation guidance rationalizes this separation; we distinguish the level at which a program team should (and should not) involve themselves in an evaluation, identify moments at which their synchronization with the external evaluator is (or is not) appropriate to the integrity of results, and generally aim to strike a balance whereupon an evaluation can claim to have been inclusive of team input while having preserved its fundamental objectivity. And yet — with the program team largely absent from identifying, contextualizing and interpreting the crucial data that informs the evaluation result, the overall effort cannot help but paint an incomplete picture. There is a gap of knowledge on the part of the evaluator that cannot be filled in the two weeks, or month, or even three-month period they spend with the program. There are the voices of lower-level staff, not to mention the points of view of external partners and the communities they engage, that most often miss the evaluator’s attention. There are, in effect, countless data and crucial viewpoints that fall through the cracks wherever a program is to any extent shut out from its own evaluation. Yet this very situation tends to play out, so long as integrating a program team is thought to counter the professed need for ‘separation.’

But a process that looks beyond (independent) judgement, and instead to (interdependent) exploration can avoid this dilemma. Exploring one’s own program unabashedly involves everybody associated with it; exploring one’s own program glances at the results, but more importantly dives into the catalysts, leverage points and barriers that drive results (the aforementioned hows and whys). It is not that an external evaluator cannot uncover meaning, or more easily reveal long-buried assumptions; it is that the program holds the necessary insight to make sense of this and act upon it. A team that assesses in its own efforts — yes, in concert with external voices — is more likely to take away a fuller understanding of their program than they would find on the receiving end of an evaluation result***.

Learning. ‘Learning’ is such an alluring term. Learning Specialists are in-demand (front-and-center on every ‘Meet our Experts’ webpage), and Learning Organizations are cutting-edge. Today, USG foreign assistance guidance materials request Learning Agendas and Lessons Learned as standard elements of sound program management.

However, the bar we set for our Learning is misaligned with programmatic structures. Our common understanding is ‘learning’ reflects broad, relevant, applicable truths at the level of donor objectives: increased livelihoods, more democratic societies, popular norm changes, etc. Yet implementation-level staffing plans, job descriptions and associated deliverables deny teams the bandwidth, resources and incentive to grapple with this mandate. If sector-level learning does not feature during start-up, baseline research and onboarding, not to mention ongoing reporting and personnel review, learning will not occur.

I cannot claim to have contributed to what would constitute development learning. Like many others in our sector, I operate at too small a scale: at the level of NGOs, with paltry program scopes (in comparison to USG or multi-laterals). What we take away from our programs may be relevant for our organization, for a similar future program across a similar future context…but would they realistically have bearing over multimillion, high-capacity programs such as PEPFAR? Moreover, our claimed ‘learning’ most generally comes from only one of a few minds, the result of limited knowledge/ data capture and interpretation. Truly insightful sector-wide learning**** stems from a critical mass of input; I do not believe I’ve ever reached this benchmark.

‘Assessment’ puts us in a better frame of mind. Assessment naturally places us squarely within the realm of whatever we are assessing (our program). Assessment is meant to focus on the subject being assessed; it does not claim to reflect upon the sector at large. If the takeaways of a program assessment are, ultimately, responsive and relevant to future programs, great. And if enough programs latch onto these same takeaways and continue to validate them, then the initial assessment may reach the level of evidentiary learning.

***

In wrapping this post, here is a pitch for a daring/ foolhardy program or organization: Re-title ‘MEL’ staff to ‘Assessment’ staff (or ‘Assessment and Valuation’ staff)…and enjoy how the function evolves over time.

_______________________________________________________________

*Moreover, employing ‘Learning Specialists’ in many cases relieves the prerogative for ‘learning’ from the program team — but that’s another matter.

**There is a prevailing sense interpreting information (data) is the realm of ‘evaluation,’ not ‘monitoring.’ While evaluative data and monitoring data are often wholly different information sets, they both project upon the program — why is one to be interpreted but the other not? Besides, it is plausible ‘monitoring’ data is as much if not more sensible to interpret, as it is often more proximate to programs’ daily efforts (and thus the team’s assistance interpreting such data more likely to uncover what is essential to their effective programming). Treating both ‘monitoring’ and ‘evaluation’ datasets from the same assessment lens mitigates the issue.

***If it seems I’m speaking of ‘monitoring’ and ‘evaluation’ indistinguishably — I am. To be of real significance to programs, it only makes sense that all efforts large and small to explore what makes a program what it is, and how a team can derive meaning and act upon this, are considered the same function. Given my way, I would reinforce this by advocating ‘monitoring’ and ‘evaluation’ be dropped, and replaced with a singular term (hint: assessment).

***Here are a few sector-wide ‘learnings’ that come to mind: 1) The insecticide-treated bed nets model demonstrated across malaria-stricken communities; 2) The value of social behavior change communication strategies (e.g., promoting hand-washing behaviors; community leaders disseminate Ebola messages); 3) The market systems strengthening conceptual framework; 4) The Training-of-Trainers model (particularly in Ag extension work, and the CDC); 5) New programming principles: including oft-excluded voices, taking particular attention to youth and women priorities; emphasizing sustainability and diverting away from ‘donor dependence;’ community ownership of activities, etc. The thing about these models and principles: I doubt they were originally distilled from a monitoring/tracking trip; from an externally managed evaluation team; in response to a donor Learning Agenda. They do not reflect metrics or research questions. [Take, for instance, the Training-of-Trainers example. I have no clue which exact programs first piloted and then replicated the model, nor the specific objectives or results of those programs. Yet I understand these programs, above all else, valued the influence the ToT model seemed to project over their trajectory and lasting impact, and thus determined was important for our development sector to take note of. And after some time of determined replication and (re)assessment, the model presents as a sector learning]. The final note is the above examples fall under the health and agriculture disciplines. They do not present the more complex settings of democracy-building, youth development, conflict mitigation/ peace-building, economic equity and others. This may be an indication of just how particularly difficult it is to distill sector-wide learnings under less ‘controllable’ program settings.

(Need image for LinkedIn post)

--

--