SoGive’s analysis methodology

28 Apr

The heart of the SoGive approach to analysing charities is the SoGive two-question method. The idea is that when you're donating, the thing that matters is the amount of good done, or (in the jargon of the charity sector) the amount of impact achieved.

But bigger charities are always going to achieve more good than smaller charities -- you can do more with a budget of £100m than with a budget of £100k. Clearly, therefore, we have to normalise for this -- we need to consider the amount of good achieved per pound donated.

In other words it's about bang for buck.

And that’s what the SoGive two-question method covers when it poses the two questions:

How much does it cost for a charity to do something?
What is the thing? (or what does the beneficiary get for your money?)

At SoGive we sometimes perform a relatively shallow level of analysis, and sometimes the analysis is more in-depth. The core of the analytical method is essentially the same, and this write-up intends to cover both. The more in-depth analysis covers a number of elements that are tailored to the topic; this summary post will not cover every element of that style of analysis.

Question 1 of two-question method

To answer the first of the questions in the SoGive two-question method we need to find out how much it costs the charity to achieve something. In principle this is fairly straightforward, although in practice there are a number of subtleties. These are expanded on in Appendix 3.

Question 2 of two-question method

The second question involves considering what is the outcome achieved and how good is it. At first glance this is a difficult question. Below we set out our approach to tackling this.

The first step is that we structure our data according to what is commonly called a theory of change. We start off with the inputs. From the perspective of a donor the relevant inputs are money. (other possibilities include volunteer time; staff time may also be included if it were believed that the true opportunity cost of the staff time exceeded the financial cost of employing them)

The money invested into the charity leads to the immediate results of the charity’s work (referred to, in the jargon of the charity sector, as “outputs” -- the picture illustrates this with malaria nets). Those outputs lead, in turn, to outcomes -- the knock-on effects, or the things which we typically care about (for example children living rather than being killed by malaria).

Charities often report the amount of outputs achieved. Outputs are relatively easy to measure. When we review charities based on information in the public domain, it is common for us to know that a charity has achieved outputs but not know about the outcomes they may have achieved. This is not sufficient for us to know that a charity is a high-impact charity.

However, knowing about the number of outputs may be sufficient for us to confidently conclude that the charity's impact underperforms our Gold Standard benchmark. See next section for an example of this. Where we are confident that a charity is not a high-impact charity -- i.e. that it underperforms the Gold Standard benchmark -- we assign the charity a "Not Recommended" rating.

One area in which we believe the sector needs more data is at the level of outcomes. While this is typically harder we believe it's important for the charity sector to quantify the amount that it is achieving in units that are meaningful; units such as lives improved, years of suffering averted, or lives saved. Our experience of reviewing the information put in the public domain by a large proportion of the charity sector suggests that it’s rare for charities to provide this information.

Where charities do provide quantifications of the amount of outcomes achieved we do not, by default, give these quantifications any credit in our process of assigning a rating. In order to receive any credit under our analysis, the quantification would need to be subjected to careful review, and would need to have sufficient rigour to satisfy our standards. We place these stringent demands both because of the difficulty of assessing outcomes and also because of the difficulty of achieving impact.

Gold Standard benchmarks

Once we know how much it costs the charity to do something and what the thing is, the next step is to compare this with the SoGive Gold Standard benchmarks.

The Gold Standard benchmarks are determined based on a lengthy research-driven process. This process is described here.

In some cases these comparisons are relatively straightforward. For example

Let’s assume that one of the benchmarks is that a charity which can avert a year of depression for at most £200 qualifies for the Gold Standard.
And let's compare this with a charity which can provide a veteran with a special activity, typically lasting for a few days for around £3,000.
Let’s add that the activities could include things such as horseback riding in Arizona or a multi-disciplinary water sports expedition to the Bahamas.
We believe that a special activity such as horseback riding is likely to provide benefit to the beneficiary, but that is likely to do less good than £3,000/£200 = 15 years of depression being averted.
Hence it seems clear that the charity being considered underperforms the benchmark.

Note that in the above example we compared the charity’s outputs to the “depression averted” benchmark, however SoGive has multiple Gold Standard benchmarks, and the analyst could have chosen a different benchmark, for example saving a life for £3,000. The benchmarks are designed to be equivalent under the SoGive standard moral weights -- i.e. if you believe that a charity underperforms one benchmark (e.g. averting a year of depression for £100) then it will likely underperform another benchmark (e.g. saving a life for £3,000).

Note here that the example illustrates that quantification is at the heart of the SoGive approach. If we could, for the same money, avert (say) 3 years of depression or provide 1,000 or 100,000 special activities for veterans, this would change the nature of the comparison. This reflects the fact that empathy is at the heart of the SoGive approach -- every single one of the people being helped matters, and hence we need our models to make sure we’re counting every single one.

Note that the Gold Standard benchmarks are a crutch to aid comparisons; they are not the be-all and end-all of SoGive’s analysis. To be more specific, just because a charity aims to achieve goals that are not the same as one of the benchmarks, it doesn’t mean that we place zero value on those goals. There is more on this in Appendix 4.

Achieving a Gold, Silver, or Bronze rating is very difficult. In order to do so a charity would typically have to have undergone very rigorous analysis by someone in the SoGive analysis team. This reflects the reality that achieving substantial positive impact is hard. Therefore we need robust evidence before we can conclude that outcomes are definitely being achieved. We have expanded on some of the extra considerations that may apply in Appendix 2.

SoGive three-question method for systemic change

We define here systemic change as a change which can intuitively be thought of as being permanent. To make this more precise, we mean that its effects will last longer than one person’s expected lifetime.

Work that has more systemic impacts typically does not involve direct interactions with beneficiaries. It typically involves work like research or campaigning.

Note that the term “systemic change” is used in many ways; this is a particular interpretation of the term which we use in SoGive, and may not perfectly coincide with the way it’s used elsewhere.

For systemic change charities we employ the SoGive three-question method (which is really a variant of the SoGive two-question method). The three questions are:

how much does it cost for the charity to perform the project
how much positive impact is achieved per year as a result of this change
how much sooner does the change occur as a result of this charity's work

The SoGive three-question method is described in more detail in this article.

Between the SoGive two-question method for direct interventions and the SoGive three-question method for systemic changes we have approaches for tackling most kinds of charitable work.

Appendix 1: Broad-and-shallow versus in-depth analysis

We have two types of analysis.

The broad and shallow analysis is typically based on information in the public domain only, or possibly based on information provided by the charity by filling in a form. While it focuses on the 2-question (and 3-question) methods, it will typically only get as far as considering the outputs (direct, countable effects of a charity’s work). There are many cases where outputs-level considerations are sufficient to form meaningful conclusions.

The in-depth analysis will typically involve models of not just outputs, but also outcomes. The models capture the ultimate impacts that people care about like lives saved or suffering averted. It will also involve meetings with the organisation being assessed and careful assessments of the evidence.

Appendix 2: Further analytical considerations

Rigorous evidence review

In order to achieve a higher rating (certainly for a gold rating and often also for silver), we would need robust evidence of effectiveness. This reflects the fact that achieving significant impact is hard, which means that most attempts to achieve material impacts are often unsuccessful. In order to believe that a charity is achieving a material positive impact, we need robust evidence.

It is rare for charities to publish evidence of the standard that we seek, which means that we typically cannot assign a higher rating based on information in the public domain alone.

Should the work be funded by impact investment?

This question may arise where a charity has, in some sense, a business model. For example, where the charity is charging its beneficiaries.

For example, if the charity’s revenue derives very heavily from charging for its services, we would typically be unlikely to consider the charity a highly rated charity. In order to believe that such a charity warranted a high rating, we would need to believe that the philanthropically-provided funds are funding a distinct activity other than the work which is already being funded by charging for services; we would also need to believe that the activity is something that couldn’t be funded by impact investment of some sort.

Wherever we believe it’s possible for something to be funded with investment rather than philanthropy, we would prefer for it to be funded with investment. This is because there are more investors willing to invest than there are philanthropists willing to donate.

Management and governance effectiveness

A key fact about giving money is that the funds will then be controlled by someone else. Having confidence that the funds will be managed effectively increases the probability that the organisation will achieve a high bang for buck when they put your donations to work in the future.

Our broad and shallow analysis is typically based solely on information in the public domain, so we are unable to fully incorporate this criterion in our analysis. However we would consider an in-depth analysis incomplete without meeting with staff from the charity.

Additionality / counterfactuals

When assessing impact, we consider what some call “additionality”, or what others call “counterfactuals”. Additionality: we consider whether the impact achieved is additional to what would have happened anyway, even without the charity’s work. Counterfactuals: we consider how the impact achieved compares to the counterfactual (or what-if scenario) of what would have happened if the charity hadn’t done their work.

These are different ways of saying that the impact we count should only reflect the impact which is actually attributable to the thing being funded.

Some of the other considerations are variants or special cases of this point.

Displacing other donors/funders

In some cases your decision to fund an organisation’s work may make the difference between whether they receive that funding or not (possibly because there is literally nobody else who would fund them, or, more likely, because they are constantly seeking funds). Another possibility is that your funding displaces the funding provided by other sources of funding. This could happen, for example, where the organisation has explicitly set a fundraising target, and once they have reached that target they will slow down or stop their fundraising efforts. Where this is the case, the “true” impact of your donation is to liberate another donor’s funds so that they can use the funding for something else. In this scenario, it’s important to (try to) understand what those alternative uses of funds are. It’s normally difficult (or impossible) to know the precise details of this sort of what-if scenario (or counterfactual scenario). However knowing something about the funders might tell you something about this. For example, if you know that you are liberating the funds of an organisation that is operating under some constraints which stop it from doing the most good it can.

Influencing other organisations

Where a charity works with other organisations, the fact that they are doing intervention X may influence other organisations to do intervention X as well. In this case, the true impact of the charity’s work should include the fact that other organisations are doing intervention X minus the impact of whatever work the other organisation would have done otherwise.

For our shallow reviews, we typically have insufficient information to incorporate this consideration fully. Even for in-depth reviews, assessing this involves a number of counterfactuals (i.e. what-if scenarios) which would require some judgement.

We typically avoid giving credit that the organisation doesn’t claim

When analysing a charity’s work, we may consider various ways in which the charity’s work may lead to impact. Sometimes we might think of impacts or outcomes which the charity themselves do not state.

While it’s tempting to give the charity credit for these things, we normally avoid doing so.

This is because achieving impact is hard, and we don’t believe that impact is likely to be achieved by accident. So if the charity isn’t trying to achieve a specific impact, it probably won’t be achieving it.

Appendix 3: More details on cost per thing achieved (question 1)

There are a number of complications in calculating the cost per thing achieved, and all the more so when gathering data from the public domain.

Charities frequently do not track or publish the relevant data items, such as the number of outputs achieved (how many people did the charity reach, or how many training sessions did they deliver, or how many miles of cycle lane did they maintain?) If we don’t know how many outputs were achieved, we won’t be able to calculate the cost per output.

If the relevant data items exist, finding costs of achieving those that are on a consistent basis adds to the challenge.

One particular complication around the consistency of costs and outputs is when charities work in partnership. For example, a charity may state that they helped x people in the last year. However the cost of doing this might not be simply the cost of running their own charity. It might be that cost + the costs incurred by one (or more) partner charities that they collaborated with.

In this scenario, the ideal way to handle this is to find out

How much is spent by the charity being assessed
How much is spent by the partner charity/organisation
If the charity being assessed were not there, what would the partner organisation do with the money instead

When assessing charities based on information in the public domain, we typically don't have this information. Indeed, it might be that we are unaware of the partnership working altogether; especially with large international development NGOs who will often partner with local development NGOs to get on-the-ground knowledge. Hence we simply count the costs of the charity being assessed, and note the risk that this might be a lower bound of the true cost.

One of the complications is that some charities do more than one thing. Indeed, they might do lots of things. How would we effectively analyse a huge charity like, for example, Oxfam which probably does hundreds if not thousands of different activities? For us to review each of those projects would be unrealistic because the charity would probably not be able to provide the detailed information for hundreds or thousands of projects, our users would not be able to meaningfully use such a huge amount of information, and it would be a huge task for us too. It can also be complicated to define what counts as a project. For example, continuing with the example of Oxfam, what about a specific piece of work doing a certain type of economic empowerment, for example a Village Savings and Loan Associations (VSLA), in a particular village in a particular country. Would that quite specific piece of work count as a project? or would all of their VSLA work count as a project? or would all of their work in, say, Nigeria count as a project? or would all of their economic empowerment work globally count as a project? To answer this question we defer to the charity, who knows their work better than we do. When we perform a review of a charity based on information in the public domain, we pick up the costs according to the project split identified by the charity themselves; this information is frequently provided in the accounts or in the notes to the accounts. We then find information about the amount of outputs on a consistent basis with the projects that the charity has provided costings for.

We mostly gather cost information from the annual report and accounts. One of the nuances is that we want to consider the costs on what some call a full cost recovery basis. In other words, we want to make sure that the overheads/indirect costs are included in the costs. Overheads such as governance costs or fundraising costs are spread between projects in proportion with the size of the project.

Appendix 4: More on the application of moral weights and Gold Standard Benchmarks

Our ultimate goal is to enable pairwise comparisons between any two charities. I.e. given any two charities, are we able to say which one achieves more impact?

Let’s illustrate what this diagram is saying with a fictional example.

First let’s imagine two charities:

Let’s say that Charity 1 supports NEETs (people who are Not in Education, Employment or Training) in Manchester
Charity 2 implements school-wide anti-bullying programmes

At first glance, many people might say that supporting NEETs and preventing bullying are different outcomes, and so it’s hard to compare them.

Under the SoGive method, the first thing to do is to recognise that the cost per output is relevant (even though many donors often ignore this).

Let’s say that Charity 1 can support NEETs at a cost of £2,000 per NEET helped.
For Charity 2, high-quality experimental studies show that the cost per year of bullying averted is £50

This is clearly useful additional information, however it’s still the case that we are comparing very different outcomes. To facilitate this comparison, we compare to benchmarks which are very different from each other, but where we have put a lot of thought into the comparison between the benchmarks.

For example, let’s say that two of our benchmarks are:

A charity reaches the Gold Standard if it can double a beneficiary’s consumption/spending for a year for no more than £30
A charity reaches the Gold Standard if it can avert a year of depression for no more than £100

Is the outcome of supporting NEETs (those not in education, employment or training) the same as increasing consumption? No, it isn’t.

Is the outcome of averting bullying the same as averting depression? No, it isn’t.

However it is easier to compare Charity 1 (supporting NEETs) to increasing consumption than it is to compare Charity 1 to Charity 2. And similarly it’s easier to compare Charity 2 (averting bullying) to averting depression than it is to compare Charity 2 to Charity 1.

Of course, all this does is move one tough comparison (Charity 1 vs Charity 2) to another tough comparison (benchmark 1 aka increasing consumption vs benchmark 2 aka averting depression).

However at least the new tough comparison is one where we have already invested a lot of effort.

One of the main things to convey with this is that the benchmarks that we have chosen are intended to aid comparisons between different things; they are not meant to indicate that the outcomes represented are the only outcomes which matter.

Appendix 5: Our influences

By referring to the SoGive two-question method and the SoGive three-question method, we are not meaning to imply that SoGive are the only organisations thinking in this way. Our main inspiration for the SoGive two-question method was the paper The Moral Imperative toward Cost-Effectiveness by Toby Ord.

Sanjay Joshi

SoGive’s analysis methodology

SoGive

Details

Contact

SoGive’s analysis methodology

SoGive is hiring! Analysts wanted to lead evaluation of charities

SoGive three-question method for systemic change

SoGive

Details

Contact