This website is still in a beta testing phase. Data is not exhaustive. Updates are occurring frequently.

What is the point of this application?

To gather data published at the IATI Registry and analyse it, either in whole or in subsets, to see how well it fits with Benford's law.

What is Benford's law?

Benford's law is a mathematical model pertaining to the frequency of first-digits across sets of numerical values.

IATI publishes a large set of numbers for financial transactions. For any given random single number from this set, what is the probability that the first digit is, for example, 8? The intuitive answer to this question is to say '1 in 9' (or 0.11 or 11%). That is, there are 9 possible digits it could be (0 is not considered a real leading digit) so the chances of the first digit being '8' are 1 out of 9.

However, for many types of data, this assumption is not correct. As Benford demonstrated, the probability of any given first-digit appearing follows a specific, non-uniform distribution. The actual probability of an 8 appearing is just 5.1%. The most probable digit is 1, which is expected to occur just over 30% of the time.

Why IATI data?

Data published in the IATI registry contains transaction records, each of which contains a 'value' which represents a numeric, monetary amount. Data of this nature would be expected to follow Benford's law.

Does IATI data follow Benford's law?

Yes, very closely.

Why create a whole website just to say that?

The fact that IATI data follows Benford's law is only mildly interesting in itself. What is potentially more interesting is if it does not. Given that we have established the data as a whole follows the law, if we split the data into various subsets we would expect each subset to also broadly follow that law. This website allows a user to split the data in an almost infinite number of ways to view how well each subset follows the distribution.

Why is it interesting if a set of data does not follow Benford's law?

Although there are many caveats (see below), at a basic level it can be inferred that data not following this distribution is potentially incorrect.

The data could be 'incorrect' for many reasons, ranging from technical errors to unknown external variables to fraud. As discussed above, the fact that each leading digit does not occur with equal frequency in sets of numbers is counter-intuitive to most human beings.

This explains why analysis of financial data against Benford's law has been shown to aid detection of fraud.

I've found a subset that does not follow Benford's law, does that mean the data is wrong or fraudulent?

By itself, definitely not. Any of the following reasons could explain it:

It is only after discounting these factors that any kind of inference can be made about the reasons for the data not following the expected distribution. Although this website can help to flag potential errors as a cause for further investigation, a statistical distribution is not in any way proof of erroneous or fraudulent activity.