What is the point of this application?
To gather data published at the IATI Registry and analyse it, either in whole or in subsets, to see how well it fits with Benford's law.
What is Benford's law?
Benford's law is a mathematical model pertaining to the frequency of first-digits across sets of numerical values.
IATI publishes a large set of numbers for financial transactions. For any given random single number from this set, what is the probability that the first digit is, for example, 8? The intuitive answer to this question is to say '1 in 9' (or 0.11 or 11%). That is, there are 9 possible digits it could be (0 is not considered a real leading digit) so the chances of the first digit being '8' are 1 out of 9.
However, for many types of data, this assumption is not correct. As Benford demonstrated, the probability of any given first-digit appearing follows a specific, non-uniform distribution. The actual probability of an 8 appearing is just 5.1%. The most probable digit is 1, which is expected to occur just over 30% of the time.
Why IATI data?
Data published in the IATI registry contains transaction records, each of which contains a 'value' which represents a numeric, monetary amount. Data of this nature would be expected to follow Benford's law.
Does IATI data follow Benford's law?
Yes, very closely.
Why create a whole website just to say that?
The fact that IATI data follows Benford's law is only mildly interesting in itself. What is potentially more interesting is if it does not. Given that we have established the data as a whole follows the law, if we split the data into various subsets we would expect each subset to also broadly follow that law. This website allows a user to split the data in an almost infinite number of ways to view how well each subset follows the distribution.
Why is it interesting if a set of data does not follow Benford's law?
Although there are many caveats (see below), at a basic level it can be inferred that data not following this distribution is potentially incorrect.
The data could be 'incorrect' for many reasons, ranging from technical errors to unknown external variables to fraud. As discussed above, the fact that each leading digit does not occur with equal frequency in sets of numbers is counter-intuitive to most human beings.
This explains why analysis of financial data against Benford's law has been shown to aid detection of fraud.
I've found a subset that does not follow Benford's law, does that mean the data is wrong or fraudulent?
By itself, definitely not. Any of the following reasons could explain it:
- The set is too small - any statistical distribution is only an approximation of probability. Random chance can always skew things. The smaller the set, the more likely that the distribution will not be followed due to random chance.
- An error with the way the data is published by the IATI member on their website
- An error with the way the data was originally generated by the IATI member prior to publication
- An error with the calculations made by this website (though we try our very best to avoid that)
- An external factor which skews the transaction data in a non-obvious way. For example, if a member actively chooses to only publish data that meets some hidden, internal criteria (e.g. transactions over $10,000 in value)
It is only after discounting these factors that any kind of inference can be made about the reasons for the data not following the expected distribution. Although this website can help to flag potential errors as a cause for further investigation, a statistical distribution is not in any way proof of erroneous or fraudulent activity.