One of my favorite mathematical thingamajigs is Benford’s Law, the weirdly counter-intuitive finding about which digits are most likely to appear in most data sets.

Basically, it says that in many naturally occurring collections of numbers, no matter how random they seem, the first digit is going to be 1 about 30% of the time, 2 about 18% of the time, 3 about 12% of the time and so on down to 9, which is the first digit about 5% of the time. This pattern is found in collections of data ranging from population statistics to heights of buildings to rainfall totals – virtually any set of numbers that are naturally occurring. The pattern was famously noted back in the days when tables of logarithms were printed in books; the pages in the front of the book in a library were much more worn than those in the back.

There’s a long wikipedia article about it, if you want to know more. If you’re wondering why this happens, there’s no good quick-and-easy explanation.

I bring this up because Benford’s Law can be used to spot faked data: If the numbers in a data set don’t fit the expected pattern – only 15% of them start with a 1, perhaps – at least some have probably been made up. Bendford’s Law has become a standard analysis tool in financial fraud cases. And now a Venezuelan researcher has tried it out on national reported COVID-19 data and founds some numbers which seem suspicious. This is from the abstract, which reads like it was machine-translated into English:

The results indicated that results from Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile are suspicions of data manipulation because the numbers fail the Benford’s Law according to the results obtained until April 30, 2020.

The temptation to fudge COVID-19 data is huge, given the financial cost of bad results. That’s why so many people were alarmed when the Trump Administration, which has a terrible record in dealing fairly with unpleasant facts, took the COVID-19 reporting task away from the Centers for Disease Control. Benford’s Law is one way to spot any fakery (although by now I suspect that most data-fakers know enough to incorporate it in their fudging).

Pin It on Pinterest