One of my favorite mathematical thingamajigs is Benford’s Law, the weirdly counter-intuitive finding about which digits are most likely to appear in most data sets.
Basically, it says that in many naturally occurring collections of numbers, no matter how random they seem, the first digit is going to be 1 about 30% of the time, 2 about 18% of the time, 3 about 12% of the time and so on down to 9, which is the first digit about 5% of the time. This pattern is found in collections of data ranging from population statistics to heights of buildings to rainfall totals – virtually any set of numbers that are naturally occurring. The pattern was famously noted back in the days when tables of logarithms were printed in books; the pages in the front of the book in a library were much more worn than those in the back.
There’s a long wikipedia article about it, if you want to know more. If you’re wondering why this happens, there’s no good quick-and-easy explanation.
I bring this up because Benford’s Law can be used to spot faked data: If the numbers in a data set don’t fit the expected pattern – only 15% of them start with a 1, perhaps – at least some have probably been made up. Bendford’s Law has become a standard analysis tool in financial fraud cases. And now a Venezuelan researcher has tried it out on national reported COVID-19 data and founds some numbers which seem suspicious. This is from the abstract, which reads like it was machine-translated into English:
The results indicated that results from Italy, Portugal, Netherlands, United Kingdom, Denmark, Belgium and Chile are suspicions of data manipulation because the numbers fail the Benford’s Law according to the results obtained until April 30, 2020.
The temptation to fudge COVID-19 data is huge, given the financial cost of bad results. That’s why so many people were alarmed when the Trump Administration, which has a terrible record in dealing fairly with unpleasant facts, took the COVID-19 reporting task away from the Centers for Disease Control. Benford’s Law is one way to spot any fakery (although by now I suspect that most data-fakers know enough to incorporate it in their fudging).
Dave, I love that you brought up Benford’s Law.
Coincidentally just last evening I was watching Netflix’s series called “Connected.” No. 4 is DIGITS and is all about Benford’s Law and the variety of natural (and unnatural lol like politics) things that all follow the same rule.
Amazing, disconcerting, and now my favorite law too. So wonderful you mention it in connection with COVID-19.
I also saw that episode of connected! Fascinating! The first thing I thought of was “I wonder if anyone had analyzed COVID 19 data with benfords law” so I started googling and here I am!
Literally also here because of tbe show called connected lol…
Haha same here! Very good show.
What!. this is actually a thinkg. If it wasn’t for connected, I would not have known about this.
Yep! I’m here too because of connected! Watching it right now actually, haha!
As an academic and social liberal, I immediately began compiling COVID data to test my theory that the data being released by the white house was less than accurate. I am still working on numerous data subsets, but what I can tell you is that I am currently shocked by the results I have seen to date. The data being release by the Trump administration seems to follow Benford’s Law on a national level. Every large democratic run state that I have looked at so far has NOT. This includes NY, NJ & CA. The 3 republican run states that I analyzed follow Benford’s Law: OH, FL, GA. Obviously this has me confused, curious, and even a bit concerned. Are democratic governors exaggerating this disease? Maybe COVID just doesn’t follow Benford’s Law. At this point I’m unsure but rest assured I am not a Russian bot, Trump supporter OR attempting to troll anyone. I’m only sharing my findings to date.
Hi Kelly – I wonder if this could explain your findings? Seems it would apply since the states you cited (NY, NJ, CA) have dramatically flattened their COVID curves… https://www.sciencedirect.com/science/article/pii/S0378437120305719
Covid numbers only follow Benford’s law when they are increasing / when proper measures are not taken / when those measures are not followed. When the covid numbers stop growing exponentially, or when they start to decline, due to effective measures / people adhering to those rules, the covid numbers no longer follow Benford’s law. You have to check if that’s what you see! … that republican states are less able to fight the disease
Hi Kelly, would you be so kind as to share a bit about your findings. Are the date you’re using the deaths by months, or number of cases by months, etc.? I’m extremely interested in this research. Thanks much! My email is dgumie where it’s never cold. thanks
Simple observation – humans, animals, nature, the earth 🌎, the universe, we are all connected.
Why is that so difficult to comprehend?
It’s easy to comprehend – everybody knows it. Doesn’t have much to do with this topic, however.
Where should I send your Nobel to?
I read that the law works better if the numbers in perview that span several orders of magnitude. Do you mind throwing some light on what numbers in the covid-19 reporting could this test have possibly been applied to?
I guess statistics like daily deaths or daily reported cases would not be very different (in terms of order) than they were a month ago, let’s say. Also, these numbers would tend to follow an increasing (and eventually a decreasing) pattern.
All the way from ZA and the show connected also spiked my interest in Benford’s Law. My thoughts are exactly the same: what if data is being manipulated because we’re all now know about this theory..? Will we ever know the truth. I was trying to analyze China’s data and it doesn’t also seem to ‘fit’ the theory…? Very interesting stuff. I hope the world is as transparent as we need it to be, to keep proving this amazing Theory.
I want to write Benfords Law on my walls. So many astonishing things in the universe but this (also seen on Netflix show-Connected) is number 1 to me. That would be about 30% right. What a treasured find.
Also, connected from the Netflix series. It looks like we’re all on the same page. Thanks for writing this out!
I checked coronavirus deaths by US county data as of 8/6/20 for fit to Benford. I found the data does not fit based on a Goodness of Fit chi-square test.
Amazing! Probably the number of people who write on this post because of that Netflix show will follow Benford’s Law. I’m here because if that show as well. Then I googled covid-Benford’s law🙂
Ha!
Ha! I am here because of the series as well. Looks like we are all “Connected” now.
you see what I did there….hee he hee.
Looking at the data set a little differently, I plotted and rationalized the following columns from worldometers.info for US states
total cases(1) deaths(2) active cases(3); then combined deaths + active cases(4)
Plotting (1) vs (4) I find Texas far out of line, especially over time.
Is Texas withholding data? I have seen Texas Health Scientists report that deaths are not faithfully recorded in intervews.
COVID-19, flattening the curve, and Benford’s law.
Please see the link below:
https://www.researchgate.net/publication/343657736_COVID-19_flattening_the_curve_and_Benford's_law
So everyone here IS here because they saw the Digits episode and wondered how it connected to COVID. So what pattern do we all share to conform to it?
What are you people talking about… Benfords law totally works . Now the real question is overlapping test results, but yes still works
David,
I saw Connected too. I’m also responsible for tracking our COVID cases at my hospital.
I though I would find it in Medical Record Numbers…but no. Those numbers are computationally assigned. In fact, I was surprised to learn that the pattern didn’t show any time that I included any numbers at all that were auto-assigned by a computer.
The only numbers that worked, that followed the pattern organically, were the “seemingly random” or even “insignificant” data. Like birth month, day of birth, but then…the street number of each patient address.
The pattern was most reliable applied to the “randomness” of the street addresses. When I looked at each of our patients, I found it in the data sets of…
All COVID test records (even when I include duplicated tests); PCR-only COVID test records; In COVID-positive cases; In unique patients tested; then when pulling all patients who walked in to the hospital during that same time frame.
My graphing also has a dip like yours in “7”… but also in “9”. Nevertheless, this is more than enough for me to start a research project. If we can use this for verification, how can we apply Benford’s law to prevention methods and as a starting point for the health task forces that have formed in all of our cities…
*my brain is like a million miles an hour with this*
Randomly assigned digits do not follow Benford’s Law, so it looks like you’re analyzing it correctly!
Bedford’s law reminds me of another phenomena in nature called the Fibonacci sequence where petals on flowers only have certain numbers for various plants. The number pattern is the sum of the previous two numbers. 3,5,8,13,21 etc. It is also found in other patterns in nature. Kind of strange how so much that seems totally random is not.