We are all about marketing, data, analysis, innovation and technology

Tuesday, March 24, 2020

Understanding and Making Sense of the Coronavirus Pandemic Data in the US, Italy, and Worldwide.

It's nearly impossible to avoid the many articles, social media posts, graphs and newscasts as of late that portray the United States as being on the same course as Italy regarding the COVID 19 pandemic.  In reviewing these articles, I thought, “Wow, are we really that unprepared?  Did we not buckle down early enough?  After all, aren’t we the most successful and rich country in the world?  Don’t we have the best medical care?  How could this be?”  

Then I decided to take a look at the evidence and the data myself in order to make my own decision.  After all, I am a statistician and experienced data analytic professional. So let's begin.

Did you know that the Corornavirus data is available to anyone?

First of all, in case you did not realize, anyone can download all the Coronavirus data from the European Center for Disease Prevention and Control in an excel spreadsheet.  It is a time series data file beginning with the first occurrence in China in late December.  This data file gives you the daily number of COVID occurrences and the number of deaths by country for every country in the world.   The downloadable data used in this analysis was additionally augmented with data from other sources including each country’s population and their land mass (more to come on this).

Let’s answer some key questions, shall we?

What follows is a discussion of a few key questions that can be answered with the data.  It is important to remember that each country is unique to itself, so things like the availability of tests, regional practices of social distancing, usage of medicines to mitigate symptoms all impact the data, thus complicate making comparisons and drawing definitive conclusions.

Is the United States really on the same path as Italy?

One of the graphs circulating out there that concerned me the most is one that compares the United States to Italy.  This graph gives the impression that citizens in the United States are in for the same fate as the citizens of Italy in terms of the number of positive cases.  See below for the graph that has many worried, especially out on social media. This graph represents a replication of that chart using the same data.

The problem with this bar graph is that it is not scaled appropriately based on the size of each country. To compare Italy cases to US cases without making it proportional to our population differences is very misleading.  In fact, the US population is more than five times the size of Italy as seen in the population chart below.  The US population is over 330 million while Italy is just over 60 million.

Once we adjust this chart for differences in population sizes, the graph paints a totally different picture.

Another chart being disseminated online, including social media, is the one seen below showing our cumulative case rate in comparison to other country’s including Italy as well.  As one looks at this graph it appears that we are on a doomed course compared to all other nations.  Nothing could be further from the truth as you will soon see.

Similar to the bar chart comparing the United States to Italy, this data is not represented on a scale relative to the population of the country. 

The same data is shown below represented on cases per 100,000 population.

When scaled appropriately, the United States compares very favorably with other countries.  Note the data is current as of March 21, 2020.

Both of these charts must be put in perspective.  In general, you would expect countries with larger population to have more cases, all other things being equal.  But to present a narrative that we are on the same path as Italy, is irresponsible.  All that was required was an adjustment of the figures to represent cases on a per 100,000 population basis.  I find it very alarming that some media are presenting the data in such an irresponsible manner.

Data Limitations—number of confirmed COVID 19 cases

Another major concern I have with respect to the data being shown is the issue of accurately trying to show and predict positive COVID 19 cases and make comparisons between countries based on case data.  The key issues in terms of making accurate "case" comparisons across countries are:
  • availability of testing kits for running tests  
  • access by every citizen to get to a testing facility
Because the above two factors can vary across countries, the number of cases could be understated until the testing kits or access to tests “catch up” with unconfirmed cases.  In the United States for example, testing kits have been in short supply at the start and there are many areas (as with any country) where individuals of lesser means may not have easy access to transportation to get to a testing facility or the monetary means to be tested. 

As such, I have shifted to use the metric related to COVID 19 mortalities or deaths and not cases.  The charts below focus on this metric, with the baseline being at the first death for each country. 

So what do the mortality curves look like?

So, what does the incremental and cumulative death figures on a per 100,000 population basis look like 21 days out since the first occurrence for the US?  The mortality charts below show our data.

Incremental US mortality rates:

Cumulative US mortality rates:

On their own, these charts are not overly meaningful.  So, what do the mortality curves look like for the other countries and in comparison to the US?

As the charts below reveal, at this time, we are in a favorable position relative to other countries.  These line charts compare the US with Canada, Netherlands, Japan, France, South Korea, Italy and Iran all relative to each countries population.  But, please keep in mind, anything can change in a moment’s notice.  Nothing is constant here.

Incremental mortality rates:

Cumulative US mortality rates:

What about China's Mortality Curves?

Unlike the US and the other nations, China has run its course, and is much later in the life-cycle of the virus.  The charts below show its peak at about day 34 based on the incremental chart and where the flattening begins on the cumulative chart,

Incremental China mortality rates:

Cumulative China mortality rates:

The life-cycle of the virus

It would take just one super spreader or a major breach in hygiene to totally change our trajectory.  That is why the tight controls are in place at the moment in the US.  And, why the President's team is reluctant to make predictions.  We are still just too early in the cycle.  Anything is possible.

So the question begs, how do we compare to the China virus graphs?  Do we have another two months, two weeks, or two days to go?  Where are we in the life-cycle of this thing? 

To answer this question and assess where the United States and other countries are relative to China’s life-cycle, I have decided to overlay the “China incremental and cumulative mortality curves” on top of the prior two charts showing the same for the US and other countries.

NOTE:  When examining the charts below, the Y axis is not to scale for China but only the X axis to understand the time element of this virus.

Incremental mortality rates:

Cumulative mortality rates:

As these graphs show, China’s incremental deaths per day peaked in deaths at about 34 days following the first reported death.   Following that point the cumulative curve begins to flatten.  If the data maintains it current trends, it appears that Iran and France are also about to peak. As more data is reported we should know if this holds true.

Italy’s virus life-cycle has not matured fully nor has Spain’s.  Which is alarming given the steepness of their curve.

Differences by country—how has public policy impacted the depth and length of virus impact?

How did China get a strong hold on the Virus so quickly?  Why is Italy’s trajectory so steep?  What did Japan and S. Korea do to keep their mortality curve relatively flat? 

Below are just a few of many facts that point to these differences: 

  • First of all, we must remember that China is a totalitarian government.  And, as such, they quickly imposed very strict enforcement on their citizens by tracking their every movement via close monitoring of their every step and purchases.  To fully understand the extent to which the government monitors their citizens now and prior the virus, I advise you read the article by the AmericanAssociation for the Advancement of Science.  Regardless, this tightening of control certainly assisted in quickly getting this virus under control in China.  And, flattening out the mortality curve quickly.
  • S. Korea was quick to move based on their experience with the MERS virus several years back.  This made them ready to scale quickly as also reported by the AmericanAssociation for the Advancement of ScienceThey even send text message reminders to those that are "positive" regarding hygiene. 
  • Italy and some of the other European nations have been criticized as being slow to respond.  For more information on this see one can read the article by CNBC.

Does our mortality rate to date look favorable compared to other countries?
At this point in time the worldwide death rate of confirmed cases is at 4.4%.  This means of all confirmed cases, 4.4% result in death.  However, we know this number is overstating the rate since not all cases are being reported.  Why is that?
  • Many people do not have symptoms severe enough to cause them to go to the doctor to be tested;  
  • Some lack the means to be tested; and,
  • Some just do not like doctors. 
So, what is the real number?  2%?  3%?  We will never truly know. But we do know it is less than 4.4%.

For America, the death rate at day 21 (since our first case) is at 1.27%.  This is about 70% less than the national average and among the lowest of all nations as seen below at the same point in time. 

But unfortunately, we will not end up this low.  We will end up higher than this when all is said and done. 

How do we know this?  

We know this based on data from other countries.  As time progresses, the rate only increases.  China, for example, had a death rate of 2.19% at day 21.  At day 69 (the end of their cycle) their final death rate was 4.00%.  This is an increase of 83% from 21 to 69 days.  So, using this figure to index up our rate we can project our death rate will go from 1.27% at day 21 to 2.32% (1.27% X 1.83) at day 69.  Again, this is assuming it makes sense to use their data to make US projections.  But, what else do we have to use?

And keep in mind, with this number, we could extrapolate the number of beds and respirators we might need going forward.  A figure we definitely need to quickly get a handle on future demands of the health care system.

Are there other factors impacting our ability to make predictions?

As mentioned before, It is important to remember that each country is unique to itself, so things like the availability of tests, regional practices of social distancing, usage of medicines to mitigate symptoms all impact the data, thus complicate making comparisons and drawing definitive conclusions.

Population density matters

Another major variable that affects the spread of the virus within any given country is the population density of that country or city.  The more dense the population, the more rapidly a virus can spread if tight controls are not imposed.   The table below shows the differences in the land mass for various countries relative to their population size.  Given this, one needs to applaud S. Korea and Japan for maintaining such a low occurrence and death rate. 

And, in case you did not realize, the population density in New York City is 67,000 people per square mile.  So now you understand all the concern by NY Governor Andrew Cuomo. 

What about other factors?

The demographics and overall health of a population will also likely play a role in how quickly a virus can and will spread and result in different morality curves.  Below is a table showing the smoker penetration, median age and overall health score for the various countries.  As we can see there are vast differences in these data by country.  How that impacts each countries mortality curve is hard to say at this time.  

How does this virus compare to deaths caused by pneumonia and the flu?

To keep things in perspective it is important to remember that almost 60,000 Americans die every year due to the flu and pneumonia combined. That is an astonishing number. The Coronavirus, worst case, will most likely take the lives of around 6,000 Americans (assuming no changes in trends from what we are observing today). 


In summary, I think we can all agree that America is doing a good job at keeping this pandemic under control.  All the measures put in place appear to be working.  Within another week, as more data becomes available, we should be able to determine our fate.  But so far we are looking good.

So, let's keep doing what we are doing.  We are almost there. We have almost made it.  Let's keep maintaining our social distance, limiting our outside activities, washing our hands,  and stay safe and healthy.

When will the next update be?

We plan to update this report on Monday March 30th.  And, at that point in time, we should have a good sense of where we are headed and what our true needs will be.  And, if any other data has shifted.

To your health,

Perry D. Drake, PhD
     and Rhonda Knehans-Drake



No comments:

Post a Comment