We are all about marketing, data, analysis, innovation and technology

Tuesday, March 31, 2020

Understanding and Making Sense of the Coronavirus Pandemic Data in the US, Italy, and Worldwide. (3/31/20 update)

As a follow up to the analysis issued on March 24, which compared the United States to other countries, and reviewed the data on the basis of cases and deaths on a per 100,000 basis, the analysis that follows is an update of last week’s charts as well as an exploration of new questions raised in the past week.

The United States is approximately 30 days into the pandemic, and one can scarcely escape news and cautions about the coronavirus. As a nation is admonished to practice social distancing, wash their hands, and not touch their face, it is still possible to find individuals who do not heed these warnings. The entire country is learning about data, exponential growth of cases, and wondering if the shut down of businesses, home schooling and work from home initiatives are effective and how long they will last.

Meanwhile, some wonder if all of these efforts are worth it. Some wonder, how the coronavirus compares to the flu or other pandemics such as H1N1, and why the measures taken now have not been taken in the past for other flus, or outbreaks.

In the midst of all of these questions, while useful to aid understanding, it is critical for each of us to recognize, that the data sanitizes the reality that each death represents a person. Each person has a connection to a family, and each family grieves the loss of a loved one (maybe multiple loved ones) who fell victim of the virus. The data is merely a useful tool to address the problem. It can hopefully aid in understanding to bring about solutions, or change public behavior to save lives.

Updated curves—what do they show us?

Most people, those compliant citizens, or those considered essential workers, are keenly interested in knowing where we are on the curve. The question is, what is it we are looking for on the curves? The answer is simple. We are looking for a sustained slow down of the number of deaths (mortality rate). You may have discerned that this is a huge challenge for a country like the United States, since our country is a republic, with states each given the right to designate rules within their borders. We have seen Rhode Island, requesting incoming visitors from New York to quarantine, in addition, in Florida, visitors from New York are requested to self quarantine. This ultimately resulted in the Center for Disease control issuing a travel advisory for individuals from New York.

In the past week, New York City has become recognized as the epicenter for the coronavirus, the question is, where is New York on the curve, as the “canary in the coal mine” for the entire country?

What is happening at a national level?

The updated curves from our initial post last week for the United States versus selected other countries are shown below. As you will see on a national level, the curves have not changed much since last week. Following the updated linear curves, scaled to population, you will see logarithmically scaled curves.

The chart below plots the same curves on a logarithmic scale.

A logarithmic curve will show more clearly where the death rate begins to slow. Note here that the curve for South Korea stops climbing so steeply at day 15. Other countries such as Italy and Iran appear to also be slowing. However that is not the case for the United States, France or Spain.

What can we see at a more granular level within our country?

The chart below shows the difference in how the states of New York and Washington have been impacted versus the United States as a whole. New York City, and Seattle, were the first cities to have cases of coronavirus. The very high population density of New York City has fueled the spread of the virus. The situation in Washington, was one of the virus taking hold in a nursing home and being spread by workers from one nursing home to another.

Is the curve flattening? Is the mortality rate slowing down?

In the initial post of this series we discussed the use of a logarithmic scale. A logarithmic scale is one that scales on multiples of 10, with each increment 10 times higher than the one before. Such scales are helpful in some respects for epidemiologic functions since each individual infected can pass along the infection to a multiple of others. Thus, the function is not linear, it is exponential. That being said, our previous charts were normalized on a per 100,000 basis and presented linearly. This was done because we felt that the logarithmic scale was overstating what was happening relative to the infection rate. The New York Times did a good job of explaining this relationship. As previously mentioned, a logarithmic relationship will emphasize the point in which the growth rate slows by showing a flattening of the curve (more horizontal) rather than the steep (nearly vertical) slope of the line.

Much discussion online focused on where the inflection point was happening which would show us where we were turning the corner of the cases decreasing. In viewing the data in this way, it appears that New York is nearly at the point of the curve bending horizontally which would indicate the death rate is decreasing incrementally. Washington State, seems to already be past the point of the curve bending horizontally.

The United States is still seeing a climbing death rate as hot spots continue to emerge around the country.

What is it that the United States is trying to achieve? How will we know when we are there?

The United States is striving to see a decrease in the incremental daily mortality rate. The incremental mortality rate is the number of daily deaths, and to see for example fewer deaths occurring today than yesterday. In addition, the United States Coronavirus taskforce team would like to see this relationship (fewer deaths than the previous day) occurring several days in succession so we can be sure it is a legitimate trend and not just a random blip in the data.

The United States took note of the handling of the virus by South Korea. South Korea, really ramped up testing and became informed about patterns, who were the infectors, and how to handle the localized containment of the virus. Of course, even though there are positive lessons to learn about the handling of the virus by South Korea, it is important to remember that South Korea is a country with a smaller landmass, and smaller population.

Below is a side by side comparison of the United States mortality curve compared to South Korea both in linear and logarithmic scales based on cases per 100,000 population.

Linear Scale:

Logarithmic Scale:

Note that the curve for South Korea is flattening, but the United States curve is still increasing.

What are the challenges for the United States to drive the mortality rates down? Can we hope to achieve what South Korea has achieved?

The United States recognized that more testing was a key factor to get a handle on the infections, and to be able to predict hot spots. New York has emerged as an epicenter, but other hot spots are also emerging, New Orleans, Florida, and Maryland to name a few. Why is this happening? Put bluntly, in the United States, the compliance with stay at home directives and social distancing are not being met with a high enough degree of compliance. 

Spring Break partiers in Florida were clogging the beaches and many of the young revelers left South Florida only to return to other parts of the country to spread the virus.
In New Orleans, Mardi Gras went on this year as it does every year. In retrospect, many think perhaps, it should have been cancelled. 

In Maryland, it appears to be a timeline similar to many places across the United States, but given the population density, slow walking public policy geared to closing down restaurants, and bars, and limiting the size of gatherings simply did not take place quickly enough, and now they are emerging as a hot spot. Maryland’s first cases were reported on February 28, but orders to restrict gathering sizes were not made until March 12. Also, the virus has hit one particular nursing home in Maryland very hard.

In contrast, South Korea has through electronic surveillance by the government, employed monitoring strategies to predict the movement of the virus, and compliance of quarantine rules and enforcement when quarantine rules are violated. Enforcement comes by way of a strict quarantine period of 14 days for those found to have the virus, as well as those re-entering the country. There is a zero-tolerance policy for infractions, those who do not comply could face deportation, if they are a foreign national, or if they are a citizen of South Korea, they could face arrest and financial penalties. It should be noted in the sited article, that there were 11 known people who violated the quarantine in South Korea between March 13 and March 24.

The last sentence was emphasized with good reason. Eleven people violating the rules seems like a very modest number and in our reality it may seem sort of silly regarding why such enforcement (arrest and penalties) would be enforced. After all, if it was only eleven people, what is the harm? Let’s look at the data.

If each of the 11 original violators interacted with only two people, and then each of the subsequent infected people, who would not show symptoms initially, interacted with only two people (second generation) and this pattern carried out for seven generations, ultimately 1,397 people would be infected with the virus. See the snapshot of our excel spreadsheet below.

The assumptions of this scenario are very conservative, and based on each individual who has been in contact only is in contact with two others. Think about other scenarios, family get togethers, weddings, concerts, church services, public places where the contact rates would be much higher. This is where countries who are willing to do things like surveille their citizens to manage this crisis, have a kind of advantage because they can get much closer to 100% compliance than the United States can. It is debatable whether our culture in the United States would tolerate electronic monitoring and data gathering of its citizenry, even for a purpose as noble as the defeat of a pandemic. Or would we?

Are we seeing mixed messages from various leaders in the United States coronavirus task force?

Maybe we are getting mixed messages. In confusing and frightening times, it is human nature to hear the message that is most pleasing or resonates the most with you. We have some leaders (such as Dr. Fauci and Dr. Birx) who are very cautious and have predicted 200,000 deaths from the virus. We have others (primarily the president and other politicians) giving us a much more optimistic forecast with respect to the future of the country and when we can get back to “normal”. What is going on here? Who is right? Where will we end up?

In assessing the current set of solutions and resources at the disposal of the United States, we have a fixed and known numbers of ventilators and ICU beds. We also have a fixed and known numbers of doctors, personally protective equipment (PPE), emergency medical staff, etc. Epidemiologists have studied the virus, they have reviewed data from all countries who have faced this foe before and they understand the nature of the spread, as well as the nature of the affliction to infected patients. In the United States we have an assembled task force, and an entire agency, the CDC ready to make predictions, and many are asking who do I believe? Do I believe there will be up to 200,000 deaths associated with this virus in our country? Is that a reasonable expectation? How did the scientists arrive at this number?

While we are not inside the minds of Doctors Fauci and Birx, to know precisely how they arrived at their recently publicized estimate of fatalities, we can assume that as scientists, these numbers were based on facts and epidemiological models. But as a data scientist, it is clear to me that every model will have limitations of variables that it cannot take into account, so no model is perfect. To provide a worst-case scenario, we took the fatality rate seen by Italy (the country who has lost the most souls) and applied it to the US population. Italy has a population of approximately 60.5 million, the United States has a population of about 331 million. This makes the United States population about 5.5 times larger than Italy. Our worst-case expectation is that approximately 55,000 citizens would succumb to the virus and not 200,000.

In other words,

  • Italy has a population of 60.46 million. As of March 29, there were 10,023 deaths for a rate of .00016577.
  • Applying this rate to the population of the United States of 331,000,000 we would realize an ultimate number of deaths of 54,871. While this number is wholly unacceptable and shockingly large, it is still only about 25% of what Dr. Birx and Dr. Fauci have been projecting in the last couple of days.
To counterpoint the view of the scientists who are dealing with the facts as we know them now, the fixed resources that are counted and in place, we have a president with aspirations. The president is being optimistic and looking ahead when some of the fixed resources have been supplemented. Perhaps our president is overly optimistic. He is looking at the problem from many angles, with an eye of breaking the problem down with more resources. The president hopes to bring down the deaths by:
  • Producing more ventilators so that lack of resources in this regard does not contribute to deaths,
  • Reduction of regulations to speed vaccines and other medicines thus curing more people who might die were these medicines not available
  • Imposing travel advisories and talking up the virtues of the citizens in the United States to appeal to our sense of right and wrong to stay home, thus limiting the spread of the disease by the non-compliant citizens.
We have no idea what kind of reduction of deaths we would see from increased production of ventilators, or the increased production of medicines. These are precisely the kind of variables not included in current models. The one thing the model can account for, however is how compliance would impact the spread of the disease. This is a purely mathematical relationship.

The central issue is, we need to, as a country see that EVERYONE comply with the stay at home order, or it just doesn’t work. Some people just don’t want to sacrifice a little to achieve a goal. This is the crux of the problem, and as a free country, we have difficulty in taking away another citizen’s freedoms.

When we hear the words from our leadership, we all have a role to play, it is true. For many the role is to simply stay home. Don’t connect with others no matter how inconsequential you think it is. Call on the phone or skype or Facetime. That is OK. But person to person contact is NOT OK.

Our next update will be on April 7th.

Perry D. Drake, PhD 

    and Rhonda Knehans-Drake

Tuesday, March 24, 2020

Understanding and Making Sense of the Coronavirus Pandemic Data in the US, Italy, and Worldwide.

It's nearly impossible to avoid the many articles, social media posts, graphs and newscasts as of late that portray the United States as being on the same course as Italy regarding the COVID 19 pandemic.  In reviewing these articles, I thought, “Wow, are we really that unprepared?  Did we not buckle down early enough?  After all, aren’t we the most successful and rich country in the world?  Don’t we have the best medical care?  How could this be?”  

Then I decided to take a look at the evidence and the data myself in order to make my own decision.  After all, I am a statistician and experienced data analytic professional. So let's begin.

Did you know that the Corornavirus data is available to anyone?

First of all, in case you did not realize, anyone can download all the Coronavirus data from the European Center for Disease Prevention and Control in an excel spreadsheet.  It is a time series data file beginning with the first occurrence in China in late December.  This data file gives you the daily number of COVID occurrences and the number of deaths by country for every country in the world.   The downloadable data used in this analysis was additionally augmented with data from other sources including each country’s population and their land mass (more to come on this).

Let’s answer some key questions, shall we?

What follows is a discussion of a few key questions that can be answered with the data.  It is important to remember that each country is unique to itself, so things like the availability of tests, regional practices of social distancing, usage of medicines to mitigate symptoms all impact the data, thus complicate making comparisons and drawing definitive conclusions.

Is the United States really on the same path as Italy?

One of the graphs circulating out there that concerned me the most is one that compares the United States to Italy.  This graph gives the impression that citizens in the United States are in for the same fate as the citizens of Italy in terms of the number of positive cases.  See below for the graph that has many worried, especially out on social media. This graph represents a replication of that chart using the same data.

The problem with this bar graph is that it is not scaled appropriately based on the size of each country. To compare Italy cases to US cases without making it proportional to our population differences is very misleading.  In fact, the US population is more than five times the size of Italy as seen in the population chart below.  The US population is over 330 million while Italy is just over 60 million.

Once we adjust this chart for differences in population sizes, the graph paints a totally different picture.

Another chart being disseminated online, including social media, is the one seen below showing our cumulative case rate in comparison to other country’s including Italy as well.  As one looks at this graph it appears that we are on a doomed course compared to all other nations.  Nothing could be further from the truth as you will soon see.

Similar to the bar chart comparing the United States to Italy, this data is not represented on a scale relative to the population of the country. 

The same data is shown below represented on cases per 100,000 population.

When scaled appropriately, the United States compares very favorably with other countries.  Note the data is current as of March 21, 2020.

Both of these charts must be put in perspective.  In general, you would expect countries with larger population to have more cases, all other things being equal.  But to present a narrative that we are on the same path as Italy, is irresponsible.  All that was required was an adjustment of the figures to represent cases on a per 100,000 population basis.  I find it very alarming that some media are presenting the data in such an irresponsible manner.

Data Limitations—number of confirmed COVID 19 cases

Another major concern I have with respect to the data being shown is the issue of accurately trying to show and predict positive COVID 19 cases and make comparisons between countries based on case data.  The key issues in terms of making accurate "case" comparisons across countries are:
  • availability of testing kits for running tests  
  • access by every citizen to get to a testing facility
Because the above two factors can vary across countries, the number of cases could be understated until the testing kits or access to tests “catch up” with unconfirmed cases.  In the United States for example, testing kits have been in short supply at the start and there are many areas (as with any country) where individuals of lesser means may not have easy access to transportation to get to a testing facility or the monetary means to be tested. 

As such, I have shifted to use the metric related to COVID 19 mortalities or deaths and not cases.  The charts below focus on this metric, with the baseline being at the first death for each country. 

So what do the mortality curves look like?

So, what does the incremental and cumulative death figures on a per 100,000 population basis look like 21 days out since the first occurrence for the US?  The mortality charts below show our data.

Incremental US mortality rates:

Cumulative US mortality rates:

On their own, these charts are not overly meaningful.  So, what do the mortality curves look like for the other countries and in comparison to the US?

As the charts below reveal, at this time, we are in a favorable position relative to other countries.  These line charts compare the US with Canada, Netherlands, Japan, France, South Korea, Italy and Iran all relative to each countries population.  But, please keep in mind, anything can change in a moment’s notice.  Nothing is constant here.

Incremental mortality rates:

Cumulative US mortality rates:

What about China's Mortality Curves?

Unlike the US and the other nations, China has run its course, and is much later in the life-cycle of the virus.  The charts below show its peak at about day 34 based on the incremental chart and where the flattening begins on the cumulative chart,

Incremental China mortality rates:

Cumulative China mortality rates:

The life-cycle of the virus

It would take just one super spreader or a major breach in hygiene to totally change our trajectory.  That is why the tight controls are in place at the moment in the US.  And, why the President's team is reluctant to make predictions.  We are still just too early in the cycle.  Anything is possible.

So the question begs, how do we compare to the China virus graphs?  Do we have another two months, two weeks, or two days to go?  Where are we in the life-cycle of this thing? 

To answer this question and assess where the United States and other countries are relative to China’s life-cycle, I have decided to overlay the “China incremental and cumulative mortality curves” on top of the prior two charts showing the same for the US and other countries.

NOTE:  When examining the charts below, the Y axis is not to scale for China but only the X axis to understand the time element of this virus.

Incremental mortality rates:

Cumulative mortality rates:

As these graphs show, China’s incremental deaths per day peaked in deaths at about 34 days following the first reported death.   Following that point the cumulative curve begins to flatten.  If the data maintains it current trends, it appears that Iran and France are also about to peak. As more data is reported we should know if this holds true.

Italy’s virus life-cycle has not matured fully nor has Spain’s.  Which is alarming given the steepness of their curve.

Differences by country—how has public policy impacted the depth and length of virus impact?

How did China get a strong hold on the Virus so quickly?  Why is Italy’s trajectory so steep?  What did Japan and S. Korea do to keep their mortality curve relatively flat? 

Below are just a few of many facts that point to these differences: 

  • First of all, we must remember that China is a totalitarian government.  And, as such, they quickly imposed very strict enforcement on their citizens by tracking their every movement via close monitoring of their every step and purchases.  To fully understand the extent to which the government monitors their citizens now and prior the virus, I advise you read the article by the AmericanAssociation for the Advancement of Science.  Regardless, this tightening of control certainly assisted in quickly getting this virus under control in China.  And, flattening out the mortality curve quickly.
  • S. Korea was quick to move based on their experience with the MERS virus several years back.  This made them ready to scale quickly as also reported by the AmericanAssociation for the Advancement of ScienceThey even send text message reminders to those that are "positive" regarding hygiene. 
  • Italy and some of the other European nations have been criticized as being slow to respond.  For more information on this see one can read the article by CNBC.

Does our mortality rate to date look favorable compared to other countries?
At this point in time the worldwide death rate of confirmed cases is at 4.4%.  This means of all confirmed cases, 4.4% result in death.  However, we know this number is overstating the rate since not all cases are being reported.  Why is that?
  • Many people do not have symptoms severe enough to cause them to go to the doctor to be tested;  
  • Some lack the means to be tested; and,
  • Some just do not like doctors. 
So, what is the real number?  2%?  3%?  We will never truly know. But we do know it is less than 4.4%.

For America, the death rate at day 21 (since our first case) is at 1.27%.  This is about 70% less than the national average and among the lowest of all nations as seen below at the same point in time. 

But unfortunately, we will not end up this low.  We will end up higher than this when all is said and done. 

How do we know this?  

We know this based on data from other countries.  As time progresses, the rate only increases.  China, for example, had a death rate of 2.19% at day 21.  At day 69 (the end of their cycle) their final death rate was 4.00%.  This is an increase of 83% from 21 to 69 days.  So, using this figure to index up our rate we can project our death rate will go from 1.27% at day 21 to 2.32% (1.27% X 1.83) at day 69.  Again, this is assuming it makes sense to use their data to make US projections.  But, what else do we have to use?

And keep in mind, with this number, we could extrapolate the number of beds and respirators we might need going forward.  A figure we definitely need to quickly get a handle on future demands of the health care system.

Are there other factors impacting our ability to make predictions?

As mentioned before, It is important to remember that each country is unique to itself, so things like the availability of tests, regional practices of social distancing, usage of medicines to mitigate symptoms all impact the data, thus complicate making comparisons and drawing definitive conclusions.

Population density matters

Another major variable that affects the spread of the virus within any given country is the population density of that country or city.  The more dense the population, the more rapidly a virus can spread if tight controls are not imposed.   The table below shows the differences in the land mass for various countries relative to their population size.  Given this, one needs to applaud S. Korea and Japan for maintaining such a low occurrence and death rate. 

And, in case you did not realize, the population density in New York City is 67,000 people per square mile.  So now you understand all the concern by NY Governor Andrew Cuomo. 

What about other factors?

The demographics and overall health of a population will also likely play a role in how quickly a virus can and will spread and result in different morality curves.  Below is a table showing the smoker penetration, median age and overall health score for the various countries.  As we can see there are vast differences in these data by country.  How that impacts each countries mortality curve is hard to say at this time.  

How does this virus compare to deaths caused by pneumonia and the flu?

To keep things in perspective it is important to remember that almost 60,000 Americans die every year due to the flu and pneumonia combined. That is an astonishing number. The Coronavirus, worst case, will most likely take the lives of around 6,000 Americans (assuming no changes in trends from what we are observing today). 


In summary, I think we can all agree that America is doing a good job at keeping this pandemic under control.  All the measures put in place appear to be working.  Within another week, as more data becomes available, we should be able to determine our fate.  But so far we are looking good.

So, let's keep doing what we are doing.  We are almost there. We have almost made it.  Let's keep maintaining our social distance, limiting our outside activities, washing our hands,  and stay safe and healthy.

When will the next update be?

We plan to update this report on Monday March 30th.  And, at that point in time, we should have a good sense of where we are headed and what our true needs will be.  And, if any other data has shifted.

To your health,

Perry D. Drake, PhD
     and Rhonda Knehans-Drake