U.S. Leading Causes of Death Coding Project

By: Thomas O'Brien and William D'Amore

Project ID Number: 25

Project Topic: U.S. Leading Causes of Death

We wanted to choose this topic because provides it valuable insights into public health trends, aiding in understanding and addressing key health issues that impact the population's well-being and informing evidence-based interventions. This dataset presents the age-adjusted death rates for the 10 leading causes of death in the United States, beginning in 1999 to 2017. The data is based on information from all resident deaths. This data also analyzes the information at the state level. This is interesting since you can learn the trends of diseases and causes of death and compare state by state and across the U.S. over time.

Exploration Questions

Here are the main exploration questions that guided our thought process and our visualization creation:
Question 1: How many deaths were there in each state from 1999-2017?
Question 2: What was the leading cause of death in each state in 2017?
Question 3: What was the leading cause of death in the USA as a whole in 2017 (most recent data)?
Question 4: How many people died from “accidents” from 1999-2017?
Question 5: What trends exist for cancer deaths from 1999-2017?
Question 6: Are certain states more at risk for certain death causes?
Question 7: Is there a certain cause of death in the U.S. that is rising rapidly and needs to be controlled?
Question 8: Which states have the least number of deaths relative to population in the state?
Question 9: Which state has the highest/lowest number of Alzheimer’s Disease?
Question 10: Do states that are more highly populated have a greater percentage of deaths from Influenza?
Question 11: What is the average number of people killed by heart disease in New York from 2010 to 2017?
Question 12: Which cause of death has the highest age-adjusted death rate in Nevada?
Question 13: When factoring in population, which states have the highest and lowest death rates per capita?
Question 14: What are the underlying factors that cause certain states to have higher death rates per capita, such as age demographics, healthcare spending, and inactivity presences?
Question 15: Is there a correlation between these underlying factors and the death rates across states?

Who Would Use Project & Why?

The audience for our collection of data visualizations could include a wide range of stakeholders, such as:
Public Health Officials: can use these visualizations to monitor and address health disparities, allocate resources effectively, and make data-driven decisions to improve public health.
Researchers: can gain insights into trends, correlations, and patterns related to causes of death and their geographic and political associations.
Healthcare Providers: can use the data to understand the most prevalent health issues in their regions and tailor their services accordingly.
Legislators: can use the visualizations to inform healthcare policies and prioritize funding for specific health interventions.
General Public: can benefit from increased awareness of health disparities and trends, enabling them to make more informed decisions about their health.

Data Sources

Our main data source is a CSV file published by the United States Department of Health. The data source contains columns for Year, 113 Cause Name, Cause Name, State, Deaths, and Age-adjusted Death Rate. The data source has 10,689 rows of data, so it is a very sufficient source of information. We picked this data since it was very detailed and over a long period of time with recent data as well. We also had to bring in data sources for United States population in order to complement our data. By bringing in data sources for the United States population, we could compare across states of all different populations evenly. The source was entered as a CSV file, where we trimmed the data down into a DataFrame that includes “State” and “Population”. This dataset helped us visualize many of the major causes of death per capita, improving the scope and understanding of what the numbers mean. The per capita visualizations help compare states despite their vast differences in population, contributing greatly to the overall story we want to tell. We also included a small dataset regarding inactivity percentages for each state to see if there was a correlation between exercise and better health.

Data Journey

Our collaborative data journey commenced with the acquisition of a mortality dataset, which we accessed from an online source using the CDC's data portal. Working together, we leveraged the pandas library in Python to handle and analyze the data effectively. By defining the URL to the CSV file and using the pd.read_csv() function, we loaded the dataset into a Pandas DataFrame, named df, for further exploration. In an effort to streamline our analysis, we collaboratively selected specific columns of interest from the DataFrame. The chosen columns, including "Year," "Cause Name," "State," "Deaths," and "Age-adjusted Death Rate," were stored in a new DataFrame called death_df. This refined dataset provided us with a focused view, allowing us to investigate mortality patterns more effectively. Our exploration involved displaying the first few rows of the DataFrame using print(death_df.head()). This helped us gain an initial understanding of the structure and content of the dataset. Further, we jointly displayed the entire DataFrame using print(death_df), enabling a comprehensive overview of the mortality data. In summary, our collaborative efforts involved accessing, loading, and refining a mortality dataset using Python and Pandas. This prepared us for subsequent steps in our data journey, such as exploration, analysis, and visualization of mortality patterns. We also incorporated data from the US Census Bureau that displayed population data for each state from 2017. We created a dataframe with this data using “State” and “Population” as the column titles. States appeared in alphabetical order when printing the dataframe, “df”. This data was merged with the original data from the CDC and became “merged_df”.

Data Caveats

Overall, we did not have many caveats with our data. Our data source was very detailed but thankfully did not have any missing or inaccurate cells. One specific issue we had with our data was that we needed to create a new dictionary for state abbreviations in order for our choropleth visualizations to show colors. Additional caveats included the need to rename columns in the population data in order to merge it with our original data. Initially, we encountered many problems with the combination of the two data sets as we tried to create visualizations that incorporated both. In the inactivity dataset, we needed to manually input data for certain states that had blank cells.

Learning and Conclusions

The most challenging part of this project was working with the data and manipulating it to correctly fit within the bounds of the project. It was difficult to merge different data sources and then use them in visualizations to properly describe trends. The most rewarding part of this project was building the website and embedding the visualizations. It was very exciting to see all the hard work over the past three weeks come together and finally look like a finished product.

We first began by creating different visualizations for causes of death for each state. One obvious conclusion was that the largest states in the USA greatly out-paced smaller states. Using this data, we were able to see the leading causes of death per state, which was either heart disease or cancer. Using this data, we could also compare the age-adjusted death rates for each state to the United States across every leading cause of death. Our original data set also allowed us to see trends over time or for a specific year with all causes of death. After much analysis with this data, we then introduced population data to supplement our original data. This was because in our original data set, the states with the largest population simply had the highest number of deaths, so it did not provide much insight into underlying factors. From the combination of the two data sets, we were able to look into per capita metrics that allowed for a more even analysis across states. We learned that West Virginia has very high deaths per capita for certain causes, such as “unintentional injuries”. Additionally, we learned that Utah has very low deaths per capita due to the fact that their state population is very young with respect to age. We added data regarding health services spending in each state, but that did not seem to have any correlation to death rate per capita. We included physical activity data to add additional insights, and they both seemed to follow the trends associated with per capita deaths for Chronic Lower Respiratory Disease (CLRD) and heart disease. We also were able to do insight into deaths per capita across every leading cause of death to find unique insights there. For instance, we discovered how Hawaii has the highest death rate per capita out of all states for influenza/pneumonia. This is due to their high volume of tourism and tropical climates. Overall, our insights were very valuable to learn and can be used by many people who are eager to learn or promote social good.

There is also a lot of social good that can come out of our project. There can be improved public health as our project can contribute to the development of more effective public health strategies and interventions. Our project can also help with resource allocation as policymakers can allocate resources more efficiently to address specific health issues in regions with higher mortality rates. Our project can also lead to informed decision-making and preventative healthcare strategy. We hope you enjoy exploring our page!

Demo Video

Code Walkthrough Video