Cancer Incidence Analysis
Personal Project
Project Type: Data Analytics and Dashboard
Date: October 2023
Location: Dallas, TX
Skills & Tools: Python | Pandas | Excel | Data Cleaning | Data Processing | Tableau
Achievements of Project
Data Integration and Cleaning: Utilized the cancer_reg.csv dataset from Kaggle, which includes demographic, socioeconomic, and health-related metrics. Performed comprehensive Data Cleaning and Preprocessing using Python on over 3000 rows of data with libraries such as Pandas and NumPy.
Visualization Development: Created advanced visualizations including heatmaps, scatter plots, and bar charts using Tableau, which helped in depicting complex relationships and trends in the data effectively.
Data Exploration: Conducted initial data checks for completeness and consistency, removed duplicates, imputed missing values, and standardized column names to enhance the dataset’s usability.
Statistical Analysis: Engaged in exploratory data analysis to understand distributions and relationships within the data. Calculated incidence rates standardized per 100,000 people and assessed correlations using Pearson correlation coefficients.
Insightful Observations: Identified a significant positive correlation between median income and cancer incidence rates, highlighting potential disparities in cancer diagnosis linked to economic factors and access to healthcare.
Key Outcomes:
This project provided valuable insights into the relationship between socioeconomic status, healthcare access, and cancer incidence, uncovering patterns that suggest higher diagnosis rates in wealthier areas.
The analysis not only served as a crucial resource for understanding cancer epidemiology but also aided in guiding public health decisions and resource allocation to address healthcare inequalities. The visualizations and detailed analytical processes documented in the repository offer a comprehensive framework for further research and policy development.
Data Cleaning
Data Integration and Cleaning: Utilized the cancer_reg.csv dataset from Kaggle, which includes demographic, socioeconomic, and health-related metrics. Performed comprehensive Data Cleaning and Preprocessing using Python on over 3000 rows of data with libraries such as Pandas and NumPy.
Imputed missing values with mean or median values based on the distribution.
Visualization Development: Created advanced visualizations including heatmaps, scatter plots, and bar charts using Tableau, which helped in depicting complex relationships and trends in the data effectively.
Data Exploration: Conducted initial data checks for completeness and consistency, removed duplicates, imputed missing values, and standardized column names to enhance the dataset’s usability.
Statistical Analysis: Engaged in exploratory data analysis to understand distributions and relationships within the data. Calculated incidence rates standardized per 100,000 people and assessed correlations using Pearson correlation coefficients.
Insightful Observations: Identified a significant positive correlation between median income and cancer incidence rates, highlighting potential disparities in cancer diagnosis linked to economic factors and access to healthcare.
Key Outcomes:
This project provided valuable insights into the relationship between socioeconomic status, healthcare access, and cancer incidence, uncovering patterns that suggest higher diagnosis rates in wealthier areas.
The analysis not only served as a crucial resource for understanding cancer epidemiology but also aided in guiding public health decisions and resource allocation to address healthcare inequalities. The visualizations and detailed analytical processes documented in the repository offer a comprehensive framework for further research and policy development.