EDA on Cancer Epidemiology
Gather public health data from multiple sources and derive some insights on Cancer Epidemiology. You can start by finding correlations with well known aspects;
For e.g., smoking, unhealthy lifestyle etc., and then try to find any other interesting findings that may stand out after analyzing any other unrelated aspects. For e.g., is there any correlation between
weather of a place to cancer incidents? Fast food joints and cancer incidents etc..
You have to collect data from multiple sources ( e.g., fast food sales may be from one source and cancer in a region from another source).
Once collected you have to glean the information you need for analysis. Please refer to the list below for some possible sources. Although you are not required to use one of them,
but any data you select should contain data related to USA. Once you decide on your data please enter the details on this page.
Analysis: You will analyze a combination of at least 3 data sets on different topics to answer some of the typical questions like:
Cancer and its correlation to Tobacco usage.
Types of cancer and their trends across states and regions
Cancer and any correlation to any other aspects you find
In addition to some hints given above, you could also think of correlations with; income, ethnicity of the population, geography, education levels, weather, exercise or lifestyle factors etc..
Although above topics are given as guidance you can choose any other Cancer related topic of your choice.
You should show at least 3 steps of cleaning your dataset by explaining why it needs to be cleaned. You will need to show at least 5 different types of correlations,
frequencies and/or relationships between various independent and dependent variables with meaningful plots. Explain your assumptions and your findings before and after every plot.
Every plot should have a title and x/y axis should be labelled legibly without any label overlaps.
Methodology: You can use a combination of Jupyter notebook and/or Spark or any other platform of your choice like Hive, Pig etc..
different data cleaning and correlation strategy will be used . Good in ML and AI Concept Using python. Can do this stuff quickly and smartly with EDA and insights.
26 freelancers estão ofertando em média $158 para esse trabalho
Hi there I am good at python and padas and I am sur eI can build your scrpt with pandas please contact me and tell me abut the data source thanks I hope to hear from you