Analysing a data set, start to finish

For this assignment we’re going to look at birthweights of babies in California from 2000 to 2013. The data has year, groups of birthweights, and counts of number of babies in that group of birth weights. There’s also information on county, zip code, lat/long, etc. These may or may not be useful, but are interesting grouping factors.

1. Warmup: Load up the data! Use skimr to show that you’ve done so properly, and everything is as it should be.

2. What is the number of children in each birthweight category in Sacramento, CA across the entire dataset?

5. Extra credit (variable, depending on awesomeness of data viz)

Note that there is latitude and longitude information in this data. Can you use that in some way to plot out anything interesting in the data in terms of geographic distribution. Note, log(x+1) transformations may be your friend for some things. Or correcting by population size. Have fun with this! Feel free to look into geospatial visualization with ggplot2 or other packages - although we’ll do this more formally in a few weeks. Might not be necessary, but, you never know.