Real and Current Data

In the R community, there’s a weekly event known as Tidy Tuesday where everyone comes together around a single big dataset and attempts to create the most interesting visualizations possible, posting code and data viz on Twitter using #TidyTuesday. I’d like us to try a… Tidy Friday, with data on the ongoing coronavirus pandemic.

With the advent of the Covid-19 caronavirus outbreak across the world, people want to know more - and no more now! Fortunately, the R community has begun coming together and making tools to rapidly disseminate data. There are two packages out there currently, but let’s focus on the coronavirus package. To install it, use the following code:

install.packages("coronavirus")

The available Data

Let’s see what is there:

library(coronavirus)

head(coronavirus)
##         date province country     lat      long      type cases   uid iso2 iso3
## 1 2020-01-22  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 2 2020-01-23  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 3 2020-01-24  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 4 2020-01-25  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 5 2020-01-26  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
## 6 2020-01-27  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
##   code3    combined_key population continent_name continent_code
## 1   124 Alberta, Canada    4413146  North America             NA
## 2   124 Alberta, Canada    4413146  North America             NA
## 3   124 Alberta, Canada    4413146  North America             NA
## 4   124 Alberta, Canada    4413146  North America             NA
## 5   124 Alberta, Canada    4413146  North America             NA
## 6   124 Alberta, Canada    4413146  North America             NA

Each row is a single instance of recorded cases. There is information on the province, country, continent, the latitude, longitude, and date. We also see the number of cases and whether the observation is of a confirmed case, a recovery, or a death. You can take a look at ?coronavirus for more information or at https://ramikrispin.github.io/coronavirus/.

There is also a second dataset on vaccines.

head(covid19_vaccine)
##   country_region       date doses_admin people_partially_vaccinated
## 1    Afghanistan 2021-02-22           0                           0
## 2    Afghanistan 2021-02-23           0                           0
## 3    Afghanistan 2021-02-24           0                           0
## 4    Afghanistan 2021-02-25           0                           0
## 5    Afghanistan 2021-02-26           0                           0
## 6    Afghanistan 2021-02-27           0                           0
##   people_fully_vaccinated report_date_string uid province_state iso2 iso3 code3
## 1                       0         2021-02-22   4           <NA>   AF  AFG     4
## 2                       0         2021-02-23   4           <NA>   AF  AFG     4
## 3                       0         2021-02-24   4           <NA>   AF  AFG     4
## 4                       0         2021-02-25   4           <NA>   AF  AFG     4
## 5                       0         2021-02-26   4           <NA>   AF  AFG     4
## 6                       0         2021-02-27   4           <NA>   AF  AFG     4
##   fips      lat     long combined_key population continent_name continent_code
## 1 <NA> 33.93911 67.70995  Afghanistan   38928341           Asia             AS
## 2 <NA> 33.93911 67.70995  Afghanistan   38928341           Asia             AS
## 3 <NA> 33.93911 67.70995  Afghanistan   38928341           Asia             AS
## 4 <NA> 33.93911 67.70995  Afghanistan   38928341           Asia             AS
## 5 <NA> 33.93911 67.70995  Afghanistan   38928341           Asia             AS
## 6 <NA> 33.93911 67.70995  Afghanistan   38928341           Asia             AS

This is much more rich with spatial data, but, can be used in a similar manner.

Explore the data.

I want you to load the data, look through it, and then, make it tell a story! To do this, I want you to

  1. Really dig into what is there with all of the tools we have at our disposal.

  2. Sit down and write out what kind of story you want to tell. What do you want to learn from this data? Write out a paragraph. Or two!

  3. Sketch out any data visualizations you might want to make. With pencil and paper - just a theoretical example of what it might look like.

  4. Start a fresh .R file and, in comments, sketch out the steps you will take and what you will do.
    4a. Note, for all data viz, don’t just use defaults. Get creative. Make this look like something that we wouldn’t be surprised to find in a magazine or newspaper. Feel free to use alternate themes - even from ggthemes or other places. Google around.
  5. Once finished, move the code into a .Rmd file to create a nice, clean, HTMl file that tells a story. Show how you processed the data. and make the visuzliations.

  6. If you are a superstar, create a second story! Or dig into a different package - see https://mine-cetinkaya-rundel.github.io/covid19-r/ for a partial list or google around - and see if it can be used to tell a different story.

Showing what you found

As you make your first cool data viz, copy it and post it to slack! Show it off to the class to see what you found!

After you finish your lab report (your .Rmd file), compile it, and submit it at your homework link. I want to make a gallery of interesting reports for you all to look at to see what is possible.

Extra credit if you post it to Twitter with the hashtags #rstats and #coronavirus, making sure to mention that you used https://ramikrispin.github.io/coronavirus/