Make an R Cheat Sheet

A Biol355 Midterm

Jarrett Byrnes

2021-11-02

Introduction

In the wild and wolly world of R, there are many packages out there. Some of them are on the Comprehensive R Archive Network1 See CRAN. Many can be found on Github2 A site for sharing code and using git as version control software. If you’re interested, see Happy with Git for more and talk to me. Extra credit awaits… and installed with the devtools package3 instead of install.packages you use devtools::install_github('username/pkgname').

Throughout the semester, we’ve been refering you to R Cheat Sheets for everything from working with strings to ggplot2 and more. These cheat sheets are invaluable as learning tools. Creating a cheat sheet is also an amazing way to familiarize yourself with a new package, and really solidify your knowledge of the R ecosystem.

So, for your midterm, I’d like you to create a cheat sheet for the package of your choice. It can be a simple package, or a complex package, or anything in between. Below, I’ll detail the steps of the process, as well as provide a list of some packages that might be enjoyable to use for this assignment - although the choice is ultimately up to you!

Due Date

Nov 5, 2021

Simple Extra Credit

Put your cheatsheet up on Github!If you’re interested, see Happy with Git for more and/or talk to me.

Past Example Cheatsheets

vegan - by Bruna Silva in 2020, now on Rstudio’s website!
auk - by Mickayla Johnson in 2020
waffle - by Stephen Smiddy in 2020
slackr - from Daniel Villareal

Steps

  1. First, look at existing R Cheat Sheets. I will give you a homework question asking you to identify what you do and do not find useful about the examples here shortly.

  2. Second, read up on how to create an R Cheat Sheet. Download the template and read it over CAREFULLY.

  3. Third, select a package. See the section below on possible packages and/or where to find other packages. Once you have selected a package, sign up here so only one person gets one package!

  4. Play with the package. A lot. I will have an upcoming homework question about this, asking what are the most interesting things that the package does, what are the most useful things that it does, what elements of it you think most people would use the most frequently, and how would you organize this information.

  5. Make a sketch of your cheatsheet. Be as detailed as you can. This will also be a future homework question!

  6. Finally, make the cheat sheet. Read the template carefully, as there many wonderful tips for how to make an effective cheatsheet.

Grading

The breakdown of your grade for this assignment will be as follows:

Things NOT to do

Finding a Package

While we provide a few packages below, you’re also more than welcome to search for ones you might find interesting. There are a number of places to look. Your package can be from CRAN. It might be from a CRAN Task View. You might find it on awesome-r.com. It could be a package featured as a Top 40 new package at Rviews. Or take a look at the New and Updated packages section of rweekly (note - I read this daily!). Many data oriented packages can be found at ROpenGov or for health data, rOpenHealth. I’m also a big fan of ROpenSci for many ecologically focused packages. Also, on our data sets page there are a variety of R packages that provide access to data. Those are great targets as well!

Some package suggestions

By all means, don’t chose something from the list below if you don’t want to! But there are some packages which might be good to think about if your imagination is flagging. I’ll also provide a bit of an example of what they can do.

gghighlight

gghighlight allows you to easily highlight specific lines and points within a ggplot.

An example of a new geom from gghighlight

An example of a new geom from gghighlight

ggforce

The ggforce package that contains a number of additional geoms and stats for ggplot2 giving added functionality and really fun plots.

An example of a new geom from ggforce

An example of a new geom from ggforce

patchwork

patchwork is a truly amazing package that allows you to combine multiple ggplots into multipaneled figures. I’ll admit, this is one of the workhorse packages I use almost every day. But - it doesn’t have a cheatsheet! Can you write one?

An example of multipaneled patchwork plots

An example of multipaneled patchwork plots

vroom

Reading comma separated files and the like seems fast to you now - but that’s only because we’ve been reading files with kilobytes of data in them at most. What about when you get to REAL big data - files with gigabytes or more of data in them? That’s where vroom comes in. It can read over 1GB of data per second! But with great power, comes great responsibility in terms of how you structure loading files - and there are ways to load MANY files with vroom as well. So, try writing a cheat sheet on large file(s) handling with vroom!

vroom::vroom("mtcars.tsv",
  col_types = list(cyl = "i", gear = "f",hp = "i", disp = "_",
                   drat = "_", vs = "l", am = "l", carb = "i")
)
#> # A tibble: 32 x 10
#>   model           mpg   cyl    hp    wt  qsec vs    am    gear   carb
#>   <chr>         <dbl> <int> <int> <dbl> <dbl> <lgl> <lgl> <fct> <int>
#> 1 Mazda RX4      21       6   110  2.62  16.5 FALSE TRUE  4         4
#> 2 Mazda RX4 Wag  21       6   110  2.88  17.0 FALSE TRUE  4         4
#> 3 Datsun 710     22.8     4    93  2.32  18.6 TRUE  TRUE  4         1
#> # … with 29 more rows

snakecase

snakecase is an all in one converter of complicated strings to a number of more standardized formats - snakecase, camelcase, and others. In addition to specific parsers, there’s also a general parser, to come up with your own conversion case, and it handles transliterrating other languages on the fly.

library(snakecase)

string <- c("lowerCamelCase") 

to_any_case(string)
## [1] "lower_camel_case"
#from German!
to_any_case("Doppelgänger", 
            transliterations = "german", 
            case = "upper_camel")
## [1] "Doppelgaenger"

scatterD3

Do you love awesome interactive web plots? Then scatterD3 is for you!

Play with this scatterD3 plot!

visdat

We’ve used visdat for basically one function, but there’s so so soooo much more! Build a cheat sheet to help folk explore their data!

skimr

As with visdat, we’ve used skimr for one function only, but, again, so much more!

Data summary

Name Piped data
Number of rows 272
Number of columns 2
_______________________
Column type frequency:
numeric 2
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
eruptions 0 1 3.49 1.14 1.6 2.16 4 4.45 5.1 ▇▂▂▇▇
waiting 0 1 70.90 13.59 43.0 58.00 76 82.00 96.0 ▃▃▂▇▂

Extra Credit: data validation

Combine the skimr and visdat packages along with any others you find that are useful to create a comprehensive data validation cheat sheet for when you first load a new data set. This is a powerful beast, and is worth up to 25% extra credit.

worms

Taxonomy is hard. worms makes it easy. Names change. We often want a lot of different information from a taxonomy. One place that is standardized to all marine species is the World Register of Marine Species.

rnaturalearth

rnaturalearth provides an interface to the Natural Earth project - a collection of vector and raster files to make amazing maps.

## OGR data source with driver: ESRI Shapefile 
## Source: "/private/var/folders/hr/2f_q28hx74382lr9359h3jkc309s_w/T/Rtmp5Uk0PA", layer: "ne_50m_lakes"
## with 275 features
## It has 35 fields
## Integer64 fields read as strings:  scalerank ne_id
## OGR data source with driver: ESRI Shapefile 
## Source: "/private/var/folders/hr/2f_q28hx74382lr9359h3jkc309s_w/T/Rtmp5Uk0PA", layer: "ne_50m_rivers_lake_centerlines"
## with 462 features
## It has 32 fields
## Integer64 fields read as strings:  ne_id

Lakes and rivers of the US

Lakes and rivers of the US

rayshader

rayshadr allows you to make all sorts of hauntingly beautiful photo-realistic 3D plots - maps, ggplots, and more!

rayshader

rnoaa

rnoaa is an astouding package that lets you access many of the different data sets available from NOAA. There is no way you can do a comprehensive cheat sheet here. BUT - if you want to make a cheat sheet to fully explore ONE data set, and show us how to use it to make it useful, that would be perfect!

Sea Ice at the North Pole from 2000-2010

Sea Ice at the North Pole from 2000-2010

rtweet

Wanna dig into the stew of Twitter and analyze it? What do you find there? What’s awesome? What’s terrible? Check out rtweet

How often does Prof. Byrnes tweet?

How often does Prof. Byrnes tweet?

the lost pies of magrittr

The magrittr package is far more than just %>%. There are a multitude of pipes we don’t talk about, different ways of creating repeatable workflows with . %>% and more. Write a cheat sheet that embraces the full functionality of magrittr!