Introduction

In the wild and wolly world of R, there are many packages out there. Some of them are on the Comprehensive R Archive Network11 See CRAN. Many can be found on Github22 A site for sharing code and using git as version control software. If you’re interested, see Happy with Git for more and talk to me. Extra credit awaits… and installed with the devtools package33 instead of install.packages you use devtools::install_github('username/pkgname').

Throughout the semester, we’ve been refering you to R Cheat Sheets for everything from working with strings to ggplot2 and more. These cheat sheets are invaluable as learning tools. Creating a cheat sheet is also an amazing way to familiarize yourself with a new package, and really solidify your knowledge of the R ecosystem.

So, for your midterm, I’d like you to create a cheat sheet for the package of your choice. It can be a simple package, or a complex package, or anything in between. Below, I’ll detail the steps of the process, as well as provide a list of some packages that might be enjoyable to use for this assignment - although the choice is ultimately up to you!

Due Date

Nov 5, 2021

Simple Extra Credit

Put your cheatsheet up on Github!If you’re interested, see Happy with Git for more and/or talk to me.

Past Example Cheatsheets

vegan - by Bruna Silva in 2020, now on Rstudio’s website!
auk - by Mickayla Johnson in 2020
waffle - by Stephen Smiddy in 2020
slackr - from Daniel Villareal

Steps

First, look at existing R Cheat Sheets. I will give you a homework question asking you to identify what you do and do not find useful about the examples here shortly.
Second, read up on how to create an R Cheat Sheet. Download the template and read it over CAREFULLY.
Third, select a package. See the section below on possible packages and/or where to find other packages. Once you have selected a package, sign up here so only one person gets one package!
Play with the package. A lot. I will have an upcoming homework question about this, asking what are the most interesting things that the package does, what are the most useful things that it does, what elements of it you think most people would use the most frequently, and how would you organize this information.
Make a sketch of your cheatsheet. Be as detailed as you can. This will also be a future homework question!
Finally, make the cheat sheet. Read the template carefully, as there many wonderful tips for how to make an effective cheatsheet.

Grading

The breakdown of your grade for this assignment will be as follows:

50% Utility. Can I execute things from this package using your cheat sheet?
25% Completeness. How thoroughly did you explore the package? Is it just one or two things, or is this really a guide to the whole thing.
15% Communication: Is it aesthetically pleasing (i.e. did you do a good job communicating the package to a new user)?
10% - Accessibility: Spelling, grammer, etc. Yes. They matter. It’s communication.
Extra credit - 10% - going above and beyond and covering a topic area with multiple packages instead of a single package. Talk to Prof. Byrnes for more on this.
Extra credit - 10% - get it on Rstudio’s contributed cheat sheet website
Extra credit - 10% - get folk commenting on it on #rstats twitter!

Things NOT to do

Do not write a long word document explaining the package
Do not make a bunch of slides rather than use the template for a cheat sheet

Finding a Package

While we provide a few packages below, you’re also more than welcome to search for ones you might find interesting. There are a number of places to look. Your package can be from CRAN. It might be from a CRAN Task View. You might find it on awesome-r.com. It could be a package featured as a Top 40 new package at Rviews. Or take a look at the New and Updated packages section of rweekly (note - I read this daily!). Many data oriented packages can be found at ROpenGov or for health data, rOpenHealth. I’m also a big fan of ROpenSci for many ecologically focused packages. Also, on our data sets page there are a variety of R packages that provide access to data. Those are great targets as well!

Some package suggestions

By all means, don’t chose something from the list below if you don’t want to! But there are some packages which might be good to think about if your imagination is flagging. I’ll also provide a bit of an example of what they can do.

gghighlight

gghighlight allows you to easily highlight specific lines and points within a ggplot.

An example of a new geom from gghighlight

ggforce

The ggforce package that contains a number of additional geoms and stats for ggplot2 giving added functionality and really fun plots.

An example of a new geom from ggforce

patchwork

patchwork is a truly amazing package that allows you to combine multiple ggplots into multipaneled figures. I’ll admit, this is one of the workhorse packages I use almost every day. But - it doesn’t have a cheatsheet! Can you write one?

An example of multipaneled patchwork plots

vroom

Reading comma separated files and the like seems fast to you now - but that’s only because we’ve been reading files with kilobytes of data in them at most. What about when you get to REAL big data - files with gigabytes or more of data in them? That’s where vroom comes in. It can read over 1GB of data per second! But with great power, comes great responsibility in terms of how you structure loading files - and there are ways to load MANY files with vroom as well. So, try writing a cheat sheet on large file(s) handling with vroom!

vroom::vroom("mtcars.tsv",
  col_types = list(cyl = "i", gear = "f",hp = "i", disp = "_",
                   drat = "_", vs = "l", am = "l", carb = "i")
)
#> # A tibble: 32 x 10
#>   model           mpg   cyl    hp    wt  qsec vs    am    gear   carb
#>   <chr>         <dbl> <int> <int> <dbl> <dbl> <lgl> <lgl> <fct> <int>
#> 1 Mazda RX4      21       6   110  2.62  16.5 FALSE TRUE  4         4
#> 2 Mazda RX4 Wag  21       6   110  2.88  17.0 FALSE TRUE  4         4
#> 3 Datsun 710     22.8     4    93  2.32  18.6 TRUE  TRUE  4         1
#> # … with 29 more rows

snakecase

snakecase is an all in one converter of complicated strings to a number of more standardized formats - snakecase, camelcase, and others. In addition to specific parsers, there’s also a general parser, to come up with your own conversion case, and it handles transliterrating other languages on the fly.

library(snakecase)

string <- c("lowerCamelCase") 

to_any_case(string)

## [1] "lower_camel_case"

#from German!
to_any_case("Doppelgänger", 
            transliterations = "german", 
            case = "upper_camel")

## [1] "Doppelgaenger"

scatterD3

Do you love awesome interactive web plots? Then scatterD3 is for you!

Play with this scatterD3 plot!

visdat

We’ve used visdat for basically one function, but there’s so so soooo much more! Build a cheat sheet to help folk explore their data!

skimr

As with visdat, we’ve used skimr for one function only, but, again, so much more!

Data summary

Name	Piped data
Number of rows	272
Number of columns	2
_______________________
Column type frequency:
numeric	2
________________________
Group variables	None

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
eruptions	0	1	3.49	1.14	1.6	2.16	4	4.45	5.1	▇▂▂▇▇
waiting	0	1	70.90	13.59	43.0	58.00	76	82.00	96.0	▃▃▂▇▂

Extra Credit: data validation

Combine the skimr and visdat packages along with any others you find that are useful to create a comprehensive data validation cheat sheet for when you first load a new data set. This is a powerful beast, and is worth up to 25% extra credit.

worms

Taxonomy is hard. worms makes it easy. Names change. We often want a lot of different information from a taxonomy. One place that is standardized to all marine species is the World Register of Marine Species.

rnaturalearth

rnaturalearth provides an interface to the Natural Earth project - a collection of vector and raster files to make amazing maps.

## OGR data source with driver: ESRI Shapefile 
## Source: "/private/var/folders/hr/2f_q28hx74382lr9359h3jkc309s_w/T/Rtmp5Uk0PA", layer: "ne_50m_lakes"
## with 275 features
## It has 35 fields
## Integer64 fields read as strings:  scalerank ne_id

## OGR data source with driver: ESRI Shapefile 
## Source: "/private/var/folders/hr/2f_q28hx74382lr9359h3jkc309s_w/T/Rtmp5Uk0PA", layer: "ne_50m_rivers_lake_centerlines"
## with 462 features
## It has 32 fields
## Integer64 fields read as strings:  ne_id

Lakes and rivers of the US

rayshader

rayshadr allows you to make all sorts of hauntingly beautiful photo-realistic 3D plots - maps, ggplots, and more!

rayshader

rnoaa

rnoaa is an astouding package that lets you access many of the different data sets available from NOAA. There is no way you can do a comprehensive cheat sheet here. BUT - if you want to make a cheat sheet to fully explore ONE data set, and show us how to use it to make it useful, that would be perfect!

Sea Ice at the North Pole from 2000-2010

rtweet

Wanna dig into the stew of Twitter and analyze it? What do you find there? What’s awesome? What’s terrible? Check out rtweet

How often does Prof. Byrnes tweet?

the lost pies of magrittr

The magrittr package is far more than just %>%. There are a multitude of pipes we don’t talk about, different ways of creating repeatable workflows with . %>% and more. Write a cheat sheet that embraces the full functionality of magrittr!

Make an R Cheat Sheet

A Biol355 Midterm

Jarrett Byrnes

2021-11-02

Introduction

Due Date

Simple Extra Credit

Past Example Cheatsheets

Steps

Grading

Things NOT to do

Finding a Package

Some package suggestions

gghighlight

ggforce

patchwork

vroom

snakecase

scatterD3

visdat

skimr

Extra Credit: data validation

worms

rnaturalearth

rayshader

rnoaa

rtweet

the lost pies of magrittr