In the wild and wolly world of R, there are many packages out there. Some of them are on the Comprehensive R Archive Network1 See CRAN. Many can be found on Github2 A site for sharing code and using git
as version control software. If you’re interested, see Happy with Git for more and talk to me. Extra credit awaits… and installed with the devtools package3 instead of install.packages
you use devtools::install_github('username/pkgname')
.
Throughout the semester, we’ve been refering you to R Cheat Sheets for everything from working with strings to ggplot2 and more. These cheat sheets are invaluable as learning tools. Creating a cheat sheet is also an amazing way to familiarize yourself with a new package, and really solidify your knowledge of the R ecosystem.
So, for your midterm, I’d like you to create a cheat sheet for the package of your choice. It can be a simple package, or a complex package, or anything in between. Below, I’ll detail the steps of the process, as well as provide a list of some packages that might be enjoyable to use for this assignment - although the choice is ultimately up to you!
Nov 5, 2021
Put your cheatsheet up on Github!If you’re interested, see Happy with Git for more and/or talk to me.
vegan - by Bruna Silva in 2020, now on Rstudio’s website!
auk - by Mickayla Johnson in 2020
waffle - by Stephen Smiddy in 2020
slackr - from Daniel Villareal
First, look at existing R Cheat Sheets. I will give you a homework question asking you to identify what you do and do not find useful about the examples here shortly.
Second, read up on how to create an R Cheat Sheet. Download the template and read it over CAREFULLY.
Third, select a package. See the section below on possible packages and/or where to find other packages. Once you have selected a package, sign up here so only one person gets one package!
Play with the package. A lot. I will have an upcoming homework question about this, asking what are the most interesting things that the package does, what are the most useful things that it does, what elements of it you think most people would use the most frequently, and how would you organize this information.
Make a sketch of your cheatsheet. Be as detailed as you can. This will also be a future homework question!
Finally, make the cheat sheet. Read the template carefully, as there many wonderful tips for how to make an effective cheatsheet.
The breakdown of your grade for this assignment will be as follows:
While we provide a few packages below, you’re also more than welcome to search for ones you might find interesting. There are a number of places to look. Your package can be from CRAN. It might be from a CRAN Task View. You might find it on awesome-r.com. It could be a package featured as a Top 40 new package at Rviews. Or take a look at the New and Updated packages section of rweekly (note - I read this daily!). Many data oriented packages can be found at ROpenGov or for health data, rOpenHealth. I’m also a big fan of ROpenSci for many ecologically focused packages. Also, on our data sets page there are a variety of R packages that provide access to data. Those are great targets as well!
By all means, don’t chose something from the list below if you don’t want to! But there are some packages which might be good to think about if your imagination is flagging. I’ll also provide a bit of an example of what they can do.
The ggforce package that contains a number of additional geoms and stats for ggplot2
giving added functionality and really fun plots.
patchwork is a truly amazing package that allows you to combine multiple ggplots into multipaneled figures. I’ll admit, this is one of the workhorse packages I use almost every day. But - it doesn’t have a cheatsheet! Can you write one?
Reading comma separated files and the like seems fast to you now - but that’s only because we’ve been reading files with kilobytes of data in them at most. What about when you get to REAL big data - files with gigabytes or more of data in them? That’s where vroom comes in. It can read over 1GB of data per second! But with great power, comes great responsibility in terms of how you structure loading files - and there are ways to load MANY files with vroom as well. So, try writing a cheat sheet on large file(s) handling with vroom!
vroom::vroom("mtcars.tsv",
col_types = list(cyl = "i", gear = "f",hp = "i", disp = "_",
drat = "_", vs = "l", am = "l", carb = "i")
)
#> # A tibble: 32 x 10
#> model mpg cyl hp wt qsec vs am gear carb
#> <chr> <dbl> <int> <int> <dbl> <dbl> <lgl> <lgl> <fct> <int>
#> 1 Mazda RX4 21 6 110 2.62 16.5 FALSE TRUE 4 4
#> 2 Mazda RX4 Wag 21 6 110 2.88 17.0 FALSE TRUE 4 4
#> 3 Datsun 710 22.8 4 93 2.32 18.6 TRUE TRUE 4 1
#> # … with 29 more rows
snakecase is an all in one converter of complicated strings to a number of more standardized formats - snakecase, camelcase, and others. In addition to specific parsers, there’s also a general parser, to come up with your own conversion case, and it handles transliterrating other languages on the fly.
library(snakecase)
string <- c("lowerCamelCase")
to_any_case(string)
## [1] "lower_camel_case"
#from German!
to_any_case("Doppelgänger",
transliterations = "german",
case = "upper_camel")
## [1] "Doppelgaenger"
We’ve used visdat for basically one function, but there’s so so soooo much more! Build a cheat sheet to help folk explore their data!
As with visdat
, we’ve used skimr for one function only, but, again, so much more!
Data summary
Name | Piped data |
Number of rows | 272 |
Number of columns | 2 |
_______________________ | |
Column type frequency: | |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
eruptions | 0 | 1 | 3.49 | 1.14 | 1.6 | 2.16 | 4 | 4.45 | 5.1 | ▇▂▂▇▇ |
waiting | 0 | 1 | 70.90 | 13.59 | 43.0 | 58.00 | 76 | 82.00 | 96.0 | ▃▃▂▇▂ |
Combine the skimr
and visdat
packages along with any others you find that are useful to create a comprehensive data validation cheat sheet for when you first load a new data set. This is a powerful beast, and is worth up to 25% extra credit.
Taxonomy is hard. worms makes it easy. Names change. We often want a lot of different information from a taxonomy. One place that is standardized to all marine species is the World Register of Marine Species.
rnaturalearth provides an interface to the Natural Earth project - a collection of vector and raster files to make amazing maps.
## OGR data source with driver: ESRI Shapefile
## Source: "/private/var/folders/hr/2f_q28hx74382lr9359h3jkc309s_w/T/Rtmp5Uk0PA", layer: "ne_50m_lakes"
## with 275 features
## It has 35 fields
## Integer64 fields read as strings: scalerank ne_id
## OGR data source with driver: ESRI Shapefile
## Source: "/private/var/folders/hr/2f_q28hx74382lr9359h3jkc309s_w/T/Rtmp5Uk0PA", layer: "ne_50m_rivers_lake_centerlines"
## with 462 features
## It has 32 fields
## Integer64 fields read as strings: ne_id
rayshadr allows you to make all sorts of hauntingly beautiful photo-realistic 3D plots - maps, ggplots, and more!
rnoaa is an astouding package that lets you access many of the different data sets available from NOAA. There is no way you can do a comprehensive cheat sheet here. BUT - if you want to make a cheat sheet to fully explore ONE data set, and show us how to use it to make it useful, that would be perfect!
Wanna dig into the stew of Twitter and analyze it? What do you find there? What’s awesome? What’s terrible? Check out rtweet
The magrittr package is far more than just %>%
. There are a multitude of pipes we don’t talk about, different ways of creating repeatable workflows with . %>%
and more. Write a cheat sheet that embraces the full functionality of magrittr
!