1. This week, we’ll work with a dataset from sampling
plankton in the Plum Island Estuary by the PIE Long Term Ecological
Research site. This dataset is in an excel file with both metadata
and data. There’s a lot of information in it, and we’ll come back to
this dataset a few times through the semester.
1A. Load up the plankton data using the readxl
library, and generate a scatterplot of the relationship between
Chlorophytes
and TotalChlA
. Note - this will
require you to read the documentation for the function you use to read
in the data, as you will have to deal with loading in a sheet other than
the first one as well as dealing with columns that load as characters,
but are numerics, due to some character strings being used to specify
NA. How do you deal with this using read_xls()
?
1A Continued.Is there more Chlorophyll when there are more Chlorophytes? Note, if one of your axes is unreadable due to the number of values, that column is a character and not a numeric. Go back and fix this with how you load the data.
1B. Many processes can modify this relationship. They all
tend to covary with distance from the mouth of the estuary, where it
empties into the ocean and is highly saline. Maybe distance from estuary
mouth - Distance
- affects the relationship between
Chlorophytes and total chlorophyll? Can you see any pattern of how
distance alters this relationship by coloring the points by Distance?
Use something other than the default color scale.
1C. As distance is continuous, any patterns might still be
hard to see. What if we made a discrete variable out of distance using
cut_interval
and used facet_wrap
to see its
influence. What patterns do you see?
1D. As the estuary was sampled at times of year where
temperature varied, and distance from mouth might have a different
effect under cold v. warm temperatures, let’s look at whether
temperature and distance act in concert using facets. What do you see if
you create a discrete variable from Temp
using
cut_interval
and then make a facet_grid
plot
looking at the effects of both temperature and distance from mouth?
1E. Last, are your answers from A-D made clearer or not by changing the scale of the x and y axes with log10 or any other transformation of x or why axes? Why or why not does a transformation help?
2. Let’s make this plot look good! Choose one of the
plots that you worked on in part 1.
2A. Give it a title with ggtitle()
. Change the x
and y axis names with xlab()
and ylab()
.
2B. Now, let’s theme it using the ggthemes
package. Look through the theme options it gives you. Choose one, and
implement it (e.g., add theme_bw(base_size=12)
) to your
plot. Why did you choose this theme? What about it aids in your
visualization?
2C. Extra credit - look at the theme
help file.
Customize your plot even more using theme()
and justify
your choices.
3. What is your favorite data visualization. Grab a jpg
of it and put it into this RMarkdown document (you’ll need look at how
to get images into RMarkdown documents and you’ll need to submit it to
us along with the homework so we can compile the document). Bonus point
if you archive (think zip files) the RMD and JPEGs and submit them
together!
Now tell us why this is your favorite example of a data visualization.
4. It’s time to start thinking about your final project.
Either use your own data or find something in the datasets I’ve assembled for you. Find one
dataset that you think might be interesting. Briefly describe it and
make one plot from the data you can download.