1. This week, we’ll work with a dataset from sampling plankton in the Plum Island Estuary by the PIE Long Term Ecological Research site. This dataset is in an excel file with both metadata and data. There’s a lot of information in it, and we’ll come back to this dataset a few times through the semester.

        1A. Load up the plankton data using the readxl library, and generate a scatterplot of the relationship between Chlorophytes and TotalChlA. Note - this will require you to read the documentation for the function you use to read in the data, as you will have to deal with loading in a sheet other than the first one as well as dealing with columns that load as characters, but are numerics, due to some character strings being used to specify NA. How do you deal with this using read_xls()?

        1A Continued.Is there more Chlorophyll when there are more Chlorophytes? Note, if one of your axes is unreadable due to the number of values, that column is a character and not a numeric. Go back and fix this with how you load the data.

        1B. Many processes can modify this relationship. They all tend to covary with distance from the mouth of the estuary, where it empties into the ocean and is highly saline. Maybe distance from estuary mouth - Distance - affects the relationship between Chlorophytes and total chlorophyll? Can you see any pattern of how distance alters this relationship by coloring the points by Distance? Use something other than the default color scale.

        1C. As distance is continuous, any patterns might still be hard to see. What if we made a discrete variable out of distance using cut_interval and used facet_wrap to see its influence. What patterns do you see?

        1D. As the estuary was sampled at times of year where temperature varied, and distance from mouth might have a different effect under cold v. warm temperatures, let’s look at whether temperature and distance act in concert using facets. What do you see if you create a discrete variable from Temp using cut_interval and then make a facet_grid plot looking at the effects of both temperature and distance from mouth?

        1E. Last, are your answers from A-D made clearer or not by changing the scale of the x and y axes with log10 or any other transformation of x or why axes? Why or why not does a transformation help?




2. Let’s make this plot look good! Choose one of the plots that you worked on in part 1.

        2A. Give it a title with ggtitle(). Change the x and y axis names with xlab() and ylab().

        2B. Now, let’s theme it using the ggthemes package. Look through the theme options it gives you. Choose one, and implement it (e.g., add theme_bw(base_size=12)) to your plot. Why did you choose this theme? What about it aids in your visualization?

        2C. Extra credit - look at the theme help file. Customize your plot even more using theme() and justify your choices.




3. What is your favorite data visualization. Grab a jpg of it and put it into this RMarkdown document (you’ll need look at how to get images into RMarkdown documents and you’ll need to submit it to us along with the homework so we can compile the document). Bonus point if you archive (think zip files) the RMD and JPEGs and submit them together!

Now tell us why this is your favorite example of a data visualization.




4. It’s time to start thinking about your final project. Either use your own data or find something in the datasets I’ve assembled for you. Find one dataset that you think might be interesting. Briefly describe it and make one plot from the data you can download.