While the topics covered are broad, each week will feature different examples from genetics, ecology, molecular, and evolutionary biology highlighting uses of each individual set of techniques.

This course will be a mixture of lecture, live-code demonstrations, and opportunities for in class work. Lecture days will have small exercises for students at the end of class. We will conduct lectures and labs in a computer lab in order for students to be able to follow along and try out new concepts once described and demonstrated in lecture, enabling rapid feedback between students and faculty.

W&G = Wickham & Grolemund, DC = Data Carpentry Lesson, U/P for linked pdfs = biol355

Week 1. Overview of Course, Data Creation, and Spreadsheets

1/23/2023
Lecture: Welcome!, Data Collection and Metadata
Readings: W&G Introduction Chapter 1, Browman and Woo 2017
Objective(s): Introduce the students to the course; understand what is data, discuss how we preserve information about data, view different examples of datasets from different disciplines. Compare poor versus good practice in creating data. Differentiate between data recording and data entry.
Lab: DC Spreadsheet Ecology Lesson
Files: Portal Data
Etherpad: https://etherpad.wikimedia.org/p/355-spreadsheets-2023_spring

Week 2 Introduction to R and RMarkdown

1/30/2023
Readings: W&G On Workflow Basics and Scripts and Projects, Tibbles & Data Frames, Data Import, Spreadsheets
Lectures: DC Ecology Intro to R and starting with data
Objective(s): Begin to learn the R computing language. Identify the syntax of an R function (name and arguments); Create an R project in RStudio. Read data into R using read.csv(); Use R as a basic calculator; Describe and create variables in R; Interpret the output of the str() function; Install packages in R;
Biological Examples: Human genome size.
Lab: Intro to Quarto, R Markdown and Data Subsetting
Excercises: Warmup Excercises for Friday
Cheat Sheets: Data Import
Files: Excel, CSV
Homework: Data frames!
Etherpad: https://etherpad.wikimedia.org/p/biol355-r-intro-2023_spring

Week 3. Visualization

2/6/2023
Readings: W&G on Data Visualization, Unwin 2008, Exploring Data Visually.
Lectures: Principles of Data Visualization, Intro to ggplot2
Lab: ggplot2 and climate change
Objective(s): Develop understanding of graphical presentation best practices. Create a scatterplot using ggplot(); Learn how to add data to a simple map
Biological Examples: Plum Island LTER Plankton Distribution. HAD CRUT Global temperature anomaly over the past century
Cheat Sheets: Data visualization, Mapping with ggmap
Other References:ggplot2 references, Fundamentals of Data Visualization, colors for data viz
Etherpad: https://etherpad.wikimedia.org/p/355-dataviz-2023_spring
Homework: Plotapalooza

Week 4. Data Reduction and Summarization

2/13/2023
Readings: W&G on Data Transformation and Pipes, DC Ecology Lesson on Data Aggregation
Lecture: Organizing Data to Tell a Story
Lab: Gapminder and Dplyr, Dplyr Faded Example Practice
Cheat Sheets: Data Wrangling Cheat Sheet
Objective(s): Describe the meaning and identify applications of the following summary/descriptive statistics: mean, mode, median, standard deviation; Describe the split-apply-combine strategy of data reduction and summarization; Use group_by() and summarise() to calculate summary statistics for groupings within a dataset; Subset data using filter()
Biological Examples: Human genome size. Sockeye salmon sizes.
In Class Files: gapminder and dplyr
Homework: Birthweights in California
Etherpad: https://etherpad.wikimedia.org/p/355-dplyr-2023_spring

Week 5. Date Cleaning: Strings, Factors, and Dates

2/20/2023
Readings: W&G on Strings, Dates and Times, and Factors
Lecture: Strings and Regular Expressions 1, Strings and Regular Expressions 2
Cheat Sheets: Work with Strings Cheat Sheet
Objective(s): Understand how strings differ from numbers. Learn the basics of string manipulation. Describe the different strategies to clean data full of errors with minimal effort. Process and understand the concept of regular expression matching. Manipulate different date formats and work them into a data map reduce workflow.
Files: Portal Mammal Data with String Problems
Homework: Regular Expressions
Etherpad: https://etherpad.wikimedia.org/p/355-strings-2023_spring

Week 6. Tidy Data and Data Cleaning

2/27/2023
Readings: W&G Chapters on Tidy Data
Lecture: Tidy Data, Axoltl Data Cleaning
Lab: Tidy Friday: Bob Ross Edition
Objective(s): Understand how to reshape and manipulate data. Describe the difference between the two fundamental forms of data – long versus wide, Use the tidyr package in R to convert between long and wide data; Use unite and separate to create tidy data (where each column is a variable).
Biological Examples: Axoltl limb regeneration. Mammal taxonomic records. Weather data. Sale prices for homework. HAD CRUT Global temperature anomaly over the past century, wide format.
Etherpad: https://etherpad.wikimedia.org/p/355-tidy-2023_spring
Homework: Weather Data Cleaning
MIDTERM: make a cheatsheet

Week 7 & 8. Data “Mashups” and Geospatial Data

3/6/2023 and 3/20/2023
Readings: Intro to Geocomputation in R, Geographic Data in R, W&G Chapter on Relational Data
Lectures: Intro to GIS, Rasters and plotting rasters, Joins, Vector Data
Labs: coronavirus spatial mapping joins and maps
Objective(s): Know when and where to use different types of joins, Understand how to merge survey data with geospatial information to get a geographic understanding of epidemiological patterns
Biological Examples: Hemlock wooly adelgid distribution. CDC records of heart disease across counties of the US. Change in coastal sea surface temperature since 1850. Global TB mortality distribution.
Optional Reading: Making maps in R and other chapters in Geocomputation in R. Spatial Data Science, information about the lab data
Files: geospatial data for lab, Hemlock, hemlock_densities, Arctic Boreal Forest Vegetation, Heart Disease in America, data for lab on joins, US County Borders in 2013, March 18, 2018 SST Anomalies
Etherpad: https://etherpad.wikimedia.org/p/355-gis-2023_spring
Homework: Map making!
Before Class: Install rgdal (you’ll need to install gdal first - see below), sf, sp, raster, leaflet, maptools, mapdata, and rgeos.

To install gdal on a mac, there are two steps
1) Install Homebrew from http://brew.sh/ (this is an awesome thing to have anyway)
2) in Terminal type
brew install gdal

To install on a Windows PC
1) Install OSGEO4W https://trac.osgeo.org/osgeo4w/wiki
2) Use it to install gdal

Week 9 Functions

3/27/2023
Readings: W&G Chapters on Functions
Lectures: Intro to Functions, Functions and Flexibility
Lab: Functions
Objective(s): Learn the benefits of reusable code, Understand the structure of a function, Discover debugging and making functions fail usefully, Derive principles to make functions that are easy to understand and apply to multiple data sets.
Biological Examples: NOAA buoy data.
Files: NOAA buoy data from Boston Harbor, get_buoy.R
Homework: Functions!
Etherpad: https://etherpad.wikimedia.org/p/355-functions-2023_spring

Week 10 Iteration

4/3/2023
Readings: W&G Chapters on Iteration
Optional Reading: Advanced R on [functionals] (https://adv-r.hadley.nz/functionals.html)
Lectures: Iteration with purrr
Lab: Importing Hospital Records, Bioinformatics and List Columns
Objective(s): Learn the benefits of iteration in code. Automate multiple tasks. Fitting many models in an automated fashion to test generality.
Biological Examples: Gapminder, Climate Change, Covid-19 in the US
Files: Split up Hadley Met Centre Data, get_buoy.R
Homework: Iterations and List Columns
Etherpad: https://etherpad.wikimedia.org/p/355-iteration-2023_spring

Week 11. Shiny and Web Interfaces

4/10/2023
Readings: Mastering Shiny Ch. 1-3, 4 optional, but recommended
Objective(s): Learn how to communicate data to others using dynamic web based applications.
Lectures: Building Shiny Apps
Lab: Lab for Building Shiny Apps
Data:download which is originally from here
Additional Resource: https://shiny.rstudio.com/tutorial/
Homework: Shiny homework with coronavirus
Etherpad: https://etherpad.wikimedia.org/p/355-shiny-2023_spring

Week 12. Introduction to Modeling

4/17/2023
Readings: Cortina and Dunlop 1997
Lectures: Introduction to Modeling, Single Predictor Models
Objective(s): Understand the workflow of generating inference from data, Describe the basics of probability and p-values, Model linear relationships in data, Compare groups of data using T-tests and ANOVa
Biological Examples: Batesian mimicry, Penguin morphometrics, The effects of testosterone on bird behavior.
Lab: the basics of linear models in R.
Etherpad: https://etherpad.wikimedia.org/p/355-modeling-2023_spring

Week 13. Modeling with Many Predictors

4/24/2023
Readings: W&G on Model Basics
Lectures: Multiple Predictors and Model Comparison, The General Linear Model
Objective(s): Describe when to use nonlinear models/curves, Comparing and Contrasting models, Visualization of model outcomes
Biological Examples: Seal life history variation. Mouse anti-fungal drug development. Neanderthal brain size. Fire severity in California.
Lab: general linear models
Data for Lab: from last week
Etherpad: https://etherpad.wikimedia.org/p/355-many_predictors-2023_spring

Week 14. Open Lab

5/1/2023

Week 15. Final Presentations

5/8/2023
Final Presentations on the morning of May 12th! (Papers due end of day May 17th)