While the topics covered are broad, each week will feature different examples from genetics, ecology, molecular, and evolutionary biology highlighting uses of each individual set of techniques.
This course will be a mixture of lecture, live-code demonstrations, and opportunities for in class work. Lecture days will have small exercises for students at the end of class. We will conduct lectures and labs in a computer lab in order for students to be able to follow along and try out new concepts once described and demonstrated in lecture, enabling rapid feedback between students and faculty.
W&G = Wickham & Grolemund, DC = Data Carpentry Lesson, U/P for linked pdfs = biol355
1/23/2023
Lecture: Welcome!, Data Collection and
Metadata
Readings: W&G Introduction Chapter
1, Browman and
Woo 2017
Objective(s): Introduce the students to the course;
understand what is data, discuss how we preserve information about data,
view different examples of datasets from different disciplines. Compare
poor versus good practice in creating data. Differentiate between data
recording and data entry.
Lab: DC
Spreadsheet Ecology Lesson
Files: Portal
Data
Etherpad: https://etherpad.wikimedia.org/p/355-spreadsheets-2023_spring
1/30/2023
Readings: W&G On Workflow Basics
and Scripts and
Projects, Tibbles
& Data Frames, Data Import, Spreadsheets
Lectures: DC Ecology Intro
to R and starting
with data
Objective(s): Begin to learn the R computing language.
Identify the syntax of an R function (name and arguments); Create an R
project in RStudio. Read data into R using read.csv(); Use R as a basic
calculator; Describe and create variables in R; Interpret the output of
the str() function; Install packages in R;
Biological Examples: Human genome size.
Lab: Intro
to Quarto, R Markdown
and Data Subsetting
Excercises: Warmup
Excercises for Friday
Cheat Sheets: Data
Import
Files: Excel, CSV
Homework: Data frames!
Etherpad: https://etherpad.wikimedia.org/p/biol355-r-intro-2023_spring
2/6/2023
Readings: W&G on Data
Visualization, Unwin
2008, Exploring Data
Visually.
Lectures: Principles of Data
Visualization, Intro
to ggplot2
Lab: ggplot2
and climate
change
Objective(s): Develop understanding of graphical
presentation best practices. Create a scatterplot using ggplot(); Learn
how to add data to a simple map
Biological Examples: Plum Island LTER
Plankton Distribution. HAD CRUT
Global temperature anomaly over the past century
Cheat Sheets: Data
visualization, Mapping
with ggmap
Other References:ggplot2 references,
Fundamentals of Data
Visualization, colors
for data viz
Etherpad: https://etherpad.wikimedia.org/p/355-dataviz-2023_spring
Homework: Plotapalooza
2/13/2023
Readings: W&G on Data Transformation and
Pipes, DC
Ecology Lesson on Data
Aggregation
Lecture: Organizing Data to Tell a
Story
Lab: Gapminder
and Dplyr, Dplyr
Faded Example
Practice
Cheat Sheets: Data
Wrangling Cheat Sheet
Objective(s): Describe the meaning and identify
applications of the following summary/descriptive statistics: mean,
mode, median, standard deviation; Describe the split-apply-combine
strategy of data reduction and summarization; Use group_by() and
summarise() to calculate summary statistics for groupings within a
dataset; Subset data using filter()
Biological Examples: Human genome size. Sockeye salmon sizes.
In Class Files: gapminder and
dplyr
Homework: Birthweights in
California
Etherpad: https://etherpad.wikimedia.org/p/355-dplyr-2023_spring
2/20/2023
Readings: W&G on Strings, Dates and Times,
and Factors
Lecture: Strings
and Regular Expressions 1, Strings and Regular Expressions
2
Cheat Sheets: Work
with Strings Cheat Sheet
Objective(s): Understand how strings differ from
numbers. Learn the basics of string manipulation. Describe the different
strategies to clean data full of errors with minimal effort. Process and
understand the concept of regular expression matching. Manipulate
different date formats and work them into a data map reduce
workflow.
Files: Portal Mammal Data with
String Problems
Homework: Regular Expressions
Etherpad: https://etherpad.wikimedia.org/p/355-strings-2023_spring
2/27/2023
Readings: W&G Chapters on Tidy Data
Lecture: Tidy
Data, Axoltl Data
Cleaning
Lab: Tidy Friday:
Bob Ross Edition
Objective(s): Understand how to reshape and manipulate
data. Describe the difference between the two fundamental forms of data
– long versus wide, Use the tidyr package in R to convert between long
and wide data; Use unite and separate to create tidy data (where each
column is a variable).
Biological Examples: Axoltl limb
regeneration. Mammal taxonomic
records. Weather data. Sale prices for homework. HAD CRUT
Global temperature anomaly over the past century, wide format.
Etherpad: https://etherpad.wikimedia.org/p/355-tidy-2023_spring
Homework: Weather Data
Cleaning
MIDTERM: make a
cheatsheet
3/6/2023 and 3/20/2023
Readings: Intro to
Geocomputation in R, Geographic
Data in R, W&G Chapter on Relational
Data
Lectures: Intro
to GIS, Rasters
and plotting
rasters, Joins, Vector
Data
Labs: coronavirus spatial mapping
joins and maps
Objective(s): Know when and where to use different
types of joins, Understand how to merge survey data with geospatial
information to get a geographic understanding of epidemiological
patterns
Biological Examples: Hemlock wooly adelgid
distribution. CDC records of heart disease across counties of the US.
Change in coastal sea surface temperature since 1850. Global TB
mortality distribution.
Optional Reading: Making maps in
R and other chapters in Geocomputation in R. Spatial Data Science,
information
about the lab data
Files: geospatial
data for lab, Hemlock, hemlock_densities, Arctic Boreal Forest Vegetation,
Heart Disease in America, data for lab on joins, US County Borders in
2013, March 18,
2018 SST Anomalies
Etherpad: https://etherpad.wikimedia.org/p/355-gis-2023_spring
Homework: Map making!
Before Class: Install rgdal
(you’ll need
to install gdal first - see below), sf
, sp
,
raster
, leaflet
, maptools
,
mapdata
, and rgeos
.
To install gdal on a mac, there are two steps
1) Install Homebrew from http://brew.sh/ (this is an awesome thing to have
anyway)
2) in Terminal type
brew install gdal
To install on a Windows PC
1) Install OSGEO4W https://trac.osgeo.org/osgeo4w/wiki
2) Use it to install gdal
3/27/2023
Readings: W&G Chapters on Functions
Lectures: Intro
to Functions, Functions and
Flexibility
Lab: Functions
Objective(s): Learn the benefits of reusable code,
Understand the structure of a function, Discover debugging and making
functions fail usefully, Derive principles to make functions that are
easy to understand and apply to multiple data sets.
Biological Examples: NOAA buoy data.
Files: NOAA buoy
data from Boston Harbor, get_buoy.R
Homework: Functions!
Etherpad: https://etherpad.wikimedia.org/p/355-functions-2023_spring
4/3/2023
Readings: W&G Chapters on Iteration
Optional Reading: Advanced R on [functionals] (https://adv-r.hadley.nz/functionals.html)
Lectures: Iteration with purrr
Lab: Importing
Hospital Records, Bioinformatics
and List Columns
Objective(s): Learn the benefits of iteration in code.
Automate multiple tasks. Fitting many models in an automated fashion to
test generality.
Biological Examples: Gapminder, Climate Change,
Covid-19 in the US
Files: Split up
Hadley Met Centre Data, get_buoy.R
Homework: Iterations and List
Columns
Etherpad: https://etherpad.wikimedia.org/p/355-iteration-2023_spring
4/10/2023
Readings: Mastering Shiny Ch. 1-3, 4
optional, but recommended
Objective(s): Learn how to communicate data to others
using dynamic web based applications.
Lectures: Building
Shiny Apps
Lab: Lab for
Building Shiny Apps
Data:download which
is originally from here
Additional Resource: https://shiny.rstudio.com/tutorial/
Homework: Shiny
homework with coronavirus
Etherpad: https://etherpad.wikimedia.org/p/355-shiny-2023_spring
4/17/2023
Readings: Cortina
and Dunlop 1997
Lectures: Introduction to Modeling, Single Predictor
Models
Objective(s): Understand the workflow of generating
inference from data, Describe the basics of probability and p-values,
Model linear relationships in data, Compare groups of data using T-tests
and ANOVa
Biological Examples: Batesian mimicry, Penguin
morphometrics, The effects of testosterone on bird behavior.
Lab: the basics of
linear models in R.
Etherpad: https://etherpad.wikimedia.org/p/355-modeling-2023_spring
4/24/2023
Readings: W&G on Model Basics
Lectures: Multiple Predictors and Model
Comparison, The General
Linear Model
Objective(s): Describe when to use nonlinear
models/curves, Comparing and Contrasting models, Visualization of model
outcomes
Biological Examples: Seal life history variation. Mouse
anti-fungal drug development. Neanderthal brain size. Fire severity in
California.
Lab: general linear
models
Data for Lab: from last week
Etherpad: https://etherpad.wikimedia.org/p/355-many_predictors-2023_spring
5/1/2023
5/8/2023
Final Presentations on the morning of May 12th! (Papers due end of day
May 17th)