Online resources


Setup

Below is the code to get you setup. You can copy and paste these dataframes into your code to create the dataframes you will need.

library(dplyr)
library(stringr)

# Note that I'm using tibble() from the dplyr package to make my data.frame!
counties <- tibble(counties = c('MA, 25, 001, Barnstable County, H1', 
              'MA, 25, 003, Berkshire County, H4', 
              'MA, 25, 005, Bristol County, H1',
              'MA, 25, 007, Dukes County, H1',
              'MA, 25, 009, Essex County, H4',
              'MA, 25, 011, Franklin County, H4',
              'MA, 25, 013, Hampden County, H4',
              'MA, 25, 015, Hampshire County, H4',
              'MA, 25, 017, Middlesex County, H4',
              'MA, 25, 019, Nantucket County, H4',
              'MA, 25, 021, Norfolk County, H1',
              'MA, 25, 023, Plymouth County, H1',
              'MA, 25, 025, Suffolk County, H4',
              'MA, 25, 027, Worcester County, H4', 
              'VA, 51, 193, Westmoreland County, H1',
              'VA, 51, 195, Wise County, H1',
              'VA, 51, 197, Wythe County, H1',
              'VA, 51, 199, York County, H1',
              'VA, 51, 510, Alexandria city, C7',
              'VA, 51, 515, Bedford city, C7',
              'VA, 51, 520, Bristol city, C7',
              'VA, 51, 530, Buena Vista city, C7',
              'VA, 51, 540, Charlottesville city, C7',
              'VA, 51, 550, Chesapeake city, C7',
              'VA, 51, 570, Colonial Heights city, C7'
))

dna_seqs <- data_frame(seq =c('ATG CAA TGG GGA AAT GGT ACC AGG TCC GAA CTT AAT GAG GTA AGA CAG ATT TAA', 
'A TGC AAT GGG GAA ATG TTA CCA GGT CCG AAC TTA TTG AGG TAA GAC AGA TTT AA', 
'AT GCA ATG GGG AAA TGT TAC CAG GTC CGA ACT TAT TAA GGT AAG ACA GAT TTA A'))

Questions

  1. Using str_detect and filter, create a data frame with the massachusetts counties that contain the letter ‘h’ in their county name.

  2. Using str_detect and filter, create a data frame with the massachusetts counties that contain the number 2 in their three digit county code.

  3. Using str_replace, remove the word ‘County’ or ‘city’ and the trailing white space after the county name.

  4. Open reading frames (ORF) are sections of DNA that have the potential to code for a peptide/protein. They occur between a start codon ATG and a stop codon (TAA, TAG, or TGA) on the DNA. Write the regular expression that selects the ORF of the DNA sequences provided and returns them and ONLY them (nothing before or after) with str_replace. Note, each row has only one ORF.

  5. Take a look at the cheat sheets on R Cheat Sheets. Print some out, even! What are the design elements that you like in these? Reference specific cheat sheets and what works for you in them. Reference specific examples in cheat sheets that DO NOT work for you.

  6. Choose a package you want to look at for your midterm. You don’t need to do anything with it yet. But, sign up here and tell us what package you have chosen and why here. See the midterm handout for more information.