Modeling Data

class: center, middle

# Modeling Data

![model that data](./Images/modeling/modeling_officespace.jpg)

---
class: middle

# Up until now, we've just looked at data

.pull-left-wide[
<img src="modeling_intro_files/figure-html/penguin_viz-1.png" style="display: block; margin: auto;" />
]

.pull-right-narrow[
- Is there really a linear relationship there?  
  
- Does that relationship vary by species?  
  
- Does that relationship vary by sex?  
  
- What else can I learn from this system?
]

---
# How do we move beyond squinting? Is there one line here?

---
# Or three?

---
# How Do Relationships Vary by Response Variable?

---
class: center, middle

# How do we learn about the world with data?

---

# Building an Inferential Machine: Data in Knowledge Out!
 
![MIT CSAIL](Images/modeling/Data-Machine.gif)

--
 
.center[No.... that's not quite right]

---

# Is Your Model a Golem?
#### (sensu McElreath)

![:scale 45%](Images/modeling/golem.png)
--
 
We build a model of the world, give it instructions, and let it loose - trying not to burn down Prague.

---
background-image: url("Images/modeling/midnight_train_sharon_ann_bodenus.jpg")
color: white

# .white[On a Midnight Train to Inference]

---

# What are we doing when we model data?

1. Start with a question we want to answer 
  
--

2. Design a model of the part of the world we need to know about to answer the question  
  
--

3. Acquire data appropriate to build that model

4. Chose an engine to fit the model

5. Chose an inferential framework

6. Use that framework to ask the fit model questions about the world

---
class: center, middle

# Start with a Question - What is/are Yours?

---
background-image:url("Images/modeling/train_blueprints_madcom_deviantart.png")
background-size: contain

# Design Your Train

---
# DAG - that's a good Train

--
.large[draw your system!]

---
background-image:url("Images/modeling/coastal_starlight.jpg")
background-size: contain

# Build Your Train with Data

---
# Choose an Engine

![:col_header Ordinary Least Squares, Maximum Likelihood, Bayes]
![:col_row 
 <img src="Images/modeling/steam_train.jpg"> ,
 <img src="Images/modeling/diesel.jpg"> ,
 <img src="Images/modeling/bullet_train.jpg">

]
--
![:col_list 
  Minimizes distance between prediction and observed,
  
  Models distribution of data based on the model,
  
  Same - but incorporates prior information and gets wild
]

---
# Choose Your Inferential Track

![:scale 65%](Images/modeling/train_switch.jpg)  
![:col_header Hypothesis Testing, 
  Model Comparison, 
  Bayesian Model Implications
  ]
--
![:col_list 
  Deductive Inference,
  Predictive Inference,
  Inductive Inference
]
--
![:col_list 
  Uses probabilities of overlap with a point hypothesis,
  Uses tests of model performances on new data,
  Uses probability distributions of parameters and simulation
]

---
# Look Out the Window and Ask Questions!
![](Images/modeling/starlight_obs_car.jpg)

---
# What is the Landscape Your Train is Taking You Through?
![](Images/modeling/train_journey.jpg)

---
# What You Want to Avoid

![from the purple quill](Images/modeling/train-explosion-3.jpg)

---
# Or Worse, Problems That You Might Not Notice

![:scale 50%](Images/modeling/train_derail.jpg)

---
class: center, middle

# What factors influence penguin bill depth?

---

# How do I think the system works?

---

# How do I think the system works?

`$$Depth_{ij} \sim \mathcal{N}(\hat{depth}, \sigma^2)\\
\hat{depth}_i = \alpha_j + \beta * mass_{ij}$$`

- Beak depth of individual i from species j ...

- is Normally distributed with some predicted depth with some error variance

- the predicted depth is a function of a species-specific intercept -  `$\alpha_j$`  - and a coefficient `$\beta$` times the body mass

---

# Chose an Engine and fit that model....

![](Images/modeling/steam_train.jpg)

---

# What OLS is Doing

---

# If we had done MLE...

---

# If we had done Bayes...

---

# Make sure we our train doesn't blow up...

A comparison of the distribution of observed bill depths versus those predicted by our model - one of *many* diagnostics

---

# So many ways our train could have blown up...

---

# Inference

![:scale 65%](Images/modeling/train_switch.jpg)

Oh, let's go with hypothesis testing...

---

# Our first journey: what do things look like out the window?

---

# Coming 'round the bend - do species or body mass matter?

Let's look at the ratio of variation explained by each predictor versus noise, and ask, what's the probability of seeing that ratio or a more extreme ratio if that predictor actually did not affect bill depth. This is a *p-value*.

---

# Passing the next hill - how precise are our coefficient estimates?

We can see that there is still scatter - but can estimate that our model explains 80% of the variation in bill depth

---
# One last check for the road - are species different from one another, if we held body mass constant?

<img src="modeling_intro_files/figure-html/posthoc-1.png" style="display: block; margin: auto;" />
--

Note, I'm not calculating p-values, but looking at the precision of our estimates of differences to see if they overlap 0. This is still hypothesis testing - deductive inference.

---

# What did we learn?

- Species and body mass matter for bill depth

- We have been able to estimate this relationship fairly well for the data we have

- Body mass leads to deeper bills

- If we lined up a penguins of the same body mass, Gentoo penguins have less deep bills

- At the same mass, Adelie and Chinstrap penguins have the same bill depth