Models with Many Predictors

class: center, middle
background-image: url("Images/mlr/plate_spinners.jpg")

# .white[Multiple Predictors in a Model]

---

# So Many Ways to Build a Model

1. Multiple Predictors  
  
2. Many Types of Categories
  
3. Many Models and Cross-Validation

---

# Where we ended - One Predictor and Go!
.large[

`$$y_i \sim \mathcal{N}(\widehat{y_i}, \sigma^2)\\
\widehat{y_i} = \beta_0 + \beta_1  x$$`

]

---

# Where we ended - This Model Works for Means!
.large[

`$$y_i \sim \mathcal{N}(\widehat{y_i}, \sigma^2)\\
\widehat{y_i} = \beta_0 + \beta_1  x$$`

]

---

# Where we ended - Many Means, Many 0/1s! So.... Many Slopes?
.large[

`$$y{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)\\
\widehat{y_{ij}} = \beta_0 + \sum\beta_j  x_{ij}$$`
]

---

# One-Way ANOVA Graphically

![:scale 60%](./Images/mlr/anova.png)

.large[

`$$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)\\
\widehat{y_{ij}} = \beta_0 + \sum\beta_j  x_{ij}$$`
]

---
# Multiple Linear Regression?

![:scale 60%](./Images/mlr/regression1.png)

- Note no connection between predictors, as in ANOVA.

- This is ONLY true if we have manipulated variables so that there is no relationship between the two. This is not often the case!

---
# Multiple Linear Regression

![:scale 60%](./Images/mlr/regression2.png)

- Curved double-headed arrow indicates COVARIANCE between predictors that we must account for.

- MLR controls for the correlation - estimates unique contribution of each predictor controlling for the other.

- I like to think of it as holding the other predictor at its mean, or 0, or anything!

---
# Multiple Linear Regression - We've Seen This Before!

.large[

`$$y{i} \sim \mathcal{N}(\widehat{y_{i}}, \sigma^2)\\
\widehat{y_{i}} = \beta_0 + \sum\beta_j  x_{ij}$$`

]

![:scale 60%](./Images/mlr/regression2.png)

---
background-image: url("Images/mlr/fires.jpg")
background-size: cover

<div style="bottom:100%; text-align:left; background:goldenrod"><h4>Five year study of wildfires & recovery in Southern California shur- blands in 1993. 90 plots (20 x 50m)</h4>
(data from Jon Keeley et al.)</div>

---

# What causes species richness?

- Distance from fire patch 
- Elevation
- Abiotic index
- Patch age
- Patch heterogeneity
- Severity of last fire
- Plant cover

---

# Many Things may Influence Species Richness

---

# Our Model

`$$Richness_i \sim \mathcal{N}(\widehat{Richness_i}, \sigma^2)\\
\widehat{Richness_i} =\beta_{0} + \beta_{1} cover +\beta_{2} firesev + \beta_{3}hetero$$`

---

# Fit with Our Engine: Ordinary Least Squares
### It's just another linear model!

.large[

```r
klm <- lm(rich ~ cover + firesev + hetero, data=keeley)

klm_out <- augment(klm)
```

or.....

```r
library(parsnip)
mod <- linear_reg() %>%
  set_engine("lm") %>%
  fit(rich ~ cover + firesev + hetero, data=keeley)

mod_out <- mod %>%
  pluck("fit") %>%
  augment()
```

]

---
# Does Your Model Match Your Data?

---

# Are There Weird Patterns in Your Residuals?
<img src="many_predictors_files/figure-html/mlr_resid-1.png" style="display: block; margin: auto;" />

---

![](./Images/mlr/gosling_multicollinearity.jpg)

If your predictors are correlated at > 0.8.... are they really independent predictors?

---
# The Variance Inflation Factor

```
# Check for Multicollinearity

Low Correlation

Term  VIF Increased SE Tolerance
   cover 1.29         1.14      0.77
 firesev 1.26         1.12      0.79
  hetero 1.05         1.02      0.95
```

- Basically, you look at how much all predictors explain each other ( `$R^2$` of lots of mlr fits)

- 5-10 is... not good. Greater, you have set fire to your inferential train

---

# What To Do If You've Caused a Fire

- Cry

- Drop a predictor

- Transform a predictor, if it is appropriate
---

# Did Our PredictorS Matter?

F = (MS model with predictor - MS model without)/MS Residual  
<br>
Where MS is a measure of variability (Sums of squares/DF)

<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> sumsq </th>
   <th style="text-align:right;"> df </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> cover </td>
   <td style="text-align:right;"> 1674.18 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 12.01 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> firesev </td>
   <td style="text-align:right;"> 635.65 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 4.56 </td>
   <td style="text-align:right;"> 0.04 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> hetero </td>
   <td style="text-align:right;"> 4864.52 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 34.91 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residuals </td>
   <td style="text-align:right;"> 11984.57 </td>
   <td style="text-align:right;"> 86 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
</tbody>
</table>

---

# What are the Association Between Predictors and Response?

<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 1.68 </td>
   <td style="text-align:right;"> 10.67 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> cover </td>
   <td style="text-align:right;"> 15.56 </td>
   <td style="text-align:right;"> 4.49 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> firesev </td>
   <td style="text-align:right;"> -1.82 </td>
   <td style="text-align:right;"> 0.85 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> hetero </td>
   <td style="text-align:right;"> 65.99 </td>
   <td style="text-align:right;"> 11.17 </td>
  </tr>
</tbody>
</table>

- **Intercept:** If all predictors are 0, a plot will have 1.68 species approach on average.

- **Cover:** Holding fire severity and heterogeneity constant, a 1 unit increase in cover is associated with an increase in 15.6 species

- **Fire Severity:** Holding cover and heterogeneity constant, a 1 unit increase in fire severity is associated with an loss of 1.82 species

- **Heterogeneity:** Holding fire severity and cover constant, a 1 unit increase in heterogeneity is associated with an increase in 70 species

---

# Visualize Each Predictor Holding Others at the Median or Mean

---

# Or Build a Plot That Shows the Implication of Selecting Certain Values - a Counterfactual Plot

---

# So Many Ways to Build a Model

1. Multiple Predictors  
  
2. .red[ Many Types of Categories ]
  
3. Many Models and Cross-Validation

---
class: center, middle
![image](./Images/anova/anova-3.jpg)

---
# Compare These Models

### Multiple Linear Regression:

`$$y_{ij} \sim N(\widehat{y_{ij}}, \sigma^{2} )$$`
`$$\widehat{y_{ij}} = \beta_{0} + \sum \beta_{ij}x_{ij}$$`

### Mutiway ANOVA:

`$$y_{ijk} \sim N(\widehat{y_{ijk}}, \sigma^{2} )$$` 
`$$\widehat{y_{ijk}} = \beta_{0} + \sum \beta_{ij}x_{ij} + \sum \beta_{ik}x_{ik}$$`

`$$x_{ij} \; \mathrm{and} \; x_{ik} = 0,1$$`

---
# Multiway Model = More Than One Treatment Type
## For example: A Randomized Controlled Blocked Design

![:scale 60%](./Images/anova/blocked_designs/Slide4.jpg)

---

# Effects of Stickleback Density on Zooplankton Replicated in Different Lakes (Blocks)
<br><br>
.pull-left[
![image](./Images/anova/daphnia.jpeg)
]

.pull-right[
![image](./Images/anova/threespine_stickleback.jpeg)
]

---
# Effects of Both Treatment and Block

---
# It's All 0s and 1s

Data Prepped for Model
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> treatmentcontrol </th>
   <th style="text-align:right;"> treatmenthigh </th>
   <th style="text-align:right;"> treatmentlow </th>
   <th style="text-align:right;"> block2 </th>
   <th style="text-align:right;"> block3 </th>
   <th style="text-align:right;"> block4 </th>
   <th style="text-align:right;"> block5 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
</tbody>
</table>

---
# Put it to the Test: F-Test
<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> df </th>
   <th style="text-align:right;"> sumsq </th>
   <th style="text-align:right;"> meansq </th>
   <th style="text-align:right;"> F </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> treatment </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 6.857 </td>
   <td style="text-align:right;"> 3.429 </td>
   <td style="text-align:right;"> 16.366 </td>
   <td style="text-align:right;"> 0.001 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> block </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 2.340 </td>
   <td style="text-align:right;"> 0.585 </td>
   <td style="text-align:right;"> 2.792 </td>
   <td style="text-align:right;"> 0.101 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residuals </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 1.676 </td>
   <td style="text-align:right;"> 0.209 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
</tbody>
</table>

`$R^2$` = 0.846

---
# Comparison of Differences at Average of Other Treatment(s)

<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> contrast </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> SE </th>
   <th style="text-align:right;"> df </th>
   <th style="text-align:right;"> t.ratio </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> control - high </td>
   <td style="text-align:right;"> 1.64 </td>
   <td style="text-align:right;"> 0.289 </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 5.665 </td>
   <td style="text-align:right;"> 0.001 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> control - low </td>
   <td style="text-align:right;"> 1.02 </td>
   <td style="text-align:right;"> 0.289 </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 3.524 </td>
   <td style="text-align:right;"> 0.019 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> high - low </td>
   <td style="text-align:right;"> -0.62 </td>
   <td style="text-align:right;"> 0.289 </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> -2.142 </td>
   <td style="text-align:right;"> 0.142 </td>
  </tr>
</tbody>
</table>

---

# And a Plot!

---
# If you really wanted to...

---

# So Many Ways to Build a Model

1. Multiple Predictors  
  
2. Many Types of Categories
  
3. .red[Many Models and Cross-Validation]

---
background-image: url("Images/mlr/fires.jpg")
background-size: cover

---

# What causes species richness?

- Distance from fire patch 
- Elevation
- Abiotic index
- Patch age
- Patch heterogeneity
- Severity of last fire
- Plant cover

---

# WHAT PREDICTORS SHOULD I USE?

![](./Images/mlr/regression_depression.jpg)

---
# The Second Inferential Track!

![:scale 65%](Images/modeling/train_switch.jpg)  
![:col_header Hypothesis Testing, 
  Model Comparison, 
  Bayesian Model Implications
  ]
--
![:col_list 
  Deductive Inference,
  Predictive Inference,
  Inductive Inference
]
--
![:col_list 
  Uses probabilities of overlap with a point hypothesis,
  Uses tests of model performances on new data,
  Uses probability distributions of parameters and simulation
]

---

# Predictive Ability: A Way to Differentiate Between Model Performance

- Fit of a model seems like a good place to start...

- BUT - `$R^2$` is specific to the data *you used to fit the model.

- We need something more objective!

- How about how well our model does at predicting new data?

---

# But we Don't HAVE any New Data: Training, Testing, and Cross-Validation

1. Fit a model on a **training** data set

2. Evaluate a Model on a **TEST** data set

3. Compare predictive ability of competing models with MSE, Deviance, etc.

---

# Common Out of Sample Metrics

`$MSE = \frac{1}{n}\sum{(Y_i - \hat{Y})^2}$`  
    - In units of sums of squares

`$RMSE = \sqrt{\frac{1}{n}\sum{(Y_i - \hat{Y})^2}}$`  
    - In units of response variable!
    - Estimate of SD of out of sample predictions
     
`$Deviance = -2 \sum log(\space p(Y_i | \hat{Y}, \theta)\space)$`  
    - Probability-based
    - Encompasses a wide number of probability distributions
    - Just MSE for gaussian linear models!

---

# But.... what data do I use for training and testing?
## Random Cross-Validation

- Cross-validating on only one point could be biased by choice of poing

- So, choose a random subset of your data!

- Typically use a 60:40 split, or 70:30 for lower sample size

- Calculate fit metrics for alternate models and compare

---

# For Example...Leave Out One Row of the Keeley Data

`$$Richness_i \sim \mathcal{N}(\widehat{Richness_i}, \sigma^2)\\
\widehat{Richness_i} =\beta_{0} + \beta_{1} cover +\beta_{2} firesev + \beta_{3}hetero$$`

---

# Or, Compare Two Models Leaving out 30% of the Data At Random

```r
split_dat <- initial_split(keeley, prop = 0.7)

mod_a <- lm(rich ~ cover + firesev + hetero, 
           data = training(split_dat))

mod_b <- lm(rich ~ cover, 
           data = training(split_dat))
```

---
class: center, middle

# But, wait, what if I choose a bad section of the data by chance?

---

# K-Fold Cross Validation

- Data is split into K sets of training and testing folds

- Performance is averaged over all sets

![](Images/cv/kfold_performance.png)

---

# Five-Fold CV

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> id </th>
   <th style="text-align:right;"> mse_all </th>
   <th style="text-align:right;"> mse_cover </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Fold1 </td>
   <td style="text-align:right;"> 173.8523 </td>
   <td style="text-align:right;"> 253.6410 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Fold2 </td>
   <td style="text-align:right;"> 128.8726 </td>
   <td style="text-align:right;"> 256.7461 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Fold3 </td>
   <td style="text-align:right;"> 154.1263 </td>
   <td style="text-align:right;"> 209.2751 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Fold4 </td>
   <td style="text-align:right;"> 139.0737 </td>
   <td style="text-align:right;"> 198.8732 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Fold5 </td>
   <td style="text-align:right;"> 170.3209 </td>
   <td style="text-align:right;"> 132.5258 </td>
  </tr>
</tbody>
</table>

All Predictors Model Score: 153.2491681

Cover Only Score: 210.2122324

---
# Problem: How Many Folds?

- What is a good number?
     - More folds = smaller test dataset
     - Bias-Variance tradeoff if also looking at coefficients

- 5 and 10 are fairly standard, depending on data set size

- More folds = closer to average out of sample error

- But, more folds can be computationally intensive

- Still, Leave One Out Cross Validation is very powerful

---
# LOOCV (Leave One Out Cross-Validation)

![](./Images/cv/LOOCV.gif)

---
# LOOCV Comparison of Out of Sample Deviance (MSE)
<br><br>

All Predictor Model:144.78

Cover Only Model: 210.27

---
class: center, middle

# What if We Have 10e10 Data Points or Many  Many Models?

---
# A Funny Thing Happens As You Increase Parameters in a Model...

---
# Enter the AIC
.pull-left[
![](./Images/mmi/hirotugu_akaike.jpg)
]

.pull-right[
- `$E[D_{test}] = D_{train} + 2K$`  
  
- This is Akaike's Information Criteria (AIC)

]

`$$\Large AIC = Deviance + 2K$$`

---
# AIC and Prediction

- AIC optimized for forecasting (out of sample deviance)

- Approximates average out of sample deviance from test data

- Assumes large N relative to K  
    - AICc for a correction

---

# AIC Tables Give us Weight of Evidence for Different Models

```r
klm <- lm(rich ~ cover + firesev + hetero, data=keeley)
klm_cover <- lm(rich ~ cover, data=keeley)
klm_firesev <- lm(rich ~ firesev, data=keeley)
klm_hetero <- lm(rich ~ hetero, data=keeley)
```

<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Modnames </th>
   <th style="text-align:right;"> K </th>
   <th style="text-align:right;"> AIC </th>
   <th style="text-align:right;"> Delta_AIC </th>
   <th style="text-align:right;"> AICWt </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> All predictors </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 705.65 </td>
   <td style="text-align:right;"> 0.00 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> heterogeneity </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 728.03 </td>
   <td style="text-align:right;"> 22.38 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> fire </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 736.03 </td>
   <td style="text-align:right;"> 30.38 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> cover </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 738.80 </td>
   <td style="text-align:right;"> 33.15 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
</tbody>
</table>

---
# We Have Visisted the Second Track!

---
# Final Thoughts on Cross-Validation and Model Selection

- Make sure your model meets assumptions before including it in a set to be evaluated! Don't set fire to your inferential train!!

- Prediction is not always your goal!

- A model can be predictive but not have causal meaning

- BUT - prediction is highly useful for real-world application

- In working with users, CV approaches can help them gain insight into which models to use in order to guide their choices