LECTURE 06: Introduction to linear models

class: center, middle, inverse, title-slide

.title[
# LECTURE 06: Introduction to linear models
]
.subtitle[
## ENVS475: Exp. Design and Analysis
]
.author[
### Spring 2023
]

---

class: inverse

# outline

#### 1) What is a model?

--

#### 2) What is a linear model?

#### 3) Linear model assumptions

---
# what is a model?

> "an informative representation of an object, person or system"
--

#### Many types (conceptual, graphical, mathematical)

#### In this class, we will deal with *statistical* models

- Mathematical representation of our hypothesis

- By necessity, models will be simplifications of reality ("all models are wrong...")

- Do not have to be complex

---
# Models

--
- Inference **requires** models

--
- Models link **observations** to **processes**

--
- Models are tools that allow us understand processes that we **cannot directly observe** based on quantities that we **can** observe

---
# a simple model

`$$\Huge y = a + bx$$`

--
It may not be obvious, but this is essentially the only model we will use this semester1

.footnote[[1] With some minor variations, mainly in `$x$`]

---
# a simple model

`$$\Huge y = a + bx$$`

If we want to use this as a statistical model, what's missing?

---
# a simple model

`$$\Huge y = a + bx$$`

If we want to use this as a statistical model, what's missing?

#### **Stochasticity!**

---
# a simple model

`$$\Huge y = a + bx + \epsilon_i$$`

If we want to use this as a statistical model, what's missing?

#### **Stochasticity!**

---
class:inverse, middle, center

# the linear model

---
# the linear model

`$$\Large response = deterministic\; part+stochastic\; part$$`

--
`$$\underbrace{\LARGE E[y_i] = \beta_0 + \beta_1 \times x_i}_{Deterministic}$$`

--
`$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma)}_{Stochastic}$$`  
--
Note that the deterministic portion of the model has the same form as the equation for a line: `$y = a + b \times x$`, which is why we call these linear models

---
# the linear model

#### A "simple" example: X is categorical

`$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$`

--
<img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-5-1.png" width="288" style="display: block; margin: auto;" />
---
# Categorical groups

We are often interested in comparing different groups together.

* Control vs. Treatment  
* Burned vs. Un-burned  
* High Elevation vs. Low Elevation

We can treat these categories as 0's and 1's

`$$\LARGE E[y_i] = -2 + 0.5 \times x_i$$`

`$$\LARGE x_{control} = 0$$`
`$$\LARGE x_{treatment} = 1$$`

---
# Categorical groups

`$$\LARGE E[y_i] = -2 + 0.5 \times x_i$$`

--
`$$\Large x_{control} = 0$$`

--
`$$\LARGE E[y_{control}] = -2 + 0.5 \times 0 = -2$$`

--
`$$\Large x_{treatment} = 1$$`

--
`$$\LARGE E[y_{treatment}] = -2 + 0.5 \times 1 = -1.5$$`

---
# the linear model

#### A "simple" example: X is categorical

`$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$`

`$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma=0.25)}_{Stochastic}$$`

--
<img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-6-1.png" width="288" style="display: block; margin: auto;" />

---
# the linear model

#### Same model, continuous `$\Large x$`

`$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$`

`$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma=0.25)}_{Stochastic}$$`

--
<img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-7-1.png" width="432" style="display: block; margin: auto;" />

---
# the linear model

#### A more complex model

`$$\large y_i = \beta_0 + \beta_1x_{1} + \beta_2x_{2} + ... + \beta_px_{p} + \epsilon_i$$`

- Each `$\beta$` coefficient is the effect of a specific predictor variables `$x$`

- Predictor variables may be continuous, binary, factors, or a combination

- We will cover more complex models (and interpretation) later

---
# residuals

#### One concept we will talk about a lot is *residuals*

- Residuals are the difference between the observed values `$y_i$` and the predicted values `$E[y_i]$`

---
# residuals

#### One concept we will talk about a lot is *residuals*

- Residuals are the difference between the observed values `$y_i$` and the predicted values `$E[y_i]$`

- How much variation in `$y$` is explained by `$x$`?

- Useful for assessing whether data violate model assumptions

---
class:inverse, center, middle

# assumptions

---
# assumptions

#### **EVERY** model has assumptions

- Assumptions are necessary to simplify real world to workable model

- If your data violate the assumptions of your model, inferences *may* be invalid

- **Always** know (and test) the assumptions of your model1

.footnote[[1] You know what happens when you assume...]

---
# linear model assumptions

`$$\Large y_i = \beta_0 + \beta_1 x_i + \epsilon_i$$`

`$$\Large \epsilon_i \sim normal(0, \sigma)$$`

1) **Linearity**: The relationship between `$x$` and `$y$` is linear

2) **Normality**: The residuals are normally distributed2

.footnote[[2] Note that these assumptions apply to the residuals, not the data!]

3) **Homoscedasticity**: The residuals have a constant variance at every level of `$x$`

4) **Independence**: The residuals are independent (i.e., uncorrelated with each other)

???

Because virtually every model we will use this semester is a linear model, these assumptions apply to everything we will discuss from here out

---
# linear models

#### Very flexible

- Predictor(s) can take different forms (binary, continuous, factor)

- Can contain many predictors

- Can model non-linear relationships

#### Link different "tests" (e.g., t-tests, ANOVA, ANCOVA, linear regression)

#### Can be used for different statistical goals

- Estimating unknown parameters

- Testing hypotheses

- Describing stochastic systems

- Making predictions that account for uncertainty

---
# looking ahead

**Next time:** Linear Models Lab

**Friday:** HW 08: Linear Models

**Reading:** Hector Ch. 6, 7, 9

**Aknowledgements** This lecture was largely based on Clark Rushings [NR6750](https://rushinglab.github.io/FANR6750/index.html) course materials