class: center, middle, inverse, title-slide .title[ # LECTURE 06: Introduction to linear models ] .subtitle[ ## ENVS475: Exp. Design and Analysis ] .author[ ###
Spring 2023 ] --- class: inverse # outline <br/> #### 1) What is a model? <br/> -- #### 2) What is a linear model? <br/> -- #### 3) Linear model assumptions --- # what is a model? > "an informative representation of an object, person or system" -- #### Many types (conceptual, graphical, mathematical) -- #### In this class, we will deal with *statistical* models -- - Mathematical representation of our hypothesis -- - By necessity, models will be simplifications of reality ("all models are wrong...") -- - Do not have to be complex --- # Models -- - Inference **requires** models </br> -- - Models link **observations** to **processes** </br> -- - Models are tools that allow us understand processes that we **cannot directly observe** based on quantities that we **can** observe --- # a simple model </br> </br> `$$\Huge y = a + bx$$` -- <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-1-1.png" width="360" style="display: block; margin: auto;" /> -- It may not be obvious, but this is essentially the only model we will use this semester<sup>1</sup> .footnote[[1] With some minor variations, mainly in `\(x\)`] --- # a simple model </br> </br> `$$\Huge y = a + bx$$` <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-2-1.png" width="360" style="display: block; margin: auto;" /> If we want to use this as a statistical model, what's missing? --- # a simple model </br> </br> `$$\Huge y = a + bx$$` <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-3-1.png" width="360" style="display: block; margin: auto;" /> If we want to use this as a statistical model, what's missing? #### **Stochasticity!** --- # a simple model </br> </br> `$$\Huge y = a + bx + \epsilon_i$$` <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-4-1.png" width="360" style="display: block; margin: auto;" /> If we want to use this as a statistical model, what's missing? #### **Stochasticity!** --- class:inverse, middle, center # the linear model --- # the linear model <br/> <br/> `$$\Large response = deterministic\; part+stochastic\; part$$` <br/> <br/> -- `$$\underbrace{\LARGE E[y_i] = \beta_0 + \beta_1 \times x_i}_{Deterministic}$$` <br/> <br/> -- `$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma)}_{Stochastic}$$` -- Note that the deterministic portion of the model has the same form as the equation for a line: `\(y = a + b \times x\)`, which is why we call these linear models --- # the linear model #### A "simple" example: X is categorical `$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$` -- <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-5-1.png" width="288" style="display: block; margin: auto;" /> --- # Categorical groups We are often interested in comparing different groups together. * Control vs. Treatment * Burned vs. Un-burned * High Elevation vs. Low Elevation We can treat these categories as 0's and 1's `$$\LARGE E[y_i] = -2 + 0.5 \times x_i$$` `$$\LARGE x_{control} = 0$$` `$$\LARGE x_{treatment} = 1$$` --- # Categorical groups `$$\LARGE E[y_i] = -2 + 0.5 \times x_i$$` <br/> -- `$$\Large x_{control} = 0$$` <br/> -- `$$\LARGE E[y_{control}] = -2 + 0.5 \times 0 = -2$$` <br/> <br/> -- `$$\Large x_{treatment} = 1$$` <br/> -- `$$\LARGE E[y_{treatment}] = -2 + 0.5 \times 1 = -1.5$$` --- # the linear model #### A "simple" example: X is categorical `$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$` `$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma=0.25)}_{Stochastic}$$` -- <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-6-1.png" width="288" style="display: block; margin: auto;" /> --- # the linear model #### Same model, continuous `\(\Large x\)` `$$\underbrace{\LARGE E[y_i] = -2 + 0.5 \times x_i}_{Deterministic}$$` `$$\underbrace{\LARGE y_i \sim normal(E[y_i], \sigma=0.25)}_{Stochastic}$$` -- <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-7-1.png" width="432" style="display: block; margin: auto;" /> --- # the linear model #### A more complex model `$$\large y_i = \beta_0 + \beta_1x_{1} + \beta_2x_{2} + ... + \beta_px_{p} + \epsilon_i$$` -- - Each `\(\beta\)` coefficient is the effect of a specific predictor variables `\(x\)` - Predictor variables may be continuous, binary, factors, or a combination - We will cover more complex models (and interpretation) later --- # residuals #### One concept we will talk about a lot is *residuals* -- - Residuals are the difference between the observed values `\(y_i\)` and the predicted values `\(E[y_i]\)` <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-8-1.png" width="396" style="display: block; margin: auto;" /> --- # residuals #### One concept we will talk about a lot is *residuals* - Residuals are the difference between the observed values `\(y_i\)` and the predicted values `\(E[y_i]\)` <img src="lecture_06_intro-models_files/figure-html/unnamed-chunk-9-1.png" width="396" style="display: block; margin: auto;" /> -- - How much variation in `\(y\)` is explained by `\(x\)`? -- - Useful for assessing whether data violate model assumptions --- class:inverse, center, middle # assumptions --- # assumptions #### **EVERY** model has assumptions -- - Assumptions are necessary to simplify real world to workable model -- - If your data violate the assumptions of your model, inferences *may* be invalid -- - **Always** know (and test) the assumptions of your model<sup>1</sup> .footnote[[1] You know what happens when you assume...] --- # linear model assumptions </br> `$$\Large y_i = \beta_0 + \beta_1 x_i + \epsilon_i$$` `$$\Large \epsilon_i \sim normal(0, \sigma)$$` </br> -- 1) **Linearity**: The relationship between `\(x\)` and `\(y\)` is linear -- 2) **Normality**: The residuals are normally distributed<sup>2</sup> .footnote[[2] Note that these assumptions apply to the residuals, not the data!] -- 3) **Homoscedasticity**: The residuals have a constant variance at every level of `\(x\)` -- 4) **Independence**: The residuals are independent (i.e., uncorrelated with each other) ??? Because virtually every model we will use this semester is a linear model, these assumptions apply to everything we will discuss from here out --- # linear models #### Very flexible -- - Predictor(s) can take different forms (binary, continuous, factor) -- - Can contain many predictors -- - Can model non-linear relationships -- #### Link different "tests" (e.g., t-tests, ANOVA, ANCOVA, linear regression) -- #### Can be used for different statistical goals - Estimating unknown parameters - Testing hypotheses - Describing stochastic systems - Making predictions that account for uncertainty --- # looking ahead <br/> **Next time:** Linear Models Lab **Friday:** HW 08: Linear Models <br/> **Reading:** Hector Ch. 6, 7, 9 **Aknowledgements** This lecture was largely based on Clark Rushings [NR6750](https://rushinglab.github.io/FANR6750/index.html) course materials