class: center, middle, inverse, title-slide .title[ # LECTURE 03: Basic concepts in statistics ] .subtitle[ ## ENVS475: Exp. Design and Analysis ] --- class: inverse # Outline<sup>1</sup> .footnote[[1] Lecture based on [FANR6750](https://rushinglab.github.io/FANR6750/index.html)] #### 1) What is statistics, and why do we need it? -- #### 2) Populations and samples -- #### 3) Summary statistics -- #### 4) Inferential statistics -- #### 5) Hypothesis testing --- # What is statistics? <br/> <br/> > The study of the collection, analysis, interpretation, presentation, and organization of data (Dodge 2006) -- <br/> > The science of learning from data (various) -- `\(Statistics = Information + Uncertainty\)` -- * Or: -- * Detecting the signal from the noise. --- # Why do we need statistics? - Summarize and describe data -- - Summarize a Vector of data into a location and measure of dispersion -- - Test hypotheses -- - i.e., are the means different between groups? -- - Causal inference -- - Does `\(x\)` influence `\(y\)`? -- - Estimate parameters -- - Make predictions that account for uncertainty --- # Populations Vs. Samples #### Population - A collection of subjects of interest - Often, a biologically meaningful unit - Sometimes a process of interest -- #### Sample - A finite subset of the population + i.e. the data we collect - Samples allow us to draw inferences about the population - Good samples are: + Random, Representative, Sufficiently large --- # Population: Human Height <u>P</u>opulation <u>P</u>arameters - The "true" mean and standard deviation <img src="lecture_03_basic-stats_2025_files/figure-html/pop-1.png" width="648" style="display: block; margin: auto;" /> --- # Sample: Human Height <img src="lecture_03_basic-stats_2025_files/figure-html/samp1-1.png" width="864" style="display: block; margin: auto;" /> --- # Sample: Human Height <img src="lecture_03_basic-stats_2025_files/figure-html/samp1 annotated-1.png" width="864" style="display: block; margin: auto;" /> --- # Summary Statistics ### Describing the sample .pull-left[ Vector of data is hard to interpret <table class="table table-condensed" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Heights </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 67.0 </td> </tr> <tr> <td style="text-align:center;"> 68.2 </td> </tr> <tr> <td style="text-align:center;"> 74.5 </td> </tr> <tr> <td style="text-align:center;"> 69.2 </td> </tr> <tr> <td style="text-align:center;"> 69.5 </td> </tr> </tbody> </table> ] -- .pull-right[ Describe any vector with two summary stats * Average (mean) * Standard Deviation ] --- # Summary Statistics .pull-left[ * **Mean** is a measure of location or central tendency * Works well with normally distributed data * "Peak" of the bell curve * **Standard deviation (SD)** measure of dispersion * Average distance of data point to mean * "Width" of the bell curve ] .pull-right[ <img src="lecture_03_basic-stats_2025_files/figure-html/unnamed-chunk-2-1.png" width="576" style="display: block; margin: auto;" /> ] --- # Sample Variation .pull-left[ <img src="lecture_03_basic-stats_2025_files/figure-html/samp var-1.png" width="648" style="display: block; margin: auto;" /> <img src="lecture_03_basic-stats_2025_files/figure-html/samp var2-1.png" width="648" style="display: block; margin: auto;" /> ] -- .pull-right[ <br/> * Each sample produces its own mean and SD * Inferential statistics allows us to deal with this ] --- # Sampling = Uncertainty Populations (usually) cannot be measured * sampling is essential -- But sampling is inherently *stochastic* - sampling produces uncertainty - unavoidable (but that's ok!) --- # Inferential Statistics Standard Error (SE) of the mean * On average, how far is sample mean, `\(\bar y\)`, from the population mean, `\(\mu\)`? Statistics is what allows us to learn about the **population** using **samples** in the face of **uncertainty** Use samples to make *inferences* about the populations --- # Hypothesis Testing Statistics is often concerned with testing hypotheses * Is the sample mean different or the same as some value? * Are the sample means of two groups the same or different? We will go through the details in future classes The following is a brief review of statistical terms and principles --- # Statistics review What is a Null and Alternative Hypothesis? Can you write out one set of two-tailed hypotheses? Do you remember what a one-tailed hypothesis is? What does `\(\alpha\)` mean? What value does `\(\alpha\)` usually have? What is a `\(\text{p-values}\)`? What do we compare `\(\text{p-values}\)` to? How do we decide if we reject or fail to Reject a null hypothesis? --- # Statistics review Null Hypothesis = base-level we're testing against. * Usually has the `\(=\)` sign in it. * i.e., Not different -- Alternative Hypothesis * Does not have an `\(=\)` sign in it * In this class, it will have the `\(\ne\)` sign in it. -- Two-tailed hypotheses Null: `\(\large H_0 : \mu_1 = \mu_2\)` Alternative: `\(\large H_0 : \mu_1 \ne \mu_2\)` * In this class, we will always be working with 2-tailed tests --- # Statistics review `\(\alpha\)` = level of signifigance * In this class, `\(\alpha = 0.05\)`. -- `\(\text{p-values}\)` = One of the outputs of most statistical tests * We compare the `\(\text{p-values}\)` against `\(\alpha = 0.05\)` -- If `\(\text{p-value} < \alpha\)`, we **Reject the null hypothesis ** + i.e., This usually means there *is a difference* -- If `\(\text{p-value} > \alpha\)`, we **FAIL to reject the null hypothesis ** + i.e., This usually means there *is NOT a difference* --- # Example Is the average height of males and females in the US different? <img src="lecture_03_basic-stats_2025_files/figure-html/unnamed-chunk-3-1.png" width="864" style="display: block; margin: auto;" /> -- * Estimated difference in heights = 5.9 inches -- * `\(\text{p-value} = 0.001\)` -- What do we conclude? --- # Example 2 Is the average height of males and females in the US different? <img src="lecture_03_basic-stats_2025_files/figure-html/unnamed-chunk-4-1.png" width="864" style="display: block; margin: auto;" /> -- * Estimated difference in heights = 1.5 inches -- * `\(\text{p-value} = 0.273\)` -- What do we conclude? --- # Looking ahead #### **Wednesday**: Normal Distributions * Look over Normal Distribution Lecture notes before next class! #### **Friday**: HW02 <br/> #### **Reading**: Hector chp. 5 #### **Video Resources**: on D2L