LECTURE 03: Basic concepts in statistics

class: center, middle, inverse, title-slide

.title[
# LECTURE 03: Basic concepts in statistics
]
.subtitle[
## ENVS475: Exp. Design and Analysis
]

---

class: inverse

# Outline<sup>1</sup>

.footnote[[1] Lecture based on [FANR6750](https://rushinglab.github.io/FANR6750/index.html)]

#### 1) What is statistics, and why do we need it?

#### 2) Populations and samples

#### 3) Summary statistics

#### 4) Inferential statistics

--
#### 5) Hypothesis testing

---

# What is statistics?

<br/>
<br/>
> The study of the collection, analysis, interpretation, presentation, and organization of data (Dodge 2006)

--
<br/>
> The science of learning from data (various)

--
`\(Statistics = Information + Uncertainty\)`

--
* Or:

* Detecting the signal from the noise.

---
# Why do we need statistics?  
- Summarize and describe data

--
  - Summarize a Vector of data into a location and measure of dispersion

--
- Test hypotheses

--
  - i.e., are the means different between groups?

--
- Causal inference

--
  - Does `\(x\)` influence `\(y\)`?

--
- Estimate parameters

--
- Make predictions that account for uncertainty

---
# Populations Vs. Samples  
#### Population  
- A collection of subjects of interest  
- Often, a biologically meaningful unit  
- Sometimes a process of interest

#### Sample  
- A finite subset of the population  
  + i.e. the data we collect  
- Samples allow us to draw inferences about the population  
- Good samples are:  
  + Random, Representative, Sufficiently large

---
# Population: Human Height

<u>P</u>opulation <u>P</u>arameters
- The "true" mean and standard deviation

---
# Sample: Human Height  
<img src="lecture_03_basic-stats_2025_files/figure-html/samp1-1.png" width="864" style="display: block; margin: auto;" />

---
# Sample: Human Height  
<img src="lecture_03_basic-stats_2025_files/figure-html/samp1 annotated-1.png" width="864" style="display: block; margin: auto;" />

---
# Summary Statistics  
### Describing the sample  
.pull-left[
Vector of data is hard to interpret  
<table class="table table-condensed" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;"> Heights </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 67.0 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 68.2 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 74.5 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 69.2 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 69.5 </td>
  </tr>
</tbody>
</table>
]

.pull-right[
Describe any vector with two summary stats  
* Average (mean)  
* Standard Deviation  
]

---
# Summary Statistics  
.pull-left[
* **Mean** is a measure of location or central tendency  
  * Works well with normally distributed data  
  * "Peak" of the bell curve

* **Standard deviation (SD)** measure of dispersion  
  * Average distance of data point to mean  
  * "Width" of the bell curve  
]

.pull-right[
<img src="lecture_03_basic-stats_2025_files/figure-html/unnamed-chunk-2-1.png" width="576" style="display: block; margin: auto;" />
]
---
# Sample Variation 
.pull-left[
<img src="lecture_03_basic-stats_2025_files/figure-html/samp var-1.png" width="648" style="display: block; margin: auto;" />

<img src="lecture_03_basic-stats_2025_files/figure-html/samp var2-1.png" width="648" style="display: block; margin: auto;" />
]
--
.pull-right[
<br/>

* Each sample produces its own mean and SD

* Inferential statistics allows us to deal with this  
]

---
# Sampling = Uncertainty

Populations (usually) cannot be measured  
* sampling is essential

But sampling is inherently *stochastic*

- sampling produces uncertainty

- unavoidable (but that's ok!)

---
# Inferential Statistics  
Standard Error (SE) of the mean  
* On average, how far is sample mean, `\(\bar y\)`, from the population mean, `\(\mu\)`?

Statistics is what allows us to learn about the **population** using **samples** in the face of **uncertainty**

Use samples to make *inferences* about the populations   
---
# Hypothesis Testing

Statistics is often concerned with testing hypotheses  
* Is the sample mean different or the same as some value?  
* Are the sample means of two groups the same or different?

We will go through the details in future classes

The following is a brief review of statistical terms and principles

---
# Statistics review

What is a Null and Alternative Hypothesis?  
Can you write out one set of two-tailed hypotheses?  
Do you remember what a one-tailed hypothesis is?

What does `\(\alpha\)` mean?  
What value does `\(\alpha\)` usually have?

What is a `\(\text{p-values}\)`? 
What do we compare `\(\text{p-values}\)` to?

How do we decide if we reject or fail to Reject a null hypothesis?

---
# Statistics review

Null Hypothesis = base-level we're testing against.  
  * Usually has the `\(=\)` sign in it.  
  * i.e., Not different

Alternative Hypothesis   
  * Does not have an `\(=\)` sign in it  
  * In this class, it will have the `\(\ne\)` sign in it.

Two-tailed hypotheses
Null: `\(\large H_0 : \mu_1 = \mu_2\)`  
Alternative: `\(\large H_0 : \mu_1 \ne \mu_2\)`  
* In this class, we will always be working with 2-tailed tests

---
# Statistics review  
`\(\alpha\)` = level of signifigance  
  * In this class, `\(\alpha = 0.05\)`.  
  
--

`\(\text{p-values}\)` = One of the outputs of most statistical tests  
* We compare the `\(\text{p-values}\)` against `\(\alpha = 0.05\)`

If `\(\text{p-value} < \alpha\)`, we **Reject the null hypothesis **  
  + i.e., This usually means there *is a difference*

If `\(\text{p-value} > \alpha\)`, we **FAIL to reject the null hypothesis **

+ i.e., This usually means there *is NOT a difference*  
  
---
# Example

Is the average height of males and females in the US different?

* Estimated difference in heights = 5.9 inches

* `\(\text{p-value} = 0.001\)`  
--

What do we conclude?

---
# Example 2

Is the average height of males and females in the US different?

<img src="lecture_03_basic-stats_2025_files/figure-html/unnamed-chunk-4-1.png" width="864" style="display: block; margin: auto;" />
--

* Estimated difference in heights = 1.5 inches

* `\(\text{p-value} = 0.273\)`

What do we conclude?

---
# Looking ahead

#### **Wednesday**: Normal Distributions  
* Look over Normal Distribution Lecture notes before next class!

#### **Friday**: HW02

<br/>

#### **Reading**: Hector chp. 5

#### **Video Resources**: on D2L