Basic for loop
- Loops are the fundamental structure for repetition in programming
forloops perform the same action for each item in a list of things
for (item in list_of_items) {
do_something(item)
}
- To see an example of this let’s calculate masses from volumes using a loop
- Need
print()to display values inside a loop or function
volumes = c(1.6, 3, 8)
for (volume in volumes){
mass <- 2.65 * volume ^ 0.9
print(mass)
}
- Code in the loop will run once for each value in volumes
- Everything between the curly brackets is executed each time through the loop
- Code takes the first value from
volumesand assigns it tovolumeand does the calculation and prints it - Then it takes the second value from
volumesand assigns it tovolumeand does the calculation and prints it - And so on
- So, this loop does the same exact thing as
volume <- volumes[1]
mass <- 2.65 * volume ^ 0.9
print(mass)
volume <- volumes[2]
mass <- 2.65 * volume ^ 0.9
print(mass)
volume <- volumes[3]
mass <- 2.65 * volume ^ 0.9
print(mass)
Do Tasks 1 & 2 in Basic For Loops.
Looping with an index & storing results
- R loops iterate over a series of values in a vector or other list like object
- When we use that value directly this is called looping by value
- But there is another way to loop, which is called looping by index
- Looping by index loops over a list of integer index values, typically starting at 1
- These integers are then used to access values in one or more vectors at the position inicated by the index
- If we modified our previous loop to use an index it would look like this
- We often use
ito stand for “index” as the variable we update with each step through the loop
volumes = c(1.6, 3, 8)
for (i ...)
- We then create a vector of position values starting at 1 (for the first value) and ending with the length of the object we are looping over
volumes = c(1.6, 3, 8)
for (i in 1:3)
- We don’t want to have to know the length of the vector and it might change in the future, so we’ll look it up using the
length()function
volumes = c(1.6, 3, 8)
for (i in 1:length(volumes)){
}
- Then inside the loop instead of doing the calculation on the index (which is just a number between 1 and 3 in our case)
- We use square brackets and the index to get the appropriate value out of our vector
volumes = c(1.6, 3, 8)
for (i in 1:length(volumes)){
mass <- 2.65 * volumes[i] ^ 0.9
print(mass)
}
- This gives us the same result, but it’s more complicated to understand
- So why would we loop by index?
-
The advantage to looping by index is that it lets us do more complicated things
- One of the most common things we use this for are storing the results we calculated in the loop
- To do this we start by creating an empty object the same length as the results will be before the loop starts
- To store results in a vector we use the function
vectorto create an empty vector of the right length modeis the type of data we are going to storelengthis the length of the vector
masses <- vector(mode = "numeric", length = length(volumes))
masses
- Then add each result in the right position in this vector
- For each trip through the loop put the output into the empty vector at the
ith position
for (i in 1:length(volumes)){
mass <- 2.65 * volumes[i] ^ 0.9
masses[i] <- mass
}
masses
- Walk through iteration in debugger
Do Tasks 3-4 in Basic For Loops.
End of 1 hour class
Looping over multiple values
- Looping with an index also allows us to access values from multiple vectors
as <- c(2.65, 1.28, 3.29)
bs <- c(0.9, 1.1, 1.2)
volumes = c(1.6, 3, 8)
masses <- vector(mode="numeric", length=length(volumes))
for (i in 1:length(volumes)){
mass <- as[i] * volumes[i] ^ bs[i]
masses[i] <- mass
}
Do Task 5 in Basic For Loops.
Looping with functions
- It is common to combine loops with with functions by calling one or more functions as a step in our loop
- For example, let’s take the non-vectorized version of our
est_massfunction that returns an estimated mass if thevolume > 5andNAif it’s not.
est_mass <- function(volume, a, b){
if (volume > 5) {
mass <- a * volume ^ b
} else {
mass <- NA
}
return(mass)
}
- We can’t pass the vector to the function and get back a vector of results because of the
ifstatements - So let’s loop over the values
- First we’ll create an empty vector to store the results
- And them loop by index, callling the function for each value of
volumes
masses <- vector(mode="numeric", length=length(volumes))
for (i in length(volumes)){
mass <- est_mass(volumes[i], as[i], bs[i])
masses[i] <- mass
}
- This is the for loop equivalent of an
mapplystatement
masses_apply <- mapply(est_mass, volumes, a, b)
Looping over files
- Repeat same actions on many similar files
- Let’s download some simulated satellite collar data
download.file("http://www.datacarpentry.org/semester-biology/data/locations.zip",
"locations.zip")
unzip("locations.zip")
- Now we need to get the names of each of the files we want to loop over
- We do this using
list.files() - If we run it without arguments it will give us the names of all files in the directory
list.files()
- But we just want the data files so we’ll add the optional
patternargument to only get the files that start with"locations-"
data_files = list.files(pattern = "locations-")
- Once we have this list we can loop over it count the number of observations in each file
- First create an empty vector to store those counts
n_files = length(data_files)
results <- integer(n_files)
- Then write our loop
for (i in 1:n_files){
filename <- data_files[i]
data <- read.csv(filename)
count <- nrow(data)
results[i] <- count
}
Do Task 1 of Multiple-file Analysis. Exercise uses different collar data
Storing loop results in a data frame
- We often want to calculate multiple pieces of information in a loop making it useful to store results in things other than vectors
- We can store them in a data frame instead by creating an empty data frame and storing the results in the
ith row of the appropriate column - Associate the file name with the count
- Also store the minimum latitude
- Start by creating an empty data frame
- Use the
data.framefunction - Provide one argument for each column
- “Column Name” = “an empty vector of the correct type”
results <- data.frame(file_name = character(n_files),
count = integer(n_files),
min_lat = numeric(n_files))
- Now let’s modify our loop from last time
- Instead of storing
countinresults[i]we need to first specify thecountcolumn using the$:results$count[i] - We also want to store the filename, which is
data_files[i]
for (i in 1:n_files){
filename <- data_files[i]
data <- read.csv(filename)
count <- nrow(data)
min_lat = min(data$lat)
results$file_name[i] <- filename
results$count[i] <- count
results$min_lat[i] <- min_lat
}
Do Task 2 Multiple-file Analysis. Exercise uses different collar data
Subsetting Data (optional)
- Loops can subset in ways that are difficult with things like
group_by - Look at some data on trees from the National Ecological Observatory Network
library(ggplot2)
library(dplyr)
neon_trees <- read.csv('data/HARV_034subplt.csv')
ggplot(neon_trees, aes(x = easting, y = northing)) +
geom_point()
- Look at a north-south gradient in number of trees
- Need to know number of trees in each band of y values
- Start by defining the size of the window we want to use
- Use the grid lines which are 2.5 m
window_size <- 2.5
- Then figure out the edges for each window
south_edges <- seq(4713095, 4713117.5, by = window_size)
north_edges <- south_edges + window_size
- But we don’t want to go all the way to the far edge
south_edges <- seq(4713095, 4713117.5 - window_size, by = window_size)
north_edges <- south_edges + window_size
- Set up an empty data frame to store the output
counts <- vector(mode = "numeric", length = length(left_edges))
- Look over the left edges and subset the data occuring within each window
for (i in 1:length(south_edges)) {
data_in_window <- filter(neon_trees, northing >= south_edges[i], northing < north_edges[i])
counts[i] <- nrow(data_in_window)
}
counts
Nested Loops (optional)
- Sometimes need to loop over multiple things in a coordinate fashion
- Pass a window over some spatial data
-
Look at full spatial pattern not just east-west gradient
- Basic nested loops work by putting one loop inside another one
for (i in 1:10) {
for (j in 1:5) {
print(paste("i = " , i, "; j = ", j))
}
}
- Loop over x and y coordinates to create boxes
- Need top and bottom edges
east_edges <- seq(731752.5, 731772.5 - window_size, by = window_size)
west_edges <- east_edges + window_size
- Redefine out storage
output <- matrix(nrow = length(south_edges), ncol = length(east_edges))
for (i in 1:length(south_edges)) {
for (j in 1:length(east_edges)) {
data_in_window <- filter(neon_trees,
northing >= south_edges[i], northing < north_edges[i],
easting >= left_edges[j], easting < right_edges[j],)
output[i, j] <- nrow(data_in_window)
}
}
output
Sequence along (optional)
seq_along()generates a vector of numbers from 1 tolength(volumes)