You will be working with an untidy data set. Your goal is to enter the data into a tidy format in spreadsheet software, save it as a .csv file, import it into R and run some simple calculations.
The data you will be working with is a simplified version of the
who2
data set. The who2
data is automatically
loaded with the tidyverse
, and documents Tuberculosis cases
by country through time.
The simplified data set has information on “smear positive” TB tests from Argentina for 2000 to 2010. “Smear positive” is abbreviated as sp and is one of the testing methodologies available for detecting TB cases.
Students in class will receive a print out of the data set, and the full data can be viewed below:
tb_argentina
This week, most of your assignment will take place outside of an R
script. You will need to make a new csv file (i.e., using Microsoft
Excel, or Google Sheets) which has all the tb_argentina
information stored in a tidy format.
The second half of your assignment will be an R script that imports your tidy csv and performs a few basic calculations.
Two files will need to be submitted on D2L this week.
1) A tidy .csv file with the following naming convention:
week_09_tidy_yourlastname.csv
2) An R script completing the second half of the assignment.
A. Create a tidy .csv file with the TB information from Argentina.
Column names have data in them.
sp
indicates the test (same for all
columns)
Patient sex is indicated with an m
or an
f
Patient age range is indicated by numbers.
014
is 0-14; 1524
is 15-24;
65
is \(\ge\) 65.Your tidy data file should end up with 154 rows and 6-7 columns
(depending on if you keep the sp
or not).
Save your .csv file in your data/
folder.
Make sure your lastname is in your filename so I can keep them straight when I grade it.
i.e., week_09_tidy_yourlastname.csv
tidyverse
A. Print the head()
of your tidy data.
B. Print out the dim()
of your tidy data.
C. Calculate the rate per 1000 people (number of cases / population *
1000) of TB infections. D. Calculate the total number of cases per
year.
E. Plot the total number of cases by year.
F. Use lm()
to perform a t-test comparing the average
number of TB-infections based on sex (i.e., count ~ sex
).
Display the summary()
of the model to your screen.
G. Interpret the results of your model in part F above.