Overview

This week, you will be introduced to R Markdown. You will follow some self-guided reading and will submit an R Markdown file and the rendered HTML document for Homework 7 (Due at the end of next week).

What is Markdown?

Before we discuss what R Markdown is, we need to discuss what Markdown is. What is Markdown? Let’s start with what it’s not.

Many of you have probably created a report or a paper using a word processor like Microsoft Word or Google Docs. Word processors are referred to as “what you see is what you get” (wysiwyg) text editors. This means that when you highlight text and click the boldface icon in Word, the text appears bold on your screen. All sorts of other formatting options, including making headers, inserting figures, adding page numbers, etc., are possible by clicking on buttons. There is code behind the scenes that creates these changes but users don’t see the code, only the formatting output. This makes wysiwyg editors relatively easy to use for beginners. But for more advanced users, it can actually be problematic. Have you ever had Word act in ways that you don’t fully understand? Of course! We all have. Have you ever tried opening a .docx file using an older version of Word, only to find that it doesn’t look the way thought it would? Have you ever inserted a figure only to have it jump to another page or get ‘anchored’ to the bottom of a page? These are just a few of the problems that occur when your document has a bunch of hidden formatting code that you cannot see or understand.

Markdown is different. Markdown files are plain text files, meaning that they can be created and edited using text editors (like NotePad on Windows or TextEdit on Mac). The biggest difference between Markdown files and Word documents is that formatting Markdown documents occurs in the document itself rather than behind the scenes. So to make something boldface you have to tell Markdown to do that by putting two **asterisks** on either side of the word or phrase. Italics is done by putting one *asterisk* around the text. Hyperlinks are written like this: [Hyperlinks](https://en.wikipedia.org/wiki/Markdown). These are just a few of the many formatting options you can include in a Markdown document. We’ll learn about options like headers, lists, mathematical symbols and equations, and figures later in this tutorial and throughout the semester.

As you’re writing, the text won’t look bold or italic or whatever (this is not ‘what you see is what you get’, it’s ‘what you see is what you type’). The formatting only shows up when you render the Markdown file to create another type of document (pdf, html, even Word). The nice thing about Markdown is that because it uses standard ways to express specific formatting options, you can convert your documents into different output formats very easily.

What is R Markdown?

In this course, we will use a specific ‘flavor’ of Markdown called ‘R Markdown’. R Markdown gives us all of the formatting options available for Markdown plus the ability to embed, display, and run R code in our documents. By mixing R code with plain text, we can create dynamic reports that replicate the analytical processes, show the code underlying these processes, create the output from our analysis (figures, summary statistics, etc.), and provide all of the necessary text explanations to go along with the code and output.

This will be particularly important for assignments in the remainder of this class. As we perform more advanced statistical models, it will be necessary to interpret the results. R Markdown will allow us to run the code for the models in the document, and include text for interpretation. This will be much easier to read than including long comments in our scripts. Likewise, being able to include things like italics, boldface, sub and superscripts will make interpretation easier and much nicer presentation.

You may have figured this out by now, but if you haven’t: Everything in this class so far has been produced using R Markdown. You can see the source material for this course at my GitHub Page. GitHub is a place where you can store your R-projects (and other coding projects). A detailed description of Git and GitHub is beyond the scope of this course, but if you are going to grad school or get a job which involves lots of coding and collaboration, I highly recommend you spend the time to learn Git on your own.

Why use R Markdown

R Markdown has many advantages compared to creating reports in Word or GoogleDocs. These advantages include:

  1. Versatility- Want to convert a Word document into pdf? That’s not too hard. But pdf to Word? That’s a pain. PDF to HTML? Maybe you know how to do that but I don’t. With R Markdown, we can change between these formats (or produce multiple outputs at once) with a few lines of code). You can even convert them into pretty nice slide shows.

  2. Embed code in text - After running an analysis, how do you get your results into Word? Type them by hand? Copy-and-paste? Both are a pain and error prone. Rerun your analysis using new data? Oops, now you have to copy and paste those new results and figures. With R Markdown, we embed code directly into the text so results and figures get added to our reports automatically. That means no copying and pasting and updating reports as new results come in is pain free. I cannot emphasize enough how convenient this is. This will certainly save you time in the future.

  3. Annotate your code - Using the # is great for adding small annotations to your R scripts and you should definitely get in the habitat of doing that. But sometimes you need to add a lot of details to help other users (or your future self) make sense of complex code. For instance, in future assignments in this class interpreting the results of statistical analyses will become more detailed and complex. Writing this in R Markdown will be much easier, and reading the html document will be easier for me to grade! R Markdown allows you to create documents with as much text and formatting as you need, along with the code.

  4. Version control - Tired of saving manuscript_v1.doc, manuscript_v2.doc, manuscript_final.doc, manuscript_final_v2.doc? Then version control is for you. We won’t go into the specifics here but R Markdown allows you to seamlessly use version control systems like git and Github to document changes to your reports.

  5. Edit as text files - R Markdown files are most easily created and edited within RStudio but you don’t have to do it that way. They can be opened and edited in base R and even using text editors. This means you can create and edit them on any platform (Windows, Mac, Linux) using free software that is already installed on the computer

  6. Stability - How many of us have had Word crash while we’re working on a paper? Did you save as you were working? Hope so. Because R Markdown files are smaller and more lightweight, they tend not to cause your computer to crash as you’re working on them.

  7. Focus on text, not formatting - Do you spend a lot of time tweaking the formatting of your Word document rather than writing? R Markdown allows you to separate the writing process from the formatting process, which allows you to focus on the former without worrying about that later (in theory at least). Plus there are lots of templates you can use to ensure that the formatting is taken care without you having to do anything special!

Why not use R Markdown?

There are a few disadvantages to R Markdown.

  1. Your adviser/boss/professor/colleague doesn’t use it - Try sending a .Rmd file to your adviser to get feedback. I’ll wait… Like it or not, most folks still use word processors, so if you adopt R Markdown you will still have to create and edit Word documents for some collaborators who are stuck in their ways. (Or do what I do: Write it in R Markdown and knit it as a Word document.)

  2. No track changes - Even if you’re lucky to have an adviser who will review a .Rmd file, you won’t get nice track changes like in Word. There are alternative to this (version control helps) but none are quite as easy as track changes.

  3. Fewer formatting options - For better or worse, you have a more limited set of formatting options with R Markdown. That can be constraining (but often it’s actually freeing!)

  4. There’s a learning curve - You already know how to use Word. R Markdown is new. How do you make something bold? How do you insert equations? How do you get figures to go at the end of your document? At first, you will almost certainly have to Google almost every thing you need to do in R Markdown (this is why number 1 is a problem). Most of it is pretty simple but it still means the going can be slow at first. Also I recommend downloading the two R Markdown Cheat Sheets available in R Studio (Help -> Cheatsheets) into your other_resources\cheatsheets folder!

Creating a new R Markdown file

  1. Click on File -> New File -> R Markdown...

  2. Enter a title and choose a format (HTML, pdf, or Word) for the document (for this class, I recommend sticking with HTML; other formats require additional package downloads and can be a little finicky to get set up).

  3. Click Ok

  4. Save the newly created document

Pretty easy

Basic formatting: The YAML header

At the top of your .Rmd file, you should see several line in between three blue dashes:

---
title: "The title I entered"
author: "The author I entered"
output: html_document
---

This is called the “YAML header” and it’s where we can control a lot of the major formatting options for our documents. For example, to change the output to pdf, just switch html_document to pdf_document (note, you may need to install a Latex distribution to knit to pdf. If you get an error message at this step, see suggestions here) and then click the Knit button again.

There’s lots of formatting options you can include in the YAML header, most of which are beyond the scope of this course. For this class we’re going to stick with HTML output, but we will include table of contents by including the following code:

---
title: "The title I entered"
author: "The author I entered"
output: 
  html_document:
    toc: true
    toc_depth: 3
---

Notice that the output section has changed and there are now additional arguments included which are indented.

  • toc: true means that we will include a table of contents

  • toc_depth: 3 means that we will include an entry for all headers from level 1 to 3.

Headers

Using headers is a natural way to break up a document or report into smaller sections. You can include headers by putting one or more # signs in front of text. One # is a main header, ## is the secondary header, etc.

Header 1

Header 2

Header 3

Header 4

Paragraph and line breaks

When writing chunks of text in R Markdown (e.g., a report or manuscript), you can create new paragraphs by leaving an empty line between each paragraph:

This is one paragraph.

This is the next paragraph

Font formats

Bold, Italics, sub- and superscripts

As mentioned earlier, create boldface by surrounding text with two asterisks (**bold**) and use single asterisks for italics (*italics*). You can also enter strikethrough text with ~~strikethough~~.

You can also add subscripts: H2O (H~2~O) and superscripts: x10 (x^10^) NOTE that the code for sub and superscripts in equations is slightly different, see below.

Highlight Code in Text

To highlight code, for example mean() surround the text with back ticks `mean()`. Note, this does not actually insert functioning code, just formats text to show that it is code rather than plain text.

You can include multiple lines of code by including three back ticks (```) and {r} on the line before the code and then three back ticks on the line after the code:

Multiple lines of code
look like 
this

Lists

Bulleted lists

  • Bulleted lists can be included by starting a line with an asterisk

  • You can also start the lines with a single dash -

    • for sub-bullets, indent the line (press tab once) and start it with +
      • for sub-sub-bullets, indent twice (press tab two times) and start with -
* Bullet

  + Sub-bullets

Numbered lists

  1. Numbered lists look like this

  2. You can also include sub-levels in number lists

    1. these can be lower case roman numerals
    1. or lowercase letters
      B. or uppercase letters
1. Numbered list
2. Here is another point
  i) and some sub points

Insert hyperlinks by putting the text you want displayed in square brackets followed by the link in parentheses: [RStudio cheatsheet](https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)

Equations

Inserting equations in R Markdown is where knowing some Latex really comes in handy because equations are written using Latex code. For the most part, this is not too difficult but if you need to insert complex equations you will probably need to look up the code for some symbols. There are many good resources for if you google “latex equations” or something similar. One of my favorite resources is Math in R Markdown

Inline vs. block equations

You can include equations either inline (\(e = mc^2\)) or as a stand-alone block:

\[e=mc^2\]

Inline equations are added by putting a single dollar sign $ on either side of the equation ($e=mc^2$). Equation blocks are create by starting and ending a new line with double dollar signs

$$e=mc^2$$

Fractions

For simple fractions, such as \(SEM = \sigma / \sqrt(n)\) you can just type a /. For example $SEM = \sigma / \sqrt(n)$.

For more complex fractions, you need to distinguish it using \frac { top }{ bottom }.

For instance, to write the equation for Bayes theorem:

\[ P(A|B) = \frac{ P(B|A)\times P(A) } { P(B) }\]

you would type the following: $$ P(A|B) = \frac{ P(B|A)\times P(A) }{ P(B) }$$

Greek letters

Statistical models include a lot of Greek letters (\(\alpha, \beta, \gamma\), etc.). You can add Greek letters to an equation by typing a backslash \ followed by the name of the letter \alpha. Uppercase and lower case letters are possible by capitalizing the name (\(\Delta\) = $\Delta$) or not (\(\delta\) = $\delta$).

You can also add Greek letters directly inline with text as so: here is my line of text with the letter $\alpha$

Equation Subscripts and superscripts

You can add superscripts using the ^ (\(\pi r^2\)=$\pi r^2$) symbol and subscripts using an underscore _ (\(N_t\) = $N_t$). Note that only one ^ or _ is needed at the beginning. Look again at adding sub- and superscripts to normal text above to see the differences.

If the superscript or subscript in an equation includes more than one character, put the entire script within curly brackets {}: \(N_t-1 \neq N_{t-1}\) is $N_t-1 \neq N_{t-1}$

Adding functioning code

The ability to format and create pdf and html documents is great but the real strength of R Markdown is the ability to include and run code within your document. Code can be included inline or in chunks

Inline code

Inline code is useful to including (simple) R output directly into the text. Inline code can be added by enclosing R code between `r and `. For example, typing `r mean(c(3,7,4,7,9))` will compute and print the mean of the given vector. That is, it will print 6 instead of the code itself. This can be very useful for including summary statistics in reports.

For example, if we have a vector indicating the number of individuals captured at each occasion during a mark-recapture study (e.g., n <- c(155, 132, 147, 163)) and we want to include the number of occasions in a report, instead of typing 4, we can type `r length(n)`. Not only does this prevent typos, it is extremely useful if length(n) might change in the future. Instead of manually changing the number of occasions, we just re-render the document and the new number of occasions will be printed automatically.

Code chunks

For more complicated code, it is generally more useful to use chunks than inline code. Chunks start on a separate line with ```{r} and end with a ``` on its own line (instead of doing this manually, you can click the Insert button at the top right of script window, then click R). In between these two lines, you can include as many lines of code as you want. For example,

```{r}
n1 <- 44 # Number of individuals captured on first occasion
n2 <- 32 # Number of individuals captured on second occasion
m2 <- 15 # Number of previously marked individuals captured on second occasion
N <- n1 * n2 / m2 # Lincoln-Peterson estimate of abundance
```

Chunk options

Code chunks can take a lot of options to control how the code is run and what is displayed in the documents. These options go after {r and before the closing } (to see all the options put your cursor after the {r, hit the space bar, then hit tab). For example:

  • echo = FALSE shows the output of the code but not the code itself

  • include = FALSE runs the code but does not display the code or the output (useful for chunks that read or format data)

  • eval = FALSE shows the code but does not run it (useful for showing code)

  • warning = FALSE and message = FALSE can be include to ensure that error messages and warnings are not printed, which can be useful for cleaning up the appearance of documents

  • cache = TRUE save the results of the R code and doesn’t rerun the chunk unless the code is changed (useful for chunks that take a long time to run)

Additional resources

All of the information necessary to complete the homework assignment should be available in this document. For further information, you can visit the following:

From the RStudio tool bar, click Help -> Cheatsheets and then select the R Markdown cheat sheet (lots of other good cheat sheets there as well)

chapter on R Markdown in the R4DS book.

RStudio’s R Markdown tutorial

Tom Edward’s R Markdown tutorial

Coding Club’s Getting Started with R Markdown

Cosma Shalizi’s Using R Markdown for Class Reports