Workshop 4. Introduction to R¶
Reasons why you should learn R¶
Runtime: ~7 min. Created by Gracia Bonilla
RStudio interface¶
Here is an introduction to the main features of RStudio, the user-friendly IDE (integrated development environment) preferred by many researchers.
This video introduces the help operator ?
which is a very useful feature of R, since it allows the user to easily search through the documentation pages for R functions and other objects.
It also demonstrates how to assign values to variables using the <-
operator, and it introduces the setwd()
command, to change the working directory.
Runtime: ~10 min. Created by Katherine Norwood
Installing packages¶
As we mentioned earlier, a great thing about R is that there are a lot of packages already out there to perform analysis on your data. Therefore, installing packages is an essential task when working with R, since it will allow you to incorporate sophisticated analyses into your work very easily.
This video introduces the command install.packages
and it covers the steps required to install a package from the Bioconductor repository.
Runtime: ~7 min. Created by Katherine Norwood
Console and Working Environment Basics¶
The following video introduces the basics of R scripting, by using the interactive prompt.
Runtime: ~8 min. Created by Katherine Norwood
Atomic data types in R¶
This video introduces 4 of the atomic data types in R,
logical
numeric
complex
character
Also, this video introduces the use of the typeof
and mode
functions,
Runtime: ~7 min. Created by Katherine Norwood
This video covers aspects of missing values
Runtime: ~3 min. Created by Katherine Norwood
Multidimensional Data Types in R¶
Introduction to vectors
Runtime: ~7 min. Created by Katherine Norwood
Introduction to Matrices, Arrays, Lists, and Data Frames, and how to obtain subsets of the data.
Runtime: 9:30 min. Created by Katherine Norwood
For more information on how you can manipulate vectors visit this page
For examples on how to manipulate multi-dimensional data objects in R, check out this tutorial
Basics of Flow Control¶
This video covers the basics of scripting, including the if
statements, and logical operators.
Runtime: ~5:30 min
This video covers the apply
function
Runtime: ~7 min
For more information on the apply
family of functions, visit
For more information on the apply
versus for
loop debate, visit:
Data Wrangling¶
Tidyverse is a suite of R packages that are designed to work with each other by sharing principles in the way data is structured. Most of the tidyverse packages were written by Hadley Wickham, a key figure in the R(studio) development world. Here, we present a brief intro to some of these packages, highlighting their main functionalities using example datasets, as well as how one can tie these concepts together to go from data in its more raw or ‘messy’ form to pretty visualizations in R!
The following three videos illustrate data manipulation and plotting. The code used can be found here and the data is available here (cancer_samples.csv)
Introduction to reshape. This video also demonstrates how to read a data table into your R environment.
Runtime: ~6 min. Created by Vinay Kartha
Introduction to the plyr package
Runtime: 8:30 min. Created by Vinay Kartha
Plotting and ggplot¶
A great thing about R, is that it makes it very easy to create high quality graphics from complex data. The following video introduces the basics of the ggplot2
plotting system, which is preferred by many data scientists over the base R plots due to the flexibility it provides when dealing with complex data sets.
Runtime: 9 min. Created by Vinay Kartha
Here are some good resources for beginners:
Once you have some experience, great resources are:
Bonus R Markdown for Reproducibility¶
Composing reproducible manuscripts using R Markdown
For an example, you can view the RMarkdown document that generated the
R Workshop: Intro to reshape, plyr and ggplot ()
document. You can open this in RStudio, and do File -> Knit (ctrl + shift + K ).