Module: Introduction to R

CSB1020H/F, Teaching Section LEC 0142

Offered by the Centre for the Analysis of Genome Evolution & Function (CAGEF),

Fall 2021 session

September 16 – October 28 (7 weeks), Thursdays from 2-5 pm

Enrollment: 20 graduate students; up to 40 auditors


Dr. David S. Guttman, CSB, CAGEF 

Dr. Calvin Mok, CAGEF Bioinformatics

Course Objectives

This is a beginner’s introduction to R and the Jupyter Notebook environment for individuals with no prior experience or background. Individuals who complete the course will be able to:

  • Work with the Jupyter Notebook environment and navigate the R programming language.
  • Understand data structures and data types.
  • Import data into R and manipulate data frames.
  • Transform ‘messy’ datasets into ‘tidy’ datasets.
  • Make exploratory plots as well as publication-quality graphics.
  • Use string searching and manipulation to clean data.
  • Perform basic statistical tests and run a regression model.
  • Use flow control and build branching code.

Each class will consist of a short introductory section followed by ‘code-along’ hands-on learning that will gradually build up the lecture’s topic(s). Students are expected to have access to a computer during class and are encouraged to ask questions while coding-along with the instructor. A homework assessment will be assigned after each class to reinforce the skills learned and a final project will test overall knowledge and application. The course will be provided through Quercus using Bb-collaborate.

Course Availability

This course will be presented online and will be available to all graduate students, postdocs, staff, and faculty, although only registered students will be evaluated. The course will count as a single module (0.25 credits) for CSB graduate students. All graduate students interested in taking the course for credit should enroll through ACORN. Anyone wishing to audit the course should fill out the request form at


Item Note % Mark
Homework  Assignments 6 weekly assignments x 12% each 72%
Term project Due 2 weeks after the end of the course 28%


Access to a computer. No prior programming experience needed.

Reference Material

R for Data Science  (


Class Topic
1 Introduction to R and Jupyter Notebooks: R and Jupyter Notebook basics, best coding practices, functions and syntax, data types and structures, mathematical operations with R objects, installing R packages, getting help.
2 How to read, write, and manipulate your data: Importing text and Excel files, the dplyr package and functions to manipulate tabular data.
3 Introduction to Tidy Data:  Wide versus long data formats, reshaping data with the tidyverse package.
4 Data visualization with ggplot2: The grammar of graphics; scatter, line, box, bar, and density plots, among other types of graphics.
5 Data cleaning with regular expressions (RegEx): Introduction to RegEx; inspecting, cleansing, and data wrangling using RegEx; classes, quantifiers, operators, pattern-matching, and string manipulation.
6 Linear regressions: Simple and multiple linear regressions, ANOVA, ANCOVA, model selection
7 Flow control: for loops, conditional statements (if, while, repeat, next, and break); troubleshooting loops

Subject to change

Last updated on September 14th, 2021