Module: Fundamentals of Genomic Data Science

CSB1021H/F, Teaching Section LEC 0131

Offered by the Centre for the Analysis of Genome Evolution & Function (CAGEF).

Fall 2022 session


Dr. David S. Guttman, CSB, CAGEF 

Dr. Calvin Mok, CAGEF Bioinformatics


October 26 – December 7 (7 weeks), Wednesdays, 1-4 pm


16 graduate students; auditors allowed, space permitting

Course Objectives

The rise of next-generation genomics has changed the way we think about, study, and employ genetic data, enabling applications that were, until recently, merely the stuff of science fiction. These advances have dramatically increased both the size and scope of biological datasets, and consequently, increased the need for basic computational literacy for nearly all biologists.

This course is designed to serve as an introduction to genomic data science for students who do not have a background in bioinformatics. Students in the course will learn to perform several basic genomic data analyses using Galaxy, an open, web-based platform that incorporates multiple bioinformatics tools into a friendly Graphical User Interface (GUI). Students will then learn to scale up these genomic analyses using the Unix command line to tackle larger and more complex datasets. During the course, students will learn how to:

  • Use Galaxy and command line tools to process and manipulate data
  • Use the Integrative Genomics Viewer to visualize genomes
  • Work in a Unix terminal
  • Install bioinformatics software
  • Connect and work on remote servers
  • Understand common genomics file formats
  • Perform de novo genome assembly, reference-based genome assembly, genome annotation, variant calling, and RNA-seq data analysis.

The course will take advantage of online resources for background material, while spending class time analyzing real data sets. Students are expected to have a basic understanding of genomics and molecular biology, but no prior computational knowledge is required.

Each class will consist of a short introductory section followed by ‘code-along’ hands-on learning that will gradually build up the lecture’s topic(s). Students are expected to have access to a computer during class and are encouraged to ask questions while coding-along with the instructor. A homework assessment will be assigned after each class to reinforce the skills learned. The course materials will be provided through Quercus and lectures will be held in-person.

Course Availability

This course will be held in-person (unless otherwise determined) and will be available to graduate students in CSB and EEB. Auditor spaces will be based upon available space to postdocs, staff, and faculty, although only registered students will be evaluated. The course will count as a single module (0.25 credits) for CSB graduate students. All graduate students interested in taking the course for credit should enroll through ACORN.

Anyone wishing to audit the course should fill out the request form at


Item Note % Mark
Homework Assignments 7 weekly assignments x 10% each 70%
Term project Due 2 weeks after the end of the course 30%

 Pre-requisites: Access to a computer. No prior programming experience needed.


Class Topic
1 Introduction, Exploring Genomic File Formats
2 Galaxy Platform: Navigation, Quality Control, De Novo Assembly, Annotation
3 Galaxy Platform: Reference Alignment, Variant Detection, RNA-Seq
4 Galaxy Platform: RNA-Seq

Command Line: Navigation, File management & manipulation, Accessing remote servers

5 Command Line: Downloading & installing software, $PATH, Testing software
6 Command Line: Quality Control, De Novo Assembly, Annotation, BLAST
7 Command Line: Reference Alignment, Samtools, Variant Detection, RNA-Seq

Subject to change

Last updated on August 11th, 2022