Module: Fundamentals of Genomic Data Science
CSB1021H/F, Teaching Section LEC 0131
Offered by the Centre for the Analysis of Genome Evolution & Function (CAGEF).
Fall 2024 session
Instructors:
Dr. David S. Guttman, CSB, CAGEF david.guttman@utoronto.ca
Dr. Calvin Mok, CAGEF Bioinformatics calvin.mok@mail.utoronto.ca
Dates:
October 30th – December 11th (7 weeks), Wednesdays, 1:00 – 4:00pm
Earth Sciences Centre 3087
Enrollment:
16 graduate students
Audit spaces based on availability
Weight: One module (0.25 FCE)
Course Objectives
The rise of next-generation genomics has changed the way we think about, study, and employ genetic data, enabling applications that were, until recently, merely the stuff of science fiction. These advances have dramatically increased both the size and scope of biological datasets, and consequently, increased the need for basic computational literacy for nearly all biologists.
This course is designed to serve as an introduction to genomic data science for students who do not have a background in bioinformatics. Students in the course will learn to perform several basic genomic data analyses using Galaxy, an open, web-based platform that incorporates multiple bioinformatics tools into a friendly Graphical User Interface (GUI). Students will then learn to scale up these genomic analyses using the Unix command line to tackle larger and more complex datasets. During the course, students will learn how to:
- Use Galaxy and command line tools to process and manipulate data
- Use the Integrative Genomics Viewer to visualize genomes
- Work in a Unix terminal
- Install bioinformatics software
- Connect and work on remote servers
- Understand common genomics file formats
- Perform de novo genome assembly, reference-based genome assembly, genome annotation, variant calling, and RNA-seq data analysis.
The course will take advantage of online resources for background material, while spending class time analyzing real data sets. Students are expected to have a basic understanding of genomics and molecular biology, but no prior computational knowledge is required.
Each class will consist of a short introductory section followed by ‘code-along’ hands-on learning that will gradually build up the lecture’s topic(s). Students are expected to have access to a computer during class and are encouraged to ask questions while coding along with the instructor. A homework assessment will be assigned after each class to reinforce the skills learned. The course materials will be provided through Quercus and lectures will be held in-person.
Course Availability
This course will be held in-person (unless otherwise determined) and will be available to graduate students in CSB and EEB. Auditor spaces will be based upon available space to postdocs, staff, and faculty, although only registered students will be evaluated. The course will count as a single module (0.25 credits) for CSB and EEB graduate students. All graduate students interested in taking the course for credit should enroll through ACORN.
Anyone wishing to audit the course should fill out the request form at: https://forms.gle/ECKFiqmoQ85YLeZo8
Evaluation
Item | Note | % Mark |
Homework Assignments | 7 weekly assignments x 9% each | 63% |
Term project | Due 2 weeks after the end of the course | 37% |
Pre-requisites: Access to a modern laptop (no more than 3 years old, if possible). No prior programming experience needed.
Syllabus:
Class | Topic |
1 | Introduction, Exploring Genomic File Formats |
2 | Galaxy Platform: Navigation, Quality Control, De Novo Assembly, Annotation |
3 | Galaxy Platform: Reference Alignment, Variant Detection, RNA-Seq |
4 | Galaxy Platform: RNA-Seq
Command Line: Navigation, File management & manipulation, Accessing remote servers |
5 | Command Line: Downloading & installing software, $PATH, Testing software |
6 | Command Line: Quality Control, De Novo Assembly, Annotation, BLAST |
7 | Command Line: Reference Alignment, Samtools, Variant Detection, RNA-Seq |
Subject to change
Last updated on August 9th, 2024