Module: Introduction to Python

CSB1021H/S, Teaching Section LEC 0140

Offered by the Centre for the Analysis of Genome Evolution & Function (CAGEF),

Winter 2023 session.

Weight: One module (0.25 FCE)

Date: January 10 – February 17 (7 weeks), Tuesdays, 10 am – 1 pm

Enrollment: 16 graduate students; auditors allowed, space permitting

Instructors

Dr. David S. Guttman, CSB, CAGEF           david.guttman@utoronto.ca

Calvin Mok, CAGEF Bioinformatics           calvin.mok@mail.utoronto.ca

Course Objectives

This is a beginner’s introduction to Python for data science applications. The course is intended for students with no computer science background who want to develop the skills needed to analyze their own data. Students who complete this course will be able to:

  • Perform data analysis in Python using the Jupyter Notebook environment.
  • Understand Python data structures and data types.
  • Manipulate Python objects such as lists, data frames, and dictionaries.
  • Import data into Python and transform ‘messy’ datasets into ‘tidy’ datasets.
  • Use flow control to develop branching code.
  • Use regular expression and string manipulation to explore and clean data.
  • Make exploratory plots.

Throughout the course we’ll work with a set of data that takes us through the various steps of analysis from importing to data wrangling to visualization. Each class will consist of a short introductory section followed by ‘code-along’ hands-on learning that will gradually build up the lecture’s topic(s). Students are expected to have access to a computer during class and are encouraged to ask questions while coding-along with the instructor. A homework assessment will be assigned after each class to reinforce the skills learned and a final project will test overall knowledge and application. The course will be provided through Quercus and lectures will be held in-person.

Course Availability

This course will be held in-person (unless otherwise determined) and will be available to graduate students in CSB and EEB. Auditor spaces will be based upon available space to postdocs, staff, and faculty, although only registered students will be evaluated. The course will count as a single module (0.25 credits) for CSB graduate students. All graduate students interested in taking the course for credit should enroll through ACORN.

Anyone wishing to audit the course should fill out the request form at https://bityl.co/DdBX

Evaluation

Item Note % Mark
Completed Jupyter Notebook 7 lectures x 2% each* 14%
Homework Assignments 6 weekly assignments x 9% each 54%
Term project Due 2 weeks after the end of the course 32%

* a 4.9% bonus (0.7% per lecture) will be awarded for submitting notebooks on the day of lecture.

Pre-requisites

Access to a computer. No prior programming experience needed.

Reference Material

2016. Severance, Charles. Python for Everybody: Exploring Data Using Python 3.

http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf

Course tools: University of Toronto Jupyter Hub, DataCamp, Zoom.

Syllabus

Class Topic
1 Intro to Python and Jupyter Notebooks: Basics about Python, using Jupyter Notebooks, how to run Python code, as well as an introduction to Python variables, functions, modules, best coding practices, data types, missing data, code debugging and getting help.
2 Python data structures, Numpy and Pandas: List, Dictionaries, Tuples, Sets, Series, mathematical operations with Python objects, Introduction to NumPy and Pandas.
3 How to Read, Write, and Manipulate Your Data: The wide and long formats, reading in data, data wrangling with Pandas, and writing data.
4 Data visualization with plotnine: The grammar of graphics; scatter, line, box, bar, and density plots, among other types of graphics.
5 Flow control: Flow control, for loops, Conditionals
6 Regular Expressions: Classes, quantifiers, operators, pattern-matching, String manipulation.
7 User-defined functions: Defining a function, best practices in user-defined functions, and context managers

Subject to change

Last updated on August 12th, 2022