Class schedule & Lecture slides

Below is a general description of the material to be covered. Note that topics and dates are
subject to change as the course progresses
.

Lec Date Topic Details Resources
1 09/Jan Introductions and installations slides(PDF) Setting up R and friends: Win/Mac/Linux
2 11/Jan Refresher in statistics, hands-on activity working with distributions - without a computer. slides
- 16/Jan No class Hard freeze day
3 18/Jan R basics, datatypes navigating Rstudio, syntax, data types, functions, data frames/tibbles, debugging and looking for help. slides (worksheet, practice) Swirl: R programming tutorial. base R cheatsheet.PDF; vectors quick ref.
4 23/Jan R basics 2 List, subsetting, data.frame, if()..else
5 25/Jan R basics 3 for() ; function(). announcements; .R scripts: lec5_demo, lec5_practice functions: fundamentals. practical use. Environments explained with illustrations
6 30/Jan Basics 4 + Data wrangling in R with tidyverse, dplyr More function(), vectorization ; Simple manipulation of data.frames/tibbles = data wrangling. slides Why vectorize code; Merits of tibbles
7 01/Feb Data wrangling workshop Load data from .csv or excel file, commands to arrange rows and columns.; re-hash concepts, get students to wrangle data to answer a question. worksheet dplyr cheatsheet
8 06/Feb ggplot() plotting ggplot, geom_point() and geom_line(), interactive plots with plotly::ggplotly(). Principles of “What should you show in a plot?”. slides; worksheet / R sheet ; histdata, pointsdata sthda by Alboukadel Kassambara is an excellent resource showing all possible kinds of plots with ggplot. Textbook chapter on ggplot aesthetics/mapping, ggplot cheatsheet.
- 08/Feb No class MONDAY SCHEDULE TODAY
9 13/Feb quarto/Rmarkdown to produce reports motivation for reproducible data analysis, benefits of localizing (thoughts, code and outputs) in one document. slides Quarto : intro, presentations ; What is Markdown?, it’s syntax. quarto introduction.
10 15/Feb Data -> figure pipeline from a research article, workshop using Rmd/quarto to walk through the steps required to reproduce plots from a published paper or news article. worksheet reproducing figure from an NYT/COVID dashboard
11 20/Feb Revisit the pipeline; format plots/ How to train your ggplots? Formatting your ggplots to add annotations etc.. updated worksheet Very elaborate resource for breaking down ggplot elements step by step as a tutorial cedricscherer. Pre-requisite: Please spend ~ 1-2 h to watch the video and work through the worksheet from last class. We will quickly revisit the initial data wrangling and clarify a few misconceptions
12 22/Feb Normal-distributions understand central limit theorem intuitively + using R simulations; why ~normal is a useful default. Learn cases when it doesn’t apply. Students t-distribution is got by sampling from a normal and calculating the mean. slides ; worksheet
- 27/Feb Review session Summary of key concepts by TAs + Q & A Midterm is a take home exam covering R: * tidyverse, plots, Rmarkdown; Stats: t-tests, problems with p-values, linear regressions*
13 29/Feb Intro to hypothesis testing (t-tests) experimental science usually involves comparisons => hence hypothesis testing to compare distributions / means. Concept: Students t-distribution is got by sampling from a normal and calculating the mean. slides
14 05/Mar (advanced lecture) Version control using git Why version control? Setup git with Rstudio. Learn some command line basics : cd, ls, git add, git commit, grep "function-name" slides Intro to git: lesson; philosophy of git. Textbook: happygitwithr
15 07/Mar Hands-on git: continued Initiate repo and make commits with CLI, connecting to Rstudio and using GUI to commit, using git clone and git push from github repositories short videos on how git works and quick intro to git and branches
- 9-17/Mar Spring break no class
16 19/Mar Best ways to represent graphical information Plots don’t lie : Show your raw data where ever possible/useful. Make every plot with a takeaway / focus point. Label features directly instead of legend. Slides ; prologue data-to-viz : the best plot based on the types/ dimensions in your data; barbarplots campaign to show raw data/points instead of mean/bars; horizontal bar charts; William Chases’ glamour of graphics talk
17 21/Mar T-tests continued 2, two tails → p-values Relating the sampling distribution under NULL hypothesis to the P-value from t.test() in R. Understand Std error of mean, confidence interval & P < 0.05 from a bootstrapped sampling discribution. Slides Quick video to understand p-values from bootstrapping. What’s bootstrapping again?. A python workflow illustrating t-test on viridis data. Extra reading on Bootstrapping intro
18 26/Mar T-tests continued 3, bootstrapping worksheet Many flavours of t-tests, bootstrapping to explain 1-sample t-test, one-tailed vs two-tailed. (worksheet)
19 28/Mar Bootstrapping worksheet Continue working on the sheet
20 02/Apr Linear regression (2 dimensional data) Explain linear regression: lm(y ~ x). fitting straight line equation: y = a + bx + noise. Using geom_smooth() and ggpmisc::stat_poly_eq() in R. Understand R^2 as % of variance explained by the fit. interpret lm results and p.values. Show connection to hypothesis testing when x is a categorical variable. Slides, lecture 20 fitting linear models in R: basics. More mathematics on ordinary least squares fitting. What’s a good R-squared value
21 04/Apr Non-linear regressions examples: fitting dose-response curves and bacterial growth curves. Explain why initial conditions matter for nls, using self starting functions. map; safely() workflow to avoid breaking code due to convergence issues
22 09/Apr Working with higher dimensional data explain dimensionality reduction concept. Techniques: PCA, weighted techniques, clustering. Slides-PCA. worksheet
23 11/Apr Using AI based tools to speed up writing code Setting up github copilot with Rstudio. Make an individual account for github copilot (free for students) Ethics ; LLMs overview
24 16/Apr Student presentations 1 Bring an interesting question on your own data, apply statistical analysis to answer it, present written report and presentation with visualizations and explanations
25 18/Apr Student presentations 2
- ? Student presentations 3 This session will happen during non-class hours and is optional attendance
- 24-30/Apr Final exam ~ Submit written reports

How do I download the slides as PDF for writing notes?

If you would like to download and write notes on the slides as PDFs, you can do it with this procedure

  1. When the slide show is open in the browser, click on e to change to PDF export mode
  2. Now press ctrl + P key combination to print the page and select PDF, and you are good to go!

If you want your annotations to stay current when the slides are edited, you might want to annotate within the web version of slides itself. You can use tools such as hypothes.is for this. This link should help you get setup - https://web.hypothes.is/start/

You could do more drawing on the pages too if you use the edge browser as show on this page: Write on the web