Lec | Date | Topic | Details | Resources |
---|---|---|---|---|
1 | 09/Jan | Introductions and installations | slides(PDF) | Setting up R and friends: Win/Mac/Linux |
2 | 11/Jan | Refresher in statistics, hands-on activity | working with distributions - without a computer. slides | |
- | 16/Jan | No class | Hard freeze day | |
3 | 18/Jan | R basics, datatypes | navigating Rstudio, syntax, data types, functions, data frames/tibbles, debugging and looking for help. slides (worksheet, practice) | Swirl: R programming tutorial. base R cheatsheet.PDF; vectors quick ref. |
4 | 23/Jan | R basics 2 | List, subsetting, data.frame, if()..else | |
5 | 25/Jan | R basics 3 | for() ; function(). announcements; .R scripts: lec5_demo, lec5_practice |
functions: fundamentals. practical use. Environments explained with illustrations |
6 | 30/Jan | Basics 4 + Data wrangling in R with tidyverse , dplyr |
More function() , vectorization ; Simple manipulation of data.frames /tibbles = data wrangling. slides |
Why vectorize code; Merits of tibbles |
7 | 01/Feb | Data wrangling workshop | Load data from .csv or excel file, commands to arrange rows and columns.; re-hash concepts, get students to wrangle data to answer a question. worksheet |
dplyr cheatsheet |
8 | 06/Feb | ggplot() plotting |
ggplot , geom_point() and geom_line() , interactive plots with plotly::ggplotly() . Principles of “What should you show in a plot?”. slides; worksheet / R sheet ; histdata, pointsdata |
sthda by Alboukadel Kassambara is an excellent resource showing all possible kinds of plots with ggplot. Textbook chapter on ggplot aesthetics/mapping, ggplot cheatsheet. |
- | 08/Feb | No class | MONDAY SCHEDULE TODAY | |
9 | 13/Feb | quarto /Rmarkdown to produce reports |
motivation for reproducible data analysis, benefits of localizing (thoughts, code and outputs) in one document. slides | Quarto : intro, presentations ; What is Markdown?, it’s syntax. quarto introduction. |
10 | 15/Feb | Data -> figure pipeline from a research article, workshop | using Rmd/quarto to walk through the steps required to reproduce plots from a published paper or news article. worksheet |
reproducing figure from an NYT/COVID dashboard |
11 | 20/Feb | Revisit the pipeline; format plots/ How to train your ggplots? | Formatting your ggplots to add annotations etc.. updated worksheet | Very elaborate resource for breaking down ggplot elements step by step as a tutorial cedricscherer. Pre-requisite: Please spend ~ 1-2 h to watch the video and work through the worksheet from last class. We will quickly revisit the initial data wrangling and clarify a few misconceptions |
12 | 22/Feb | Normal-distributions | understand central limit theorem intuitively + using R simulations; why ~normal is a useful default. Learn cases when it doesn’t apply. Students t-distribution is got by sampling from a normal and calculating the mean. slides ; worksheet | |
- | 27/Feb | Review session | Summary of key concepts by TAs + Q & A | Midterm is a take home exam covering R: * tidyverse, plots, Rmarkdown; Stats: t-tests, problems with p-values, linear regressions* |
13 | 29/Feb | Intro to hypothesis testing (t-tests) | experimental science usually involves comparisons => hence hypothesis testing to compare distributions / means. Concept: Students t-distribution is got by sampling from a normal and calculating the mean. slides | |
14 | 05/Mar | (advanced lecture) Version control using git |
Why version control? Setup git with Rstudio. Learn some command line basics : cd , ls , git add, git commit , grep "function-name" slides |
Intro to git: lesson; philosophy of git. Textbook: happygitwithr |
15 | 07/Mar | Hands-on git : continued |
Initiate repo and make commits with CLI, connecting to Rstudio and using GUI to commit, using git clone and git push from github repositories |
short videos on how git works and quick intro to git and branches |
- | 9-17/Mar | Spring break | no class | |
16 | 19/Mar | Best ways to represent graphical information | Plots don’t lie : Show your raw data where ever possible/useful. Make every plot with a takeaway / focus point. Label features directly instead of legend. Slides ; prologue | data-to-viz : the best plot based on the types/ dimensions in your data; barbarplots campaign to show raw data/points instead of mean/bars; horizontal bar charts; William Chases’ glamour of graphics talk |
17 | 21/Mar | T-tests continued 2, two tails → p-values | Relating the sampling distribution under NULL hypothesis to the P-value from t.test() in R. Understand Std error of mean, confidence interval & P < 0.05 from a bootstrapped sampling discribution. Slides |
Quick video to understand p-values from bootstrapping. What’s bootstrapping again?. A python workflow illustrating t-test on viridis data. Extra reading on Bootstrapping intro |
18 | 26/Mar | T-tests continued 3, bootstrapping worksheet | Many flavours of t-tests, bootstrapping to explain 1-sample t-test, one-tailed vs two-tailed. (worksheet) | |
19 | 28/Mar | Bootstrapping worksheet | Continue working on the sheet | |
20 | 02/Apr | Linear regression (2 dimensional data) | Explain linear regression: lm(y ~ x) . fitting straight line equation: y = a + bx + noise . Using geom_smooth() and ggpmisc::stat_poly_eq() in R. Understand R^2 as % of variance explained by the fit. interpret lm results and p.values. Show connection to hypothesis testing when x is a categorical variable. Slides, lecture 20 |
fitting linear models in R: basics. More mathematics on ordinary least squares fitting. What’s a good R-squared value |
21 | 04/Apr | Non-linear regressions | examples: fitting dose-response curves and bacterial growth curves. Explain why initial conditions matter for nls, using self starting functions. map; safely() workflow to avoid breaking code due to convergence issues |
|
22 | 09/Apr | Working with higher dimensional data | explain dimensionality reduction concept. Techniques: PCA , weighted techniques, clustering. Slides-PCA. worksheet |
|
23 | 11/Apr | Using AI based tools to speed up writing code | Setting up github copilot with Rstudio. Make an individual account for github copilot (free for students) | Ethics ; LLMs overview |
24 | 16/Apr | Student presentations 1 | Bring an interesting question on your own data, apply statistical analysis to answer it, present written report and presentation with visualizations and explanations | |
25 | 18/Apr | Student presentations 2 | “ | |
- | ? | Student presentations 3 | This session will happen during non-class hours and is optional attendance | |
- | 24-30/Apr | Final exam ~ Submit written reports |
Class schedule & Lecture slides
Below is a general description of the material to be covered. Note that topics and dates are
subject to change as the course progresses.
How do I download the slides as PDF for writing notes?
If you would like to download and write notes on the slides as PDFs, you can do it with this procedure
- When the slide show is open in the browser, click on
e
to change toPDF export mode
- Now press
ctrl
+P
key combination to print the page and select PDF, and you are good to go!
If you want your annotations to stay current when the slides are edited, you might want to annotate within the web version of slides itself. You can use tools such as hypothes.is for this. This link should help you get setup - https://web.hypothes.is/start/
You could do more drawing on the pages too if you use the edge browser as show on this page: Write on the web