This is the website for the Introduction to Data Science (MATH08077) course offered at the University of Edinburgh for the academic year 2024/5.
Learn to explore, visualize, and analyse data to understand natural phenomena, investigate patterns, model outcomes, and make predictions, and do so in a reproducible and shareable manner. Gain experience in data collection, wrangling, and visualization, exploratory data analysis, predictive modelling, and effective communication of results while working on problems and case studies inspired by and based on real-world questions. The course will focus on the R statistical computing language. No statistical or computing background is necessary. Additional official course information can be found here.
Week 1 (16 Sep 2024 - 20 Sep 2024): Get acquainted with the course, the technology, the workflow, and the skills you will acquire throughout the semester.
Week 2 (23 Sep 2024 - 27 Sep 2024): Data wrangling, joining, and tidying.
Week 3 (30 Sept 2024 - 04 Oct 2024): Importing data, data types and classes, recoding.
Week 4 (07 Oct 2024 - 11 Oct 2024): Data visualization and interpretation of graphical information.
Week 5 (14 Oct 2024 - 18 Oct 2024): Tips for effective data visualization, communication of results, and collaboration.
Week 6 (21 Oct 2024 - 25 Oct 2024): Misrepresentation of findings, data privacy, and algorithmic bias.
Week 7 (28 Oct 2023 - 01 Nov 2023): Harvesting data from the web, writing functions, and iteration.
Week 8 (04 Nov 2024 - 08 Nov 2024): Linear models for predicting numerical data from single and multiple variables.
Week 9 (11 Nov 2024 - 15 Nov 2024): Logistic regression for predicting categorical data and model building.
Week 10 (18 Nov 2024 - 22 Nov 2024): Evaluating models with cross validation and uncertainty quantification with bootstrap confidence intervals. quantification.
Week 11 (25 Nov 2024 - 29 Nov 2024): Additional topics beyond IDS
Information on the various components of the course.
Information on the assessments for the course.
This online work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International licence (visit here for more information). These materials have been adapted from Data Science in a Box by Mine Çetinkaya-Rundel, which is under the same licence.