Data Science Reference Materials
source: below reference information is from UBC MDS program.
Linear algebra review
- There are a bunch of suggestions here.
- We particularly recommend essence of linear algebra (YouTube series)
- Immersive linear algebra (interactive e-book).
511 Programming Basics for Data Science
- Python documentation
- The Python documentation is a great resource for learning Python especially the Python tutorial.
- Think Python: How to Think Like a Computer Scientist
- “How to Think Like a Computer Scientist” is a standard textbook for introductory programming courses. It includes case studies and exercises.
- Advanced R, by Hadley Wickham (free version online)
- This is a prominent resource for R as a programming language, allowing the reader to dig deep into R. It anticipates readers to already have some programming background. Its first part on Foundations is closely aligned with the objectives of DSCI 511, and is therefore the textbook for the second half of the course. Gaining familiarity with this book will likely be an asset in your data science career.
- R swirl
- For those new to R who want more practice with the basics.
Here are other resources that you might find useful:
- Hands-On Programming with R by RStudio’s Garrett Grolemund
- If you find “Advanced R” too be too technical for your liking, try this book. It’s along the same vein, but less technical.
- McKinney, Wes. Python for Data Analysis. O’Reilly, 2013. note - you can download chapters from the book for free from the UBC library
- Sedgewick, Robert; Wayne, Kevin; and Dondero, Robert. Introduction to Programming in Python: An Interdisciplinary Approach. Addison-Wesley, 2015.
- Introduction to Scientific Python
- Scipy Lecture Notes (a lot of good stuff here, for many MDS courses!)
- Python Data Science Handbook
- Python for Computational Science and Engineering
- Data Analysis and Visualization Using R
512 Algorithms and Data Structures
Online reference material
- Visualizing Algorithms
- Algorithms to Live By
- 500 Data Structures and Algorithms practice problems and their solutions
- Recursion practice problems
- P vs. NP and the Computational Complexity Zoo (video)
Books
- Michael Goodrich and Roberto Tamassia, Algorithm Design: Foundations, Analysis, and Internet Examples, John Wiley & Sons, Inc. ISBN: 0-471-38365-1
- Sanjoy Dasgupta, Christos Papadimitriou and Umesh Vazirani, Algorithms, McGraw Hill Book Company, 2008, ISBN 0-07-352340-2, also available here.
- John Kleinberg and Eva Tardos, Algorithm Design, Addison-Wesley Publishing company, 2005, ISBN 0-321-29535-8.
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein, Introduction to Algorithms, 3rd edition, MIT Press, 2009, ISBN 0-262-03384-4.
- Donald E. Knuth, The Art of Computer Programming, Volume 1, Third edition: Fundamental Algorithms, Addison-Wesley Publishing company, 1997, ISBN 0-201-89683-4.
- Donald E. Knuth, The Art of Computer Programming, Volume 2, Third edition: Seminumerical Algorithms, Addison-Wesley Publishing company, 1998, ISBN 0-201-89684-2.
- Donald E. Knuth, The Art of Computer Programming, Volume 3, Second edition: Sorting and Searching, Addison-Wesley Publishing company, 1998, ISBN 0-201-89685-0.
- Donald E. Knuth, The Art of Computer Programming, Volume 4a: Combinatorial Algorithms, Part 1, Addison-Wesley Publishing company, 2011, ISBN 0-201-03804-8.
513 Database and Data Retrieval
- Ramakrishnan, Raghu and Gehrke, Johannes. Database Management Systems, 3rd Edition, McGraw-Hill, 2002. (http://pages.cs.wisc.edu/~dbbook/)
- SQL Tutorials on W3Schools
521 Computing Platforms for Data Science
no textbook
522 Data Science Workflows
Textbooks:
- Art of Data Science by Roger Peng & Elizabeth Matsui (very cheap or even free!)
523 Data Wrangling
- STAT 545 lessons on data wrangling with the tidyverse in R
- Basic care and feeding of data in R
- Introduction to dplyr
dplyr
functions for a single dataset- Cheatsheet for
dplyr
join functions - material to come on tidying
- STAT 545 lessons on specific vector types
- Be the boss of your factors to be refreshed with the new forcats package
- Regular expression lessons from 2015 and 2014
- purrr tutorial and worked examples by Jenny Bryan
- Using purrr with dplyr
- Data Wrangling with Python: Tips and Tools to Make Your Life Easier
- Collection of Python Pandas tutorials
- Fun international movie database Pandas data wrangling/exploratory data analysis tutorial
524 Collaborative Software Development
N/A
525 Web and Cloud Computing
TBD
531 Data Visualization I
Here are prominent course resources that we will be referring to.
- R for Data Science (r4ds)
- Overall good book on using R for data science – including data vis, of course!
- ggplot2 book
- Readable, comprehensive resource for learning about
ggplot2
, by the main author of theggplot2
package, Hadley Wickham.
- Readable, comprehensive resource for learning about
- STAT 545 “All the Graph Things” by Jenny Bryan.
- Contains tutorials relevant to our subject matter.
Other resource that you might find useful:
- R Graphics Cookbook
- Good as a reference if you want to learn how to make a specific type of plot in
ggplot2
.
- Good as a reference if you want to learn how to make a specific type of plot in
- Jenny Bryan’s ggplot2 tutorial
- Has a lot of examples and less dialogue.
- ggplot2 cheat sheet
- Great for quick reference if you need something beyond tab-completion.
- “Visualization Analysis and Design” by Tamara Munzner, CRC Press, 2014.
- The go-to book for data vis theory.
532 Data Visualization II
- “Visualization Analysis and Design”, by Tamara Munzner
- Covers vis theory in detail
541 Privacy, Ethics, Security
TBD
542 Communication and Argumentation
Books
- Writing: The Sense of Style, Steven Pinker (If you buy anything, buy this!)
- Tools & Technology: Chapters 26-30 of R for Data Science, Garrett Grolemund & Hadley Wickham
- Writing & Speaking: Houston, We Have a Narrative: Why Science Needs Story, Randy Olson
- Heuristics & Biases: Thinking, Fast and Slow, Daniel Kahneman
- Persuasion: Influence: The Psychology of Persuasion, Robert Cialdini
Websites & short articles
- Technical explanations: Better Explained, Kalid Azad
- Technical explanations: My favourite pedagogical principle: examples first!, Gowers’s Weblog
- Writing: Nonfiction Writing Advice, Scott Alexander
- Speaking: The Cognitive Style of PowerPoint, Edward Tufte
Videos & podcasts
- Speaking: How to Speak: Lecture Tips from Patrick Winston, Patrick Winston
- Writing: Writing (aka rewriting), The Effort Report (Elizabeth Matsui & Roger Peng)
- Biases & Heuristics: Cognitive Biases in Data Science, Drew Conway on This Week in Machine Learning and AI
- Writing & Speaking: Houston, We Have a Narrative: Why Science Needs Story, Randy Olson on New Books Network
551 Descriptive Statistics and Probability for Data Science
- Seeing Theory
- Introduction to Probability, Statistics, and Random Processes
- Harvard STAT 110 course plus YouTube videos
- Probability Cheatsheet and another one
552 Statistical Inference and Computation I
Textbooks
- Modern Dive: An Introduction to Statistical and Data Sciences by R by Chester Ismay and Albert Y. Kim
- OpenIntro Statistics by David M Diez, Christopher D Barr and Mine Cetinkaya-Rundel
553 Statistical Inference and Computation II
Reference books
- Probabilistic Programming for Hackers (Davidson-Pilon)
- Doing Bayesian Data Analysis (Kruschke)
- Advanced: the Bayesian Choice (Robert)
- The Theory of that would not die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy
Reference Material
- Probabilistic Programming and Bayesian Methods for Hackers (MDS alumni recommended!)
- Introduction to Empirical Bayes: Examples from Baseball Statistics
- Statistical Rethinking: A Bayesian Course with Examples in R and Stan plus exercises converted to PyMC3
- Bayesian Data Analysis book.
- JAGS with R
- A nice blog post summarizing how to write a JAGS model and input it into R (with the
rjags
package).
- A nice blog post summarizing how to write a JAGS model and input it into R (with the
- Quora: For a non-expert, what is the difference between Bayesian and frequentist approaches?
554 Experimentation and Causal Inference
TBD
561 Regression I
- Intro to Statistical Learning (ISLR), especially Chapter 3.
- A modern and approachable take on statistics / machine learning.
- R for Data Science (r4ds), especially Part IV.
- Practical and approachable book on the use of R for data science.
- Linear Models with R
- Comprehensive book on linear models.
- OpenIntro Statistics
- Fairly accessible, seems to lean towards a traditional approach. Chapters 7 & 8 are relevant for linear regression.
562 Regression II
- Julian J. Faraway. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Second Edition (Chapman & Hall/CRC Texts in Statistical Science), 2016.
- David G. Kleinbaum, Mitchel Klein (2012), Survival analysis: a self-learning text, 3rd edition
- Non-technical explanation of survival analysis, with a nice succinct summary along the side of each page.
- Recommends epidemiological background, but we will avoid those parts.
563 Unsupervised Learning
- [JWHT13]: James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning. 2013. Springer-Verlag New York
- [HTF09]: Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning. 2009. Second Edition. Springer-Verlag New York
- Week 1:
- Week 2:
- Mixture models notes from U. Toronto CSC 321
- Understanding mixture models and expectation-maximization (using baseball statistics)
- Model based clustering: Section 13.2 and 14.3 from HTF09
- EM algorithm: Section 8.5 from HTF09
- Information Theory, Inference, and Learning Algorithms, Chapters 20-22
- Week 3:
8 PCA explained visually
- Principal Components Analysis: Section 14.5 from HTF09
- Factor Analysis: Section 14.7 from HTF09
- Week 4:
- Multidimensional Analysis: Section 14.8 from HTF09
571 Supervised Learning I
Books
- Artificial intelligence: A Modern Approach by Russell, Stuart and Peter Norvig.
- Artificial Intelligence 2E: Foundations of Computational Agents (2017) by David Poole and Alan Mackworth (of UBC!). Freely available online at https://artint.info/2e/html/ArtInt2e.html.
- Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas C. Mueller and Sarah Guido.
- A Course in Machine Learning by Hal Daumé III (also relevant for DSCI 572, 573, 575, 563)
Online courses
- Machine Learning (Andrew Ng’s famous Coursera course)
- Foundations of Machine Learning online course from Bloomberg.
- Machine Learning Exercises In Python, Part 1 (translation of Andrew Ng’s course to Python, also relevant for DSCI 561, 572, 563)
Short posts/articles
- A Visual Introduction to Machine Learning (Part 1)
-
[Machine Learning What’s Inside the Box?](https://medium.com/@randylaosat/machine-learning-whats-inside-the-box-861f5c7e72a3)
Misc
- Metacademy (sort of like a concept map for machine learning, with suggested resources)
- Machine Learning 101 (slides by Jason Mayes, engineer at Google)
572 Supervised Learning II
ML-related textbooks
- James, Gareth; Witten, Daniela; Hastie, Trevor; and Tibshirani, Robert. An Introduction to Statistical Learning: with Applications in R. 2014. Plus Python code and more Python code.
- Russell, Stuart, and Peter Norvig. Artificial intelligence: a modern approach. 1995.
- David Poole and Alan Mackwordth. Artificial Intelligence: foundations of computational agents. 2nd edition (2017). Free e-book.
- Kevin Murphy. Machine Learning: A Probabilistic Perspective. 2012.
- Christopher Bishop. Pattern Recognition and Machine Learning. 2007.
- Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. 2005.
- Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman. 2nd ed, 2014.
Math for ML
- Mathematics for Machine Learning
- The Matrix Calculus You Need For Deep Learning
- Introduction to Optimizers
Other ML resources
Deep learning resources
- Dive into Deep Learning, a book based on STAT 157 at UC Berkeley.
- Deep learning YouTube series by 3Blue1Brown.
- Neural Networks and Deep Learning (free online book).
- Deep Learning. Ian Goodfellow, Yoshua Bengio and Aaron Courville.
- Deep Learning with Python. Jason Brownlee.
- Stanford UFLDL tutorial (or here)
- Geoff Hinton Coursera lectures
- CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)
- Grokking Deep Learning
- Practical Deep Learning For Coders, Part 1 and some more resources on their blog here
- A Guide to Deep Learning
- Awesome Deep Learning, which is a list of other resources
573 Feature and Model Selection
Relevant textbooks
- Artificial Intelligence: A Modern Approac
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
- Machine Learning: a Probabilistic Perspective
- Learning from data
Related resources
- An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, specifically:
- Chapter 2: Statistical Learning
- Chapter 5: Resampling Methods
- Chapter 6: Linear Model Selection and Regularization
- Chapter 7: Moving Beyond Linearity
- Python code associated with An Introduction to Statistical Learning
- R code associated with An Introduction to Statistical Learning
574 Spatial Temporal Mode
- Shaddick, Gavin and Zidek, James V. Spatio-Temporal Methods in Environmental Epidemiology. CRC Press, 2016.
- Chatfield, Chris. The Analysis of Time Series: An Introduction. CRC Press, 2003.
575 Advanced Machine Learning
TBD
Written on December 22, 2018