Data Science Reference Materials

source: below reference information is from UBC MDS program.

Linear algebra review

511 Programming Basics for Data Science

  1. Python documentation
    • The Python documentation is a great resource for learning Python especially the Python tutorial.
  2. Think Python: How to Think Like a Computer Scientist
    • “How to Think Like a Computer Scientist” is a standard textbook for introductory programming courses. It includes case studies and exercises.
  3. Advanced R, by Hadley Wickham (free version online)
    • This is a prominent resource for R as a programming language, allowing the reader to dig deep into R. It anticipates readers to already have some programming background. Its first part on Foundations is closely aligned with the objectives of DSCI 511, and is therefore the textbook for the second half of the course. Gaining familiarity with this book will likely be an asset in your data science career.
  4. R swirl
    • For those new to R who want more practice with the basics.

Here are other resources that you might find useful:

512 Algorithms and Data Structures

Online reference material

Books

  • Michael Goodrich and Roberto Tamassia, Algorithm Design: Foundations, Analysis, and Internet Examples, John Wiley & Sons, Inc. ISBN: 0-471-38365-1
  • Sanjoy Dasgupta, Christos Papadimitriou and Umesh Vazirani, Algorithms, McGraw Hill Book Company, 2008, ISBN 0-07-352340-2, also available here.
  • John Kleinberg and Eva Tardos, Algorithm Design, Addison-Wesley Publishing company, 2005, ISBN 0-321-29535-8.
  • Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein, Introduction to Algorithms, 3rd edition, MIT Press, 2009, ISBN 0-262-03384-4.
  • Donald E. Knuth, The Art of Computer Programming, Volume 1, Third edition: Fundamental Algorithms, Addison-Wesley Publishing company, 1997, ISBN 0-201-89683-4.
  • Donald E. Knuth, The Art of Computer Programming, Volume 2, Third edition: Seminumerical Algorithms, Addison-Wesley Publishing company, 1998, ISBN 0-201-89684-2.
  • Donald E. Knuth, The Art of Computer Programming, Volume 3, Second edition: Sorting and Searching, Addison-Wesley Publishing company, 1998, ISBN 0-201-89685-0.
  • Donald E. Knuth, The Art of Computer Programming, Volume 4a: Combinatorial Algorithms, Part 1, Addison-Wesley Publishing company, 2011, ISBN 0-201-03804-8.

513 Database and Data Retrieval

  • Ramakrishnan, Raghu and Gehrke, Johannes. Database Management Systems, 3rd Edition, McGraw-Hill, 2002. (http://pages.cs.wisc.edu/~dbbook/)
  • SQL Tutorials on W3Schools

521 Computing Platforms for Data Science

no textbook

522 Data Science Workflows

Textbooks:

523 Data Wrangling

524 Collaborative Software Development

N/A

525 Web and Cloud Computing

TBD

531 Data Visualization I

Here are prominent course resources that we will be referring to.

  1. R for Data Science (r4ds)
    • Overall good book on using R for data science – including data vis, of course!
  2. ggplot2 book
    • Readable, comprehensive resource for learning about ggplot2, by the main author of the ggplot2 package, Hadley Wickham.
  3. STAT 545 “All the Graph Things” by Jenny Bryan.
    • Contains tutorials relevant to our subject matter.

Other resource that you might find useful:

  1. R Graphics Cookbook
    • Good as a reference if you want to learn how to make a specific type of plot in ggplot2.
  2. Jenny Bryan’s ggplot2 tutorial
    • Has a lot of examples and less dialogue.
  3. ggplot2 cheat sheet
    • Great for quick reference if you need something beyond tab-completion.
  4. “Visualization Analysis and Design” by Tamara Munzner, CRC Press, 2014.
    • The go-to book for data vis theory.

532 Data Visualization II

  • “Visualization Analysis and Design”, by Tamara Munzner
    • Covers vis theory in detail

541 Privacy, Ethics, Security

TBD

542 Communication and Argumentation

Books

Websites & short articles

Videos & podcasts

551 Descriptive Statistics and Probability for Data Science

552 Statistical Inference and Computation I

Textbooks

  1. Modern Dive: An Introduction to Statistical and Data Sciences by R by Chester Ismay and Albert Y. Kim
  2. OpenIntro Statistics by David M Diez, Christopher D Barr and Mine Cetinkaya-Rundel

553 Statistical Inference and Computation II

Reference books

  • Probabilistic Programming for Hackers (Davidson-Pilon)
  • Doing Bayesian Data Analysis (Kruschke)
  • Advanced: the Bayesian Choice (Robert)
  • The Theory of that would not die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy

Reference Material

554 Experimentation and Causal Inference

TBD

561 Regression I

  1. Intro to Statistical Learning (ISLR), especially Chapter 3.
    • A modern and approachable take on statistics / machine learning.
  2. R for Data Science (r4ds), especially Part IV.
    • Practical and approachable book on the use of R for data science.
  3. Linear Models with R
    • Comprehensive book on linear models.
  4. OpenIntro Statistics
    • Fairly accessible, seems to lean towards a traditional approach. Chapters 7 & 8 are relevant for linear regression.

562 Regression II

563 Unsupervised Learning

571 Supervised Learning I

Books

  • Artificial intelligence: A Modern Approach by Russell, Stuart and Peter Norvig.
  • Artificial Intelligence 2E: Foundations of Computational Agents (2017) by David Poole and Alan Mackworth (of UBC!). Freely available online at https://artint.info/2e/html/ArtInt2e.html.
  • Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas C. Mueller and Sarah Guido.
  • A Course in Machine Learning by Hal Daumé III (also relevant for DSCI 572, 573, 575, 563)

Online courses

Short posts/articles

Misc

  • Metacademy (sort of like a concept map for machine learning, with suggested resources)
  • Machine Learning 101 (slides by Jason Mayes, engineer at Google)

572 Supervised Learning II

ML-related textbooks

  • James, Gareth; Witten, Daniela; Hastie, Trevor; and Tibshirani, Robert. An Introduction to Statistical Learning: with Applications in R. 2014. Plus Python code and more Python code.
  • Russell, Stuart, and Peter Norvig. Artificial intelligence: a modern approach. 1995.
  • David Poole and Alan Mackwordth. Artificial Intelligence: foundations of computational agents. 2nd edition (2017). Free e-book.
  • Kevin Murphy. Machine Learning: A Probabilistic Perspective. 2012.
  • Christopher Bishop. Pattern Recognition and Machine Learning. 2007.
  • Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. 2005.
  • Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman. 2nd ed, 2014.

Math for ML

Other ML resources

Deep learning resources

573 Feature and Model Selection

Relevant textbooks

Related resources

574 Spatial Temporal Mode

  • Shaddick, Gavin and Zidek, James V. Spatio-Temporal Methods in Environmental Epidemiology. CRC Press, 2016.
  • Chatfield, Chris. The Analysis of Time Series: An Introduction. CRC Press, 2003.

575 Advanced Machine Learning

TBD

Written on December 22, 2018