Resources

This page collects a variety of references, organizations, data repositories, links, and books that I find useful or motivating. In all cases, the lists are highly partial, in reflection of my own experience and limitations. If you have a suggestion for an addition, please let me know!

Data Science, Math, Technology, and Social Justice

Feminism, Antiracism, and Anti-Colonialism More Broadly

Network Theory

Machine Learning

Useful Math

Data Sets

  • Tidy Tuesday is an initiative organized by the R For Data Science online learning community. Each week, they pose a different data analysis problem in which people can practice their programming and data science skills. The collection of data sets is particularly nice.
  • The machine learning competition website Kaggle hosts a large variety of data sets suitable for various data science tasks.
  • Data sets for network science:
    • The Colorado Index of Complex Networks (ICON) hosts a large variety of network data sets spanning a large variety of research fields. ICON is curated by the group of Aaron Clauset at CU Boulder.
    • The Stanford Large Network Dataset Collection (SNAP) hosts a wide range of network data sets. SNAP is curated by Jure Leskovec and Andrej Krevl at Stanford University.
    • Austin Benson at Cornell hosts a collection of data sets for a range of problems related to graphs and hypergraphs.
    • The Data Science For Good Lab, led by Michael Fire at Ben-Gurion University of the Negev, hosts a number of very interesting data sets. Many of these have network structure.
    • UCLA students Christine Gu, Yu-Hsin Huang, and Shaodian Wang assembled a data set of Reddit submissions and comments related to the COVID-19 vaccine. You are welcome to access the data and use it in projects. Please acknowledge Christine, Yu-Hsin, and Shaodian in any published work that uses this data.
  • Congress In Data collects a wide range of data sets, including many with network structure, on the US Senate and House of Representatives.

Organizations

  • “The society of Women in Network Science (WiNS) connects women, trans and non-binary gender network scientists from different races, socioeconomic backgrounds, and nations. The society aims to recognize the work, perspectives and expertise of its members to create bridges between academia, government, and private industry related to network science.”
  • I am a Partner at QSIDE, the Institute for the Quantitative Study of Inclusion, Diversity, and Equity. QSIDE has a number of ongoing projects and welcomes collaborators.
  • QSIDE recently released their Data4Justice Curriculum, which contains sample lesson plans, code, readings, and data sets.
  • The Just Mathematics Collective is an international collective of mathematicians whose goal is to “to shift the global mathematics community towards justice, via genuine anti-racism, anti-militarism, and solidarity with the Global South.”

Programming

Python

  • CS For All, a website and book developed for brand-new programming learners by the Department of Computer Science at Harvey Mudd College.
  • Lecture notes and videos from PIC16A, my course on core skills in Python programming and data science.
  • A Whirlwind Tour of Python by Jake VanderPlas is an excellent, rapid overview of fundamental Python skills. It is suitable for those who have experience in several other programming languages, or for those who previously learned Python and just need a brush-up.
  • Lecture notes from PIC16B, my course on advanced computational and data science in Python.
  • The Python Data Science Handbook by Jake VanderPlas is an excellent and freely-available online resource for practical data science in Python.

R

  • R for Data Science by Hadley Wickham and Garrett Grolemund is my favorite “0 to data analysis” text. Great chapters on data wrangling, visualization, modeling, and communication.
  • Folks with a bit of prior programming experience might like reading Jenny Bryan’s STAT 545, which covers many of the same topics but also addresses workflow considerations like version control, automation, and interactivity.
  • Advanced programmers who want to develop their own R packages should consult R Packages by Hadley Wickham and Jenny Bryan.

Julia

Other

  • The Missing Semester is an MIT course that aims to train you in fundamental tools for practical computer science that you may not have encountered in other classes. These include shell scripting, text editing, version control, profiling, and much more. Detailed lecture notes and high-quality lecture videos are available on their website.

Other Data Science Technical Resources

  • Dirk Eddelbuettel (University of Illinois) hosts a website with a wide array of resources for his course Data Science Programming Methods.
  • Sanjay Lall and Stephen Boyd are running an interesting course on machine learning with the Julia programming language.
  • Programming for Data Science is a course in the nuts and bolts of writing code for data analysis using R. One thing I especially like about this course is that it introduces machine learning through the topic of algorithm evaluation and auditing. The course is taught by Dr. Sarah Brown at the University of Rhode Island.

Pedagogy

Humor

  • Many mathematics memes collected by Wyatt Deimel and Sam Willoughby, with contributions from Julia Engholm and Bella Rieder.