A guide for data science self-study

My path to becoming a data scientist has been untraditional.

After receiving a B.S. in chemical engineering, I worked as a process engineer for a wide range of industries, designing manufacturing facilities for products as varied as polyurethanes, pesticides, and Grey Poupon mustard.

Tired of long days on my feet starting up production lines and longing for an intellectual challenge, I discovered data science in 2017 and decided to pivot my career.

I participated in Springboard’s part-time online bootcamp and managed to land a job as a junior data scientist shortly thereafter. But my journey to learning data science was really only just beginning.

Besides the bootcamp, all my data science skills are self-taught. Fortunately, today’s era of education democratization has made that kind of path possible. For those that are interested in pursuing their own course of self-study, I’m including my recommended classes/resources below.

Python

MIT OCW’s Introduction to Computer Science and Programming

I enjoy lecture-style classes with corresponding problem sets, and I thought this class catapulted my Python skills farther and faster than a lot of the online interactive courses like DataCamp.

Additionally, this course covers more advanced programming topics like recursion that—at the time—I hadn’t thought were necessary for data scientists. Fast-forward six months when I was asked a question on recursion during the interview for my first data science job! I was so grateful to this course for providing a really solid education in Python and general coding practices.

Algorithms and Data Structures

Coursera’s Data Structures and Algorithms Specialization

I only completed the first two courses (Algorithmic Toolbox and Data Structures) of the specialization but I don’t believe the more advanced topics are necessary for your average data scientist.

I can’t recommend these courses highly enough. I originally had enrolled just hoping to become more conversant with common algorithms like breadth-first search but I found myself using these concepts and ways of thinking at my job.

The professors who designed these online classes have done a fantastic job of incorporating games to improve your intuition about a strategy and designing problem sets that force you to truly understand the material. There’s no fill-in-the-blank here—you’re given a problem and you must code up a solution.

I also recommend starting with the Introduction to Discrete Mathematics for Computer Science specialization, even if you already have a technical background. You’ll want to make sure you have a solid foundation in those concepts before undertaking the DS&A specialization.

Linear Algebra

MIT OCW’s Linear Algebra

I needed a refresher on linear algebra after barely touching a matrix in the ten years since my university days. And MIT’s videotaped lectures from 2010 with associated homework and quizzes was a great way to cover the basics.

The quality of this course is entirely thanks to Professor Gilbert Strang, who is passionate about linear algebra and passionate about teaching (a rare combination). He covers this subject at an approachable level that doesn’t require much complicated math.

I did supplement this course with 3Blue1Brown‘s YouTube series on Linear Algebra. These short videos can really help visualize some of these concepts and build intuition.

Machine Learning

Andrew Ng’s Machine Learning

Taking this course is almost a rite of passage for anyone choosing to learn data science on their own. Professor Andrew Ng manages to convey the mathematics behind the most common machine learning algorithms without intimidating his audience. It’s a wonderful introduction to the ML toolbox.

My one gripe with this course is that I didn’t feel like the homework really added to my understanding of the algorithms. Most of the assignments required me to fill in small pieces of code, which I was able to do without fully comprehending the big picture. I took the course in 2017, however, so it’s possible this aspect of the class has improved.

Deep Learning

fastai’s Practical Deep Learning for Coders

Andrew Ng’s Deep Learning course is just about as popular as the Machine Learning course I recommended above. But after completing his DL class, I only had a vague understanding of how neural networks are constructed without any idea of how to train one myself.

The folks at fastai take the opposite approach. They give you all the tools to build a neural network in the first few lessons and then spend the remaining chapters digging into opening the black box and discussing how to improve the performance. This is a much more natural way of learning and leads to better retention upon course completion.

There are videotaped lectures discussing these concepts but I would recommend just reading the book because the lectures don’t add any new material. The book is actually a series of Jupyter notebooks, allowing you to run and edit the code.

Closing

I will warn that this path is not for everyone. There were many times when I wished I could work through a problem with a classmate or dig deeper into a concept with the professor. Online discussion forums for these kinds of classes are not the same as real-time feedback. Perseverance, self-reliance, and a lot of Googling are all necessary to get the most out of a self-study program.

The variety of skills and knowledge data scientists are supposed to have can be overwhelming to newcomers in this field. But just remember—no one knows it all! Simply embrace your identity as a lifetime learner, and enjoy the journey.

One Reply to “A guide for data science self-study”

Comments are closed.