Kaggle, a data scientist company and subsidiary of Google, offers 12 free micro-courses designed to improve data science skills. Each course is between 1 and 7 hours and is comprised of a few lessons each. The lessons consist of explanations of concepts with examples followed by labs of exercises with hints and solutions, if needed. The labs are presented in a notebook and you can run all code there through the Kaggle website.
I pursued these courses because I’m using both Python and SQL in my current project and also wanted to brush up on some skills and concepts. I’d taken Python and SQL classes in college, and I’ve used the languages professionally, but I was hoping that I’d be reminded of some useful tips and maybe even learn something new. I was not disappointed!
I started with the first two courses: Python and Intro to Machine Learning.
About Kaggle Courses
An overview of all courses, and links to each course, can be found here: https://www.kaggle.com/learn/overview
Kaggle also supports multiple discussion forums, one of which is dedicated to their courses. I have not yet used this resource, but it will be useful to search for answers or ask a question for any of the courses I’ll take in the future.
These courses don’t assume everyone is at the same level. There are links such as “skip straight to the hands-on exercise” on the top of some lessons and you’re welcome to skip any lessons or exercises you feel you don’t need to complete. I read or skimmed all the lessons and then skipped only a few exercises.
These problems can be completed pretty quickly, and the harder problems are an enjoyable challenge (and noted with chili peppers; example shown from the Python course).
Kaggle Python Course ReviewFor reference, the Python course is 7 lessons and states it takes 7 hours; I spent 3 hours and 15 minutes on it. In the Python course, I was reminded of some valuable code that I can implement into my programs at work:
- To switch the values of 2 variables, one can use the following code instead of using a temp variable.
>>> a = 1 >>> b = 2 >>> a, b = b, a >>> a 2 >>> b 1
- Floor division provides the integer of the quotient regardless of the existence of a remainder.
>>> 22 // 6 3
- Zero (0) and empty strings (‘ ‘) return False; this is especially useful in my project because a lot of data are saved as empty strings instead of nulls.
>>> a = 0 >>> b = ‘ ‘ >>> c = 1 >>> d = ‘hi’ >>> a == True False >>> a == False True >>> b == True False >>> b == False True >>> c == True True >>> d == True True
- The any() function returns True if any of the elements provided are True.
>>> any([a > 7 for a in [1, 2, 5, 8, 2]]) True >>> any([a > 7 for a in [1, 2, 5, 2]]) False
- The enumerate() function provides the index and the element in a list (I can definitely improve some programs by implementing this and the any functions).
>>> for index, element in enumerate([‘a’,’b’]): >>> print(index) >>> print(element) 0 a 1 b
- Since I often present results (e.g. data quality analysis reports), I’ve done a bit of work to format strings printed to the screen or to a file. Any help to make this easier is greatly appreciated. The rjust() function formats a string to be right-justified using a given width and filler (default is a space); this is just one example of the available functions to help with text formatting.
>>> a = ‘123’ >>> a.rjust(10, ‘0’) 0000000123
Kaggle Intro to Machine Learning Course Review
The Intro to Machine Learning course is 7 lessons and states it takes 3 hours; I spent 1 hour and 45 minutes on it.
I have taken numerous machine learning courses and have implemented concepts in projects, so all of the concepts were a review for me. However, I have never trained with the Python pandas library, which this course introduced and used. It was helpful to practice with pandas by creating and using data structures for analysis. I have already requested this library to use at work. You can load a table into a DataFrame using pandas and it retains its column, row structure. In the example below, data from a CSV file is loaded into a pandas DataFrame, and then the columns property is used to print all columns in the data set.
I also revisited scikit-learn, a machine learning library for Python (note: when importing, the library is simply called sklearn). Using these libraries, the course covers decision trees, model validation, over- and under-fitting, and random forests.
Final Thoughts on Kaggle Courses
Overall, the lessons were succinct and the exercises were fun and sometimes tricky. I was legitimately excited to do the problems and looked forward to the next set! The Kaggle website is easy to navigate, progress is well tracked, and I appreciated all the pleasant colors and modern design. The example below shows my course status with delightfully encouraging sentiments (note: I skipped some of the Python exercises, so that’s what the website is tracking).
I enjoyed these courses and learned (or re-learned) quite a bit that I can immediately implement into current projects. I definitely recommend these courses for anyone who wants to brush up on related skills. The courses are free and it is easy to focus on what would be most useful or interesting for you.