I took the The Data Science Team Lead Foundations course (DS-TLF) as part of Nyla’s Continuous Learning benefit program, which allows employees up to $5000 per year to use for classes, conferences, certifications, or time off to study at home. The DS-TLF is an online self-paced course offered by the Data Science Process Alliance (DSPA), which strives to “help data science leaders, teams, and organizations apply effective project management to improve data science outcomes.”
DS-TLF costs $795 and includes the following four modules and an exam:
- Module 1: Data Science and Agility
- Module 2: Data Science Project Lifecycles (CRISP-DM, OSEMN, TDSP, etc.)
- Module 3: Collaboration Frameworks (Scrum and Kanban)
- Module 4: Data Driven Scrum
- Final Exam
DS-TLF also includes a real-world case study, curated blogs, white papers, and Q&A via email with an instructor. For information about the Data Science Process Alliance and their courses, visit their website here.
Why I took this course
Simply put, I took DS-TLF to do data science better. I needed a way to organize data science projects and teams to increase transparency (know who is doing what and why, at any given time) and increase throughput (deliver more value to the customer faster).
Data science doesn’t happen in a vacuum. Data scientists need to collaborate with a multitude of people within cross-functional teams composed of software developers, data engineers, and systems engineers to complete large-scale projects. Suddenly, concepts like version control, logging, testing, task prioritization, communication, etc. become increasingly important.
‘Data scientist’ has been my official work role for a few years now, and I can’t say I’ve seen a working model for data science project management in the workplace. Teams often deploy an ad hoc approach to completing data science projects. However, as the team gets more projects and data from disparate customers, it becomes wildly apparent that ad hoc doesn’t scale. Tasks take long to complete and priorities get lost in translation. On the other end of the spectrum, some data science teams experience external pressure to treat data science like software development, which completely ignores the nuances of data science projects. For example, I’ve been asked to code in Java instead of Python by very talented software developers. I have nothing against Java. In fact, I learned how to code in Java, but it’s certainly not my choice of programming language for data science. pandas, please!
My biggest takeaways:
- Data science project management is composed of a data science life cycle, which describes the phases or steps in a data science project, and a collaboration framework, which describes how teams should work together to develop, implement, and maintain complex products.
- Data science life cycles (CRISP-DM, OSEMN, etc.) help create a shared mental model for data science teams and their customers. Data scientists can use data science life cycles to explain data science projects and their dependencies to non-data scientists.
- Not every project requires each stage described in a data science life cycle. For example, plenty of data science projects will conclude with a simple report after exploratory data analysis, not a machine learning model.
- Data science life cycles are customizable. For example, if you really like CRISP-DM as a data science life cycle but wish it included a phase for monitoring models after deployment, add it yourself to suit your needs!
- Data Driven Scrum (DDS) combines the elements of Scrum and Kanban into a collaboration framework for data science. However, it’s worth noting that DDS is not Scrum since “the Scrum framework, as outlined [in the Scrum guide], is immutable. While implementing only parts of Scrum is possible, the result is not Scrum.”
- You can integrate DDS with your choice of data science life cycle. One way to combine the two is to color code the cards on your DDS Task Board (Kanban Board) by phases of your chosen data science life cycle.
What I did with the skills I acquired:
After completing the DS-TLF course and having a few Q&A emails with my instructor, I pitched a data science project management proposal to my team. I delivered my proposal in a PowerPoint presentation that explained the motivation behind the proposal, or the benefits of DS project management; an overview of Uber’s data science life cycle, which I adapted to meet our needs; an overview of DDS; and, finally, an overview of the tools that already exist in our ecosystem that we could use to implement the proposal (Jira Software and Microsoft SharePoint, in our case).
A few days after my presentation, I sent out a poll to my team members asking them if we should move forward with the proposal and to share their concerns. The motion passed and we’re now in the process of setting up!
I liked …
- The course was well-organized and the pre-recorded discussions amongst the instructors were helpful.
- The final exam was easy and the certification, as well as course access, lasts forever.
- The course includes Q&A with one of the three course instructors.
I did not like …
- Unfortunately, DS-TLF doesn’t mention how you would implement DDS for a data science team that is responsible to multiple customers and, therefore, multiple products. (The answer is having a Chief Product Owner to rule all Product Owners.) Like Scrum, DDS implicitly assumes that teams are responsible for one product.
- DS-TLF doesn’t cover how you would broach data science project management with your team.
- The course is overpriced at $795. You can get most of the course content by reading DSPA’s blog, the Scrum guide, the Data Driven Scrum guide, the Agile manifesto, and articles about the different life cycle frameworks. DS-TLF synthesizes the aforementioned information.
Overall, I’m glad I took the course. It was great to learn from and connect with people who have data science project management at the forefront of their minds. However, the course price makes it difficult for me to recommend it to anyone else.
UPDATE: It has been a couple of months since my team and I decided to prioritize data science project management. Although we do not follow DDS exactly as it is laid out – a compromise I was willing to make – we do have regular 10-minute stand-ups and retrospectives. We have also implemented controls within our Gitlab to enforce our agreed upon merging strategy and testing standards, which has greatly improved the quality of our code.
That being said, implementing DDS was not trivial and we learned a couple of lessons along the way. First, simplicity is everything. In the beginning, we decided to use Jira to track our issues within a Kanban board. However, Jira does not integrate nicely with Gitlab and it is a chore to move across platforms to track tickets. So, we have moved to using Gitlab issues and boards. Second, changing the team culture is difficult. The progress we have made so far is a result of finding like-minded people to band together with to effect change. Change doesn’t happen overnight, unfortunately, but it has been a great experience so far and I look forward to continuously improving our system.