Data Science has come to represent the proactive use of data and advanced analytics to drive better decision making. While there is broad agreement around this, the skillsets of a Data Scientist are still something that generates debate (and endless venn-diagram-filled blog posts).
A common element around this debate is the frequent exclusion criteria placed on the role. Something like, “if someone has this skill/qualification then they are not a Data Scientist”, which is typically stated confidently by a self-identified Data Scientist who has — surprise, surprise — exactly the skill/qualification in question. Some recent examples of this that I’ve experienced, include:
- If you’re not a statistician you’re not a Data Scientist
- If you can’t build a Recommender Engine you’re not a Data Scientist
- If you don’t have a PhD you’re not a Data Scientist
For the record, I know some fantastic Data Scientists who:
- Wouldn’t self-identify as a Statistician (e.g. they come from machine learning background)
- Have never needed to build a Recommender Engine (maybe because the area they work in has never required that)
- Don’t have a PhD (or an Msc)
Now, I’m not saying “everyone is a Data Scientist” and I do think there’s an inherent danger in not defining some sort of criteria. However, with the money to be made in the world of Data Science, it’s no wonder that we’re in a situation where consultants with any sort of data skills are re-badging themselves as Data Scientists and increasing their day rates.
The concern here, of course, is that organisations will invest in non-Data-Sciencey-Data-Scientists (we’re getting pretty technical here), but not see the value they expected. This could ultimately have a negative impact on the world of Data Science in the same way that the Big Data world has been tainted by examples of over-investment in Big Data tech (people tend to be saying “hey, let’s build a data lake” a little more sheepishly than a few years ago).
So, what makes a Data Scientist a Data Scientist? Without specifying technologies you must use, algorithms you must know, or qualifications you must have, there appears to be some consensus around ‘minimum skills’ (although please let me know if you disagree):
Advanced Analytics
The word ‘Analytics’ is incredibly broad and encompasses everything from adding up some numbers to fitting advanced mathematical models. I feel a Data Scientist is someone who applies Advanced Analytic techniques (such as Predictive/Prescriptive analysis techniques based on statistical or machine learning).
While Business Intelligence is vital, I think that someone who spends their time building dashboards but not modelling would not be a Data Scientist.
Broad vs Deep Methodology
Many ‘statistical’ roles in the last few decades were largely reactive, in that their remit was narrow and long-established. This meant that the range of analytic techniques would likely also be narrow and statisticians ended up with a deep knowledge in a particular methodology rather than a broad understanding of analytic approaches.
For example, in my first role I almost exclusively used linear models, whereas in my next role it was all about survival models. As a Data Scientist is being asked to proactively solve a wider range of problems, they at least need an appreciation of the broader possibilities and the ability (to some extent) to be able to consume and apply a new methodology (once assumptions of those methods are understood).
Coding, not scripting
There’s a significant difference between someone who ‘writes scripts’ and someone who can really code. In the early 2000s, I spent a great deal of time as a statistician writing SAS code, where my primary output was ‘insight’ and the code I wrote was more of a by-product of what I did as opposed to a deliverable. For what it’s worth, I wouldn’t class that earlier version of me as a ‘Data Scientist’.
I think a Data Scientist is more of a programmer, where the code they write is part of what they deliver, and therefore needs to be scalable and written with formal development practices in mind. I’m not saying that every Data Scientist needs to be a master of DevOps (although that would be nice!), but some element of coding rigour should be essential.
Doing Science
The last thing that, for me, sets Data Scientists apart is the way they approach a challenge. When hiring, I’m looking for someone who fundamentally sees data as an opportunity and has an inherent curiosity about what insight that data will contain. Beyond that, a Data Scientist’s approach is fundamentally a ‘scientific’ one, where assumptions are created and tested using the data and available analytic methodologies.
While exclusion criteria such as the ones I’ve written above may feel unfair, but I think they are necessary to be able to delineate the career of a Data Scientist and to distinguish it from the variety of other data roles. Ultimately, the criteria will vary because each organisation will need different types of Data Scientists to achieve their goals. However, establishing a base for what would be considered a ‘Minimally Viable Data Scientist’ feels vital to the success of Data Science as a whole.
Where to from here?
If you’re looking to become a Data Scientist, then I hope this helps you to understand the skills needed.
If you’re looking to hire a Data Scientist, then make sure you know what you are trying to achieve and check potential hires have the skills required to deliver the value you’re expecting.
How Mango can help
We’ve been helping companies with their data analysis since 2002. We now also work with organisations to build their data science capability and develop their data science strategies. Talk to us about how we can help you make the most of your data to strengthen your business: [email protected]