When a company desires a new Data Science capability I am called in to design their Big Data infrastructure. I try to explain to Business people how Data Science, Data Engineering, and Data Analytics fit together and overlap without getting technical. Sometimes they make the mistake of thinking of Data Scientists as just part of IT instead of a first-level business function. This is not unreasonable because Data Scientists are usually techies who can program.
Instead a business should think about their Data Science Capability somewhat like a Spidey sense super-power: it should give them the capability to detect problems and change the business direction quickly.
I don’t think any common department name suits them yet,
Data Science is not Data Engineering
Data Engineering is different because it usually has clear business tasks to achieve. Data needs to be prepared for reports, analytics, and alerts. Engineers build the systems which calculate those reports, even if business people design the reports in a self service manner. Engineers figure out how to get the data from where it is generated to where it is needed and in a form that it can be efficiently used.
These complex systems need to be built but they are mostly “Business As Usual” systems. They help the business fine tune its behaviour but do not radically change the business.
Data Science Lifecycle
Data Science is a bit different from that. Instead of aiming to produce reports every day, (or hour, or month) Data Science asks business questions which no one has had good answers to before. It must be managed as part of the business decision making process to get the most out of it. There is no point in asking a question which the business doesn’t care about. It must be a search for new knowledge that senior management want to know, and can do something with. In some ways Data Scientists produce actionable stories based on facts.
It is important to realise that as “Research” Data Science cannot guarantee to always get the result you want. No matter what level you are in your business it is worth understanding the Data Science Lifecycle.
- Hypothesis generation.
- Gathering Data,
- Cleaning Data,
- Exploring Data
- Model building and testing
- Interpret and make use of the results.
Diagram
This lifecycle is quite simplified, of course. At every stage you might loop back to an earlier stage and start again.
Step 1. The Business Question
The hypothesis should be thought of as the Business Question. If you are a senior manager then you should be helping here. Hopefully your Data Scientists understand the business well enough to come up with those themselves.
Steps 2,3, and 4
These all sound like traditional IT but are not. In some ways the data collection and cleaning overlaps with the work of traditional Data Analysts. Data Scientists require deep understanding of the data. They figure out what to collect, how to clean it, and determine what features are important. Sometimes the data is in unstructured formats like voice recordings or English text. The Data Scientists may require the help of the IT department but at this stage we don’t know whether the task can be done with a small laptop or whether it requires a full blown Big Data framework.
Step 5
Model building and testing is the unique skill of the Data Scientist. I believe it requires significant mathematics skills, though Auto-ML is trying to automate this stage though with varying success. It is important for them to understand the science so as to understand how confident people should be that the data leads to the conclusions.
Step 6 Communicating the Results
This is where the business comes in again. Hopefully the Data Scientist has answered the business question, with a Yay or a Nay, but may admit that there just isn’t enough data to answer the question. The business may need to change, and senior management must buy in to that possibility and be keen to improve their company, A Data Scientist needs to be a good communicator, unlike most IT engineers.
Hopefully you have learnt how the Data Science capability fits in to a Business. The number one take-away is to not treat Data Science as just an IT function, but part of your decision making process. Future articles from me will talk about the overlapping topics of Data Engineering, Data Analytics, Machine Learning Engineering, and Big Data.
Credits:
- Spiderman meme image supplied by imgflip.com
- The Data Lifecycle steps named above were paraphrased from TowardsDataScience.com https://towardsdatascience.com/5-steps-of-a-data-science-project-lifecycle-26c50372b492
- Header Photo by Mika Baumeister on Unsplash
Author:
Alex McLintock has over 25 years in the IT industry with the last 8 of those focussed on the Architecture of Big Data Analytics Systems