There are many different jobs in the space of data and machine learning which was initially confusing to navigate. This page is a quick summary as a introductory guide.


Data Engineer

Overview:

Key Responsibilities

Tech Stack

Typical Education:


Data Analyst

This also includes business analysts, healthcare analyst, financial analysts and even fashion analyst. Using the data to figure out and report how the business is doing. They focus primarily on the explorations and explanations. They build visualisations and calculate summary statistics of data to make it more easily digestible for others to consume. Using the software tools like Microsoft Excel, Tableau, PowerBI. Or using the languages SQL, R, or Python. Aim is to produce actionable insights from the data. Final deliverable will be a report, presentation or dashboard. Answering important questions “how much of our product did we sell over the last three quarters?”, “Here’s a breakdown of our product demographics”. In order to be effective in this role, the analyst must have good domain knowledge and understand the business need. Wide range of backgrounds, BSc in any quantitative or analytical degrees. Maths, Statistics, Economics, or even Psychology, Biochemistry, Engineering etc.


Data Scientist

Using more sophisticated mathematical and statistical techniques, as well as machine learning methods as tools to mine information from the data, for business intelligence or to make predictions about the future. This is somewhere in between the previous two roles. It is less technically demanding than data engineer but still requires a strong understanding of programming and computer science. It is not as concerned with data reporting as the analyst, but still requires a strong understanding of the domain. The Workflow is more like a scientific process, hence the name. A data scientist might construct a hypothesis based on observations/theory then falsify that hypothesis using the data to draw conclusions. Examples: using statistical analysis to model consumer psychology and forecast future demand, exploring geospatial time series data to see how economic conditions have changed over time in different boroughs of London. Performs controlled experiments, like A/B testing, hypothesis testing. Using Python, R, Scala, Julia. Final deliverable will be a report or presentation, or a predictive model. Often these positions require an advanced degree like an MSc or PhD in a numerical STEM field, maths, stats, CompSci, engineering, physics, or economics, psychology etc.


Machine Learning Scientist / Research Scientist

Researches different ML models and algorithms to figure out which would be the best to implement to solve a given problem. “whos got TikTok?” reinforcement learning on Tik Tok algorithm. In the academic terminology That algo is a learner that learns what you like based on what action you take, like, share, favourite or scroll. Likewise Netflix, YouTube and Spotify recommendations are powered by similar technology. The final output is a well formulated model which is ready to be productionised by machine learning engineers. Other popular areas of research are Computer Vision for self driving cars, or natural language processing for Amazon Alexa and Siri. Typically this is another role that requires a specialised advanced degree like an MSc or PhD specifically in CompSci, Applied Maths/Stats, Machine Learning, physics or engineering.


Machine Learning Engineer

Another technical role. Works on deploying and maintaining the models that are theorised, researched and formulated by the ML scientist. They need to figure out how to feed the data into the model so that it can operate in real time. This is like a specialised kind of Data Engineer. They build the end to end machine learning pipeline that takes raw data, extracts it, transforms it, loads it into the machine learning model, and produces useful output.