Data Scientists and Data Engineers. How does that all fit together?
You have to look at the data science process to understand the Data Engineers role.
How stuff is created and how data science is done. How machine learning is done.
This is a two part blogpost. Below, we are looking only at the machine learning process.
In the next post we are also going to look at the data needed to train and apply models. Then the data engineers role will make totally sense.
The Training Phase
The machine learning process shows, that you start with a training phase. A phase where you are basically training the algorithms to create the right output.
In the in the learning phase you are having the input parameters. Basically the configuration of the model and you have the input data.
What you're doing is you are training the algorithm. While training, the algorithm modifies the training parameters. It also modifies the used data and then you are getting to an output.
Once you get an output you are evaluating. Is that output okay, or is that output not the desired output?
if the output is not what you were looking for? Then you are continuing with the training phase.
You're trying to retrain the model hundreds, thousands, hundred thousands of times. Of course all this is being done automatically.
Putting the model into production
Once you are satisfied with the output, you are putting the model into production. In production it is no longer fed with training
data it's fed with the live data.
It's evaluating the input data live and putting out live results.
So, you went from training to production and then what?
What you do is monitoring the output. If the output keeps making sense, all good!
If the output of the model changes and it's on longer what you have expected, it means the model doesn't work anymore.
You need to trigger a retraining of the model. It basically gets to getting trained again.
Once you are again satisfied with the output, you put it into production again. It replaces the one in production.
This is the overall process how machine learning. It's how the learning part of data science is working.