How do you implement an ML model into production?

How do you implement an ML model into production? by Yassine Alouini

Answer by Yassine Alouini:

In order for an ML model to work in a production setting, you need to answer some of these questions:

  1. Is your model trained?
  2. If it isn’t, how do you train it?
  3. How do you serve the predictions to your clients?
  4. How do you update your model?
  5. How do you version different variations of the same model?
  6. How do you (continuously) optimize hyperparameters for your model?
  7. How do you store metadata about different models?
  8. What happens when a model takes too much time to train?

Let me answer some of these by breaking the process of putting a model into production into three different phases. I will mainly focus on high-level concepts rather than the implementation details.

————————————————————————————————————————

Training a model

In order to train an ML model, you first need to choose one after going through an experimentation phase. This phase won’t be detailed here since it isn’t part of the “production” step.

Once you have chosen an ML model (let’s say an ensemble of decision trees), you need to collect data (both features and targets), process it and then apply the model.

The output would be a trained model that you will store somewhere.

You will also need to store metadata about the model: how long it took to train, used features and targets to train it, the model type, the author(s) of the model, a timestamp of the creation, a version number, used packages, where it is stored and so on.

These metadata are essential to insure that the model is reproducible, maintainable, and can be recovered in case of problems. This is also essential to ease the deployment step (more about this in the next section).

The training process is usually done offline (since it takes a lot of time and requires a lot of computation resources).

Deploying a model

Now that you have a model (or many), you need to deploy it by creating a deployment service.

A common strategy is to build a REST API with a GET route that will take as input model and features identifiers and return the predictions. This layer can be more or less complex of course.

This API will act as a bridge between where you need the predictions (for example in a mobile application or website) and where they are produced (for example in your cloud infrastructure).

In fact, the client will send the ids (model and features) and the modelling server will fetch the correct model (since it has stored metadata about it) and features and then produce the predictions.

Finally, these will be served back to the client.

A cache system is needed (in memory preferably) to store the predictions and serve them immediately if they are requested again. These stored predictions need some metadata as well to know when to invalidate the cache (could be anything between few hours to few weeks).

Monitoring a model

A lot of monitoring is needed to insure that the model is:

  1. Statistically performant
  2. Computationally performant
  3. Reliable

The first point is necessary to insure that the model creates value for the client.

Imagine that we have trained a very complex deep learning model. Without monitoring its performance on the production data, it is impossible to assess if it is useful for the client.

Besides, it is a useful feedback that can be used during the model design phase. Maybe we should better tune hyperparameters on the next design iteration? Or use more features?

The second point is essential to insure that the model delivers predictions after a reasonable time.

A statistically performant model that takes forever to return a result is useless (at least on a business basis). This is why we need caches, precomputing features, scaling computation resources when needed and so on.

The third point is here to guarantee that the model can be relied upon.

Imagine that you have a very performant model (both statistically and computationally) but that is available only once a week. Would you use it?

————————————————————————————————————————

Lately, I have come across this article from Zendesk[1] which details their process to put a deep learning model (a tensorflow[2] model to be more specific) into production.

You should check it if you have some time: How Zendesk Serves TensorFlow Models in Production – Zendesk Engineering

As you can see, this is a tough challenge that needs many engineering teams (web, data engineering, systems, machine learning…) to be tackled efficiently.

An interesting and fun one nonetheless.

I hope this helps.

Footnotes

[1] Zendesk | Customer Service Software & Support Ticket System

[2] TensorFlow

How do you implement an ML model into production?

Selecting a loss function in machine learning

Which Loss function to use when:

Loss Function Name When to use Example
Squared Loss Useful for minimizing expectation in regression problems What is the expected return on a particular stock?
Classic Loss Squared loss without weight
Quantile Loss Regression problems Predicting house price
Hinge Loss Classification problems for minimizing yes/no questions Keyword_tag or not
Log Loss Classification problems. Probability is used a threshold for selecting output. Probability on clicking on advertising.