TensorFlow Model Serving

Hopsworks supports TensorFlow Serving, a flexible, high-performance serving system for machine learning models, designed for production environments.

Export your model

The first step to serving your model is to export it as a servable model. This is typically done using the SavedModelBuilder after having trained your model. For more information please see: https://www.tensorflow.org/serving/serving_basic

Model Serving in Hopsworks

Step 1.

The first step is to train and export a servable TensorFlow model to your Hopsworks project.

To demonstrate this we provide an example notebook which is also included in the TensorFlow tour. https://github.com/logicalclocks/hops-examples/blob/master/notebooks/ml/Serving/tensorflow/train_and_export_model.ipynb

In order to serve a TensorFlow model on Hopsworks, the .pb file and the variables folder should be placed in the Models dataset in your Hopsworks project. Inside the dataset, the folder structure should mirror what is expected by TensorFlow Serving.

└── mnist
    ├── 1
    │   ├── saved_model.pb
    │   └── variables
    │       ├── variables.data-00000-of-00001
    │       └── variables.index
    └── 2
        ├── saved_model.pb
        └── variables
            ├── variables.data-00000-of-00001
            └── variables.index

TensorFlow serving expects the model directory (in this case mnist) to contain one or more sub-directories. The name of each sub-directory is a number representing the version of the model, the higher the version, the more recent the model. Inside each version directory TensorFlow serving expects a file named saved_model.pb, which contains the model graph, and a directory called variables which contains the weights of the model.

Step 2.

To start serving your model, create a serving definition in the Hopsworks Model Serving service or using the Python API.

For using the Model Serving service, select the Model Serving service on the left panel (1) and then select on Create new serving (2).

New serving definition

Next click on the model button to select from your project the model you want to serve.

Create serving

This will open a popup window that will allow you to browse your project and select the directory containing the model you want to serve. You should select the model directory, meaning the directory containing the sub-directories with the different versions of your model. In the example below we have exported two versions of the mnist model. In this step we select the mnist directory containing the two versions. The select button will be enabled (it will turn green) when you browse into a valid model directory.

Select model directory

After clicking select the popup window close and the information in the create serving menu will be filled in automatically. By default Hopsworks picks the latest available version to server. You can switch to a specific version using the dropdown menu. You can also change the name of the model, remember that model names should be unique in your project.

Select the version

By clicking on Advanced you can access the advanced configuration for your serving instance. In particular you can configure the Kafka topic on which the inference requests will be logged into (see the inference for more information). By default a new Kafka topic is created for each new serving (CREATE). You can avoid logging your inference requests by selecting NONE from the dropdown menu. You can also re-use an existing Kafka topic as long as its schema meets the requirement of the inference logger.

At this stage you can also configure the TensorFlow Serving server to process the requests in batches.

Advanced configuration

Finally click on Create Serving to create the serving instance.

For using the python API, import the serving module from the hops library (API-Docs-Python) and use the helper functions.

from hops import serving
from hops import model
model_path = "Resources/mnist/"
model.export(model_path, "mnist", model_version=2, overwrite=True)
model_path = "Models/mnist/2/"
if serving.exists("mnist"):
serving.create_or_update_serving(model_path, "mnist", serving_type="TENSORFLOW", model_version=2)

Step 3.

After having created the serving instance, a new entry is added to the list.

Start the serving

Click on the Run button to start the serving instance. After a few seconds the instance will be up and running, ready to start processing incoming inference requests.

You can check the logs of the TensorFlow Serving instance by clicking on the logs button. This will bring you to the Kibana UI, from which you will be able to see if the the serving instance managed to load the model correctly.

Start the serving

Log button

Start the serving

Kibana UI

Step 4.

After a while your model will become stale and you will have to re-train it and export it again. To update your serving instance to serve the newer version of the model, click on the edit button. You don’t need to stop your serving instance, you can update the model version while the serving server is running.

Update the serving instance

Update the serving instance

From the dropdown menu you can select the newer version (1) and click Update serving (2). After a couple of seconds the model server will be serving the newer version of your model.

Start the serving

Update the version

Where do I go from here?

Take a look at the Inference documentation to see how you can send inference requests to the serving server serving your model.