Serving a simple model

On this page you will learn how to deploy your first model on the Hydrosphere platform. We will start from scratch and create a simple linear regression model that will fit our randomly generated data, with some noise added to it. After the training step, we will pack the model, deploy it to the platform and invoke it locally with a sample client.


You can find other examples of how to deploy various models in our examples repository.

Before you start

We assume you already have a deployed instance of the Hydrosphere platform and a CLI on your local machine.

To let hs know where the Hydrosphere platform runs, configure a new cluster entity.

$ hs cluster add --name local --server http://localhost
$ hs cluster use local

Training a model

We can now start working with the linear regression model. It is a fairly simple model that fits a randomly generated regression data with some noise added to it. For data generation we will use the sklearn.datasets.make_regression (link) method. We will also normalize data to the [0, 1] range. The model will be built using Keras library with Tensorflow backend.

First of all, create a directory for the model and add in to it.

$ mkdir linear_regression
$ cd linear_regression
$ touch

The model will consist of 3 fully-connected layers, with the first two having ReLU activation function and 4 units, and the third being a summing unit with linear activation. Put the following code in your file.

from keras.models import Sequential
from keras.layers import Dense
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# initialize data
n_samples = 1000
X, y = make_regression(n_samples=n_samples, n_features=2, noise=0.5, random_state=112)

scallar_x, scallar_y = MinMaxScaler(), MinMaxScaler(), 1))
X = scallar_x.transform(X)
y = scallar_y.transform(y.reshape(n_samples, 1))

# create a model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(loss='mse', optimizer='adam'), y, epochs=100)

# save model'model.h5')

We have not yet installed the necessary libraries for our model. In your linear_regression folder, create a requirements.txt file with the following contents:


Install all dependencies to your local environment.

$ pip install -r requirements.txt

Train the model.

$ python

As soon as the script finishes, you will get a model saved to a model.h5 file.

Model preparation

Every model in a cluster is deployed as an individual container. After a request is sent from the client application, it is passed to the appropriate Docker container with your model deployed on it. An important detail is that all model files are stored in the /model/files directory inside the container, so we will look there to load the model.

To run this model we will use a Python runtime that can execute any Python code you provide. Preparation of a model is pretty straightforward, though you have to follow some rules:

  1. Stick to the specific folder structure to let hs parse and upload it correctly;
  2. Provide necessary dependencies with the requirements.txt;
  3. Provide a contract file to let the model manager understand the model’s inputs and outputs.

We will begin with the main functional file.

$ mkdir src
$ cd src
$ touch

Hydrosphere communicates with the model using TensorProto messages. If you want to perform a transformation on the received TensorProto message, you will have to retrieve its contents, perform a transformation on them and pack the result back to the TensorProto message. To do that you have to define a function that will be invoked every time Hydrosphere handles a request and passes it to the model. Inside that function, you have to call a predict (or similar) method of your model and return your predictions.

import numpy as np
import tensorflow as tf
import hydro_serving_grpc as hs
from keras.models import load_model

# 0. Load model once
model = load_model('/model/files/model.h5')
graph = tf.get_default_graph() 

def infer(x):
    # 1. Retrieve tensor's content and put it to numpy array
    data = np.array(x.double_val)
    data = data.reshape([dim.size for dim in x.tensor_shape.dim])

    # 2. Make a prediction
    with graph.as_default():
        result = model.predict(data)
    # 3. Pack the answer
    y_shape = hs.TensorShapeProto(dim=[hs.TensorShapeProto.Dim(size=-1)])
    y_tensor = hs.TensorProto(

    # 4. Return the result
    return hs.PredictResponse(outputs={"y": y_tensor})

We do initialization of the model outside of the serving function so this process will not be triggered every time a new request comes in. We do that on step (0). The serving function infer takes the actual request, unpacks it (1), makes a prediction (2), packs the answer back (3) and returns it (4). There is no strict rule for function naming, it just has to be a valid Python function name.

If you’re wondering how Hydrosphere will understand which function to call from file, the answer is that we have to provide a contract. A contract is a file that defines the inputs and the outputs of the model, a signature function and some other metadata required for serving. Go to the root directory of the model and create a serving.yaml file.

$ cd ..
$ touch serving.yaml
kind: Model
name: linear_regression
runtime: "hydrosphere/serving-runtime-python-3.6:2.3.1"
install-command: "pip install -r requirements.txt"
  - "src/"
  - "requirements.txt"
  - "model.h5"

  name: infer
      shape: [-1, 2]
      type: double
      profile: numerical
      shape: [-1]
      type: double
      profile: numerical

Here you can see that we have provided a requirements.txt and a model.h5 as payload files to our model.

The overall structure of our model now should look like this:

├── model.h5
├── requirements.txt
├── serving.yaml
└── src

Although we have inside the directory, it will not be uploaded to the cluster since we did not specify it in the contract’s payload.

Serving the model

Now we can upload the model. Inside the linear_regression directory, execute the following command:

$ hs -v upload

Flag -v stands for verbose output.

You can open the http://localhost/models page to see the uploaded model.

Once you have opened it, you can create an application for it. Basically, an application represents an endpoint to your model, so you can invoke it from anywhere. To learn more about advanced features, go to the Applications page.

Open http://localhost/applications and press the Add New Application button. In the opened window select the linear_regression model, name your application linear_regression and click the creation button.

If you cannot find your newly uploaded model and it is listed on your models’ page, it is probably still in the building stage. Wait until the model changes its status to Released, then you can use it.

Invoking an application

Invoking applications is available via different interfaces.

Test request

You can perform a test request to the model from the Hydrosphere UI. Open the desired application and press the Test button. Internally it will generate arbitrary input data from the model’s contract and send an HTTP request to the application’s endpoint.

HTTP request

Send an HTTP POST request.

$ curl --request POST --header 'Content-Type: application/json' --header 'Accept: application/json' \
    --data '{"x": [[1, 1],[1, 1]]}' 'http://localhost/gateway/application/linear_regression'

For more information about invoking applications, refer to this page.

gRPC API call

Define a gRPC client on your side and make a call from it.

import grpc 
import hydro_serving_grpc as hs  # pip install hydro-serving-grpc

# connect to your ML Lamba instance
channel = grpc.insecure_channel("localhost")
stub = hs.PredictionServiceStub(channel)

# 1. define a model, that you'll use
model_spec = hs.ModelSpec(name="linear_regression", signature_name="infer")
# 2. define tensor_shape for Tensor instance
tensor_shape = hs.TensorShapeProto(
# 3. define tensor with needed data
tensor = hs.TensorProto(dtype=hs.DT_DOUBLE, tensor_shape=tensor_shape, double_val=[1,1,1,1])
# 4. create PredictRequest instance
request = hs.PredictRequest(model_spec=model_spec, inputs={"x": tensor})

# call Predict method
result = stub.Predict(request)