Getting Started

In this page you’ll learn how to deploy your first model on Hydrosphere Serving. We will start from scratch and create a simple linear regression model that will learn to fit our randomly generated data with some noize added to it. After the training step we will pack it, deploy to the Serving and call it locally with the different program.


We are assuming, that you’ve already installed Serving instance on your working machine and a cli-tool on your local machine from where you’ll upload models. Those should not necessarily be separate machines, but in practice they probably would.

Setup a cluster

Once you’ve installed the cli-tool, you’ll need to define a cluster, where you’re gonna be working on.

$ hs cluster add --name local --server http://localhost

As the --server parameter provide the address of your Serving instance. In our case it will be a localhost. If that’s your first cluster, hs will use it by default, otherwise switch to it manually.

$ hs cluster use local

Training a model

Now we can start working with our linear regression model. It’s a fairly simple model that will learn to fit a randomly generated regression data with some noize added to it. For data generation we will use sklearn.datasets.make_regression (Link). We will also scale it to the range [0, 1]. For building the model we will use Keras library with Tensorflow backend.

Create a directory for the model and add inside it.

$ mkdir linear_regression
$ cd linear_regression
$ touch

The model will consist of 3 fully-connected layers with first two of them having ReLU activation function and 4 units, and the last one will be a summing unit with the linear activation. Put the following code in your file.

from keras.models import Sequential
from keras.layers import Dense
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# initialize data
n_samples = 1000
X, y = make_regression(n_samples=n_samples, n_features=2, noise=0.5, random_state=112)

scallar_x, scallar_y = MinMaxScaler(), MinMaxScaler(), 1))
X = scallar_x.transform(X)
y = scallar_y.transform(y.reshape(n_samples, 1))

# create a model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(loss='mse', optimizer='adam'), y, epochs=100)

# save model'model.h5')

After that run this file for a minute and it will create for you a new file with the model’s weights called model.h5.

$ python

Preparing the model

Serving serves all models as Docker containers. Everytime it handles a request, it passes it to the appropriate Docker container with your model deployed on it. An important detail is that all model’s files are stored in the /model/files directory inside the container. So, we will look up there in order to load the model.

For running this model we will use a Python runtime that will basically just run any Python code which you pass to it. Preparing the model is pretty straightforward, though you have to follow some rules:

  1. Stick to the specific folder structure in order to let hs to parse and upload it correctly;
  2. Provide necessary dependencies with requirements.txt;
  3. Provide contract file to let Serving understand model’s inputs and outputs.

We will start with the main functional file.

$ mkdir src
$ cd src
$ touch

Serving communicates with the model via TensorProto messages. For Python models if you want to perform some transformation on the received TensorProto message you have to retrieve its content, make an action with it and pack the result back to the TensorProto message. To do that you have to define a function, that will be invoked every time Serving handles a request and passes it to the model. Inside the function you have to call a predict (or similar) method of your model and return your predictions.

import numpy as np
import hydro_serving_grpc as hs
from keras.models import load_model

# 0. Load model once
model = load_model('/model/files/model.h5')

def infer(x):
    # 1. Retrieve tensor's content and put it to numpy array
    data = np.array(x.double_val)
    data = data.reshape([dim.size for dim in x.tensor_shape.dim])

    # 2. Make a prediction
    result = model.predict(data)
    # 3. Pack the answer
    y_shape = hs.TensorShapeProto(dim=[hs.TensorShapeProto.Dim(size=-1)])
    y_tensor = hs.TensorProto(

    # 4. Return the result
    return hs.PredictResponse(outputs={"y": y_tensor})

Since we need to initialize our model we will have to do that outside of our signature funcion, if we don’t want to initialize the model everytime the request comes in. We do that on step (0). The signature function infer takes the actual request, unpacks it (1), makes a prediction (2), packs the answer back (3) and returns it (4). There’s no a strict rule on how to name your signature function, it just have to be a valid python function name (since we use a Python runtime).

If you’re wondering how Serving will understand, which function to call from our provided file, then the answer is pretty easy — we have to provide a contract. A contract is a file, that defines the inputs and outputs of the model, signature functions and some other metadata required for serving. Go to the root directory of the model and create a serving.yaml file.

$ cd ..
$ touch serving.yaml
kind: Model
name: linear_regression
model-type: python:3.6
  - "src/"
  - "requirements.txt"
  - "model.h5"

  infer:                    # Signature function
      x:                    # Input field
        shape: [-1, 2]
        type: double
        profile: numerical
      y:                    # Output field
        shape: [-1]
        type: double
        profile: numerical

Here you can also see, that we’ve provided a requirements.txt and a model.h5 as payload files to our model. But we haven’t created requirements.txt yet, let’s do that too.


Overall structure of our model now should look like this:

├── model.h5
├── requirements.txt
├── serving.yaml
└── src

Note: Although, we have inside directory, it won’t be uploaded to Serving since we didn’t specify it in the contract’s payload.

Serving the model

Now we can upload the model.

$ hs upload

You can open http://localhost/models page to see the uploaded model.

Once you’ve done that, you can create an application for it. Basically, an application represents a final endpoint to your model, so you can invoke it from the outside. To learn more about advanced features, go to the Applications page.

Open http://localhost/applications, press Add New button. In the opened window select linear_regression model and as a runtime select hydrosphere/serving-runtime-python, then create an application.

If you cannot find your newly uploaded model and it’s listed in your models page, that means it’s still in a building stage. Wait until the model changes its status to Released, then you can use it.

Invoking an application

Invoking applications is available via different interfaces.

Test request

You can perform test request to the model from UI interface. Open desired application and press Test button. Internally it will generate arbitrary input data from model’s contract and send an HTTP-request to the application’s endpoint.

HTTP request

Send POST request.

$ curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
"x": [[1, 1],[1, 1]]}' 'http://localhost/gateway/applications/linear_regression/infer'

gRPC API call

You can define a gRPC client on your side and make a call from it. Here we provide a Python example, but this can be done in any supported language.

import grpc 
import hydro_serving_grpc as hs

# connect to your ML Lamba instance
channel = grpc.insecure_channel("localhost")
stub = hs.PredictionServiceStub(channel)

# 1. define a model, that you'll use
model_spec = hs.ModelSpec(name="linear_regression", signature_name="infer")
# 2. define tensor_shape for Tensor instance
tensor_shape = hs.TensorShapeProto(dim=[hs.TensorShapeProto.Dim(size=-1), hs.TensorShapeProto.Dim(size=2)])
# 3. define tensor with needed data
tensor = hs.TensorProto(dtype=hs.DT_DOUBLE, tensor_shape=tensor_shape, double_val=[1,1,1,1])
# 4. create PredictRequest instance
request = hs.PredictRequest(model_spec=model_spec, inputs={"x": tensor})

# call Predict method
result = stub.Predict(request)

Note: For convenience we’ve already generated all our proto files to a python library and published it in PyPI. You can install it via pip install hydro-serving-grpc

What’s next?