Monitor anomalies with a custom Isolation Forest metric

On this page you will learn how to create a custom anomaly detection metric for a specific use case.


For this use case we have chosen a sample regression problem. We will monitor the model, which will predict how many taxi pickups will occur in the next hour, based on observations from past 5 hours. As a data source we will use a dataset from NYC Taxi & Limousine Commission.

Before you start

We assume you already have a deployed instance of the Hydrosphere platform and a CLI on your local machine.

To let hs know where the Hydrosphere platform runs, configure a new cluster entity.

$ hs cluster add --name local --server http://localhost
$ hs cluster use local

Also, you should have a target regression model already deployed on the cluster. You can find the source code of the target model here.

Model training

As a monitoring model we will use an autoregressive stateful IsolationForest model, which will be continuously retrained on a window of 5 consequent data samples.

We will skip most of the data preparation steps, for the sake of simplicity.

df = pd.read_csv("../data/taxi_pickups.csv")
df.drop(["pickup_datetime"], axis=1, inplace=True)

data, _ = transform_to_sliding_windows(df)
iforest = IsolationForest(
    n_jobs=-1, random_state=42,  behaviour="new", contamination=0.03)
is_outlier = iforest.fit_predict(data)
# Find outliers in training data 
outlier_indices = df.index[6:][is_outlier==-1]

Model evaluation

To check that our model works properly, lets plot training data and outliers.

plt.plot(df.index, df.pickups, label="Training data")
plt.vlines(outlier_indices, 0, 600, colors="red", alpha=0.2, label="Outliers")

plt.gcf().set_size_inches(25, 5)

From the plot you can see a massive amount of anomalies at the end of January 2016. These outliers came from a travel ban due to “Snowzilla”.

Model release

To create a monitoring metric, we have to deploy that IsolationForest model as a separate model in Hydrosphere. Let’s save a trained model for serving.

joblib.dump(iforest, '../monitoring_model/iforest.joblib')

Create a new directory where we will declare the serving function and its definitions.

$ mkdir -p monitoring_model/src
$ cd monitoring_model
$ touch src/

Inside the src/ file put the following code:

import numpy as np
import hydro_serving_grpc as hs
from joblib import dump, load
import collections

init_value = 1.0  # Default value, means that the sample is 'inlier'
window_len = 5  # Length of data sequence required for model.

window = collections.deque(maxlen=window_len)
outlier_detection_model = load('/model/files/iforest.joblib')

def infer(pickups_last_hour, pickups_next_hour):
    global window

    # serving.yaml defines that the type of input is int, so we take int_val 
    # from input sample. The pickups_next_hour parameter is a prediction of 
    # the target monitored model.
    input_value = int(pickups_last_hour.int_val[0])

    if len(window) < window_len-1:
        return pack_predict(init_value)
        prediction_vector = np.array(window)
        # Make a prediction
        result = outlier_detection_model.predict(prediction_vector.reshape(1, 5))
        # Pack the answer
        return pack_predict(result[0])

def pack_predict(result):
    tensor = hs.TensorProto(
    return hs.PredictResponse(outputs={"value": tensor})

This model also have to be packed with a model definition.

kind: Model
name: nyc_taxi_monitoring
runtime: "hydrosphere/serving-runtime-python-3.6:2.2.1"
install-command: "pip install -r requirements.txt"
  - "src/"
  - "requirements.txt"
  - "iforest.joblib"

  name: infer
      shape: scalar
      type: int32
      profile: numerical
      shape: scalar
      type: int32
      profile: numerical
      shape: scalar
      type: double
      profile: numerical

Inputs of this model are the inputs of the monitored model plus the outputs of the monitored model. As an output for the monitoring model we will use the value field.

Pay attention to the model’s payload. It has the src folder that we have just created, requirements.txt with all dependencies and iforest.joblib, e.g. our newly trained serialized IsolationForest model.

requirements.txt looks like this:


The final directory structure should look like this:

├── iforest.joblib
├── requirements.txt
├── serving.yaml
└── src

From that folder, upload the model to the Hydrosphere.

hs upload


Let’s create a monitoring metric for our pre-deployed regression model.


  1. From the Models section, select the target model you would like to deploy, and select the desired model version;
  2. Open Monitoring tab.
  3. At the bottom of the page, click the Add Metric button;
  4. From the opened window click Add Metric button;
    1. Specify the name of the metric;
    2. Choose monitoring model;
    3. Choose a version of the monitoring model;
    4. Select the comparison operator GreaterEq. This means that whenever our metric value drops below 0, an alarm will be fired.
    5. Set the threshold value to 0.
    6. Click Add Metric button.

That’s it. Now you have a monitored taxi pickups regression model deployed on the Hydrosphere platform.