Python library API


Mist Function is a functional framework that defines particular Spark calculation. Mist Function is a deployable unit for Mist proxy.

Job - a Spark job triggered by Mist Function.

Mist Library provides a decorator based DSL for Mist Functions that could be deployed and executed in Mist.

from mistpy.decorators import *
import random

    arg("samples", type_hint=int)
def hello_mist(sc, samples):
    def inside(p):
        x, y = random.random(), random.random()
        return x * x + y * y < 1

    count = sc.parallelize(xrange(0, samples)) \

    pi = 4.0 * count / samples
    return {'result': pi}


import os
from setuptools import setup

    install_requires=["pyspark==2.3.0", "mistpy==1.1.3"]


Speaking generally - to write you own mist function declaration using python you need to declare context type and input arguments


Mist provides managed Spark Contexts, so developer does not care about context’s lifecycle and settings. In python library we use special context decorators to inject Spark Context into function. For exmaple: if a function is marked using on_spark_context it means that user wants to receive a pyspark.SparkContext instance into it. Contexts instances is always passed as a first argument:

from mist.decorators import *

def my_func(sc): 

All context decorators:

  • on_spark_context - pyspark.SparkContext
  • on_spark_session - pyspark.sql.SparkSession
  • on_hive_session - pyspark.sql.SparkSession with enabled Hive support
  • on_streaming_context - pyspark.streaming.StreamingContext
  • on_sql_context - pyspark.sql.SQLContext
  • on_hive_context - pyspark.sql.HiveContext


Input arguments can be declared with inside with_args decorator:

  arg('first', type_hint=int)
def one_arg_fn(sc, first):

  arg('first', type_hint=int),
  arg('second', type_hint=int)
def two_args_fn(sc, first, second):

Arguments can be declared using following methods:

  • arg(name, type_hint, default = None)
  • opt_arg(name, type_hint)


  • name is argument key in input json
  • type_hint is used to annotate argument type. It accepts default primitive types: int, str, float, bool. For lists there is list_type(type) function:
    arg('list_of_ints', type_hint=list_type(int))
  • default is used to provide default value for argument that makes possible to skip it in input data
    arg('list_of_ints', type_hint=list_type(int), default = [1,2,3,4,5])

with_args is optional for usage, if you don’t need any argument expect spark context you can skip it


To be able to log and see what’s going on on job side from mist-ui you to use log4j logger:

def example(sc):
    log4jLogger =
    logger = log4jLogger.LogManager.getLogger(__name__)"Hello!")

Python versions

Python version could be explicitly specified with spark configurations in default mist context for function. Mist respects spark.pyspark.python and spark.pyspark.driver.python congurations. For example in mist-cli configuration: context.conf:

model = Context
name = py3
data {
  spark-conf {
    spark.pyspark.python = "python3"


model = Function
name =  mypy3function 
data {
    context = py3