AWS EMR Integration
Note: this is feature-preview commit. API might be slightly changed in the future after including it into the release version.
One of the core features of Mist is that it provides a way to abstract from the direct job submission using spark-submit
and manages spark-drivers under the hood. In other words, it lazily starts workers when
a context receives a request to run a function on it.
So the goal of the new feature is to extend this lazy behavior of contexts to start clusters lazily too.
Install
It’s required to configure Mist and set up AWS environment(roles, securiry groups). This step might be skipped by using our Cloudformation template
To set up template parameters you have to create AWS Access Key.
Also, there are MistLogin
and MistPassword
parameters setting up basic authorization for accessing HTTP API.
After stack was successfully launched you can find public dns name of mist-master
instance on the Outputs
tab.
Note - it takes about 5 minutes to prepare an instance and launch Mist. After that, you can open it’s UI and check it.
Example
Examples could be found in the “hello_mist” repository here.
If you skipped Quick start page, please check it first to get familiar with mist’s contexts and mist-cli
tool.
Build:
# install mist-cli
pip install mist-cli
// or
easy_install mist-cli
# clone examples
git clone https://github.com/Hydrospheredata/hello_mist.git
cd scala
sbt package
There are two files: conf/10_emr_ctx.conf
and conf/11_emr_autoscale_ctx.conf
.
You need to select one and explicitly enable it in conf/20func.conf
.
As exposing Mist API to the external environment without an authorization isn’t a good idea, our default template
installs nginx
and setups basic autorization.
To use mist-cli
you have to provide the following credentials in the --host
paremeter: $login:$password@public-dns
and use 80
port.
Apply configuration:
mist-cli --host $MistLogin:$MistPassword@$public-dns-name --port 80 apply -f conf