Quick Start
What is Apache Toree
Apache Toree has one main goal: provide the foundation for interactive applications to connect and use Apache Spark.
The project intends to provide applications with the ability to send both packaged jars and code snippets. As it implements the latest Jupyter message protocol, Apache Toree can easily plug into the Jupyter ecosystem for quick, interactive data exploration.
Installing as kernel in Jupyter
This requires you to have a distribution of Apache Spark downloaded to the system where Apache Toree will run. The following commands will install Apache Toree.
pip install --upgrade toree
jupyter toree install --spark_home=/usr/local/bin/apache-spark/
Your Hello World example
One of the most common ways to use Apache Toree is for interactive data exploration in a Jupyter Notebook. You will first need to install the notebook and get the notebook server running:
pip install notebook
jupyter notebook
The following clip shows a simple notebook running Scala code to print Hello, World!
. Each of the code cells can be
run by pressing Shift-Enter
on your keyboard.
A key component to Apache Toree is that is will automatically create a SparkContext
binding for you. This can be accessed
through the variable sc
. The following clip shows code accessing the SparkContext
and returning a value.
Where to try Apache Toree?
- Try Jupyter (Spark With Scala Notbeook)
- IBM Bluemix