Apache Spark & Zeppelin
An open-source, web-based "notebook" that enables interactive data analytics and collaborative documents.
Apache Spark and Zeppelin
Apache spark and Zeppelin is an open-source, web-based “notebook” that enables interactive data analytics and collaborative documents. The notebook is integrated with distributed, general-purpose data processing systems such as Apache Spark (Large Scale data processing), Apache Flink (Stream processing framework), and many others. Apache Zeppelin allows you to make beautiful, data-driven, interactive documents with SQL, Scala, R, or Python right in your browser.
Data ingestion in the zeppelin can be done with Hive, HBase, and other interpreters provided by the zeppelin.
Zeppelin provides Postgres, HawQ, Spark SQL, and other Data discovery tools, with spark SQL the data can be explored.
Spark, Flink, R, Python, and other useful tools are already available in the zeppelin and the functionality can be extended by simply adding the new interpreter.
Data Visualization & Collaboration
All the basic visualization like Bar chart, Pie chart, Area chart, Line chart and scatter chart are available in a zeppelin.
In FileGPS we use the Spark Streaming component integrating with Kafka for data computation.
Apache Spark Streaming
Apache Spark & Zeppelin - FAQ's
Let us now take a closer look at using zeppelin with spark using an example:
- Create a new note from zeppelin home page with “spark” as default interpreter.
- Before you start with the example, you will need to download the sample csv.
- Transform csv into RDD.
You can restart the interpreter for the notebook in the interpreter bindings (gear in upper right hand corner) by clicking on the restart icon to the left of the interpreter in question (in this case it would be the spark interpreter).