Apache Spark and Zeppelin

Apache Zeppelin is an open-source, web-based “notebook” that enables ineractive data analytics and collaborativee docments. The notebook is integrated with distributed, general-purpose data processing systems such as Apache Spark (Large Scale data processing), Apache Flink (Stream processing framework), and many others. Apache Zeppelin allows you to make beautiful, data-driven, interactive documents with SQL, Scala, R, or Phython right in your browser.

Apache Spark and Zeppelin

Apache Zeppelin is an open-source, web-based “notebook” that enables ineractive data analytics and collaborativee docments. The notebook is integrated with distributed, general-purpose data processing systems such as Apache Spark (Large Scale data processing), Apache Flink (Stream processing framework), and many others. Apache Xeppelin allows you to make beautiful, data-driven, interactive documents with SQL, Scala, R, or Phython right in your browser.

Apache Spark and Zeppelin - Overview

Apache Zeppelin is an open-source, web-based “notebook” that enables ineractive data analytics and collaborativee docments. The notebook is integrated with distributed, general-purpose data processing systems such as Apache Spark (Large Scale data processing), Apache Flink (Stream processing framework), and many others. Apache Xeppelin allows you to make beautiful, data-driven, interactive documents with SQL, Scala, R, or Phython right in your browser.

Data Ingestion

Data ingestion in zeppelin can be done with Hive, HBase and other interpreter provided by the zeppelin.

Data Discovery

Zeppelin provide Postgres, HawQ, Spark SQL and other Data discovery tools, with spark SQL the data can be explored.

Data Analytics

Spark, Flink, R, Python, and other useful tools are already available in the zeppelin and the functionality can be extended by simply adding the new interpreter.

Data Visualization and Collaboration

All the basic visualization like Bar chart, Pie chart, Area chart, Line chart and scatter chart are available in a zeppelin.

Apache Spark

In FileGPS we use Spark Streaming component integrating with kafka for data computation.

Apache Spark Streaming

  • It is an add-on to core Spark API which allows scalable, high-throughput, fault-tolerant stream processing of live data streams. Spark can access data from sources like Kafka, Flume, Kinesis or TCP socket. It can operate using various algorithms. Finally, the data so received is given to file system, databases and live dashboards. Spark uses Micro-batching for real-time streaming.
  • Micro-batching is a technique that allows a process or task to treat a stream as a sequence of small batches of data. Hence Spark Streaming, groups the live data into small batches. It then delivers it to the batch system for processing. It also provides fault tolerance characteristics.

Apache Spark and Zeppelin - Images