Data Ingestion

Data ingestion in zeppelin can be done with Hive, HBase and other interpreter provided by the zeppelin.

Zeppeling - Data Ingestion
Data Discovery - Zeppelin

Data Discovery

Zeppelin provide Postgres, HawQ, Spark SQL and other Data discovery tools, with spark SQL the data can be explored.

Data Analytics

Spark, Flink, R, Python, and other useful tools are already available in the zeppelin and the functionality can be extended by simply adding the new interpreter.

Data Analytics - Zeppelin
Data Visualization and Collaboration - Zeppelin

Data Visualization and Collaboration

All the basic visualization like Bar chart, Pie chart, Area chart, Line chart and scatter chart are available in a zeppelin.


In FileGPS we use Spark Streaming component integrating with kafka   for  data computation.

Apache Spark Streaming

It is an add-on to core Spark API which allows scalable, high-throughput, fault-tolerant stream processing of live data streams. Spark can access data from sources like KafkaFlume, Kinesis or TCP socket. It can operate using various algorithms. Finally, the data so received is given to file system, databases and live dashboards. Spark uses Micro-batching for real-time streaming.
Micro-batching is a technique that allows a process or task to treat a stream as a sequence of small batches of data. Hence Spark Streaming, groups the live data into small batches. It then delivers it to the batch system for processing. It also provides fault tolerance characteristics.

Spark Streaming Integration