Data ingestion in zeppelin can be done with Hive, HBase and other interpreter provided by the zeppelin.
Zeppelin provide Postgres, HawQ, Spark SQL and other Data discovery tools, with spark SQL the data can be explored.
Spark, Flink, R, Python, and other useful tools are already available in the zeppelin and the functionality can be extended by simply adding the new interpreter.
Data Visualization and Collaboration
All the basic visualization like Bar chart, Pie chart, Area chart, Line chart and scatter chart are available in a zeppelin.
In FileGPS we use Spark Streaming component integrating with kafka for data computation.
Apache Spark Streaming
It is an add-on to core Spark API which allows scalable, high-throughput, fault-tolerant stream processing of live data streams. Spark can access data from sources like Kafka, Flume, Kinesis or TCP socket. It can operate using various algorithms. Finally, the data so received is given to file system, databases and live dashboards. Spark uses Micro-batching for real-time streaming.
Micro-batching is a technique that allows a process or task to treat a stream as a sequence of small batches of data. Hence Spark Streaming, groups the live data into small batches. It then delivers it to the batch system for processing. It also provides fault tolerance characteristics.