Apache Gobblin Description
A distributed data integration framework which simplifies common Big Data integration tasks such as data ingestion and replication, organization, and lifecycle management. It can be used for both streaming and batch data ecosystems. It can be run as a standalone program on a single computer. Also supports embedded mode. It can be used as a mapreduce application on multiple Hadoop versions. Azkaban is also available for the launch of mapreduce jobs. It can run as a standalone cluster, with primary and worker nodes. This mode supports high availability, and can also run on bare metals. This mode can be used as an elastic cluster in the public cloud. This mode supports high availability. Gobblin, as it exists today, is a framework that can build various data integration applications such as replication, ingest, and so on. Each of these applications are typically set up as a job and executed by Azkaban, a scheduler.