Hadoop Flume

What is Hadoop Flume?
Flume is a distributed, reliable, available service for efficiently moving large amounts of data as it is produce.Ideally suited to gathering logs from multiple systems and inserting them into HDFS as they are generated
Developed in-house by Cloudera, and released as open-source software
--Now an Apache Incubator project
Design goals:
Flume's Design goals:
Flume is designed to continue delivering events in the face of system component failure.
Flume scales horizontally to support scalability
--As load increases, more machines can be added to the configuration
Flume provides a central Master controller for manageability
 --Administrators can monitor and reconfigure data flows on the fly
Flume can be extended by adding connectors to existing storage layers or data platforms
--General sources already provided include data from files, syslog, and standard output (stdout)               from a proces
 --General endpoints already provided include files on the local filesystem or in HDFS
--Other connectors can be added using Flume’s API
Flume: General System Architecture
The Master holds configuration information for each Node, plus a version number for that node--Version number is associated with the Node’s configuration
Nodes communicate with the Master every five seconds
--Node passes its version number to the Master
--If the Master has a later version number for the Node, it tells the Node to reconfigure itself
-- The Node then requests the new configuration information from the Master, and dynamically applies that new configuration
Apache hadoop Flume Download

Related Posts