hadoop fs getmerge

Usage: hadoop fs -getmerge <src> <localdst> [addnl]
hadoop fs -getmerge <dir_of_input_files> <mergedsinglefile>
--Takes a source directory and a destination file as input and concatenates files in src into the destination local file.if you have multiple small files into your input directory which you want to merge into a single file without using the local file system or writing mapreds.

Steps to Merging multiple files into one within Hadoop
1) Get all the files from hdfs to a local file system

2) merge all files into a single file(using cat or other commands)

3) push the single file to hdfs again

Or we can use the following commands

1) hadoop fs -getmerge hadoop_input_files_dir mergedfile_on_local_filesystem

It will get all files from hdfs directory and merged to a single file on local fs

2) Push mergedfile_on_local_filesystem to hdfs using -copyFromLocal

hadoop fs -copyFromLocal mergedfile_on_local_filesystem hdfs_path_of_single_merged_file

Related Posts