티스토리 뷰

개발 노트/Hadoop

copyMerge

오리지날초이 2014. 9. 18. 11:24


http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/fs/FileUtil.java#FileUtil.copyMerge%28org.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.fs.Path%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.fs.Path%2Cboolean%2Corg.apache.hadoop.conf.Configuration%2Cjava.lang.String%29


에서 찾은 hdfs file merge 코드

FileUtil 도 별도로 추가해줘야한다.


근대 막상 돌려보니까 열라 느려서 못 쓸것 같다.

그냥 로컬 서버에서 merge 시키는게 더 빠름

( 5G 텍스트 3개 Merge 시 local 1.5분, hdfs 7분, but hdsf merge 파일과 용량차이 존재함)



import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import java.net.URISyntaxException;
import java.io.IOException;

public class copyMerge {

public static long fileSize(Path path, Configuration conf) throws IOException {
long retval = 0;

FileSystem fs = path.getFileSystem(conf);
FileStatus stat = fs.getFileStatus(path);
if (stat.isDir()) {
FileStatus[] dirents = fs.listStatus(path);
for (int i = 0; i < dirents.length; i++) {
retval += fileSize(dirents[i].getPath(), conf);
}
} else {
retval += stat.getLen();
}

return retval;
}

public static void print_usage() {
System.out.println("Usage: copyMerge <input directory> <output file name>");
}

public static void main(String[] args) {
Configuration conf = new Configuration();

if (args.length != 2) {
System.err.println("Invalid number of arguments.");
System.out.println("");
print_usage();
System.exit(1);
}

String indir = args[0];
String outfile = args[1];
Path inpath = new Path(indir);
Path outpath = new Path(outfile);
FileSystem infs, outfs;
infs = outfs = null;
String addString = null;

try {
infs = FileSystem.get(new URI(indir), conf);
outfs = FileSystem.get(new URI(outfile), conf);
} catch (URISyntaxException urise) {
System.err.println("Could not load URI: " + urise.getInput());
System.err.println("Reason: " + urise.getReason());
System.exit(1);
} catch (IOException ioe) {
System.err.println("Could not get FileSystem: " + ioe.getMessage());
System.exit(1);
}

System.out.println("Merging " + indir + " into " + outfile);

try {
boolean copySuccess = FileUtil.copyMerge(infs, inpath, outfs,
outpath, false, conf, addString);
if (copySuccess)
System.out.println("Copy succeeded.");
else
System.out.println("Copy failed.");
} catch (IOException ioe) {
System.err.println("File merge failed: " + ioe.getMessage());
System.exit(1);
}

try {
long inbytes = fileSize(inpath, conf);
long outbytes = fileSize(outpath, conf);
System.out.println("Read in " + Long.toString(inbytes) + " bytes");
System.out.println("Merged " + Long.toString(outbytes) + " bytes");
if (inbytes != outbytes) {
System.err.println("Input and output file sizes differ!");
System.exit(1);
}
} catch (IOException ioe) {
System.err.println(ioe.getMessage());
System.exit(42);
}

}

}


728x90
반응형

'개발 노트 > Hadoop' 카테고리의 다른 글

HDFS Architecture and Design (v0.14)  (0) 2013.12.20
Balancer Java Code(상세)  (0) 2013.12.11
Hadoop Upgrade Guide for v.0.14  (0) 2013.12.06
Balancer  (0) 2013.12.05
운영상의 이슈  (0) 2013.11.14
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
«   2024/04   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30
글 보관함