티스토리 뷰
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/fs/FileUtil.java#FileUtil.copyMerge%28org.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.fs.Path%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.fs.Path%2Cboolean%2Corg.apache.hadoop.conf.Configuration%2Cjava.lang.String%29
에서 찾은 hdfs file merge 코드
FileUtil 도 별도로 추가해줘야한다.
근대 막상 돌려보니까 열라 느려서 못 쓸것 같다.
그냥 로컬 서버에서 merge 시키는게 더 빠름
( 5G 텍스트 3개 Merge 시 local 1.5분, hdfs 7분, but hdsf merge 파일과 용량차이 존재함)
import org.apache.hadoop.fs.FileUtil;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.FileStatus;import org.apache.hadoop.fs.Path;import java.net.URI;import org.apache.hadoop.conf.Configuration;import java.net.URISyntaxException;import java.io.IOException;public class copyMerge {public static long fileSize(Path path, Configuration conf) throws IOException {long retval = 0;FileSystem fs = path.getFileSystem(conf);FileStatus stat = fs.getFileStatus(path);if (stat.isDir()) {FileStatus[] dirents = fs.listStatus(path);for (int i = 0; i < dirents.length; i++) {retval += fileSize(dirents[i].getPath(), conf);}} else {retval += stat.getLen();}return retval;}public static void print_usage() {System.out.println("Usage: copyMerge <input directory> <output file name>");}public static void main(String[] args) {Configuration conf = new Configuration();if (args.length != 2) {System.err.println("Invalid number of arguments.");System.out.println("");print_usage();System.exit(1);}String indir = args[0];String outfile = args[1];Path inpath = new Path(indir);Path outpath = new Path(outfile);FileSystem infs, outfs;infs = outfs = null;String addString = null;try {infs = FileSystem.get(new URI(indir), conf);outfs = FileSystem.get(new URI(outfile), conf);} catch (URISyntaxException urise) {System.err.println("Could not load URI: " + urise.getInput());System.err.println("Reason: " + urise.getReason());System.exit(1);} catch (IOException ioe) {System.err.println("Could not get FileSystem: " + ioe.getMessage());System.exit(1);}System.out.println("Merging " + indir + " into " + outfile);try {boolean copySuccess = FileUtil.copyMerge(infs, inpath, outfs,outpath, false, conf, addString);if (copySuccess)System.out.println("Copy succeeded.");elseSystem.out.println("Copy failed.");} catch (IOException ioe) {System.err.println("File merge failed: " + ioe.getMessage());System.exit(1);}try {long inbytes = fileSize(inpath, conf);long outbytes = fileSize(outpath, conf);System.out.println("Read in " + Long.toString(inbytes) + " bytes");System.out.println("Merged " + Long.toString(outbytes) + " bytes");if (inbytes != outbytes) {System.err.println("Input and output file sizes differ!");System.exit(1);}} catch (IOException ioe) {System.err.println(ioe.getMessage());System.exit(42);}}}
728x90
반응형
'개발 노트 > Hadoop' 카테고리의 다른 글
HDFS Architecture and Design (v0.14) (0) | 2013.12.20 |
---|---|
Balancer Java Code(상세) (0) | 2013.12.11 |
Hadoop Upgrade Guide for v.0.14 (0) | 2013.12.06 |
Balancer (0) | 2013.12.05 |
운영상의 이슈 (0) | 2013.11.14 |
공지사항
최근에 올라온 글
최근에 달린 댓글
- Total
- Today
- Yesterday
TAG
- 웹보안공부
- java
- natas7
- over the wire
- 리터럴
- BASE64
- ssh
- 풀이
- 32bit
- Encode
- Linux
- 웹보안
- Bandit
- bz2
- solution
- OpenSSL
- Natas
- Strings
- gz
- find
- 리눅스
- tar
- OverTheWire
- tr
- grep
- 압축파일
- SSL
- HTTPS
- X32
- nc
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
글 보관함