개발 노트/Hadoop
Hadoop Job Error
오리지날초이
2013. 10. 23. 11:12
hive 를 사용하는데 아래와 같은 에러가 발생한다.
실제로 몇몇개 데이터 노드에서 아래 taskTracker 경로를 확인해보면
hadoop-root 100만개 이상의 파일, 하위 디렉토리가 생성되었고
hive-root 에서도 6만개 이상의 파일, 하위 디렉토리가 생성된 것을 확인할 수 있다.
Killed Tasks
Task | Complete | Status | Start Time | Finish Time | Errors | Counters | |
task_201209142207_44543_m_000490 | 0.00%
|
8-10월-2013 06:56:02 |
8-10월-2013 06:56:44 (42sec) |
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/archive/sgam/nds/hdfs/hadoop-root/mapred/system/job_201209142207_44543/libjars/mysql-connector-java-5.1.12-bin.jar at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:143) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/archive/sgam/nds/hdfs/hadoop-root/mapred/system/job_201209142207_44543/libjars/mysql-connector-java-5.1.12-bin.jar at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:143) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/archive/sgam/tmp/hive-root/hive_2013-10-08_06-55-10_962_211759373537171663/-mr-10003/2cffb03e-c056-4938-ac31-66d068b0fce1 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:169) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/archive/sgam/tmp/hive-root/hive_2013-10-08_06-55-10_962_211759373537171663/-mr-10003/2cffb03e-c056-4938-ac31-66d068b0fce1 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:169) |
0 | ||
task_201209142207_44543_r_000001 | 0.00%
|
8-10월-2013 06:56:08 |
8-10월-2013 06:56:44 (36sec) |
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/archive/sgam/nds/hdfs/hadoop-root/mapred/system/job_201209142207_44543/libjars/mysql-connector-java-5.1.12-bin.jar at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:143) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/archive/sgam/nds/hdfs/hadoop-root/mapred/system/job_201209142207_44543/libjars/mysql-connector-java-5.1.12-bin.jar at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:143) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/archive/sgam/tmp/hive-root/hive_2013-10-08_06-55-10_962_211759373537171663/-mr-10003/2cffb03e-c056-4938-ac31-66d068b0fce1 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:169) |
0 |
TaskTracker 디렉토리에서 너무 많은 파일을 가지고 있어서 발생하는 문제로 해당 디렉토리를 정리해 주면 된다.
모든 노드에서 수동으로 지워줄 필요는 없고,
master 노드에서 mapred 를 재기동해주면 slave 노드에서 자동으로 디렉토리를 삭제한다
(파일의 갯수가 많은므로 지우는데도 시간이 좀 걸린다. 로그를 확인하자)
※ 재기동은 master node 의 하둡설치 디렉토리/bin/stop-mapred.sh 과 /bin/start-mapred.sh 을 실행시켜주면 된다.
재발방지 대책으로는 core-site.xml 에서 아래 프로퍼티를 설정해주었다.
<property>
<name>local.cache.size</name>
<value>5368709120</value>
</property>
그리고 추가적으로 job.xml 에서 keep.failed.task.files 항목이 false 인지도 확인해 보자
728x90
반응형