Revision as of 12:51, 2 June 2011

cygwin 에서 native library 사용하기

local 로 돌릴 때 문제가 발생하는 듯함. local로 돌릴 때는 classpath에도 추가하고, libjars에도 추가할 것. delimiter 주의
먼저 export HADOOP_CLASSPATH=a.jar:b.jar
- HADOOP_CLASSPATH tends to be used to add to bin/hadoop's classpath. Because of the way the comment is written, administrator's who customize hadoop-env.sh often inadvertently disable user's abilities to use it, by not including the present value of the variable.
그리고 하둡을 실행할 때, -libjars a.jar,b.jar 옵션 추가
- Specify comma separated jar files to include in the classpath. Applies only to job.
classpath에 추가할 때는 delimiter가 : 이고, libjars에서는 , 임에 주의할 것
- http://hadoop.apache.org/common/docs/r0.20.2/commands_manual.html

counter	Map	Reduce
FILE_BYTES_READ	맵 태스크에 의해 각 파일시스템에서 읽힌 바이트 수	리듀스 태스크에 의해 각 파일 시스템에서 읽은 바이트 수. 대부분 shuffle 과정에서 읽은 크기 인 듯? FILE_BYTES_WRITTEN과 크기가 동일함
FILE_BYTES_WRITTEN	각 태스크에 의해 각 파일 시스템에 쓰인 바이트 수
HDFS_BYTES_READ	hdfs 에서 읽은 크기. Map input bytes 보다 조금 더 크다	대부분 0
HDFS_BYTES_WRITTEN	대부분 0. reducer task가 없는 경우, map task의 결과가 바로 HDFS에 기록 되고, Map output bytes와 일치?	reducer의 결과는 최종적으로 hdfs에 저장됨. hdfs에 저장된 최종 reducer 결과의 크기

The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.
- http://wiki.apache.org/hadoop/HowManyMapsAndReduces
mapred.map.tasks : 실제 hadoop이 input split 기준으로 계산한 map task 개수보다다 작은 값은 무시됨
mapred.jobtracker.maxtasks.per.job : jobtracker daemon에서 설정한 값이 적용되며, client에서 설정한 값은 무시됨
mapred.tasktracker.map.tasks.maimum : 마찬가지로 tasktracker에서 설정한 값이 적용됨
However, the FileSystem blocksize of the input files is treated as an upper bound for input splits.
- 그런데 실제로 mapred.min.split.size 값을 FileSystem의 block size보다 크게 설정하면, 아래와 같은 멘트가 뜨면서 정상적으로 동작함
  - 11/06/02 21:19:41 INFO net.NetworkTopology: Adding a new node: /default-rack/10.25.31.118:50010

@@ Line 55: / Line 55: @@
 * mapred.tasktracker.map.tasks.maimum : 마찬가지로 tasktracker에서 설정한 값이 적용됨
 * However, the FileSystem blocksize of the input files is treated as an upper bound for input splits.
+** 그런데 실제로 mapred.min.split.size 값을 FileSystem의 block size보다 크게 설정하면, 아래와 같은 멘트가 뜨면서 정상적으로 동작함
+*** 11/06/02 21:19:41 INFO net.NetworkTopology: Adding a new node: /default-rack/10.25.31.118:50010