티스토리 뷰

개발 노트/Hadoop

Balancer

오리지날초이 2013. 12. 5. 10:29


네임노드는 block 분배를 위한 몇가지 규칙을 가지고 있기 때문에 HDFS 는 항상 균일하게 데이터를 가지고 있지 않는다.
- block 을 쓰고 있는 노드는 replica 1 로 지정한다.
- replica 1 을 소유한 node 와 다른 rack 에 있는 노드에게 block 을 spread 한다. ( 재해내성을 위해 )
- replica 1 을 소유한 node 와 동일한 rack 에 있는 노드에게 block 을 spread 한다. ( network I/O 를 줄이기 위해)

balancer 는 HDFS 사용률의 밸런싱을 맞추기 위한 관리자 도구이다. 
datanode full 또는 신규 datanode 추가시 수동으로 사용할 수 있고, stop-balancer.sh 로 중단시킬 수도 있다.

balancer 는 threshold 를 지정할 수 있는데, 당연하지만 낮은 threshold 의 경우 시간이 더 많이 걸린다.
그리고 아주 낮은 threshold 를 지정하면 balancing 동안 본래 기동중인 프로세스들이 HDFS read / write / delete 등이 진행하고 있을 것이기 때문에 목표 threshold 에 도달하지 못할 수도 있다.

balancing 은 기본적으로 사용률이 높은 datanode 에서 낮은 datanode 로 block 이동시키게 되고,
iteration 이라는 단위로 데이터를 주고 받는 과정을 관리하고 log 로 기록한다.
(hadoop/logs/hadoop-user-balancer.out)

각각의 iteration 동안 처리될 데이터 량은 10GB 보다 낮은 수준 또는 threshold 에 알맞은 수준으로 설정되지만
하나의 iteration 동안 위의 데이터량을 다 balancing 하지 못하더라도 20분이 경과하면 다음 iteration 으로 넘어간다.
하나의 iteration 이 끝나면 balancer 는 네임노드로 부터 block information 을 업데이트한다.

balancer 를 위한 설정은 hdfs-site.xml 에서 bandwidth 를 설정할 수 있다.
설정한 value 는 각각의 노드에 적용되는 값이고 단위는 Bps (bytes per sec) 이다.
디폴트 값은 1MB/s 이고, bandwidth value 를 아주 크게 늘려도 일반적으로 다른 애플리케이션과 자원쟁탈을 벌이게 된다는 것을 염두하자
(설정한만큼의 성능이 나오지 않을 수 있다)
중요한건 configration 수정 후 HDFS restart 가 필요하다는 점이다.

기본적으로 Running multiple instance of balancer 는 tool 을 통해 금지되어 있다.
(block 정보가 중복되면 안되니까)

balancer 는 아래 경우 중 한가지만 해당되도 자동으로 종료된다.
- 할일이 없을때, 밸런싱이 완료됬을때
- iteration 5회 반복하고도 이동시킬 block 이 없을때
- 네임노드와 통신하는 동안 IOExceptino 이 발생했을때
- 다른 balancer 가 돌고 있을때

balancer 를 테스트해보고 싶다면 아래 사항을 반드시 고려해야 한다.
- 적용 후  datanode 사용률이 넘치거나 모자라지 않는가?
- data loss 가 발생하지 않았는가?
- missing block 이 발생하지 않았는가?
- over replication 또는 under replication 이 발생하지 않았는가?

balancing 의 디테일한 과정은 아래와 같다.(번역이 정확하지 않을 수 있다)
1. rebalancing server (balancing 을 동작시킨 서버) 에서 namenode 에게 datanode report 를 요청
2. rebalancing server 에서 source node 와 destinatino node 를 선정후 balancing 에 필요한 partitial block map 을 namenode 에게 요청
3. rebalancing server 에서 source block 과 destination block 을 선정후 각각의 souce node 에게  block move to destination 을 지시
4. source node 는 destinatino node 에게 block copy 를 지시
   -> source node 가 직접 지시하나? namenode 의 역할이 아닌가?
5. copy 완료 후 destination node 는 name node 에게 copy done 보고
6. source node 에게도 copy done 보고
7. source node 는 해당 block 을 delete 후 name node 에게 delete done 보고
7. rebalnceing server 에게도 delete done 보고

관련된 문서는 http://hadoop.apache.org/docs/r1.0.4/hdfs_user_guide.html#Rebalancer 에서 확인 할 수 있다.


Balancer Test Plan

  • Feature(s) Tested: enumerate the feature(s)

    • Which Jira issue(s)?

    • Package and classes tested & corresponding Junit test class name(s)

    • package org.apache.hadoop.hdfs.server.balancer
      Test package
    • What is the feature?

      HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:

      • Policy to keep one of the replicas of a block on the same node as the node that is writing the block.
      • Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack.
      • One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.
      • Spread HDFS data uniformly across the DataNodes in the cluster.
      Due to multiple competing considerations, data might not be uniformly placed across the DataNodes. HDFS provides a tool for administrators that analyzes block placement and rebalanaces data across the DataNode. A brief administrator's guide for rebalancer as a PDF. For command usage, see balancer command.

      The balancer is a tool that balances disk space usage on an HDFS cluster when some datanodes become full or when new empty nodes join the cluster. The tool is deployed as an application program that can be run by the cluster administrator on a live HDFS cluster while applications adding and deleting files.

      DESCRIPTION

      The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently.

      The tool moves blocks from highly utilized datanodes to poorly utilized datanodes iteratively. In each iteration a datanode moves or receives no more than the lesser of 10G bytes or the threshold fraction of its capacity. Each iteration runs no more than 20 minutes. At the end of each iteration, the balancer obtains updated datanodes information from the namenode.

      A system property that limits the balancer's use of bandwidth is defined in the default configuration file:

      					  <property>
      						    <name>dfs.balance.bandwidthPerSec</name>
      						    <value>1048576</value>
      						  	<description>
      						  	    Specifies the maximum bandwidth that each datanode can utilize 
      						  	    for the balancing purpose in term of the number of bytes per second. 
      						  	</description>
      					  </property>
      				  

      This property determines the maximum speed at which a block will be moved from one datanode to another. The default value is 1MB/s. The higher the bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes. If an administrator changes the value of this property in the configuration file, the change is observed when HDFS is next restarted.

      MONITERING BALANCER PROGRESS

      After the balancer is started, an output file name where the balancer progress will be recorded is printed on the screen. The administrator can monitor the running of the balancer by reading the output file. The output shows the balancer's status iteration by iteration. In each iteration it prints the starting time, the iteration number, the total number of bytes that have been moved in the previous iterations, the total number of bytes that are left to move in order for the cluster to be balanced, and the number of bytes that are being moved in this iteration. Normally "Bytes Already Moved" is increasing while "Bytes Left To Move" is decreasing.

      Running multiple instances of the balancer in an HDFS cluster is prohibited by the tool.

      The balancer automatically exits when any of the following five conditions is satisfied:

      1. The cluster is balanced;
      2. No block can be moved;
      3. No block has been moved for five consecutive iterations;
      4. An IOException occurs while communicating with the namenode;
      5. Another balancer is running.

      Upon exit, a balancer returns an exit code and prints one of the following messages to the output file in corresponding to the above exit reasons:

      1. The cluster is balanced. Exiting
      2. No block can be moved. Exiting...
      3. No block has been moved for 3 iterations. Exiting...
      4. Received an IO exception: failure reason. Exiting...
      5. Another balancer is running. Exiting...

      The administrator can interrupt the execution of the balancer at any time by running the command "stop-balancer.sh" on the machine where the balancer is running.

    • What is the externally visible view of the feature?

    • 				  To start:
      				       bin/start-balancer.sh [-threshold ]
      				       Example: bin/ start-balancer.sh 
      				                      start the balancer with a default threshold of 10%
      				                bin/ start-balancer.sh -threshold 5
      				                      start the balancer with a threshold of 5%
      				  To stop:
      				       bin/ stop-balancer.sh
      		 		

      Where, threshold - is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become.

  • Risk Scenarios: enumerate the bad things that could happen in the system that either:

    • Could be caused by the feature
    • This feature is tool that balances disk space usage on an HDFS cluster when some data nodes become full or when new empty nodes join the cluster. Bug in this feature could cause

      • Data node over utilized or underutilized .
      • Data loss
      • Missing blocks
      • Over replication or under replication of blocks while balancing nodes
    • Could have an effect on the feature
  • Test Cases: enumerate all tests in tables


    1. Balancer tests

      All balancer tests are run with a threshold of 10% unless otherwise noted. All nodes have the same capacity unless otherwise noted. Test cases 3, 4, and 5 are automatic while all the other cases are manual. All tests are expected to meet the following requirements unless otherwise noted.

      • REQ1. Rebalancing does not cause the loss of a block;
      • REQ2. Rebalancing does not change the number of replicas that a block had;
      • REQ3. Rebalancing does not decrease the number of racks that a block had.
      • REQ4. Rebalancing process makes the cluster to be less and less imbalanced.(Algorithm should try best to satisfy this but provides no guarantee.)

      IdType of TestDescriptionExpected BehaviorIs Automated
      Balancer_01PositiveStart balancer and check if the cluster is balanced after the run.Cluster should be in balanced stateNo
      Balancer_02PositiveTest a cluster with even distribution, then a new empty node is added to the cluster.Balancer should automatically start balancing cluster by loading data on empty clusterNo
      Balancer_03PositiveBring up a one-node dfs cluster. Set files’ replication factor to be 1 and fill up the node to 30% full. Then add an empty data nodeOld node is 25% utilized and the new node is 5% utilized. Yes
      Balancer_04PositiveThe same as 03 except that the empty new data node is on a different rack. The same as 03Yes
      Balancer_05PositiveThe same as 03 except that the empty new data node is half of the capacity as the old one.Old one is 25% utilized and the new one is 10% utilized Yes
      Balancer_06PositiveBring up a 2-node cluster and fill one node to be 60% and the other to be 10% full. All nodes are on different racks.One node is 40% utilized and the other one is 30% utilized No
      Balancer_07PositiveBring up a dfs cluster with nodes A and B. Set files’ replication factor to be 2 and fill up the cluster to 30% full. Then add an empty data node C. All three nodes are on the same rack.Old ones are 25% utilized and the new one is 10% No
      Balancer_08PositiveThe same as test case 7 except that A, B, and C are on different racks. The same as above No
      Balancer_09PositiveThe same as test case 7 except that interrupt rebalancing.The cluster is less imbalanced
      Balancer_10PositiveRestart rebalancing until it is done.The same as 7 No
      Balancer_11PositiveThe same as test case 7 except that shut down namenode while rebalancingRebalancing is interrupted No
      Balancer_12PositiveThe same as test case 5 except that writing while rebalancing.he cluster most likely becomes balanced, but may fluctuate No
      Balancer_13PositiveThe same as test case 5 except that deleting while rebalancing The same as aboveNo
      Balancer_14PositiveThe same as test case 5 except that writing & deleting while rebalancing The same as aboveNo
      Balancer_15PositiveScalability test: populate a 750-node cluster
      1. Run rebalancing after 3 node are added.
      2. Run rebalancing after 2 racks of nodes (60 nodes) are added.
      3. Run rebalancing after 2 racks of nodes are added and run file writing/deleting at the same time
      Cluster becomes balanced; File I/O’s performance should not be noticeable slower.No
      Balancer_16PositiveStart balancer with negative threshold value.Command execution error output =
      Expect a double parameter in the range of [0, 100]: -10
      Usage: java Balancer [-threshold <threshold>] percentage of disk capacity
      Balancing took __ milliseconds
      No
      Balancer_17PositiveStart balancer with out-of-range threshold value. e.g ( -123, 0, -324 , 100000, -1222222 , 1000000000, -10000 , 345, 989 )Exit with error messageNo
      Balancer_18PositiveStart balancer with alpha-numeric threshold value (e.g 103dsf , asd234 ,asfd ,ASD , #$asd , 2345& , $35 , %34).Exit with error messageNo
      Balancer_19PositiveStart 2 instances of balancer on the same gateway.Exit with error messageNo
      Balancer_20PositiveStart 2 instances of balancer on two different gatewaysExit with error messageNo
      Balancer_21PositiveStart balancer when the cluster is already balanced.Balancer should print information about all nodes in cluster and exit with status of Cluster is balanced.No
      Balancer_22PositiveRunning the balancer with half the data nodes not runningNo
      Balancer_23PositiveRunning the balancer and simultaneously simulating load on the cluster with half the data nodes not runningNo


    2. Block Replacement Protocol Test
    3. First set up a 3 node cluster with nodes NA, NB, and NC, which are on different racks. Then create a file with one block B with a replication factor 3. Finally add a new node ND to the cluster on the same rack as NC.


      IdType of TestDescriptionExpected BehaviorIs Automated
      ProtocolTest_01PositiveCopy block B from ND to NA with del hint NCFail because proxy source ND does not have the block No
      ProtocolTest_02PositiveCopy block B from NA to NB with del hint NFail because the destination NB contains the block No
      ProtocolTest_02PositiveCopy block B from NA to ND with del hint NB Succeed now block B is on NA, NC, and ND No
      ProtocolTest_02PositiveCopy block B from NB to NCwith del hint NA NA Succeed but NA is not a valid del hint. So block B is on NA and NB. The third replica is either on NC or ND No


    4. Throttling test
    5. IdType of TestDescriptionExpected BehaviorIs Automated
      ThrottlingTest_01Positive Create a throttler with 1MB/s bandwidth. Send 6MB data, and throttle at 0.5MB, 0.75MB, and in the end.Actual bandwidth should be less than or equal to the expected bandwidth 1MB/s No


    6. Namenode Protocol Test: getBlocks
    7. Set up a 2-node cluster and create a file with a length of 2 blocks and a replication factor of 2


      IdType of TestDescriptionExpected BehaviorIs Automated
      NamenodeProtocolTest_01PositiveGet blocks from datanode 0 with a size of 2 blocks.Actual bandwidth should be less than or equal to the expected bandwidth 1MB/s No
      NamenodeProtocolTest_02PositiveGet blocks from datanode 0 with a size of 1 block.Return 1 block No
      NamenodeProtocolTest_03PositiveGet blocks from datanode 0 with a size of 0 .Receive an IOException No
      NamenodeProtocolTest_04PositiveGet blocks from datanode 0 with a size of -1.Receive an IOException No
      NamenodeProtocolTest_05PositiveGet blocks from a non-existent datanode.Receive an IOException No



Hairong Kuang added a comment - 26/Jul/07 06:17 - edited

Here are some of my initial thoughts. Please comment.

1. What's balance?
A cluster is balanced iff there is no under-capactiy or over-capacity data nodes in the cluster.
An under-capacity data node is a node that its %used space is less than avg_%used_space-threshhold.
An over-capacity data node is a node that its %used space is greater than avg_%used_space+threshhold. 
A threshold is user configurable. A default value could be 20% of % used space.

2. When to rebalance?
Rebanlancing is performed on demand. An administrator issues a command to trigger rebalancing. Rebalancing automatically shuts off once the cluster is balanced and can also be interrupted by an administrator. The following commands are to be supported:
Hadoop dfsadmin balance <start/stop/get>
-----Start/stop data block rebalancing or query its status.

3. How to balance?
(a) Upon receiving a data block rebalancing request, a name node creates a Balancing thread. 
(b) The thread performs rebalancing iteratively.

  1. At each iteration, it scans the whole data node list and schedules block moving tasks. It sleeps for a heartbeat interval between iterations;
  2. When scanning the data node list, if it finds an under-capacity data node, it schedules moving blocks to the data node. The source data node is chosen randomly from over-capacity data nodes or non-under-capacity data nodes if no over-capacity data node exists. The source block is randomly chosen from the source data node as long as the block moving does not violate requirement (1).
  3. If the thread finds an over-capacity data node, it scheduls moving blocks from the data node to other data nodes. It chooses a target data node randomly from under-capacity data nodes or non-over-capcity data nodes when there is no under-capacity data node; It then randomly chooses a source block that does not violate requirement (1).
  4. The scheduled tasks are put to a queue in the source data node. The task queue has a limited length of 4 by default and is configurable.
  5. The scheduled tasks are sent to data nodes to execute in responding to a heartbeat message. Currently dfs limits at most 2 tasks per heartbeat by default.
    (c) The thread stops and frees itself when the cluster becomes balanced.






728x90
반응형

'개발 노트 > Hadoop' 카테고리의 다른 글

Balancer Java Code(상세)  (0) 2013.12.11
Hadoop Upgrade Guide for v.0.14  (0) 2013.12.06
운영상의 이슈  (0) 2013.11.14
hadoop datanode 제거 / 추가 하기 (2)  (0) 2013.11.04
hadoop datanode 제거 / 추가 하기 (1)  (0) 2013.10.30
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함