Explain Cluster Administration Commands – Hadoop Tutorial
Cluster Administration Command
In Hadoop, managing and monitoring the cluster is a crucial responsibility for administrators. Hadoop provides a wide range of cluster administration commands that help administrators perform tasks such as monitoring nodes, managing services, checking cluster health, and handling user jobs.
In this tutorial, we will explain the key Hadoop cluster administration commands, their purpose, and examples for better understanding.
Cluster administration commands in Hadoop are used by administrators to control and monitor the cluster’s health, manage nodes, check running jobs, and perform troubleshooting. These commands are usually executed by users with administrative privileges.
Provides a summary of cluster status including live nodes, dead nodes, and storage usage.
Command:
hdfs dfsadmin -report
Safe mode is a read-only mode in HDFS used for maintenance.
Commands:
hdfs dfsadmin -safemode enter
hdfs dfsadmin -safemode leave
hdfs dfsadmin -safemode get
Updates the cluster when new nodes are added or decommissioned.
Command:
hdfs dfsadmin -refreshNodes
Remove a DataNode safely from the cluster.
Steps:
Add the node to the dfs.exclude
file.
Run:
hdfs dfsadmin -refreshNodes
Helps in identifying files that are being held open.
Command:
hdfs debug -openfiles
Shows detailed block reports for debugging.
Command:
hdfs fsck /path/to/file -files -blocks -locations
Ensures data is evenly distributed across nodes.
Command:
hdfs balancer
View DataNode statistics.
Command:
hdfs dfsadmin -printTopology
To list and manage running jobs:
yarn application -list
yarn application -kill <ApplicationID>
Display cluster performance metrics.
Command:
yarn node -list -all
Ensure smooth cluster operations.
Help in troubleshooting and debugging.
Manage resource allocation and job execution.
Maintain data security and availability.
Hadoop cluster administration commands provide administrators with the ability to manage, monitor, and troubleshoot the cluster effectively. From checking cluster reports to balancing data and managing jobs, these commands are vital for ensuring a healthy and optimized Hadoop ecosystem.