Sunday, April 5, 2015

HDFS Command

1. File System Check (fsck)

Like its disk filesystem cousin, HDFS’s fsck command understands blocks. For example,
running:
% hadoop fsck / -files -blocks
will list the blocks that make up each file in the filesystem.

There are two properties that we set in the pseudodistributed configuration that deserve
further explanation. The first is fs.default.name, set to hdfs://localhost/, which is used
to set a default filesystem for Hadoop. Filesystems are specified by a URI, and here we
have used an hdfs URI to configure Hadoop to use HDFS by default. The HDFS daemons
will use this property to determine the host and port for the HDFS namenode.
We’ll be running it on localhost, on the default HDFS port, 8020. And HDFS clients
will use this property to work out where the namenode is running so they can connect
to it.
We set the second property, dfs.replication, to 1 so that HDFS doesn’t replicate
filesystem blocks by the default factor of three. When running with a single datanode,
HDFS can’t replicate blocks to three datanodes, so it would perpetually warn about
blocks being under-replicated. This setting solves that problem

No comments:

Post a Comment