How to Upload Input File to Hadoop Mac

Hadoop file system provides y'all a privilege equally it stores the data in multiple copies. Also, information technology'south a toll-constructive solution for any business to store their data efficiently. HDFS Operations acts as the key to open the vaults in which you shop the data to be available from remote locations.

Post-obit are the topics that'll be covered in this HDFS tutorial:

Starting HDFS
- Read & Write Operations in HDFS
- List Files in HDFS
- Inserting Data into HDFS
- Retrieving Data into HDFS
- Shutting Down the HDFS
Creating User Account
- Mapping the nodes
Configuring Central Based Login
- Installation of Hadoop
- Configuring Hadoop on Master Server
- Configuring Hadoop
Start the DataNode on New Style
- Removing a DataNode
Advantages of Learning HDFS Operations

Starting HDFS

Format the configured HDFS file system and so open the namenode (HDFS server) and execute the following control.

$ hadoop namenode -format

Get-go the distributed file organisation and follow the command listed below to start the namenode likewise as the information nodes in cluster.

$ start-dfs.sh

Scout this Big Data & Hadoop Full Course – Learn Hadoop In 12 Hours tutorial!

HDFS Commands and Operations HDFS Commands and Operations

Read & Write Operations in HDFS

Yous tin execute well-nigh all operations on Hadoop Distributed File Systems that can be executed on the local file organisation. You tin execute various reading, writing operations such equally creating a directory, providing permissions, copying files, updating files, deleting, etc. You tin add together access rights and scan the file system to get the cluster information similar the number of dead nodes, live nodes, spaces used, etc.

HDFS Operations to Read the file

To read any file from the HDFS, you have to interact with the NameNode as it stores the metadata about the DataNodes. The user gets a token from the NameNode and that specifies the address where the data is stored.

Yous can put a read asking to NameNode for a particular block location through distributed file systems. The NameNode will and so bank check your privilege to access the DataNode and allows you to read the address block if the access is valid.

          $ hadoop fs -cat <file>

HDFS Operations to write in file

Similar to the read performance, the HDFS Write operation is used to write the file on a particular address through the NameNode. This NameNode provides the slave address where the client/user tin can write or add together information. After writing on the block location, the slave replicates that block and copies to another slave location using the factor 3 replication. The save is and so reverted back to the client for hallmark.

The process for accessing a NameNode is pretty similar to that of a reading functioning. Below is the HDFS write commence:

          bin/hdfs dfs -ls  <path>

Take the Large Data Training and learn the primal technologies from discipline thing experts.

Listing Files in HDFS

Finding the list of files in a directory and the status of a file using 'ls' command in the terminal. Syntax of ls can be passed to a directory or a filename equally an argument which are displayed as follows:

$ $HADOOP_HOME/bin/hadoop fs -ls <args>

Inserting Information into HDFS

Below mentioned steps are followed to insert the required file in the Hadoop file system.

Step1: Create an input directory

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step2: Employ the put command transfer and store the information file from the local systems to the HDFS using the following commands in the terminal.

$ $HADOOP_HOME/bin/hadoop fs -put /home/intellipaat.txt /user/input

Step3: Verify the file using ls command.

$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Retrieving Data from HDFS

For example, if you lot have a file in HDFS called Intellipaat. So retrieve the required file from the Hadoop file system by carrying out:

Step1: View the data from HDFS using the cat control.

$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/intellipaat

Step2: Gets the file from HDFS to the local file system using become command equally shown below

$ $HADOOP_HOME/bin/hadoop fs -go /user/output/ /habitation/hadoop_tp/

Shutting Down the HDFS

Shut down the HDFS files by following the below command

$ stop-dfs.sh

Multi-Node Cluster

Installing Java

Syntax of java version command

$ java -version

Following output is presented.

java version "ane.7.0_71" Java(TM) SE Runtime Surroundings (build 1.7.0_71-b13) Java HotSpot(TM) Client VM (build 25.0-b02, mixed manner)

Get 100% Hike!

Master Most in Demand Skills Now !

Creating User Account

System user account is used on both primary and slave systems for the Hadoop installation.

# useradd hadoop # passwd hadoop

Mapping the nodes

Hosts files should be edited in /etc/ folder on each and every nodes and IP address of each system followed by their host names must be specified mandatorily.

# 6 /etc/hosts

Enter the following lines in the /etc/hosts file.

192.168.ane.109 hadoop-master 192.168.i.145 hadoop-slave-one 192.168.56.1 hadoop-slave-2

Configuring Fundamental Based Login

Ssh should be gear up upwards in each node then they tin can easily antipodal with ane another without any prompt for a countersign.

# su hadoop $ ssh-keygen -t rsa $ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] $ ssh-re-create-id -i ~/.ssh/id_rsa.pub [email protected] $ ssh-re-create-id -i ~/.ssh/id_rsa.pub [email protected] $ chmod 0600 ~/.ssh/authorized_keys $ leave

Installation of Hadoop

Hadoop should be downloaded in the master server using the following procedure.

# mkdir /opt/hadoop # cd /opt/hadoop/ # wget http://apache.mesi.com.ar/hadoop/mutual/hadoop-1.2.one/hadoop-1.2.0.tar.gz # tar -xzf hadoop-1.two.0.tar.gz # mv hadoop-one.two.0 hadoop # chown -R hadoop /opt/hadoop # cd /opt/hadoop/hadoop/

Configuring Hadoop

Hadoop server must be configured in cadre-site.xml and should exist edited wherever required.

<configuration> <property> <proper noun>fs.default.name</proper name><value>hdfs://hadoop-master:9000/</value> </property> <property> <name>dfs.permissions</name> <value>simulated</value> </belongings> </configuration>

hdfs-site.xml file should exist editted. <configuration> <property> <name>dfs.information.dir</proper name> <value>/opt/hadoop/hadoop/dfs/proper name/data</value> <concluding>truthful</last> </property> <property> <name>dfs.name.dir</proper name> <value>/opt/hadoop/hadoop/dfs/name</value> <last>truthful</last> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

mapred-site.xml file should exist edited as per the requirement example is being shown.

<configuration> <property> <name>mapred.chore.tracker</proper name><value>hadoop-master:9001</value> </property> </configuration>

JAVA_HOME, HADOOP_CONF_DIR, and HADOOP_OPTS should exist edited as follows:

consign JAVA_HOME=/opt/jdk1.vii.0_17 consign HADOOP_OPTS=-Djava.cyberspace.preferIPv4Stack=true export HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf

Installing Hadoop on Slave Servers

Hadoop should be installed on all the slave servers

# su hadoop $ cd /opt/hadoop $ scp -r hadoop hadoop-slave-1:/opt/hadoop $ scp -r hadoop hadoop-slave-2:/opt/hadoop

Configuring Hadoop on Principal Server

Primary server configuration

# su hadoop $ cd /opt/hadoop/hadoop Primary Node Configuration $ 6 etc/hadoop/masters hadoop-master

Slave Node Configuration

$ half dozen etc/hadoop/slaves hadoop-slave-ane hadoop-slave-2

Proper noun Node format on Hadoop Master

# su hadoop $ cd /opt/hadoop/hadoop $ bin/hadoop namenode –format 11/10/14 ten:58:07 INFO namenode.NameNode: STARTUP_MSG: ************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoop-master/192.168.one.109 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.ii.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.two -r 1479473; compiled by 'hortonfo' on Monday May 6 06:59:37 UTC 2013 STARTUP_MSG: java = i.7.0_71 ************************************************************ 11/10/14 10:58:08 INFO util.GSet: Computing chapters for map BlocksMap editlog=/opt/hadoop/hadoop/dfs/name/current/edits …………………………………………………. 11/x/14 x:58:08 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. eleven/10/14 10:58:08 INFO namenode.NameNode: SHUTDOWN_MSG: ************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop-main/192.168.1.15 ************************************************************

Hadoop Services

Starting Hadoop services on the Hadoop-Principal process explains its setup.

$ cd $HADOOP_HOME/sbin $ offset-all.sh

Addition of a New DataNode in the Hadoop Cluster is as follows:

Networking

Add new nodes to an existing Hadoop cluster with some suitable network configuration. Consider the following network configuration for new node Configuration:

IP accost : 192.168.1.103 netmask : 255.255.255.0 hostname : slave3.in

Adding a User and SSH Access

Add together a user working under "hadoop" domain and the user must have the access added and password of Hadoop user can be set up to anything one wants.

useradd hadoop passwd hadoop

To be executed on master

mkdir -p $Abode/.ssh chmod 700 $HOME/.ssh ssh-keygen -t rsa -P '' -f $Habitation/.ssh/id_rsa cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys chmod 644 $HOME/.ssh/authorized_keys Copy the public primal to new slave node in hadoop user $HOME directory scp $Home/.ssh/id_rsa.pub [email protected]:/dwelling/hadoop/

Execution done on slaves

su hadoop ssh -X [email protected]

Content of public key must be copied into file "$Dwelling house/.ssh/authorized_keys" and so the permission for the same must be changed every bit per the requirement.

cd $HOME mkdir -p $Dwelling house/.ssh chmod 700 $Home/.ssh cat id_rsa.pub >>$HOME/.ssh/authorized_keys chmod 644 $HOME/.ssh/authorized_keys

ssh login must be changed from the master machine. It is possible that the ssh to the new node without a password from the primary must be verified.

ssh [email protected] or [electronic mail protected]

Setting Hostname for New Node

Hostname is setup in the file directory  /etc/sysconfig/network On new slave3 automobile NETWORKING=yes HOSTNAME=slave3.in

Automobile must exist restarted again or hostname command should be run under new machine with the corresponding hostname to make changes effectively.

On slave3 node machine:

hostname slave3.in /etc/hosts must be updated on all machines of the cluster 192.168.1.102 slave3.in slave3

ping the automobile with hostnames to bank check whether information technology is resolving to IP address.

ping primary.in

Start the DataNode on New Node

Datanode daemon should be started manually using $HADOOP_HOME/bin/hadoop-daemon.sh script. Chief (NameNode) should correspondingly bring together the cluster after automatically contacted. New node should be added to the configuration/slaves file in the chief server. New node will be identified by script-based commands.

su hadoop or ssh -X [email protected]

HDFS is started on a newly added slave node

./bin/hadoop-daemon.sh start datanode

jps control output must be checked on a new node.

$ jps 7141 DataNode 10312 Jps

Removing a DataNode

Node tin exist removed from a cluster while information technology is running, without whatsoever worries of data loss. A decommissioning characteristic is made bachelor past HDFS which ensures that removing a node is performed deeply.

Pace 1

$ su hadoop

Pace 2

Earlier starting the cluster an exclude file must be configured where a key named dfs.hosts.exclude should exist added to our$HADOOP_HOME/etc/hadoop/hdfs-site.xmlfile.

NameNode's local file arrangement contains a list of machines which are not permitted to connect to HDFS receives full path past this key and the value associated with it every bit follows.

<belongings> <name>dfs.hosts.exclude</name><value>/home/hadoop/hadoop-1.2.1/hdfs_exclude.txt</value><description>>DFS exclude</description> </property>

Step three

Hosts with respect to decommission are determined.

File reorganization by the hdfs_exclude.txt for each and every machine to be decommissioned which will results in preventing them from connecting to the NameNode.

slave2.in

Step 4

Force configuration reloads.

"$HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes" should exist run $ $HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes

NameNode will be forced made to re-read its configuration, as this is inclusive for the newly updated 'excludes' file. Nodes will be decommissioned over a catamenia of fourth dimension intervals, and allowing time for each node'southward blocks to be replicated onto machines which are scheduled to be active.jps command output should be checked on slave2.in. In one case the piece of work is done DataNode procedure will shutdown automatically.

Footstep 5

Shutdown nodes.

The decommissioned hardware can be carefully shut down for maintenance purpose later on the decommission process has been finished.

$ $HADOOP_HOME/bin/hadoop dfsadmin -study

Step half dozen

Excludes are edited again and in one case the machines have been decommissioned, they are removed from the 'excludes' file. "$HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes" volition read the excludes file back into the NameNode.

Information Nodes will rejoin the cluster after the maintenance has been completed, or if additional capacity is needed in the cluster again is being informed.

To run/shutdown tasktracker

$ $HADOOP_HOME/bin/hadoop-daemon.sh stop tasktracker $ $HADOOP_HOME/bin/hadoop-daemon.sh offset tasktracker

Certification in Bigdata Analytics

Add together a new node with the following steps

ane) Accept a new system which gives admission to create a new username and countersign

two) Install the SSH and with master node setup ssh connections

3) Add sshpublic_rsa id central having an authorized keys file

4) Add together the new information node hostname, IP address and other informative details in /etc/hosts slaves file192.168.1.102 slave3.in slave3

5) Get-go the DataNode on the New Node

six) Login to the new node control similar suhadoop or Ssh -X [email protected]

7) Showtime HDFS of newly added in the slave node by using the following control ./bin/hadoop-daemon.sh start data node

8) Check the output of jps command on a new node.

Desire to learn more most Multi-node clusters bank check out our weblog!

Advantages of learning HDFS Operations

Below are the major advantages of learning HDFS operations:

Highly Scalable, you lot tin expand the big information programs based on the user experience and rise in need.
HDFS Operations are easy to sympathise and require less coding.
Hadoop provides a cost-effective storage solution for organizations of all sizes. Here, the clients merely have to pay for the resources based on the fourth dimension period they're utilizing them.
The distributed file system is a cluster of cloud servers from dissimilar locations working in a synchronized style. This makes information processing much faster and efficiently processes huge datasets within seconds.
HDFS is a new technology that many companies are adapting to, so learning it cloud provides a big leap in your career.
When the data is sent to the node, its automatically replicated to other nodes of the cluster. This means you have multiple copies of the information. Yous'll not lose your information in the upshot of failure.

Check out our Hadoop Community if you lot have whatsoever questions.

Summary

Hadoop Distributed File System is a highly scalable, flexible, fault-tolerant, and reliable system that stores the data across multiple nodes on different servers. Information technology follows a chief-slave architecture, where the NameNode acts every bit a master, and the DataNode equally the slave. HDFS Operations are used to admission these NameNodes and interact with the data. The files are broken down into blocks where the client tin can shop the data, read, write, and perform various operations past completing the authentication process.

In the side by side section of this tutorial, nosotros shall exist talking about Mapreduce in Hadoop.

jenkinsfidlen1938.blogspot.com

Source: https://intellipaat.com/blog/tutorial/hadoop-tutorial/hdfs-operations/