Step 1 : Java Installation
1.1 Install latest or desired version of java
sudo apt install default-jdk default-jre -y
1.2 Check java version
java -version
Step 2 : Create Hadoop User (Optional)
If you want to manage Hadoop files independently, create a different user (a Hadoop user).
2.1 Create a new user called hadoop.
sudo adduser hadoop
2.2 Make the hadoop user a member of the sudo group.
sudo usermod -aG sudo hadoop
The -aG argument in the above command usermod stands for append(a)-Groups(G).
2.3 Change to the Hadoop user now.
sudo su - hadoop
Step 3 : Configure Password-less SSH
Note : If you completed step 2, then proceed to step 3 after switching to the hadoop user (sudo su — hadoop).
3.1 Install OpenSSH server and client
sudo apt install openssh-server openssh-client -y
3.2 Generate public and private key pairs.
ssh-keygen -t rsa
3.3 Add the generated public key from id_rsa.pub to authorized_keys
sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3.4 Change the file permissions for authorized_keys.
sudo chmod 640 ~/.ssh/authorized_keys
3.5 Check to see if the password-less SSH is working.
ssh localhost
Step 4 : Install and Configure Apache Hadoop in hadoop user
Note : Check that you are using the hadoop user; if not, use the following command to switch to the hadoop user.
sudo su - hadoop
4.1 Download latest stable version of hadoop
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Use the following command if the previous one fails with an error.
sudo apt-get install wget
4.2 Extract the downloaded tar file
tar -xvzf hadoop-3.3.1.tar.gz
4.3 Create Hadoop directory
To ensure that all of your files are organised in one location, move the extracted directory to /usr/local/.
sudo mv hadoop-3.3.1 /usr/local/hadoop
To maintain hadoop logs, create a different directory inside of usr/local/hadoop called logs.
sudo mkdir /usr/local/hadoop/logs
Finally, use the following command to modify the directory’s ownership.
4.4 Configure Hadoop
sudo nano ~/.bashrc
Once executing the above command you can see nano editor in your terminal then paste following lines
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib/native”
Press CTRL + S to save and CTRL + X to exit the nano editor after copying the lines above.
Use the following command to activate environmental variables after closing the nano editor.
source ~/.bashrc
Step 5 : Configure Java Environmental variables
Hadoop can carry out its essential functions thanks to a large number of components. You must define Java environment variables in the configuration file for hadoop-env.sh in order to configure these components, including YARN, HDFS, MapReduce, and Hadoop-related project settings.
5.1 Find Java path and Open-JDK directory with help of following commands
which javac
readlink -f /usr/bin/javac
5.2 Edit Hadoop-env.sh file
This file contains Hadoop’s environment variable settings. You can use these to modify the Hadoop daemon’s behaviour, such as where log files are stored, the maximum amount of heap used, and so on. The only variable in this file that should be changed is JAVA HOME, which specifies the path to the Java 1.5.x installation used by Hadoop.
Open the hadoop-env.sh file in your preferred text editor first. In this case, I’ll use nano.
sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Add the next few lines to the file’s end now.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH+=” $HADOOP_HOME/lib/*.jar”
Export JAVA_HOME and HADOOP_CLASSPATH in the hadoop-env.sh file once you are aware of your java and open jdk paths.
5.3 Javax activation
Install Javax by going to the hadoop directory.
cd /usr/local/hadoop/lib
Now, copy and paste the following command in your terminal to download javax activation file
Verify your hadoop by typing hadoop version
0 Comments