Introduction to Big Data and Hadoop:

    Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs

    Master the various components of Hadoop ecosystem like Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark. Get hands-on practice on CloudLabs by implementing real life projects in the domains of banking, telecommunication, social media, insurance, and e-commerce. The course is aligned to Cloudera CCA175 certification. This course is best suited for IT, data management, and analytics professionals looking to gain expertise in Big Data.

Introduction to big data and Hadoop
Hadoop Architecture
Installing Ubuntu with Java 1.8 on VM Workstation 11
Hadoop Versioning and Configuration
Single Node Hadoop 1.2.1 installation on Ubuntu 14.4.1
Single Node Hadoop 2.7.3 installation on Ubuntu 16.04
Multi Node Hadoop 2.7.3 installation on Ubuntu 16.04
Linux commands and Hadoop commands
Cluster architecture and block placement
Modes in Hadoop
    • Local Mode
    • Pseudo Distributed
    • Fully Distributed Mode
Hadoop Daemon
    • Master Daemons(Name Node, Secondary Name Node, Job Tracker)
    • Slave Daemons(Job tracker, Task tracker)
Task Instance
Hadoop HDFS Commands
Accessing HDFS
    • CLI Approach
    • Java Approach
Installing and using Hadoop 2.X
Map-Reduce(Using New API)
Understanding Map Reduce Framework
Inspiration to Word-Count Example
Developing Map-Reduce Program using Eclipse Luna
HDFS Read-Write Process
Map-Reduce Life Cycle Method
Comparator and Comparable(Java)
Custom Output File
Analysing Temperature dataset using Map-Reduce
Custom Partitioner & Combiner
Running Map-Reduce in Local and Pseudo Distributed Mode.
Advanced Map-Reduce
Custom and Dynamic Counters
Running Map-Reduce in Multi-node Hadoop Cluster
Custom Writable
Site Data Distribution
    • Using Configuration
    • Using DistributedCache
    • Using stringifier
Input Formatters
    • NLine Input Format
    • XML Input Format
    • DB Input Format
    • Sequence File Format
    • Avro File Format
    • Primary Reverse Sorting
    • Secondary Sorting
    • Map-side Joins
    • Reduce side Joins
Compression Technique
    • Gzip
    • snappy
    • bzip2
    • deflate
Processing Multiple Line using Map-Reduce
Processing XML File using Map-Reduce
Testing MapReduce with MR Unit
Working with NYSE DataSets
Running Map-Reduce in Cloudera Box
Hive Introduction & Installation
Data Types in Hive
Commands in Hive
Exploring Internal and External Table
Complex data types(Array,Map,Structure)
UDF in Hive
    • Built-in UDF
    • Custom UDF
Thrift Server
Java to Hive Connection
Joins in Hive
Working with HUE
Bucket Map-side Join
More commands
    • View
    • SortBy
    • Distribute By
    • Lateral View

Working with Beeline
Configure MySQL instead of Derby
Working with HUE
Performing update and delete in Hive
Running Hive in Cloudera
NYSE dataset Assignment in Hive
Movie Rating Assignment in Hive
Sqoop Installations and Basics
Importing Data from Oracle to HDFS
Advance Imports
Working with sqoop and Hive
Exporting Data from HDFS to Oracle
Sqoop Metastore
Real time use-case
Running Sqoop in Cloudera
Installation and Introduction
WordCount in Pig
NYSE in Pig
Working With Complex Datatypes
Pig Schema
Miscellaneous Command
    • Group
    • Filter
    • Order
    • Distinct
    • Join
    • Flatten
    • Co-group
    • Union
    • Illustrate
    • Explain
UDFs in Pig
Parameter Substitution and DryRun
Processing XML file using Pig
Pig Macros
Testing Pig Scripts using PigUnit.
Running Pig in Cloudera
HBase Introduction & Installation
Exploring HBase Shell
Hbase Architecture
HBase Storage Techinique
HBasing with Java
CRUD with HBase
Map-Reduce HBase Integration
Filters in Hbase
Installing Oozie
Running Map-Reduce Program with Oozie
Running Pig and Sqoop with Oozie
Integrating Map-reduce,Pig,Hive with Oozie
Running Coordinator Jobs
    • Based on Particular time
    • Based on Data
    • Availability
Working on Amazon dataset with advance
map-reduce concept, Integrated with HBase,
scheduled through oozie workflow.
MySQL Installation on Linux
Oracle Installation on Linux
Some assignments on ElasticSearch
Working with Maven
Using Junits
Eclipse Debugging
Java Best practices
Basic of servlets, Servlet api

Enquiry Form

Workshop / Training Provided On