Big Data & Hadoop

Big Data is emerging as a significant source of business because of Hadoop, Spark and NoSQL technologies to accelerate big data processing. Every day, the world generates 2.99 quintillion bytes of information. The data are growing exponentially on Customer data, sales data, and stocks data, Email, social network links, and instant messages spew from a billion personal devices. Still more data is being collected in the format of Text, photos, music, and video divide and multiply in constant digital world. That’s Big Data. Big Data solutions are designed to capture, process, store, and analyze data so that the right person gets the right information, at the right time.

Course Overview

This course will provide support on Preparing Hadoop Pre-Installed Environment for the industry requirement where everyone can work with the set of technology tools (and analysis techniques) that are built on these "Big Data" environments.

Deep understating about Hadoop Distributed file system or HDFS.

Providing specific privileges to a user that enables that user to administer Ambari.

This course offers you to learn about data fundamentals using Office 365 Excel, MySQL, PostgreSQL, MongoDB very detailed with real time data. Users can learn so many practical applications of pivot tables and Formulas, Function, Queries, Filtering data, String operations, Constraints, Partitioning and Charting.

Also this course offers you the deep knowledge into Data ingestion, Data transformation and Data analysis such important role on Sqoop in Hadoop ecosystem.

Understating and developing a software framework that allows process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.

Implementing the advanced concept of Pig as a boon for programmers who were not good with Java or Python.

Implementing the advanced concept of Hive data warehouse system which used for analyzing structured and semi-structured data.

Understanding the features of Flume tool for data ingestion in HDFS. This course will provide a fundamental of Storm and Kafka use.

Implementing the advanced concept Apache Spark and Scala for parallel processing and data analytics applications across clustered systems.

Enterprises are now looking to leverage the big data environment require Big Data Architect who can design and build large-scale development and deployment of Hadoop applications.

Who Need This Training?

Fresher’s / Experienced / Diploma / Graduate /Post-Graduate in any Stream.

DURATION

450 Hrs (2hrs/Day 12 Months, 4hrs/Day 6 Months or 8hrs/Day 3 Months).

course Outline

Big Data and Hadoop Fundamentals

Overview to Big Data and Hadoop

Introduction to Big Data and Hadoop
The Five Vs of Big Data
Six Key Hadoop Data Types
About Hadoop
Hadoop Distributions

Hadoop Pre-Installation Environment Setup

Meet Minimum System Requirements
Install Vmware Workstation
Importing Sandbox in VMWare
Accessing HDP Sandbox Welcome Page through Browser
Accessing Sandbox Welcome Page through Putty
Single-Node Installation
Multi-Node Installation
Hive Installation
Pig Installation
Sqoop Installation
Mysql Installation
Postgresql Installation

Overview to HDFS

Hadoop vs. RDBMS
The HDP Hadoop Ecosystem
HDFS Components

HDFS Commands

HDFS Shell Command Phase-1
HDFS Shell Command Phase-2
HDFS Shell Command Phase-3

Apache Ambari

Overview: Ambari
Ambari Server Architecture
Explore Ambari
Managing Hosts
Managing Users and Groups

Data Fundamentals

Office 365 Excel

Working With Microsoft Excel 2016
Using Basic Formulas
Using Functions
Managing Worksheets
Using Advanced Formulas
Creating Charts

MySQL

Introduction to MySQL
Data Types
Data Manipulation Language (DML)
Data Definition Language (DDL)
Data Control Language (DCL)
Transaction Control Language (TCL)
Joins
UNION
String operations
Backup using mysqldump
LOAD DATA INFILE

PostgreSQL

What is PostgreSQL?
PostgreSQL native data types
Basic SQL Commands
Filtering data
PostgreSQL Constraints
PostgreSQL Modifying Tables
PostgreSQL table partitioning

Data ingestion, Data transformation and Data analysis

Apache Sqoop

Introduction to Sqoop
Key Features of Sqoop
Sqoop Architecture & Working
The Sqoop Import Tool
The Sqoop Export Tool

MapReduce and Apache Tez

Introduction to Hadoop MapReduce
What MapReduce Does
Understanding MapReduce
MapReduce Example Program
Introduction to Apache TEZ
YARN Administration Access controls

Apache Pig

What is Apache Pig?
What is Apache Pig Latin?
Pig Data Types
LOAD
FOREACH
FILTER
JOIN
ORDER BY
CASE
DISTINCT
FLATTEN
STORE
GROUP
GROUP ALL
COGROUP
CROSS
LIMIT
SPLIT

Apache Hive

Introduction to Apache Hive
Hive Architecture
Hive SQL Datatypes
Hive SQL Semantics
Command line Function Hive
Hive Shell Function
Hive CLI
Loading Data into a Hive Table
Performing Queries
Data Manipulation Language (DML )
Insert Command
Aggregation
Join Operation
Left Outer Join
Right Outer Join
Full Join
Hive Partitions
Hive Buckets
Skewed Tables
Using Distribute By

Apache-Spark-Scala

Introduction to Spark
Common use case of Spark
Hadoop vs Spark
Why Learn Spark
Introduction to Scala
Getting started with Scala
Object Oriented Concept
Scala function
Scala Advance Function
Collections
Introduction to RDD
Creating RDD
RDD Partiton
RDD Environment
RDD Operation
RDD Caching and Persistence
Transformation Programming
Action Programming
WordCount Execution Plan
Types of RDD
Introduction to SparkSQL
SparkSQL Architecture
DataFrame Operations
SQL Schema Inference
Working with Hive Context
Working with JSON
Life cycle of Streaming Application
Transformation on DStream
SparkR Structured Streaming
Spark GraphX API

Quick Links

Contact Us

+91 702-202-0000 +91 080-4044-5566 info@rooman.net

BIG DATA & HADOOP - JOB GUARANTEED COURSES