BIG DATA & HADOOP



Big Data is emerging as a significant source of business because of Hadoop, Spark and NoSQL technologies to accelerate big data processing. Every day, the world generates 2.99 quintillion bytes of information. The data are growing exponentially on Customer data, sales data, and stocks data, Email, social network links, and instant messages spew from a billion personal devices. Still more data is being collected in the format of Text, photos, music, and video divide and multiply in constant digital world. That’s Big Data. Big Data solutions are designed to capture, process, store, and analyze data so that the right person gets the right information, at the right time.

Course Overview


This course will provide support on Preparing Hadoop Pre-Installed Environment for the industry requirement where everyone can work with the set of technology tools (and analysis techniques) that are built on these "Big Data" environments.


Deep understating about Hadoop Distributed file system or HDFS.


Providing specific privileges to a user that enables that user to administer Ambari.


This course offers you to learn about data fundamentals using Office 365 Excel, MySQL, PostgreSQL, MongoDB very detailed with real time data. Users can learn so many practical applications of pivot tables and Formulas, Function, Queries, Filtering data, String operations, Constraints, Partitioning and Charting.


Also this course offers you the deep knowledge into Data ingestion, Data transformation and Data analysis such important role on Sqoop in Hadoop ecosystem.


Understating and developing a software framework that allows process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.


Implementing the advanced concept of Pig as a boon for programmers who were not good with Java or Python.


Implementing the advanced concept of Hive data warehouse system which used for analyzing structured and semi-structured data.


Understanding the features of Flume tool for data ingestion in HDFS. This course will provide a fundamental of Storm and Kafka use.


Implementing the advanced concept Apache Spark and Scala for parallel processing and data analytics applications across clustered systems.


Enterprises are now looking to leverage the big data environment require Big Data Architect who can design and build large-scale development and deployment of Hadoop applications.

Who Need This Training?


Fresher’s / Experienced / Diploma / Graduate /Post-Graduate in any Stream.

DURATION


450 Hrs (2hrs/Day 12 Months, 4hrs/Day 6 Months or 8hrs/Day 3 Months).

course Outline

Big Data and Hadoop Fundamentals

Overview to Big Data and Hadoop

  • Introduction to Big Data and Hadoop
  • The Five Vs of Big Data
  • Six Key Hadoop Data Types
  • About Hadoop
  • Hadoop Distributions

Hadoop Pre-Installation Environment Setup

  • Meet Minimum System Requirements
  • Install Vmware Workstation
  • Importing Sandbox in VMWare
  • Accessing HDP Sandbox Welcome Page through Browser
  • Accessing Sandbox Welcome Page through Putty
  • Single-Node Installation
  • Multi-Node Installation
  • Hive Installation
  • Pig Installation
  • Sqoop Installation
  • Mysql Installation
  • Postgresql Installation

Overview to HDFS

  • Hadoop vs. RDBMS
  • The HDP Hadoop Ecosystem
  • HDFS Components

HDFS Commands

  • HDFS Shell Command Phase-1
  • HDFS Shell Command Phase-2
  • HDFS Shell Command Phase-3

Apache Ambari

  • Overview: Ambari
  • Ambari Server Architecture
  • Explore Ambari
  • Managing Hosts
  • Managing Users and Groups

Data Fundamentals

Office 365 Excel

  • Working With Microsoft Excel 2016
  • Using Basic Formulas
  • Using Functions
  • Managing Worksheets
  • Using Advanced Formulas
  • Creating Charts

MySQL

  • Introduction to MySQL
  • Data Types
  • Data Manipulation Language (DML)
  • Data Definition Language (DDL)
  • Data Control Language (DCL)
  • Transaction Control Language (TCL)
  • Joins
  • UNION
  • String operations
  • Backup using mysqldump
  • LOAD DATA INFILE

PostgreSQL

  • What is PostgreSQL?
  • PostgreSQL native data types
  • Basic SQL Commands
  • Filtering data
  • PostgreSQL Constraints
  • PostgreSQL Modifying Tables
  • PostgreSQL table partitioning

Data ingestion, Data transformation and Data analysis

Apache Sqoop

  • Introduction to Sqoop
  • Key Features of Sqoop
  • Sqoop Architecture & Working
  • The Sqoop Import Tool
  • The Sqoop Export Tool

MapReduce and Apache Tez

  • Introduction to Hadoop MapReduce
  • What MapReduce Does
  • Understanding MapReduce
  • MapReduce Example Program
  • Introduction to Apache TEZ
  • YARN Administration Access controls

Apache Pig

  • What is Apache Pig?
  • What is Apache Pig Latin?
  • Pig Data Types
  • LOAD
  • FOREACH
  • FILTER
  • JOIN
  • ORDER BY
  • CASE
  • DISTINCT
  • FLATTEN
  • STORE
  • GROUP
  • GROUP ALL
  • COGROUP
  • CROSS
  • LIMIT
  • SPLIT

Apache Hive

  • Introduction to Apache Hive
  • Hive Architecture
  • Hive SQL Datatypes
  • Hive SQL Semantics
  • Command line Function Hive
  • Hive Shell Function
  • Hive CLI
  • Loading Data into a Hive Table
  • Performing Queries
  • Data Manipulation Language (DML )
  • Insert Command
  • Aggregation
  • Join Operation
  • Left Outer Join
  • Right Outer Join
  • Full Join
  • Hive Partitions
  • Hive Buckets
  • Skewed Tables
  • Using Distribute By

Apache-Spark-Scala

  • Introduction to Spark
  • Common use case of Spark
  • Hadoop vs Spark
  • Why Learn Spark
  • Introduction to Scala
  • Getting started with Scala
  • Object Oriented Concept
  • Scala function
  • Scala Advance Function
  • Collections
  • Introduction to RDD
  • Creating RDD
  • RDD Partiton
  • RDD Environment
  • RDD Operation
  • RDD Caching and Persistence
  • Transformation Programming
  • Action Programming
  • WordCount Execution Plan
  • Types of RDD
  • Introduction to SparkSQL
  • SparkSQL Architecture
  • DataFrame Operations
  • SQL Schema Inference
  • Working with Hive Context
  • Working with JSON
  • Life cycle of Streaming Application
  • Transformation on DStream
  • SparkR Structured Streaming
  • Spark GraphX API