Hadoop Bigdata Tarining Cochin ErnakulamTraining in ernakualm

What we do

Our Services

Description

Hadoop Big Data Training

It is a comprehensive Hadoop Big Data course designed by industry experts considering current industry job requirements to provide in-depth learning on big data and Hadoop Modules. This is an industry recognized training course that is a combination of the training courses in Hadoop developer, Hadoop administrator, Hadoop testing, and analytics. This Cloudera Hadoop training will prepare you to clear big data certification.

1. Introduction to Big Data & Hadoop, Hadoop Ecosystem, Map Reduce and HDFS

Topics – Introduction of Hadoop, Problems with data growth, Solving Data Problems, Hadoop Overview, Understanding Mapreduce, Setting the stage for big data problem solving with MapReduce, Parallel Copying with Hadoop distcp, Hadoop fs, Hadoop Archives

2. Introduction to HDFS

Topics – Introduction to Distributed File System, What is Hadoop Distributed file System (HDFS) , HDFS Design Principle & Failure, HDFS Architecture High Availability Mode and Federated Mode, Overall Architecture of HDFS, HDFS Demons, Basic HDFS Commands, Understanding Map Reduce, Hadoop Architecture, Difference between MR1 and MR2, What is YARN, Yarn jobs, Resource Management.

3. Hadoop Installation & setup

Topics – Hadoop 2.x Cluster Architecture , Federation and High Availability, A Typical Production Hadoop Cluster, Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Cloudera Single node cluster

4. Introduction to Map Reduce

Topics – What is Hadoop Map Reduce and examples, Conceptual Understanding between Map and Reduce, Anatomy of a YARN Application Run, YARN MR Application Execution Flow, YARN Workflow,Write a Map Reduce Programme using Hadoop Framework

5. Deep Dive in Map Reduce

Topics – What is Functional Programming, Difference between Functional and Imperative Programming, What is Mapping, What is Reducer, Phase of Map and Reduce,Combiner , Partitioner, Shuffle & Sort Phase, Map reduce job submission flow, Map Reduce Types- Input and Output Formats, Custom Formats, Hadoop APIs, exercise on Input and Output Format, Task Execution, Hadoop commands , Map Reduce Features : Counters, Sorting, Reduce Joins, Side Data Distribution ,Map Reduce Library Classes, Hadoop Streaming, Aggregate Data, Example of calculating time a user has spent on an Activity.

6. Problem Solving using Map Reduce:

Topics – Map Reduce Problem Statement, Hadoop Mapper, Mapper Problem, How to Handle Multiple Mapper, Multiple Inputs,Working with Multiple Input Formats

7. Deep Dive in Pig

A. Introduction to Pig Topics – What Is Pig?, Pig’s Features, Pig Use Cases, Interacting with Pig B. Basic Data Analysis with Pig Topics – Pig Latin Syntax, Loading Data, Simple Data Types, Field Definitions, Data Output, Viewing the Schema, Filtering and Sorting Data, Commonly-Used Functions, Hands-On Exercise: Using Pig for ETL Processing C. Processing Complex Data with Pig Topics – Complex/Nested Data Types, Grouping, Iterating Grouped Data, Hands-On Exercise: Analyzing Data with Pig D. Multi-Data set Operations with Pig Topics – Techniques for Combining Data Sets, Joining Data Sets in Pig, Set Operations, Splitting Data Sets, Hands-On Exercise E. Extending Pig Topics – Macros and Imports, UDFs, Using Other Languages to Process Data with Pig, Hands-On Exercise: Extending Pig with Streaming and UDFs F. Pig Jobs Case studies of Fortune 500 companies which are Electronic Arts and Walmart with real data sets.

8. Deep Dive in Hive

A. Introduction to Hive Topics – What Is Hive?, Hive Schema and Data Storage, Comparing Hive to Traditional Databases, Hive vs. Pig, Hive Use Cases, Interacting with Hive B. Relational Data Analysis with Hive Topics – Hive Databases and Tables, Basic HiveQL Syntax, Data Types, Joining Data Sets, Common Built-in Functions,Hands-on Exercise: Running Hive Queries on the Shell, Scripts, and Hue C. Hive Data Management Topics – Hive Data Formats, Creating Databases, Modeling in Hive and Hive-Managed Tables, Loading Data into Hive, Altering Databases and Tables, Self-Managed Tables, Simplifying Queries with Views, Storing Query Results, Controlling Access to Data, Hands-On Exercise: Data Management with Hive, Thrift server, Meta store in Hive, D. Hive Optimization Topics – Understanding Query Performance, Partitioning, Bucketing, Indexing Data E. Extending Hive Topics – User-Defined Functions in Hive F. Hands on Exercises – Playing with huge data and Querying extensively. G. User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning

9. (AVRO)Data Formats

Topics – Selecting a File Format, Hadoop Tool Support for File Formats, Avro Schemas, Using Avro with Hive and Sqoop, Avro Schema Evolution, Compression

10. Introduction to Hbase architecture

Topics – What is Hbase, Where does it fits, What is NOSQL

11. Apache Spark

A. Why Spark? Explain Spark and Hadoop Distributed File System Topics – What is Spark, Comparison with Hadoop, Components of Spark B. Spark Components, Common Spark Algorithms-Iterative Algorithms, Graph Analysis, Machine Learning Topics – Apache Spark- Introduction, Consistency, Availability, Partition, Unified Stack Spark, Spark Components, Comparison with Hadoop – Scalding example, mahout, storm, graph C. Running Spark on a Cluster, Writing Spark Applications using Python, Java, Scala Topics – Explain python example, Show installing a spark, Explain driver program, Explaining spark context with example, Define weakly typed variable, Combine scala and java seamlessly, Explain concurrency and distribution., Explain what is trait, Explain higher order function with example, Define OFI scheduler, Advantages of Spark, Example of Lamda using spark, Explain Mapreduce with example

12. Major Project – Putting it all together and Connecting Dots

Topics – Putting it all together and Connecting Dots, Working with Large data sets, Steps involved in analyzing large data

13. ETL Connectivity with Hadoop Ecosystem

Topics – How ETL tools work in Big data Industry, Connecting to HDFS from ETL tool and moving data from Local system to HDFS, Moving Data from DBMS to HDFS, Working with Hive with ETL Tool, Creating Map Reduce job in ETL tool End to End ETL PoC showing Hadoop integration with ETL tool.

14. Hadoop Cluster Configuration

Topics – Hadoop configuration overview and important configuration file, Configuration parameters and values, HDFS parameters MapReduce parameters, Hadoop environment setup, ‘Include’ and ‘Exclude’ configuration files,

15. Hadoop Administration and Maintenance

Topics – Namenode/Datanode directory structures and files, File system image and Edit log, The Checkpoint Procedure, Namenode failure and recovery procedure, Safe Mode, Metadata and Data backup, Potential problems and solutions / what to look for, Adding and removing nodes, Lab: MapReduce File system Recovery

16. Hadoop Monitoring and Troubleshooting

Topics – Best practices of monitoring a Hadoop cluster, Using logs and stack traces for monitoring and troubleshooting, Using open-source tools to monitor Hadoop cluster

17. ZOOKEEPER

Topics – ZOOKEEPER Introduction, ZOOKEEPER use cases, ZOOKEEPER Services, ZOOKEEPER data Model, Znodes and its types, Znodes operations, Znodes watches, Znodes reads and writes, Consistency Guarantees, Cluster management, Leader Election, Distributed Exclusive Lock, Important points

18. Advance Oozie

Topics – Why Oozie?, Installing Oozie, Running an example, Oozie- workflow engine, Example M/R action, Word count example, Workflow application, Workflow submission, Workflow state transitions, Oozie job processing, Oozie Hadoop security, Why Oozie security?, Job submission to hadoop, Multi tenancy and scalability, Time line of Oozie job, Coordinator, Bundle, Layers of abstraction, Architecture, Use Case 1: time triggers, Use Case 2: data and time triggers, Use Case 3: rolling window

19. Advance Flume

Topics – Overview of Apache Flume, Flume for Hadoop, Physically distributed Data sources, Changing structure of Data, Closer look, Anatomy of Flume, Core concepts, Event, Clients, Agents, Source, Channels, Sinks, Interceptors, Channel selector, Sink processor, Data ingest, Agent pipeline, Transactional data exchange, Routing and replicating, Why channels?, Use case- Log aggregation, Adding flume agent, Handling a server farm, Data volume per agent, Example describing a single node flume deployment

20. Hadoop Stack Integration Testing

Topics – Why Hadoop testing is important, Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end to end tests, Functional testing, Release certification testing, Security testing, Scalability Testing, Commissioning and Decommissioning of Data Nodes Testing, Reliability testing, Release testing

21. Roles and Responsibilities of Hadoop Testing

Topics – Understanding the Requirement, preparation of the Testing Estimation, Test Cases, Test Data, Test bed creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, HIVE, HBASE) while loading the input (logs/files/records etc) using sqoop/flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges etc), Report defects to the development team or manager and driving them to closure, Consolidate all the defects and create defect reports, Validating new feature and issues in Core Hadoop.

What we do

Our Services

Hadoop Big Data Training

Phone

Email