RKCP Information Technology
                                                                                                          Hadoop
Home » Courses   » Hadoop

Hadoop

Learn HDFS, Hadoop Administration & Maitenance, Job Scheduling, Map Reduce, Into To Hive, Hbase, Flume, Sqoop, Oozie & Pig in Big Data Hadoop Tutorial/Training & Certification-LIVE Online & Classroom sessions.

Overview

Apache Hadoop the open source data management software that helps organizations analyze massive volumes of structured and unstructured data is a very hot topic across the tech industry. This course enables you to use this technology and to become industry ready. Out of attending this course a developer\/architect can use Apache Hadoop with full confidence.


Objective

In this course, the partcipants will learn:

  • What is Big Data
  • What is Hadoop and why is it important
  • Hadoop Distributed File System (HDFS)
  • Hadoop Deployment
  • Hadoop Administration and Maintenance
  • Map-Reduce
  • Hive, Hbase, Flume, Sqoop, Oozie and Pig

Suggested Audience:

Developers, Architects, System Engineers

Duration - 3 Days


Prerequisites

Some programing and database experience


Syllabus

Expand All
  • 1. Introduction to BigData
    • Which data is called as BigData
    • What are business use cases for BigData
    • BigData requirement for traditional Data warehousing and BI space
    • BigData solutions
  • 2. Introduction to Hadoop
    • The amount of data processing in today's life
    • What Hadoop is why it is important
    • Hadoop comparison with traditional systems
    • Hadoop history
    • Hadoop main components and architecture
  • 3. Hadoop Distributed File System (HDFS)
    • HDFS overview and design
    • HDFS architecture
    • HDFS file storage
    • Component failures and recoveries
    • Block placement
    • Balancing the Hadoop cluster
  • 4. Hadoop Deployment
    • Different Hadoop deployment types
    • Hadoop distribution options
    • Hadoop competitors
    • Hadoop installation procedure
    • Distributed cluster architecture
    • Lab: Hadoop Installation
  • 5. Working with HDFS
    • Ways of accessing data in HDFS
    • Common HDFS operations and commands
    • Different HDFS commands
    • Internals of a file read in HDFS
    • Data copying with 'distcp'
    • Lab: Working with HDFS
  • 6. Hadoop Cluster Configuration
    • Hadoop configuration overview and important configuration file
    • Configuration parameters and values
    • HDFS parameters
    • MapReduce parameters
    • Hadoop environment setup
    • 'Include' and 'Exclude' configuration files
    • Lab: MapReduce Performance Tuning
  • 7. Hadoop Administration and Maintenance
    • Namenode/Datanode directory structures and files
    • Filesystem image and Edit log
    • The Checkpoint Procedure
    • Namenode failure and recovery procedure
    • Safe Mode
    • Metadata and Data backup
    • Potential problems and solutions / What to look for
    • Adding and removing nodes
    • Lab: MapReduce Filesystem Recovery
  • 8. Job Scheduling
    • How to schedule Hadoop Jobs on the same cluster
    • Default Hadoop FIFO Schedule
    • Fair Scheduler and its configuration
  • 9. Map-Reduce Abstraction
    • What MapReduce is and why it is popular
    • The Big Picture of the MapReduce
    • MapReduce process and terminology
    • MapReduce components failures and recoveries
    • Working with MapReduce
    • Lab: Working with MapReduce
  • 10. Programming MapReduce Jobs
    • Java MapReduce implementation
    • Map() and Reduce() methods
    • Java MapReduce calling code
    • Lab: Programming Word Count
  • 11. Input/Output Formats and Conversion Between Different Formats
    • Default Input and Output formats
    • Sequence File structure
    • Sequence File Input and Output formats
    • Sequence File access via Java API and HDS
    • MapFile
    • Lab: Input Format
    • Lab: Format Conversion
  • 12. MapReduce Features
    • Joining Data Sets in MapReduce Jobs
    • How to write a Map-Side Join
    • How to write a Reduce-Side Join
    • MapReduce Counters
    • Built-in and user-defined counters
    • Retrieving MapReduce counters
    • Lab: Map-Side Join
    • Lab: Reduce-Side Join
  • 13. Introduction to Hive, Hbase, Flume, Sqoop, Oozie and Pig
    • Hive as a data warehouse infrastructure
    • Hbase as the Hadoop Database
    • Using Pig as a scripting language for Hadoop
  • 14. Hadoop Case studies
    • How different organizations use Hadoop cluster in their infrastructure