RKCP Information Technology




Cassandra
Home » Courses   » Cassandra

Pig And Hive

Students who attend this course will learn how to intuitively create MapReduce pipelines using Apache Pig, mine data sets using the Apache Hive query language (HQL), an SQL-like API for data warehousing, and execute near real-time MapReduce processing with Apache Storm

Overview

Create data processing pipelines with Pig,Answer data mining questions using Pig Latin,Who Should Attend.This course is intended for anyone wanting to understand how some of the major components of the Apache Hadoop MapReduce ecosystem work, including Apache Pig, Hive, and Storm.

Create and query a Big Data warehouse with Hive Learn how Apache Storm supports near real-time processing


Objective

Apache Cassandra is an open-source project and a distributed NoSQL database of second-generation. For higher availability and scalability of the database this is the best choice. Cassandra supports replication across multiple data-centers. It offers tunable consistency to make the write and read processes highly scalable.

This Apache Cassandra training provides an overview of the following:

  • Fundamentals of Big data and NoSQL database
  • Cassandra and the features provided by it.
  • Architecture of Cassandra and its data model
  • Installing, configuring, and monitoring Cassandra
  • Hadoop ecosystem of products around Cassandra


Prerequisites

  • Hadoop for MapReduce Applications
  • Knowledge of Java is preferred (Not mandatory for Developers)

Syllabus

Expand All
  • 1. Apache Pig
    • Declarative vs. Procedural
    • Role of Pig
    • Setting Up Pig
    • Loading and Working with Data
    • Writing a Pig Script
    • Executing Pig in Local and Hadoop Mode
    • Filtering Results
    • Storing, Loading, Dumping
  • 2. Getting the Most Out of Pig
    • Relations, Tuples, Fields
    • Pig Data Types
    • Tuples, Bags, and Maps
    • Flatten on Bags and Tuples
    • Join and Union
    • Regular Expressions
  • 3. Apache Hive
    • Hive as a Data Warehouse
    • Hive Architecture
    • Understanding the Hive Metastore and HCatalog
    • Interacting with Hive Using the Beeline Interface
    • Creating Hive Tables
    • Loading Text Data Files into Hive
    • Exploring the Hive Query Language
    • Partitions and Buckets
    • Built-in and Aggregation Functions
    • Invoking MapReduce Scripts from Hivee
    • Common File Formats for Big Data Processing
    • Creating Avro and Parquet Files with Hive
    • Creating Hive Tables from Pig
    • Accessing Hive Tables with the Spark SQL Shell
  • 4. Apache Storm
    • Processing Real-Time Streaming Data
    • Storm Architecture: Nimbus, Supervisors, and ZooKeeper
    • Application Design: Topologies, Spouts, and Bolts