Apache Sqoop for Data Ingestion into Hadoop HDFS

Learn how to use Sqoop Import, Sqoop Export and other important Sqoop Commands to ingest data into HDFS from Databases

As part of this course, you will learn how to use Sqoop to ingest data into Hadoop HDFS using commands such as Sqoop Import, Sqoop Export, etc.

Getting Started with Sqoop

  • Introduction to Sqoop

  • Validate Source Database – MySQL

  • Review JDBC Jar to Connect to MySQL

  • Getting Help using Sqoop CLI

  • Overview of Sqoop User Guide

  • Validate Sqoop and MySQL Integration using Sqoop List Databases

  • Listing Tables in Database using Sqoop

  • Run Queries in MySQL using Sqoop Eval

  • Understanding Logs in Sqoop

  • Redirecting Sqoop Job Logs into Log Files

Importing data from MySQL to HDFS using Sqoop Import

  • Overview of Sqoop Import Command

  • Import Orders using target-dir

  • Import Order Items using warehouse-dir

  • Managing HDFS Directories

  • Sqoop Import Execution Flow

  • Reviewing Logs of Sqoop Import

  • Sqoop Import Specifying Number of Mappers

  • Review the Output Files generated by Sqoop Import

  • Sqoop Import Supported File Formats

  • Validating avro files using Avro Tools

  • Sqoop Import Using Compression

Apache Sqoop – Importing Data into HDFS – Customizing

  • Introduction to customizing Sqoop Import

  • Sqoop Import by Specifying Columns

  • Sqoop import Using Boundary Query

  • Sqoop import while filtering Unnecessary Data

  • Sqoop Import Using Split By to distribute import using non default column

  • Getting Query Results using Sqoop eval

  • Dealing with tables with Composite Keys while using Sqoop Import

  • Dealing with tables with Non Numeric Key Fields while using Sqoop Import

  • Dealing with tables with No Key Fields while using Sqoop Import

  • Using autoreset-to-one-mapper to use only one mapper while importing data using Sqoop from tables with no key fields

  • Default Delimiters used by Sqoop Import for Text File Format

  • Specifying Delimiters for Sqoop Import using Text File Format

  • Dealing with Null Values using Sqoop Import

  • Import Mulitple Tables from source database using Sqoop Import

Importing data from MySQL to Hive Tables using Sqoop Import

  • Quick Overview of Hive

  • Create Hive Database for Sqoop Import

  • Create Empty Hive Table for Sqoop Import

  • Import Data into Hive Table from source database table using Sqoop Import

  • Managing Hive Tables while importing data using Sqoop Import using Overwrite

  • Managing Hive Tables while importing data using Sqoop Import – Errors Out If Table Already Exists

  • Understanding Execution Flow of Sqoop Import into Hive tables

  • Review Files generated by Sqoop Import in Hive Tables

  • Sqoop Delimiters vs Hive Delimiters

  • Different File Formats supported by Sqoop Import while importing into Hive Tables

  • Sqoop Import all Tables into Hive from source database

Exporting Data from HDFS/Hive to MySQL using Sqoop Export

  • Introduction to Sqoop Export

  • Prepare Data for Sqoop Export

  • Create Table in MySQL for Sqoop Export

  • Perform Simple Sqoop Export from HDFS to MySQL table

  • Understanding Execution Flow of Sqoop Export

  • Specifying Number of Mappers for Sqoop Export

  • Troubleshooting the Issues related to Sqoop Export

  • Merging or Upserting Data using Sqoop Export – Overview

  • Quick Overview of MySQL – Upsert using Sqoop Export

  • Update Data using Update Key using Sqoop Export

  • Merging Data using allowInsert in Sqoop Export

  • Specifying Columns using Sqoop Export

  • Specifying Delimiters using Sqoop Export

  • Using Stage Table for Sqoop Export

Submitting Sqoop Jobs and Incremental Sqoop Imports

  • Introduction to Sqoop Jobs

  • Adding Password File for Sqoop Jobs

  • Creating Sqoop Job

  • Run Sqoop Job

  • Overview of Incremental Loads using Sqoop

  • Incremental Sqoop Import – Using Where

  • Incremental Sqoop Import – Using Append Mode

  • Incremental Sqoop Import – Create Table

  • Incremental Sqoop Import – Create Sqoop Job

  • Incremental Sqoop Import – Execute Job

  • Incremental Sqoop Import – Add Additional Data

  • Incremental Sqoop Import – Rerun Job

  • Incremental Sqoop Import – Using Last Modified

Exercises will be provided to have enough practice to get better at Sqoop as well as writing queries using Hive and Impala.

All the demos are given on our state-of-the-art Big Data cluster. If you do not have multi-node cluster, you can sign up for our labs and practice on our multi-node cluster. You will be able to practice Sqoop and Hive on the cluster.

Course Information

Tags: ,

Course Instructor

Courseis.is
Courseis.is Author

Find what your next course is. We will help you find course, get skilled, and get hired.

This course does not have any sections.

Course Information

Tags: ,