Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark

Preview this course

A complete course on Sqoop, Flume, and Hive: Ideal for achieving CCA175 and Hortonworks Spark Certification.

Unlimited access to 750+ courses.
Enjoy a Free Trial. Cancel Anytime.

- OR -

30-Day Money-Back Guarantee
Full Lifetime Access.
76 on-demand videos & exercises
Level: Beginner
English
5hrs 38mins
Access on mobile, web and TV

What to know about this course

In this course, you will start by learning about the Hadoop Distributed File System (HDFS) and the most common Hadoop commands required to work with HDFS. Next, you’ll be introduced to Sqoop Import, which will help you gain insights into the lifecycle of the Sqoop command and how to use the import command to migrate data from MySQL to HDFS, and from MySQL to Hive. In addition to this, you will get up to speed with Sqoop Export for migrating data effectively, along with using Apache Flume to ingest data. As you progress, you will delve into Apache Hive, external and managed tables, working with different files, and Parquet and Avro. Toward the concluding section, you will focus on Spark DataFrames and Spark SQL. By the end of this course, you will have gained comprehensive insights into big data ingestion and analytics with Flume, Sqoop, Hive, and Spark. All code and supporting files are available at - https://github.com/PacktPublishing/Master-Big-Data-Ingestion-and-Analytics-with-Flume-Sqoop-Hive-and-Spark

Who's this course for?

This course is for anyone who wants to learn Sqoop and Flume or those looking to achieve CCA and HDP certification.

What you'll learn

  • Explore the Hadoop Distributed File System (HDFS) and commands.
  • Get to grips with the lifecycle of the Sqoop command.
  • Use the Sqoop Import command to migrate data from MySQL to HDFS and Hive.
  • Understand split-by and boundary queries Use the incremental mode to migrate data from MySQL to HDFS.
  • Employ Sqoop Export to migrate data from HDFS to MySQL.
  • Discover Spark DataFrames and gain insights into working with different file formats and compression.

Key Features

  • Learn Sqoop, Flume, and Hive and successfully achieve CCA175 and Hortonworks Spark Certification.
  • Understand the Hadoop Distributed File System (HDFS), along with exploring Hadoop commands to work effectively with HDFS.

Course Curriculum

About the Author

Navdeep Kaur

Navdeep Kaur - Technical Trainer Navdeep Kaur is a big data professionals with 11 years of industry experience in different technologies and domains. She has a keen interest in providing training in new technologies. She has received CCA175 Hadoop and Spark developer certification and AWS solution architect certification. She loves guiding people and helping them achieves new goals.