The Ultimate Hands-On Hadoop

Preview this course

Grasp the skills needed to design distributed systems to manage big data.

This course will show you why Hadoop is one of the best tools to work with big data. With the help of some real-world data sets, you will learn how to use Hadoop and its distributed technologies, such as Spark, Flink, Pig, and Flume, to store, analyze, and scale big data.

Unlimited access to 750+ courses.
Enjoy a Free Trial. Cancel Anytime.

- OR -

30-Day Money-Back Guarantee
Full Lifetime Access.
104 lessons and on-demand videos
Level: Intermediate
English
14hrs 39mins
Access on mobile, web and TV

What to know about this course

Understanding Hadoop is a highly valuable skill for anyone working at companies that work with large amounts of data. Companies such as Amazon, eBay, Facebook, Google, LinkedIn, IBM, Spotify, Twitter, and Yahoo, use Hadoop in some way to process huge chunks of data. This video course will make you familiar with Hadoop's ecosystem and help you to understand how to apply Hadoop skills in the real world. The course starts by taking you through the installation process of Hadoop on your desktop. Next, you will manage big data on a cluster with Hadoop Distributed File System (HDFS) and MapReduce, and use Pig and Spark to analyze data on Hadoop. Moving along, you will learn how to store and query your data using applications, such as Sqoop, Hive, MySQL, Phoenix, and MongoDB. Next, you will design real-world systems using the Hadoop ecosystem and learn how to manage clusters with Yet Another Resource Negotiator (YARN), Mesos, Zookeeper, Oozie, Zeppelin, and Hue. Towards the end, you will uncover the techniques to handle and stream data in real-time using Kafka, Flume, Spark Streaming, Flink, and Storm. By the end of this course, you will become well-versed with the Hadoop ecosystem and will develop the skills required to store, analyze, and scale big data using Hadoop.

Who's this course for?

This video course is designed for people at every level; whether you are a software engineer or a programmer who wants to understand the Hadoop ecosystem, or a project manager who wants to become familiar with the Hadoop's lingo, or a system architect who wants to understand the components available in the Hadoop system.


To get started with this course, a basic understanding of Python or Scala and ground-level knowledge of the Linux command line are recommended.

What you'll learn

  • Become familiar with Hortonworks and the Ambari User Interface (UI).
  • Use Pig and Spark to create scripts to process data on a Hadoop cluster.
  • Analyze non-relational data using HBase, Cassandra, and MongoDB.
  • Query data interactively with Drill, Phoenix, and Presto Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume.
  • Consume streaming data using Spark Streaming, Flink, and Storm.

Key Features

  • Get to grips with the high-level architecture of Hadoop.
  • Understand the components available in the Hadoop ecosystem, and how they fit together.
  • Get ready to manage big data using Hadoop and related technologies.

Course Curriculum

About the Author

Frank Kane

Frank Kane has spent nine years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers all the time. He holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology and teaches others about big data analysis.

40% OFF! Unlimited Access to 750+ Courses. Redeem Now.