All LevelsBusinessBig Data

The Ultimate Hands-On Hadoop: Tame your Big Data!

Data Engineering and Hadoop tutorial with MapReduce, HDFS, Spark, Flink, Hive, HBase, MongoDB, Cassandra, Kafka + more!

Created by Sundog Education by Frank Kane, Frank Kane, Sundog Education Team

14.5 hours

Video Content

104

Lectures

187,751

Students

4.5

Rating

4.5

(187,751 students enrolled)

What you'll learn

✓Design distributed systems that manage "big data" using Hadoop and related data engineering technologies.

✓Use HDFS and MapReduce for storing and analyzing data at scale.

✓Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.

✓Analyze relational data using Hive and MySQL

✓Analyze non-relational data using HBase, Cassandra, and MongoDB

✓Query data interactively with Drill, Phoenix, and Presto

✓Choose an appropriate data storage technology for your application

✓Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.

✓Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume

✓Consume streaming data using Spark Streaming, Flink, and Storm

Course Content

12 sections • 104 lectures • 14:33:37 total length

Learn all the buzzwords! And install the Hortonworks Data Platform Sandbox.

9 lectures • 49:54

Udemy 101: Getting the Most From This Course02:10

Tips for Using This Course01:09

If you have trouble downloading Hortonworks Data Platform...00:29

Warning for Apple M1 users00:26

Installing Hadoop [Step by Step]17:44

+4 more lectures

Using Hadoop's Core: HDFS and MapReduce

12 lectures • 01:28:28

HDFS: What it is, and how it works13:53

Alternate MovieLens download location00:04

Installing the MovieLens Dataset06:20

[Activity] Install the MovieLens dataset into HDFS using the command line07:50

MapReduce: What it is, and how it works10:40

+7 more lectures

Programming Hadoop with Pig

7 lectures • 56:08

Introducing Ambari09:49

Introducing Pig06:25

Example: Find the oldest movie with a 5-star rating using Pig15:07

[Activity] Find old 5-star movies with Pig09:40

More Pig Latin07:34

+2 more lectures

Programming Hadoop with Spark

8 lectures • 01:14:34

Why Spark?10:06

The Resilient Distributed Dataset (RDD)10:13

[Activity] Find the movie with the lowest average rating - with RDD's15:33

Datasets and Spark 2.006:28

[Activity] Find the movie with the lowest average rating - with DataFrames10:00

+3 more lectures

Using relational data stores with Hadoop

10 lectures • 01:02:59

What is Hive?06:31

[Activity] Use Hive to find the most popular movie10:45

How Hive works09:10

[Exercise] Use Hive to find the movie with the highest average rating01:55

Compare your solution to mine.04:10

+5 more lectures

Using non-relational data stores with Hadoop

13 lectures • 02:28:35

Why NoSQL?13:54

What is HBase12:55

[Activity] Import movie ratings into HBase13:28

[Activity] Use HBase with Pig to import data at scale.11:19

Cassandra overview14:50

+8 more lectures

Querying your Data Interactively

9 lectures • 01:21:48

Overview of Drill07:55

[Activity] Setting up Drill10:58

[Activity] Querying across multiple databases with Drill07:07

Overview of Phoenix08:55

[Activity] Install Phoenix and query HBase with it07:02

+4 more lectures

Managing your Cluster

13 lectures • 01:59:29

YARN explained10:01

Tez explained04:56

[Activity] Use Hive on Tez and measure the performance benefit08:35

Mesos explained07:13

ZooKeeper explained13:10

+8 more lectures

Feeding Data to your Cluster

6 lectures • 54:47

Kafka explained09:48

[Activity] Setting up Kafka, and publishing some data.07:24

[Activity] Publishing web logs with Kafka10:21

Flume explained10:16

[Activity] Set up Flume and publish logs with it.07:46

+1 more lectures

Analyzing Streams of Data

8 lectures • 01:17:42

Spark Streaming: Introduction14:27

[Activity] Analyze web logs published with Flume using Spark Streaming14:20

[Exercise] Monitor Flume-published logs for errors in real time02:02

Exercise solution: Aggregating HTTP access codes with Spark Streaming04:24

Apache Storm: Introduction09:27

+3 more lectures

Designing Real-World Systems

7 lectures • 52:35

The Best of the Rest09:24

Review: How the pieces fit together06:29

Understanding your requirements08:02

Sample application: consume webserver logs and keep track of top-sellers10:06

Sample application: serving movie recommendations to a website11:18

+2 more lectures

Learning More

2 lectures • 06:32

Books and online resources05:32

Bonus Lecture: More courses to explore!01:00

Description

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!

Learn and master the most popular data engineering technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

Install and work with a real Hadoop installation right on your desktop with Hortonworks (now part of Cloudera) and the Ambari UI
Manage big data on a cluster with HDFS and MapReduce
Write programs to analyze data on Hadoop with Pig and Spark
Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto
Design real-world systems using the Hadoop ecosystem
Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue
Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

Spark and Hadoop developers are hugely valued at companies with large amounts of data; these are very marketable skills to learn.

Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM, Spotify, Twitter, and Yahoo! And it's not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.

You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines, we'll show you how to work with them too. And if you're a programmer, I'll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

You'll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end!

Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.

Knowing how to wrangle "big data" is an incredibly valuable skill for today's top tech employers. Don't be left behind - enroll now!

"The Ultimate Hands-On Hadoop... was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. " - Aldo Serrano
"I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment. This course helped me achieve a far greater understanding of the environment and its capabilities. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment." - Tyler Buck

Who this course is for:

Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend "big data" at scale.
Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.

Download Course

This course includes:

14.5 hours on-demand video
9 articles
2 downloadable resources
Access on mobile and TV
∞Full lifetime access
Certificate of completion

Instructors

Sundog Education by Frank Kane

Frank Kane

Sundog Education Team

Students also download

Explore related courses to expand your learning journey.

Java Tutorial for Complete Beginners course

All Levels

16 hours

Java Tutorial for Complete Beginners

Learn to program using the Java programming language

John Purcell

What you'll learn:

✓Learn to program in Java

16 hours

73 lectures

Access on mobile and TV

The Complete Python Bootcamp From Zero to Hero in Python

Learn Python like a Professional Start from the basics and go all the way to creating your own applications and games

Jose Portilla

What you'll learn:

✓You will learn how to leverage the power of Python to solve tasks.
✓You will build games and programs that use Python libraries.
✓You will be able to use Python for your own work problems or personal projects.
+9 more skills

22 hours

156 lectures

Access on mobile and TV

100 Days of Code: The Complete Python Pro Bootcamp

Master Python by building 100 projects in 100 days. Learn data science, automation, build websites, games and apps!

Dr. Angela Yu, Developer and Lead Instructor

What you'll learn:

✓You will master the Python programming language by building 100 unique projects over 100 days.
✓You will learn automation, game, app and web development, data science and machine learning all using Python.
✓You will be able to program in Python professionally
+6 more skills

56.5 hours

592 lectures

Access on mobile and TV

Certificate

DevelopmentPython

View All Courses