TueAM3: Introduction to Hadoop and Big Data Technologies

This course is designed to provide a rapid immersion into Big Data with Hadoop. We start with an introduction to the Hadoop cluster and teach the ways to interact with the Hadoop file system and the cluster. We also introduce Hive and Pig popular higher level interfaces to managing data in the Hadoop system. And finally we discuss YARN Upon completion, attendees will understand:

- Big Data concepts and technologies - Map Reduce concepts - The Hadoop file system - Hive and Pig for productive data management and development

Big Data and Hadoop: A quick dive

Defining Big Data
Problems with conventional systems
Map Reduce algorithm
Traditional database applications
Hadoop

MapReduce

What is MapReduce?
Relevance of MapReduce to Big Data
Map operation
Reduce operation

Hadoop

What is Hadoop?
The Hadoop architecture
Hadoop Distributed File System

Hadoop Distributed File System (HDFS)

HDFS Architecture
HDFS API
Scalability
Data replication

Hadoop Applications

Typical Hadoop algorithms
Best practices for Hadoop

YARN

What is YARN and why is it significant?
YARN and Tez

Class Reviews:

"Extremely knowledgeable, answered every question."

"I thought Vladimir's presentation was excellent. He covered a lot of information in a straight forward + understandable way."

"Excellent at introducing a novice to Hadoop environment."

"Vladimir was right on target as to what I needed to get out of the tutorial."

Dr. Vladimir Bacvanski has over two decades of engineering experience with mission critical and distributed enterprise systems and data technologies. Vladimir has helped a number of companies including the US Treasury, the Federal Reserve Bank, the US Navy, IBM, Dell, Hewlett Packard, JP Morgan Chase, General Electric, BAE Systems, AMD, and others to select, transition to, and apply new software and data technologies.

Vladimir is published worldwide and is a keynote speaker, session chair, and workshop organizer at leading industry events. As a founder of SciSpike, Vladimir is focusing on Big Data technologies and highly scalable reactive software architectures with node.js and Scala. Vladimir is the author of the O'Reilly course on Big Data and NoSQL.