Get Started with Hadoop and Bigdata

Course Info

Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was originally developed to support distribution for the Nutch search engine project. Doug, who was working at Yahoo! at the time and is now Chief Architect of Cloudera, named the project after his son's toy elephant.

Hadoop Common: contains libraries and utilities needed by other Hadoop modules

Hadoop Distributed File System (HDFS): a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster

Hadoop YARN: a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications

Hadoop MapReduce: a programming model for large scale data processing

All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.

Beyond HDFS, YARN and MapReduce, the entire Apache Hadoop "platform" is now commonly considered to consist of a number of related projects as well: Apache Pig, Apache Hive, Apache HBase, and others.


You are supposed to have a good Internet connection to watch the videos otherwise you need to buffer the videos first, in case of poor Internet connection. Please login as guest on this portal and take a demo to test if your Internet speed is sufficient or not. We recommend internet connections of atleast 1 Mbps to watch video lectures seamlessly.

System Requirements

Operating System: Linux(Ubuntu >14.04)

RAM: minimum of 4Gb

Courses syllabus
1. Introduction to Big Data and Hadoop
1.1 Introduction to Big Data
1.2 Challenge with distributed systems
1.3 Introduction to Hadoop
2. Hdfs(Hadoop distributed file system)
2.1 Introduction HDFS.
2.2 Design of HDFS.
2.3 Architecture of HDFS .
2.4 Anatomy of file read (
2.5 Anatomy of file write
2.6 Name node data structure
2.7 Secondary name node
2.8 Checkpoint node
2.9 Backup node
2.10 HDFS high availability
2.11 Installation of HDFS
2.12 Web interferance overview Page
2.13 Introduction with hdfs commond line interface
2.14 HDFS Permissions
2.15 List of commands
2.16 HDFS federation
2.17 Configuration files overview
2.18 Java API for HDFS
2.19 Working with HDFS
2.20 Assignment-1
2.21 Assignment-2
2.22 Assignment-3
3. Map Reduce
3.1 Introduction to map Reduce.
3.2 Map-Reduce architecture.
3.3 DataFlow of Map-Reduce.
3.4 Map-Reduce driver .
3.5 Combiners.
3.6 Partitioners.
3.7 Input Formats.
3.8 Output Formats.
3.9 Shuffle and Sort.
3.10 Mapside Joins.
3.11 Reduce Side Joins.
3.12 Counters.
3.13 MRUnit.
3.14 Distributed Cache.
3.15 Configuration files overview.
3.16 WordCount Program.
3.17 Sample Program to use custom Partioner.
3.18 Assignment-1.
3.19 Assignment-2.
3.20 Assignment-3.
3.21 Assignment-4.
4.1 ResourceManager
4.2 NodeManager
4.3 Difference between 1.x and 2.x
4.4 Assignment-1
4.5 Assignment-2
5. Setting up Hadoop cluster
5.1 Cluster setup and Installation
5.2 Hadoop Configuration
5.3 Security
6. Adminstering Hadoop
6.1 HDFS persistent data structures
6.2 HDFS Tuning
6.3 Monitoring
7.1 Understanding Hive
7.2 Hive Data types
7.3 Loading and Querying Data in Hive
7.4 Running hive scripts
7.5 Hive UDF
7.6 Assignment-1
7.7 Assignment-2
8. Pig
8.1 Understanding pig
8.2 Pig latin
8.3 User defined funtions
8.4 Data processing operators
8.5 Assignment-1
8.6 Assignment-1
9. Sqoop
9.1 Getting Sqoop
9.2 Sqoop connectors
9.3 Sample Import
10. Flume
10.1 Installing flume
10.2 Transactions and Reliability
10.3 The HDFS Sink
10.4 Fan out
10.5 Distribution: Agent Tiers
10.6 Sink groups
10.7 Integrating with Applications
11. HBase Basics
11.1 Concepts
11.2 Installation
11.3 Java client
11.4 HBase vs RDBMS
12. ZooKeeper
12.1 Installing and Running ZooKeeper
12.2 ZooKeeper Service
12.3 ZooKeeper in production
13. Course Project

Please note that the videos are not downloadable. Sharing your access or trying to sell or distribute videos is a legally punishable offense. Earlier we caught some people doing this and they were punished legally and a huge penalty was imposed on them.