Project 1 –Designing and Implementing film Database using Sqoop

In this project we will learn to install and play around with importing and exporting data with Sqoop. First we will import the MySQL World database tables into HDFS with default delimiters and using non-default file formats. Then we will practice by importing the film database tables into HDFS. Finally, we will export a parquet file on customer data from HDFS to MySQL.

Project 2 – Design  and Architecting Flume agent  to Ingest unstructured data from Twitter 
In this project we will build a Flume agent to ingest data from Directory spool source to HDFS. We will be using interceptors and channel selectors. We will then use the famous Twitter example and build a Flume agent to stream data from Twitter to HDFS.

Project 3 - Diagnostics and performance tuning of Movie lens dataset using Pig
In this project, we will learn about Apache Pig and how to use it to process the Movie lens dataset. We will get familiar with the various Pig operators used for data processing. We will cover how to use UDFs and write your own custom UDFs. Finally we will take a look at diagnostics and performance tuning.


Project 4 – Analysing and Processing of NYSE trading transactions data using Hive
In this project we will learn about Apache Hive - another popular processing framework. We will be using Hive to process the NYSE trading data on daily price. We will see the different file types and formats available and how to create and load data into Hive tables. Finally we will show how partitioning works in Hive.

Project 5 - Analysing and Querying Airline on-time performance dataset using Hive
We will dive deeper into Hive in this project. We will be working with the airline on-time performance dataset. As part of processing this data - we will learn about Joins. We will also learn how to use built in UDFs and also create your custom UDFs/UDAFs.


Project 6 – Implementing workflow and Coordination for Movie lens data using Oozie
Here we will be introducing the workflow manager - Oozie. We will learn how to create and run an oozie work flow for movie lens data processing pipeline. We will be using the Pig and Hive actions to setup this work flow. We will also be building Oozie coordinators using Time and data triggers.

 Project 7 – Implementing media analytics throughTwitter Feed Analysis  using Flume
This is our final project and will be using multiple tools we have learnt through the course. We will be loading live twitter feeds related to jobs advertisements. We will setup batch processing of tweets on hdfs and then extract the data to MySQL database. We will also use Oozie as workflow manager.

Project 8- Designing and Building E-Commerce Data Warehouse using Hadoop Technologies
we are going to be designing a data warehouse for a retail shop. The design and implementation, however, we focus on answering some specific questions that are related to price optimization and inventory allocation. The two questions we will be looking to answer in this project include:


  1. Were the higher priced items selling in certain markets?
  2. Should inventory be re-allocated or price optimized based upon geography?


We will recognize the entire purpose of answer these questions with data is to boost overall bottom line for the business while improving the experience for the shoppers.

Project 9 – Analysing Retail Data Analytics using Sqoop and Flume.


Retail Industry's business question is: What products do our customers like to buy? To answer this question, the first thought might be to look at the transaction data, which should indicate what customers actually do buy and like to buy, right?

This is probably something you can do in your regular RDBMS environment, but a benefit of Apache Hadoop is that you can do it at greater scale at lower cost, on the same system that you may also use for many other types of analysis. So we will using the un-structured data from the retail website using flume to get into deeper analysis

Project 10 – Analysing formal specifications of  Geolocation Data on  big data platform  using  Flume and Pig

In this project we will be analysing Geolocation data to show how a trucking company can analyse geolocation data to reduce fuel costs and improve driver safety. Geolocation data identifies the location of an object or individual at a moment in time. This data may take the form of coordinates or an actual street address.

Contact details for Project : 7598199611 /944500 3404