Carl H. Lindner College of BusinessCarl H. Lindner College of BusinessUniversity of Cincinnati

Carl H. Lindner College of Business
Business Analytics Training Introduction to Big Data

Introduction to Big Data and Fundamentals of Hadoop

July 27 & 28

Course Description: The production of data is expanding at an astounding pace. The explosion of accessible data through social media, the extensive use of web crawling, and the widespread availability of sensor data, have provided unprecedented amounts of data to organizations for collection and analysis. This two-day workshop explores drivers behind Big Data and uses cases across a wide variety of industries to illustrate the power of new technologies to harness Big Data and generate meaningful insights. Participants will be introduced to Hadoop and key-value data storage, the central components of the Big Data movement. These systems allow the distributed processing of very large data sets for structured and unstructured data.

During this course, participants will learn how Hadoop works with “hands on” experiences using the Hadoop File Systems (HDFS) and MapReduce. Participants will also be introduced to several ecosystem components like HBase, Hive, Impala, and Spark used in Big Data reporting systems.

This is an introductory course in Big Data and Hadoop, but it will go beyond basics to introduce some technical components. It is appropriate for those that just want to learn more about Hadoop and Big Data and those that are looking to begin on a path to becoming a Hadoop developer.

Day 1 Material:

  1. What is big data?
    • Volume, variety, velocity, and veracity
    • Comparing big data to conventional reporting systems
  2. Strengths and weaknesses of big data solutions
    • Processing bottlenecks
    • Data integration challenges
    • Data redundancy versus speed
    • Lack of data integrity
  3. Key-value data systems and HDFS
    • Relational, dimensional, and key-value data models
    • Navigating data in the HDFS
  4. Parallel processing
    • Big data hardware setups
    • Retrieving and processing data in big data environments

Day 2 Material:

  1. MapReduce
    • Writing MapReduce algorithms in JAVA
    • Executing MapReduce code on HDFS data
  2. Common big data algorithms
    • Reusing mappers and reducers
    • Common mappers: case, explode, filter, keyspace, identity
    • Common reducers: sum, average, identity
  3. Big data reporting with Hive
    • What is Hive?
    • Writing Hive statements
  4. The big data ecosystem (Hive, Impala, Spark, etc.)
    • Introduction to other Hadoop ecosystem tools: Pig, Sqoop, Flume, Oozie, Spark, Impala
    • Comparison of in-memory processing options
  5. Squeezing value from big data
    • Best practices for “wringing value” from big data

 

 

2017 CLASSES

Introduction to Python  Sept 14 & 15
Tableau Training Sept 21 & 22
Introduction to R Oct 5 & 6
Text Mining with "R"   Oct 19 & 20
Tableau Training Nov 2 & 3
Analytics In Excel TBD

Course Fee: $695 includes breakfasts, lunches, refreshments and free parking for both days.

Course Location:
U-Square. Room 359
225 Calhoun Street
Cincinnati. OH 45219   (google map link)

Instructor

Andrew Harrison is an Assistant Professor of Information Systems in the Lindner School of Business at the University of Cincinnati.  His research interests include consumer fraud, deception, security systems, privacy, media capabilities, and virtual worlds.