Carl H. Lindner College of BusinessCarl H. Lindner College of BusinessUniversity of Cincinnati

Carl H. Lindner College of Business
Analytics header

Big Data - Spark

This class has ended

Analytics Summit Training Session   
8:00AM - 4:30PM

Big Data with Hadoop and Spark

COURSE DESCRIPTION

Hadoop? MapReduce? Spark? Hive?...Making sense of the tools used to analyze big data can seem confusing and overwhelming at times. Dr. Harrison and Dr. Shan will help you understand how these components function and form the core of big data analytics systems. The emphasis of this course will be on understanding the fundamental principles of big data systems using Hadoop and Spark.

Spark allows the processing of huge volumes of data in real-time, and is a dominant choice for performing analytics at scale. Similarly, the Hadoop Distributed File System (HDFS) forms the backbone of most big data systems. In this course, participants will learn the theory behind how these tools work so they can understand when, and how, to implement them effectively. The relative strengths and weaknesses of various big data systems will be highlighted to explain how Spark has emerged as a popular choice for analyzing dynamic, high-velocity, and high-volume data.
Participants will also get hands-on experience using HDFS and Spark to illustrate the power of big data analytics.

AGENDA AND TOPICS

Day 1

Day 2

  • Overview of Big Data
  • Key-Value Data Models
  • Setup/Navigating the HDFS (Lab)
  • How Hadoop Works
  • Submitting a job to a YARN cluster (Lab)
  • Hadoop with MapReduce versus Spark
  • Importing and querying data with Sqoop and Hive (Lab)
  • How Spark Works
  • Making Spark RDD’s (Lab)
  • Querying Data using Spark
  • Processing  data with Spark (Lab)
  • Data Integration
  • Joining Spark RDD’s (Lab)
  • Using Big Data to Solve Real Business Problems
  • Tying it all together (Lab)

INTENDED AUDIENCE

This is an introductory course in Big Data and Spark, but it will go beyond basics to introduce some technical components. Most big data analytics will be performed using Spark and HiveQL, a querying language based on SQL. Participants will also use basic Linux commands for operating Hadoop. This course is appropriate for those that want to learn more about how Spark and HDFS function and those that are looking to begin career in big data analytics.

COURSE FEE:

$750 includes breakfast, lunch, snacks and free parking for both days

TESTIMONIALS:

  • It was the most useful content I've ever received on the big data/Hadoop/Spark topic
  • Excellent job by Andrew and Jay.  The organization was great
  • Useful mix of lecture and labs to really make the topics stick. I now have a better practical undersatnding of tools that are part of the Hadoop system.
  • Awesome for my level of knowledge in Hadoop
  • Excellent overview of rapidly changing topic-Very knowledgeable instructors.
  • Fantastic course. Provided a good knowledge of the big data tools ecosystem

 

INSTRUCTORS

Andrew Harrison Big Data Spark Instructor

Andrew Harrison is an Assistant Professor of Information Systems in the Lindner School of Business at the University of Cincinnati. His research interests include consumer fraud, deception, security systems, privacy, media capabilities, and virtual worlds. http://business.uc.edu/academics/departments/obais/faculty/andrew-harrison.html#bio

Jay Shan Big Data

Zhe (Jay) Shan is currently an assistant professor in Department of Operations, Business Analytics, and Information Systems in the Lindner College of Business at the University of Cincinnati. He earned his Ph.D. degree in Business Administration and Operations Research from Penn State University Smeal College of Business in 2011. Before joining UC OBAIS, he worked as Assistant Professor of Information Systems at Manhattan College School of Business for two years. Personal website: http://homepages.uc.edu/~shanze/