R for Data Science
R is one of the fastest growing programming languages and tool of choice for analysts and data scientists. In part, R owes its popularity to its open source distribution and massive user community.
In this progression of courses, we will help both new and existing R users master R and expand their data science skills. There will be an emphasis on using hands on exercises and real world datasets.
Available Courses
The following courses are offered online in multiple sessions or in-person over 2 consecutive days. All courses can be customized and delivered at your location.
- Introduction to R
- Intermediate R
- Applied R
- Text Mining with R
- Machine Learning with R
- Deep Learning with Keras and Tensorflow in R
- Introduction to R for Data Science
- Intermediate to R for Data Science
- Advanced R for Data Science or Machine Learning with R or Fundamentals of Machine Learning
To achieve a Certificate of Competency, all classes must be taken through the Center for Business Analytics. In the event that professional experience meets the prerequisite for a class, another approved class may be substituted (limited to 1 class per Certificate of Competency).
Introduction to R
Upon successfully completing this course, students will:
- Be up and running with R
- Understand the different types of data R can work with
- Understand the different structures in which R holds data
- Be able Import data into R
- Perform basic data wrangling activities with R
- Compute basic descriptive statistics with R
- Visualize their data with base R and ggplot graphics
Day One | Day Two |
---|---|
Introduction
Importing/exporting data
Data structures
Data types
Tidy Data
|
Transforming and manipulating data
Base R plotting
Advanced plotting with ggplot
Putting it all together
|
Intermediate R
This is the second course in the "R" for Data Science series and will build on the material from "Introduction to "R". Attendance at the introductory course is not required for those with practical experience in a professional setting. For those with little prior experience, please contact Larry Porter for resources to prepare for this class.
This course will cover the application of R for the entire data science workflow – data acquisition, wrangling, visualization, analytic modeling, and communication. There will be an emphasis on using hands on exercises and real world datasets.
Upon successfully completing this course, students will:
- Be able to work in a fully reproducible literate statistical environment
- Have mastered the data wrangling process to include handling text data and scraping structured and unstructured online data
- Understand how to minimize code duplication by applying control statements, the apply family of functions, along with developing their own functions
- Be fluent with exploratory data analyses
- Understand the analytic modeling process
- Be able to communicate their analysis through a variety of mediums
Day One | Day Two | |
---|---|---|
Introduction (45 min):
Working in a Reproducible Environment (50 min)
Wrangling (90 min)
Regular Expressions (60 min)
Scraping (60 min)
Iteration (90 min)
|
Functions (90 min)
Exploratory Data Analysis (60 min)
Modeling Basics (90 min)
Modeling Building Process (120 min)
Communicating Your Results (120 min)
|
Applied Analytics with R
This course makes strong assumptions about your prior knowledge. To ensure your success, be sure that you have reviewed and are comfortable with the material covered in the Intro to R and Intermediate R courses.
This is the third course in the "R" Data Science Series and will build on the material from "Intermediate "R". Attendance at the introductory or intermediate course is not required for those with significant practical experience in a professional setting. This course will cover the application of several descriptive, predictive and prescriptive analytic techniques. The emphasis will be on the general purpose of these techniques along with integration and application in R rather than on the theoretical nature. This allows this course to be more accessible to a wider audience looking to inject R for analytic purposes across organizational processes. There will be an emphasis on using hands on exercises and real world datasets.
Upon successfully completing this course, students will be able to use R to:
· Preprocess their data prior to modeling to improve model performance
· Apply a variety of descriptive, predictive, and prescriptive analytic models
· Extract and interpret model results
· Visualize their modeling results to communicate their findings
Descriptive Analytics | Predictive Analytics |
Prescriptive Analytics |
---|---|---|
|
|
|
Text Mining with R
If you work in analytics or data science you are familiar with the fact that data is being generated all the time at ever-faster rates. Analysts are often trained to handle tabular or rectangular data that are mostly numeric, but much of the data proliferating today is unstructured and text-heavy. Many of us who work in analytical fields are not trained in even the simplest approaches to analyzing unstructured text data. This short course serves as an introduction to text mining with the R programming language.
Upon successfully completing this course, students will be able to use R to:
- Assess regular expressions within unstructured text.
- Tidy unstructured text data.
- Perform word frequency analysis.
- Quantify the sentiment in text.
- Assess the frequency and importance of terms across documents.
- Understand the relationship between words.
- Perform topic modeling analysis.
Machine Learning with R
Learn the fundamentals and application of modern machine learning tasks. This course will cover unsupervised techniques to discover the hidden structure of datasets along with supervised techniques for predicting categorical and numeric responses via classification and regression.
Learn how to process data for modeling, how to train your models, how to visualize your models and assess their performance, and how to tune their parameters for better performance. The course emphasizes intuitive explanations of the techniques while focusing on problem-solving with real data across a wide variety of applications.
Intended Audience
This course is intended for academics and data science practitioners who wish to learn about machine learning tasks as well as a guide to applying them. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis, along with intermediate R programming skills.
Topics | Topics | |
---|---|---|
|
|
Deep Learning with Keras and Tensorflow in R
This two-day workshop introduces the essential concepts of building deep learning models with TensorFlow and Keras via R. First, we’ll establish a mental model of where deep learning fits in the spectrum of machine learning, highlight its benefits and limitations, and discuss how the TensorFlow - Keras - R toolchain work together. We'll then build an understanding of deep learning through first principles and practical applications covering a variety of tasks such as computer vision, natural language processing, anomaly detection, and more. Throughout the workshop you will gain an intuitive understanding of the architectures and engines that make up deep learning models, apply a variety of deep learning algorithms (i.e. MLPs, CNNs, RNNs, LSTMs, autoencoders), understand when and how to tune the various hyperparameters, and be able to interpret model results. Leaving this workshop, you should have a firm grasp of deep learning and be able to implement a systematic approach for producing high quality modeling results.
Intended Audience
Is this workshop for you? If you answer "yes" to these three questions, then this workshop is likely a good fit:
- Are you relatively new to the field of deep learning and neural networks but eager to learn? Or maybe you have applied a basic feedforward neural network but aren't familiar with the other deep learning frameworks?
- Are you an experienced R user comfortable with the tidyverse, creating functions, and applying control (i.e. if, ifelse) and iteration (i.e. for, while) statements?
- Are you familiar with the machine learning process such as data splitting, feature engineering, resampling procedures (i.e. k-fold cross validation), hyperparameter tuning, and model validation? This workshop will provide some review of these topics but coming in with some exposure will help you stay focused on the deep learning details rather than the general modeling procedure details.
Day One | Day Two | |
---|---|---|
Introductions Deep learning ingredients Deep learning recipe · Training your model · Mini-project: Predicting Ames, IA home sales prices Computer vision & CNNs · MNIST revisited · Cats vs. dogs · Transfer learning Project: Classifying natural images |
Word embeddings · The original IMDB · Pre-trained embeddings · Mini project - Amazon reviews Collaborative filtering RNNs & LSTMs · IMDB revisted · Mini-project: Non-IMDB reviews Wrap up · Project: Detecting Duplicate Quora · Final words of wisdom |
R for Data Science Instructor
Brad Boehmke, PhD, is the Director of Data Science at 84.51°, Professor at three universities, author of the Data Wrangling with R book, and creator of multiple R open source packages and data science short courses. He focuses on developing algorithmic processes, solutions, and tools that enable 84.51° and its analysts to efficiently extract insights from data and provide solution alternatives to decision-makers. He has a wide analytic skill set covering descriptive, predictive, and prescriptive analytic capabilities applied across multiple domains including retail, healthcare, cyber intelligence, finance, Department of Defense, and aerospace. Summary of his works is available online at bradleyboehmke.github.io.