R for Data Science

R is one of the fastest growing programming languages and tool of choice for analysts and data scientists. In part, R owes its popularity to its open source distribution and massive user community.

R software logo, which is a blue capital R with a grey oval around the upper part of the letter

In this progression of courses, we will help both new and existing R users master R and expand their data science skills. There will be an emphasis on using hands on exercises and real world datasets.

Available Courses

The following courses are offered online in multiple sessions or in-person over 2 consecutive days. All courses can be customized and delivered at your location.

  • Introduction to R
  • Intermediate R
  • Applied R
  • Text Mining with R
  • Machine Learning with R
  • Deep Learning with Keras and Tensorflow in R  

R FOR DATA SCIENCE CERTIFICATE OF COMPETENCY (3 Classes)

  • Introduction to R for Data Science
  • Intermediate to R for Data Science
  • Advanced R for Data Science or Machine Learning with R or Fundamentals of Machine Learning
To achieve a Certificate of Competency, all classes must be taken through the Center for Business Analytics.  In the event that professional experience meets the prerequisite for a class, another approved class may be substituted (limited to 1 class per Certificate of Competency)

Introduction to R

Upon successfully completing this course, students will:

  • Be up and running with R
  • Understand the different types of data R can work with
  • Understand the different structures in which R holds data
  • Be able Import data into R
  • Perform basic data wrangling activities with R
  • Compute basic descriptive statistics with R
  • Visualize their data with base R and ggplot graphics
                                          Introduction to R Course Outline
                  Day  One                    Day Two

Introduction

  •   Downloading R and RStudio
  •   The R environment
  •   Getting help
  •   Managing your directory
  •   R as a Calculator
  •   Simple objects & assignment
  •   Vectors
  •   Working with packages

 

Importing/exporting data

  •   Built-in data
  •   Text files
  •   Excel files
  •   Scraping online tabular files

 

Data structures

  •   Vectors
  •   Matrices
  •   Lists
  •   Data frames

 

Data types

  •   Numbers
  •   Character strings
  •   Factors
  •   Dates
  •   Logical

 

Tidy Data

  •   Managing wide and long data
  •   Splitting and uniting variables

Transforming and manipulating data

  •   Selecting variables
  •   Filtering variables
  •   Summarizing
  •   Ordering
  •   Creating new variables
  •   Merging data sets

 

Base R plotting

  •   Strip charts
  •   Histograms
  •   Density plots
  •   Box plots
  •   Bar charts
  •   Dot plots
  •   Scatter plot
  •   Line charts

 

Advanced plotting with ggplot

  •   Geoms
  •   Overfitting
  •   Color, size and shape aesthetics
  •   Small multiples (faceting)
  •   Scales, axes and legends
  •   Themes

 

Putting it all together

  •   Case study 1
  •   Case study 2

Intermediate R

This is the second course in the "R" for Data Science series and will build on the material from "Introduction to "R". Attendance at the introductory course is not required for those with practical experience in a professional setting. For those with little prior experience, please contact Larry Porter for resources to prepare for this class.

This course will cover the application of R for the entire data science workflow – data acquisition, wrangling, visualization, analytic modeling, and communication. There will be an emphasis on using hands on exercises and real world datasets.

Upon successfully completing this course, students will:

  • Be able to work in a fully reproducible literate statistical environment
  • Have mastered the data wrangling process to include handling text data and scraping structured and unstructured online data
  • Understand how to minimize code duplication by applying control statements, the apply family of functions, along with developing their own functions
  • Be fluent with exploratory data analyses
  • Understand the analytic modeling process
  • Be able to communicate their analysis through a variety of mediums
                                      Intermediate R Course Outline
Day One   Day Two  

Introduction (45 min):

  • Course intro
  • Recap of the basics

Working in a Reproducible Environment (50 min)

  • RStudio projects
  • R Markdown
  • R Notebooks

Wrangling (90 min)

  • Tibbles & Data frames
  • Pipe function
  • tidyr & dplyr
  • Mastering data types & structures

Regular Expressions (60 min)

  • Regex syntax
  • Regex functions
  • Normalizing text

Scraping (60 min)

  • Scraping tabular & spreadsheet files
  • Scraping HTML text
  • Scraping HTML tables
  • Scraping with APIs

Iteration (90 min)

  • Control statements (if, ifelse, for, while, repeat)
  • Apply family (apply, lapply, sapply, tapply)



Functions (90 min)

  • When to write functions
  • Function components
  • Function arguments
  • Scoping rules
  • Concept of lazy evaluation
  • Returning function outputs
  • Handling invalid parameters
  • Sourcing your own functions

Exploratory Data Analysis (60 min)

  • Describing your data visually
  • Describing your data numerically

Modeling Basics (90 min)

  • Simple models
  • Visualizing models
  • Mastering model formulas
  • Interpolation vs. extrapolation

Modeling Building Process (120 min)

  • Case study 1
  • Case study 2
  • Case study 3
  • Managing many models

Communicating Your Results (120 min)

  • Interactive graphics
  • R Markdown reporting
  • Intro to flexdashboards
  • Intro to Shiny
 

Applied Analytics with R

This course makes strong assumptions about your prior knowledge. To ensure your success, be sure that you have reviewed and are comfortable with the material covered in the Intro to R and Intermediate R courses.

This is the third course in the "R" Data Science Series and will build on the material from "Intermediate "R".  Attendance at the introductory or intermediate course is not required for those with significant practical experience in a professional setting.  This course will cover the application of several descriptive, predictive and prescriptive analytic techniques.  The emphasis will be on the general purpose of these techniques along with integration and application in R rather than on the theoretical nature.  This allows this course to be more accessible to a wider audience looking to inject R for analytic purposes across organizational processes. There will be an emphasis on using hands on exercises and real world datasets.

Upon successfully completing this course, students will be able to use R to:

·       Preprocess their data prior to modeling to improve model performance

·       Apply a variety of descriptive, predictive, and prescriptive analytic models

·       Extract and interpret model results

·       Visualize their modeling results to communicate their findings

Although comprehension and progression varies by class, the following provides an illustration of the variety of techniques that are commonly covered.
Descriptive Analytics   Predictive Analytics
Prescriptive Analytics
  • Descriptive visualization
  • Numerical data descriptive statistics
  • Categorical data descriptive statistics
  • Assessing basic assumptions
  • Correlations
  • t tests/ANOVA
  • Time series forecasting
  • Linear regression
  • Multilevel modeling
  • Logistic regression
  • Decision trees / random forests
  • Support vector machines
  • K-means clustering
  • Neural networks
  • Linear programming
  • Data envelopment analysis
  • Decision models

Text Mining with R

If you work in analytics or data science you are familiar with the fact that data is being generated all the time at ever-faster rates. Analysts are often trained to handle tabular or rectangular data that are mostly numeric, but much of the data proliferating today is unstructured and text-heavy. Many of us who work in analytical fields are not trained in even the simplest approaches to analyzing unstructured text data. This short course serves as an introduction to text mining with the R programming language.

Upon successfully completing this course, students will be able to use R to:

  • Assess regular expressions within unstructured text.
  • Tidy unstructured text data.
  • Perform word frequency analysis.
  • Quantify the sentiment in text.
  • Assess the frequency and importance of terms across documents.
  • Understand the relationship between words.
  • Perform topic modeling analysis.

 



Machine Learning with R

Learn the fundamentals and application of modern machine learning tasks. This course will cover unsupervised techniques to discover the hidden structure of datasets along with supervised techniques for predicting categorical and numeric responses via classification and regression.

Learn how to process data for modeling, how to train your models, how to visualize your models and assess their performance, and how to tune their parameters for better performance. The course emphasizes intuitive explanations of the techniques while focusing on problem-solving with real data across a wide variety of applications.  

Intended Audience

This course is intended for academics and data science practitioners who wish to learn about machine learning tasks as well as a guide to applying them. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis, along with intermediate R programming skills.

                                    Machine Learning with R Topic Outline
Topics Topics  
  • Unsupervised
  • Principal components analysis
  • Clustering
  • Supervised regression techniques
  • Model validation
  • Linear regression and its cousins
  • Nonlinear regression
  • Regression trees
  • Supervised classification techniques
  • Model validation
  • Linear classification models
  • Nonlinear classification models
  • Classification trees
 

Deep Learning with Keras and Tensorflow in R

This two-day workshop introduces the essential concepts of building deep learning models with TensorFlow and Keras via R. First, we’ll establish a mental model of where deep learning fits in the spectrum of machine learning, highlight its benefits and limitations, and discuss how the TensorFlow - Keras - R toolchain work together. We'll then build an understanding of deep learning through first principles and practical applications covering a variety of tasks such as computer vision, natural language processing, anomaly detection, and more. Throughout the workshop you will gain an intuitive understanding of the architectures and engines that make up deep learning models, apply a variety of deep learning algorithms (i.e. MLPs, CNNs, RNNs, LSTMs, autoencoders), understand when and how to tune the various hyperparameters, and be able to interpret model results. Leaving this workshop, you should have a firm grasp of deep learning and be able to implement a systematic approach for producing high quality modeling results.

Intended Audience

Is this workshop for you? If you answer "yes" to these three questions, then this workshop is likely a good fit:

  • Are you relatively new to the field of deep learning and neural networks but eager to learn? Or maybe you have applied a basic feedforward neural network but aren't familiar with the other deep learning frameworks?
  • Are you an experienced R user comfortable with the tidyverse, creating functions, and applying control (i.e. if, ifelse) and iteration (i.e. for, while) statements?
  • Are you familiar with the machine learning process such as data splitting, feature engineering, resampling procedures (i.e. k-fold cross validation), hyperparameter tuning, and model validation? This workshop will provide some review of these topics but coming in with some exposure will help you stay focused on the deep learning details rather than the general modeling procedure details.
                         Deep Learning with Keras and Tensorflow in R Course Outline
Day One Day Two  

Introductions

Deep learning ingredients

Deep learning recipe

·       Training your model

·       Mini-project: Predicting Ames, IA home sales prices

Computer vision & CNNs

·       MNIST revisited

·       Cats vs. dogs

·       Transfer learning

Project: Classifying natural images

Word embeddings

·       The original IMDB

·       Pre-trained embeddings

·       Mini project - Amazon reviews

Collaborative filtering

RNNs & LSTMs

·       IMDB revisted

·       Mini-project: Non-IMDB reviews

Wrap up

·       Project: Detecting Duplicate Quora

·       Final words of wisdom

 

R for Data Science Instructor

Brad Boehmke, PhD, is the Director of Data Science at 84.51°, Professor at three universities, author of the Data Wrangling with R book, and creator of multiple R open source packages and data science short courses. He focuses on developing algorithmic processes, solutions, and tools that enable 84.51° and its analysts to efficiently extract insights from data and provide solution alternatives to decision-makers. He has a wide analytic skill set covering descriptive, predictive, and prescriptive analytic capabilities applied across multiple domains including retail, healthcare, cyber intelligence, finance, Department of Defense, and aerospace. Summary of his works is available online at bradleyboehmke.github.io.