Past MS Business Analytics Capstone Projects

Fall 2017

Zachary P. Malosh, The Impact of Scheduling on NBA Team Performance, November 2017, (Michael Magazine, Tom Zentmeyer)
Every year, the NBA releases their league schedule for the coming year. The construction of the schedule contains many potential schedule-based factors (such as rest, travel, and home court) that can impact each game. Understanding the impact of these factors is possible by creating a regression model that quantifies the team performance in a particular game in terms of final score and fouls committed. Ultimately, rest, distance, attendance, and time in the season had direct impact on the final score of the game while the attendance at a game led to an advantage in fouls called against the home team. The quantification of the impact of these factors can be used to anticipate variations in performance to improve accuracy in a Monte Carlo simulation.

Oscar Rocabado, Multiclass Classification of the Otto Group Products, November 2017, (Yichen Qin, Amitabh Raturi)
Otto group is a multinational group with thousands of products that need to be classified consistently in nine groups. The better the classification, the more insights they can generate about their product range. However, the data is highly unbalanced among classes so we try to find out if the balancing group Synthetic Minority Oversampling Technique has notable effects in the performance of the accuracy and Area under the Curve of the classifiers. Given the data set is obfuscated so that the interpretability of the dataset is impossible, we will use black box methods like Linear and Gaussian Support Vectors Machines and Multilayer Perceptron and Ensembles that combines classifiers like Random Forests and Majority Voting.

Shixie Li, Credit Card Fraud Detection Analysis: Over sampling and under sampling of imbalanced data, November 2017, (Yichen Qin, Dungang Liu)
Imbalanced credit fraud data is analyzed by over sampling and under sampling methods. A model is built with logistic regression and area under PRROC (Precision-Recall curve) is used to show model performance of each method. The disadvantage of using area under ROC is that due to the imbalance of the data the specificity will be always close to 1. Therefore the area under the curve does not work well on imbalanced data. This disadvantage is shown by comparison in this paper. Instead a precision-recall plot is used to find a reasonable region for the cutoff point based on the result from selected model. The cutoff value should be chosen within the region or around the region and it is all depends on whether precision or recall is more important to the bank.

Cassie Kramer, Leveraging Student Information System Data for Analytics, November 2017, (Michael Fry, Nicholas Frame)
In 2015, The University of Cincinnati began to transition its Student Information System from a homegrown system to a system created by Oracle PeopleSoft called Campus Solutions and branded by UC as Catalyst. In order to perform reporting and analytics on this data, the data must be extracted from the source system, modeled and loaded into a data warehouse. The data can then be exported to perform analytics. In this project, the process of extract, modeling, loading and analyzing will be covered. The goal will be to predict students’ GPA and retention for a particular college.

Parwinderjit Singh, Alternative Methodologies for Forecasting Commercial Automobile Liability Frequency, October 2017, (Yan Yu, Caolan Kovach-Orr)
Insurance Services Office, Inc. publishes quarterly forecast of Commercial Automobile liability frequency (number of commercial automobile insurance claims reported/paid) to help insurers make better pricing and reserving decisions. This paper proposes forecasting models based on time-series forecasting techniques as an alternative to already existing traditional methods and intends to improve the existing forecasting capabilities. ARIMAX forecasting models have been developed with economic indicators as external regressors. These models resulted in a MAPE (Mean Absolute Percentage Error) ranging from 0.5% to ~9% which is a significant improvement from currently used techniques.