top of page
Enchantment Lottery Analysi
MMM
Cutomer Behavior Analysis

Data Analytics Projects

During my time at University of Washington, I worked on multiple projects using supervised and unsupervised learning methods to extract valuable business insights. The steep learning curve at UW helped me build proficiency in Python, R, SQL, Advanced Excel, AWS, Tableau and Adobe Analytics.

You can get any of my project codes on my Github. Do try it out yourself !

(Click on any button to read about the project. I will be happy to provide more in-depth details about any of these. )

Enchantment Lottery Analysis

Tableau, SQL

Business Objective : To develop a strategy to increase the chances of winning the Enchantment Lottery for the Northwest Treks

Dataset : 1 dataset consisting of previous years lottery outcomes. 1 dataset with the weather outcomes for the previous year in the trek region.

Approach :

To device a strategy to select the best dates with the highest chances of winning the enchantment lottery and having the best weather conditions for the trek, there are a few questions that need to be addressed 

  • What are their overall chances of winning the lottery based on 2021 results?

  • What should they keep in mind as they submit their application? Are there specific selections that they can make in order to increase their chances of having their application accepted?

  • If they want to be in the Core Zone of the Enchantments on days when the average historical temperature is over 52°F and the amount of rain is less than 0.03 on average, what date would you recommend they
    pick in order to increase their chances of having their application accepted?

Screenshot 2023-06-23 171114.png
Screenshot 2023-06-23 170553.png
Screenshot 2023-06-23 170633.png

Result : Were able to select 3 best weekends to trek with the highest chance of winning the lottery. 

Technology used: SQL, Tableau

Marketing Channel Spend Analysis

Media Mix Modeling (MMM), Machine Learning

Business Objective : To develop a data model to predict and optimize the spend for a marketing campaign based on revenue data and spend across various marketing channels like Radio, TV and Banners 

Dataset : Weekly recorded data for a social media campaign, consisting of Revenue earned and the amount of money spent across various social media channels like Radio, TV and Banners

Approach :

  • Conducted exploratory analysis to find general trends within the sales.

  • Transformed the data to account for Advertising Adstock.

  • Implemented Saturation Effect on the money spent acorss various channels. In marketing, if there is no saturation effect, we assume that the more money you spend on advertising, the higher your sales get. However, the increase gets weaker the more we spend. This is called a saturation effect or the effect of diminishing returns.

  • Implemented Carryover Effect on the money spent on marketing. What this means is, if you spend $1000 on an advertisement today, a part of your audience might buy it today, but a part of the audience might see the advertisement and decide to buy it tomorrow. The window of the effect of the ad campaign and the power of the effect may vary according to different use cases.

  • Used cross validation and hyperparameter tuning to adjust the window and power of the carryover effect and the saturation effect.

Screenshot 2023-06-23 173208.png
Screenshot 2023-06-23 173152.png
Screenshot 2023-06-23 175000.png
Screenshot 2023-06-23 174335.png

Result : Built a machine learning model that could analyze the effect of spend across various marketing channels on the revenue generated. Additionally, predicted the revenue spend with an accuracy of 87%.

Technology used: Python

Multichannel Retail Store Analysis

RFM Analysis, Classification Trees, Logistic Regression

Business Objective : To develop a strategy for a multichannel retail store for successful customer relationship manangement (CRM) to improve sales.

Dataset : Observations on 4 datasets, with 214 variables and 4,000,000 observations based on the sales of an actual multichannel retail store.

Approach :

  • Conducted exploratory analysis to find general trends within the sales.

  • Created an ETL pipeline on AWS to fetch the data weekly.

  • Explored logistic regression to indentify the trend with the first purchase of customers.

  • Found the relation with customer purchase behavior with respect to the money spent.

  • Performed RFM analysis to find the most valueable customers that should be targeted for specialized marketing campaigns.

  • Used CART model to classify customers within 4 different categories based on their purchase behavior.

shop.jfif

Result : Classified customers amongst 4 categories for different marketing campaigns. Then, I classified the most potential customers based on RFM analysis. Additionally, built a machile learning model that predicts the sales based on certain characteristics with an accuracy of 35%.

Screenshot 2023-06-23 171644.png

Technology used: R, SQL, Python, AWS

Click here to find the source code

Credit Card Fraud Detection for a large bank

Regression

Business Objective : To develop a credit scoring model that can be used to determine if a new application will be a good credit customer or not.

Dataset : Observations on 31 variables for 284,000 past applicants for credit.

Modelling Approach : First, I performed Exploratory Data Analysis to find the highly correlated variables.Then, I used Synthetic Minority Over-sampling Technique to address the problem of imbalanced data and regression supported by sklearn to get the model with the best fit.

ccfraud.png

Result : This model had a high Area Under Curve of 92%. The precision was 98% .The business recommendation was to provide credit card only to those new applicants who had a good credit score and deal with frequent transactions with higher amounts.

Technology used: Python

Click here to find the source code

Hilton Customer Journey Analytics (Adobe Analytics Challenge)

Adobe Analytics

Business Objective : Develop strategies to enhance customer acquisition for Hilton hotels by identifying and optimizing key performance indicators (KPIs) on digital booking platforms through data visualisation and storytelling.

Dataset : Website traffic clickstream data for Hilton

Approach :

  • Conducted in-depth analysis on the digital platforms that garnered the highest web traffic and bookings.

  • Compared the membership distributions across these different digital streams.

  • Analyzed the conversion rates for each digital platform.

Result: Found various insighst such as direct website bookings drive most of the traffic. The membership program contributes to most of the bookings and hence strategizing to expand the membership plan can lead to potential sales. Some corporate booking partners are extremely valueable sources and contribute to 12% of bookings and 28% of digital traffic.

hilton.jpg
Screenshot 2023-06-23 175837.png

Technology used: Adobe Analytics

Click here to find the visualizations

Airbnb Customer Sentiment Analysis

Linear Regression

Business Objective : Analyze the effects of various factors on customer sentiments and recommend ways to improve overall customer sentiment score

Dataset : Listings dataset that contains 5000 different Airbnb properties and 16 characteristics of each property listing. Reviews Dataset that contains 150k customer reviews and the sentiment score for a unique property. 

Modelling Approach : I performed Exploratory Data Analysis to find insights that could portrait some critical information. I ran a linear regression across all variables to see the weight of each property characteristic on customer sentiment. The model was indeed basic and needed modifications. I eliminated the variables that were not significant and also checked the correlation between these variables. 

Result : Discovered that one of the most critical factor for improved customer sentiment is host being a super host. customer sentiments increase by 0.5% for every host being a superhost and other factors. The model accuracy improved by 40% from the basic version. The cancellation policy plays a major effect in the customer sentiments too. Customers prefer a linient cancellation policy.

Airbnb_Logo_Bélo.svg.png

Technology used: Python

Click here to find the source code

Google Play Store

Exploratory Analysis

Business Objective : Analyze the trends across apps on Google Play and gather interesting insights from their downloads.

Dataset : Download information of various apps across Google Play

Approach : Targeted different industries to see the star performer of that industry. Then, I conpared these app ratings with the number of downloads to see which app is actually preferred. Performed analysis to see the effects of in app purchases on the downloads and ratings of these apps.

Result : Apps with ads result in higher customer attraction by 60%. Apps with in app purchases attract a specific market of customers, since they have higher downloads. Education sector is the best performer across all categories and has a presence of 55% on play store.

google play.png
gp.jfif

Technology used: SQL, Tableau

Click here to find the source code

Health Analysis

Logit Regression, Classification Trees, K-means Clustering, Hierarchical Clustering

Business Objective : Predict the possibility of developing diabetes, hypertension or stroke. Additionally, classify groups of patients who would receive similar medication and care.

Dataset : 70,692 responses from the CDC's Behavioral Risk Factor Surveillance System survey in 2015

Modelling Approach : I trained the logistic regression model on the training dataset with significant variables only. I excluded the insignificant variables from the model, to remove biasness. I further classified patients into 4 different categories based on their similarities through classification trees.

Result : The model was able to predict the outcome of the diseases with an accuracy which was 25% higher than the baseline model. 4 different groups of patients were created based on their characteristic and specialized groups could receive intensive care. 

health_.jfif
health.jfif

Technology used: R

Click here to find the source code

Credit Card Fraud Detection
Adobe Analytics Challenge
Airbnb analysis
Google play analysis
health analysis
bottom of page