Java Web Service »
Full Stack Java Project (Event Search and Ticket Recommendation Web Service)
- Developed an interactive web page for users to search events and purchase tickets. Designed a content-based recommendation algorithm to match similar events based on categories
- Built Java servlets with RESTful APIs and Apache Tomcat to handle HTTP requests and responses
- Parsed business data from Yelp API(JSON) to implement business recommendation. Created Relational and NoSQL databases (MySQL, MongoDB) to store user preference and event information data from TicketMaster API
- Deployed server on Amazon AWS to handle 150 queries per second tested by Apache JMeter
Credit Card Fraud Detection »
Credit Card Fraud Detection
- Preprocessed raw customer dataset with over 12,000 customers and 15 features by data cleaning, categorical variables encoding and standardization. Performed exploratory data analysis using R (plotly, ggplot and dplyr).
- Used Python to conduct feature selection and normalize the hugely imbalanced transaction level data using SMOTE.
- Applied machine learning models (SVM, Logistic Regression and Random Forest) with stacking techniques to predict future probability for customer fraud behavior. Evaluated models’ performance using various learning curves.
- Applied feature selection approaches (Permutation Test and Lasso Regularization) to identify important features for different models. Designed an automatic fraud detection system to provide recommendations with over 88% accuracy.
Bittiger Boot Camp »
E-commerce revenue optimization
- Developed a marketing program targeted to dormant one-time buyers on the platform to incentivise them to purchase again,
improving the program efficiency with machine learning models by at least 80% compared to baseline methodology on
backtesting, with over 2GBs of training data processed on single machine
- Implemented classification models including Lasso logistics regression and Random Forest with cross validation optimization
- Conducted data importing, cleaning and preprocessing, models tuning and optimization with R
IOS store monetization experiment design
- Identified potential monetization opportunity to improve buyer conversion by 3x. Discovered strong correlation between
user conversion and form of payment, through exploratory analysis on longitudinal user data in Python.
- Built interactive and scalable Python dashboard to measure impact of A/B test in store purchase flow by drawing Jackknife
confidence interval and calculating statistical significance.
- Made recommendation of running experiment to incentivize user to purchase gift card, presented this recommendation and
demoed Python dashboard to audience of 30 people including 6 capstone committee members.
Yelp Business Rating Prediction Using Sentiment Analysis »
Yelp Business Rating Prediction Using Sentiment Analysis
- Completed yelp business review ratings prediction project competition in representation of statistics department. My
team mainly used R and R studio to create a model that efficiently predict the yelp user rating using sentiment analysis.
- Explored R packages like random forest, mice, MissForest and ggplot to construct models with parametric imputation.
- The Final model achieved 60 percent prediction accuracy
Financial Billing Prediction Project »
This is the project I did during my financial analyst internship at China Asset Management Co.,LTD.
Udacity Business Analyst Nanodegree »
Project1: Problem solving using Alteryx software:
- Learn data analytic techniques to create business insights. Apply the knowledge in the data analytic program called Alteryx.
- Predict diamond prices
Project 0 »
- Predict catalog demand
Project 1 »
Project 2: Generate an analytical dataset using Alteryx
- Learn and apply analytic programming language SQL to input, clean, blend, and format data in preparation for analysis.
- Data cleanup
Project 2.1 »
- Predict catalog demand
Project 2.2 »
Project 3: Data visualization in Tableau: movie trend
Project 3 »
- Apply design principles, human perception, color theory, and effective storytelling with data.
Project 4: Classification models in predicting default risk:
Project 4 »
- In the project, use classification modeling method in two scenario: binary classification models to make predictions
of binary outcomes; non-binary classification models to make predictions of non-binary outcomes.
Project 5: A/B testing for new menu lauch
Project 5 »
- Menu launching project that provides the foundational knowledge to design and analyze A/B tests to create business insights
and support decision making
Project 6: Times series model in forecasting video game sales
Project 6 »
- Establish the forecasting model for the video game time series data and forecasting models. In the project, use ETS (Error,
Trend, Seasonality) models to make forecasts. Also apply ARIMA (Autoregressive, Integrated, Moving Average) models.
Project 7: Segmentation and clustering: the use of K-mean clustering technique in segmenting
Project 7 »
- K-menu project that focuses on understanding key concepts of segmentation and clustering, such as standardization vs.
localization, distance, and scaling. Apply k-centroid clustering models, use the concepts of variable reduction and
principal components analysis (PCA) to prepare data for clustering models.
Udacity Data Analyst Nanodegree »
Project 1 Intro-to-statistics
Statistical techniques using Python to investigate a classic phenomenon from experimental psychology known as the Stroop
Effect. Python code was used to produce descriptive statistics, data visualizations and run a paired samples t-test.
Project 2 Intro-to-data-analysis
Application of data wrangling, statistical analysis, machine learning, and visualization techniques to a New York City Subway
dataset. Includes a Mann-Whitney U-test to test for significant difference between the number of people who ride the
NYC subway when raining versus not raining, while a regression model was fitted to predict the hourly number of Subway
Project 3 Data-wrangling-with-mongodb
Data munging techniques using Python in order to clean OpenStreetMap data for Perth, Australia, create a .json map file,
and load that file into the MongoDB instance.
Project 4 Explore and Summarize Data with R
Exploratory analysis in R in order to examine the relationship between 11 chemical and physical properties of a sample of
white wines. Includes univariate, bivariate and multivariate analysis using the ggplot function, with a focus on identifying
properties which are correlated with the subjective quality ranking of each wine.
Project 5 Intro-to-machine-learning
A Python based predictive model which is able to identify and label Persons of Interest (POI) i.e. Enron employees who committed
fraud. Makes use of a GridSearchCV Pipeline with a StratifiedShuffleSplit cross-validation loop to select the optimal
estimation algorithm (logistic regression estimator with feature scaling using MinMaxScaling and feature selection with
Project 6 Data-visualisation-and-d3js
An interactive data visualization from a dataset of flight delay statistics for airports based within the US, created using
HTML, CSS, D3.js and dimple charts. Visualization indented to provide users the ability to easily access and interpret
time-series flight delay statistics for various airports spread across the US.
Project 7 - Design A/B Testing Experiment
Results of an A/B test that was run by Udacity in order to recommend whether or not to launch a change to the Udacity course
enrolment webpage. Involved the selection of invariant and evaluation metrics, calculation of the duration and proportion
of traffic diversion, and analysis of whether a statistically significant result was observed between the test and control
Some of My Statistics Class Projects
STAT 333 Linear Regression Class Project »
Body fat percentage Prediction
- Practiced data cleaning with a large web data set. Used R to compute various descriptive plots to analyze the data and
deal with missing values.
- Constructed a model that can efficiently predict people's body fat percentage given specific parameters like weight and
STAT 461 Financial Statistics Class Final Project
Title: Portfolio Investments On Four Stocks
STAT 456 Multivariable Data Analysis Final Project »
- Conduct financial analysis on selected stocks from four tech companies. In particular, I closely analyzed Google stock
price and perform data transformation, model fitting, and forecasting.
- Then I consider the call & put option investment base on the Black-Scholes model and Cox-Ross-Rubinstein (CRR) model.
- Finally, I analyzed two ways of computing the weights of individual stocks for the optimal portfolio
Source Code »
Principle Component Analysis with two given data set
STAT 424 Statistics Experimental Design »
Experimetnal design project with blood pressures measuring with mice.
Kaggle Playground Competition Project »