Hi, I am Shuang Wu. I got my M.S. degree in Computer Science and M.S. degree in Economics at University of Wisconsin-Madison. My interests are RDBMs, NoSQL Databases, Data Mining, and Web MVC framework. Currently I am Seeking for a full-time New Graduate Software Engineer (Backend) position starting July 2016. Expertise in Setting up Hadoop Distributed File System and Apache Spark clusters Using HBase, MongoDB, Redis and MySQL for different storage purposes Programming for ETL and applying machine learning analysis Running Spring + Spring MVC + Mybatis
University of Wisconsin-Madison | Master of Science (M.S.) Computer Science | 2014 - 2016 |
---|---|---|
Spring 2016 | CS 838: Foundations of Data Management | Prof. Paris Koutris |
Fall 2015 | CS 784: Data Models and Languages | Prof. AnHai Doan |
Fall 2015 | CS 764: Advanced Database Mgt Systems | Prof. Jeff Naughton |
Fall 2015 | CS 760: Machine Learning | Prof. Mark Craven |
Spring 2015 | CS 640: Computer Networks | Aaron Gember-Jacobson |
Spring 2015 | CS 564: Database Mgt Systems | Prof. AnHai Doan |
Spring 2015 | CS 537: Operating Systems | Perry Kivolowitz |
Fall 2014 | CS 642: Information Security | Prof. Thomas Ristenpart |
Fall 2014 | CS 545: Natural Lang & Computing | Prof. Ben Snyder |
Fall 2014 | CS 770: Human-Computer Interaction | Prof. Bilge Mutlu |
University of Wisconsin-Madison | Master of Science (M.S.) Economics | 2013 - 2015 |
---|---|---|
Fall 2014 | ECON 311: Econometrics III | Prof. Taber, Christopher R. |
Spring 2014 | ECON 708: Microeconomics II | Prof. Smith, Lones |
Spring 2014 | ECON 705: Econometrics II | Prof. Porter, Jack R. |
Spring 2014 | ECON 312: Intmed Macroecon-Adv Treatment | Prof. Williams, Noah M. |
Fall 2013 | ECON 606: Mathematical Economics II & Computing | Prof. Hansen, David R. |
Fall 2013 | ECON 311: Intmed Microecon-Adv Treatment | Prof. Deneckere, Raymond J. |
Realtime Social Sentiment Analysis App (ongoing)
➢ Got geo-tagged tweets in real time by using the Twitter Streaming APIs
➢ Evaluated the sentiment of tweets based on Spouts-Bolts topology pipeline in Storm
➢ Stored these results in HBase, MongoDB, Redis and MySQL for different purposes
➢ Visualized real-time statistical results on a website based on Spring + Spring MVC
Personalizing Yelp Star Rating (2016)
➢ Mining quality phrases from Massive dataset by using NumPy, Pandas, NLTK and R
➢ Reduced recommendation time by 95% by using raw RDD caching in Spark Cluster
➢ Increased matching F1 score by 5% by implementing a Belief Propagation algorithm
Database System for Madison Foodie (2015)
➢ Implemented a 2000+ restaurants searching database system by using Oracle MySQL
➢ Developed Webpages for queries, reports and administration by using Java and JDBC
ELT & Data Matching for eBooks & iTunes (2015)
➢ Built an optimized Python crawler with high scalability and maintainability by Scrapy
➢ Data cleaning, integration, reduction and transforming by using TF/DF, Jaccard Index
➢ Boosted the precision from 75% to 98% while improving the recall from 66% to 92%
Traffic Congestion Classification (2015)
➢ Precisions using support vector machine with different kernel functions and neural network with different hidden units and output functions are compared by 3-fold cross validation
➢ Bagging (Bootstrap Aggregation) algorithm is introduced with the SVM and Neural Network chosen by first step, and confusion matrix among three different congestion levels is used to evaluate algorithms
➢ Investigate how predictive accuracy varies as a function of training-set size.Experiments show that ensemble methods outperform single classifiers, and Bagging-SVM with RBF kernel outperforms all other classifiers for this classification task
Gaze Pattern Detection (2014)
➢ Mounting mobile eye tracking platform to monitor student attention and improve it when student is distracted.
➢ Adopting high-order polynomial regression to match pupil coordinate and target coordinate. We also use running average to get robust estimation of focus and recapture their diminishing attention by none-verbal cue
➢ Finding shows this platform can significantly improve students learning performance by 48% in the experiment