Discovering Spontaneous Social Events Project
Existing methods for event detection using social media are limited in several important ways
including the generality of the number and types of events that can be discovered,
emphasis on text data, and the lack of tools for using
the results for purposes that directly benefit users. To remedy these weaknesses,
the main goals of this project are
to automatically discover
``spontaneous social events'' using Bayesian nonparametric modeling of both text and images,
and to use the discoveries to foster new social links.
The central hypothesis of this project is that there exist latent categories
of social events, that these event categories possess some
set of attributes, and that these attributes are at least partially represented
in the text messages and photos taken of an event. The project will develop new
attribute-based recognition techniques from computer vision, and develop
new models for Bayesian nonparametric methods from machine learning.
A generic system capable of discovering arbitrary spontaneous social events will be built,
and at the same time applied to an example application for detecting wildlife disease based on tweets
containing both text and images.
A user interface will also be developed that allows users ways to conveniently browse those spontaneous
events that are similar to recent tweets by the user, and to enable the user to find others who have
The project is likely to have a number of important broader impacts.
First, development of novel attribute-based, nonparametric Bayesian models for
discovering spontaneous social events
combined with a user interface to aid users in building their social groups has many potential
Second, inter-disciplinary collaboration with researchers in environmental studies to develop a prototype
application for wildlife disease outbreak detection.
Third, involvement of undergraduate students
in the research project through the
auspices of the Undergraduate Research Scholars Program and independent study courses.
Fourth, topics associated with this project will be added to existing computer science courses taught
by the investigators, including
introduction to artificial intelligence, computer vision, and statistical machine learning.
Fifth, presentation of research results to groups outside computer science
who are interested in new methods for improving the utility of social media for
spontaneous event detection and fostering new social connections.
We have been collecting text and image data from Twitter daily since October 2011 so that we can do various experiments using this data.
Each month we are collecting about 100 GB of data. We have chosen as a simple domain for developing our methods the task of detecting
'snow events,' i.e., times and places where there was significant snowfall in an area of the U.S. and using information extracted from the text
and images to infer snow depth. Tweets containing text, images and GPS data are used, and ground truth snow depth data is obtained from
nearby weather stations.
Yimin Tan has been developing novel machine learning methods for using text and image features for this task. Specifically, he is developing a
nonhomogeneous Poisson process as a generative model for inferring snow depth. Chunhui Zhu has been exploring techniques for classifying
images as containing a significant amount of snow or not. Experiments have been conducted using a dataset collected from Twitter with over
10,000 non-snow images and 1,600 snow images collected during the period January - April 2012. Over 100,000 CPU hours have been used so
far to compute and evaluate a large variety of image features.
S. Mei, H. Li, J. Fan, X. Zhu, and C. R. Dyer, Inferring air pollution by sniffing social media, Proc. IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), 2014.
[pdf | poster | data (370 MB) | UW news article | CBS news article]
J. Xu, H. Huang, A. Bellmore, and X. Zhu, School bullying in Twitter and Weibo: A comparative study, Proc. 8th International AAAI Conference on Weblogs and Social Media (ICWSM), 2014.
J. Xu, B. Burchfiel, X. Zhu, and A. Bellmore, An examination of regret in bullying tweets, Proc. North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT), 2013.
[pdf | slides]
J. Xu, A. Bhargava, R. Nowak, and X. Zhu, Socioscope: Spatio-temporal signal recovery from social media, Proc. 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013.
[pdf | code]
X. Zhu, Persistent homology: An introduction and a new text representation for natural language processing, Proc. 23rd Int. J. Conf. Artificial Intelligence (IJCAI), 2013.
[pdf | slides | data and code]
J. Rosin, C. R. Dyer, and X. Zhu,
The multimodal focused attribute model: A nonparametric Bayesian approach to simultaneous object classification and attribute discovery,
Department of Computer Sciences Technical Report 1697, University of Wisconsin-Madison, January 2012.
M. Maynord, J. Tiachunpun, X. Zhu, C. R. Dyer, K.-S. Jun, and J. Rosin,
An image-to-speech iPad app,
Computer Sciences Department Technical Report 1774, University of Wisconsin-Madison, July 2012.
This project is based on work partially supported by the National Science
Foundation under Grant No. IIS-1148012.
Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the authors and do not
necessarily reflect the views of the National Science Foundation.