Computer Sciences Dept.

Discovering Spontaneous Social Events Project

Existing methods for event detection using social media are limited in several important ways including the generality of the number and types of events that can be discovered, emphasis on text data, and the lack of tools for using the results for purposes that directly benefit users. To remedy these weaknesses, the main goals of this project are to automatically discover ``spontaneous social events'' using Bayesian nonparametric modeling of both text and images, and to use the discoveries to foster new social links.

The central hypothesis of this project is that there exist latent categories of social events, that these event categories possess some set of attributes, and that these attributes are at least partially represented in the text messages and photos taken of an event. The project will develop new attribute-based recognition techniques from computer vision, and develop new models for Bayesian nonparametric methods from machine learning. A generic system capable of discovering arbitrary spontaneous social events will be built, and at the same time applied to an example application for detecting wildlife disease based on tweets containing both text and images. A user interface will also be developed that allows users ways to conveniently browse those spontaneous events that are similar to recent tweets by the user, and to enable the user to find others who have similar interests.

The project is likely to have a number of important broader impacts. First, development of novel attribute-based, nonparametric Bayesian models for discovering spontaneous social events combined with a user interface to aid users in building their social groups has many potential high-impact applications. Second, inter-disciplinary collaboration with researchers in environmental studies to develop a prototype application for wildlife disease outbreak detection. Third, involvement of undergraduate students in the research project through the auspices of the Undergraduate Research Scholars Program and independent study courses. Fourth, topics associated with this project will be added to existing computer science courses taught by the investigators, including introduction to artificial intelligence, computer vision, and statistical machine learning. Fifth, presentation of research results to groups outside computer science who are interested in new methods for improving the utility of social media for spontaneous event detection and fostering new social connections.


We have been collecting text and image data from Twitter daily since October 2011 so that we can do various experiments using this data. Each month we are collecting about 100 GB of data. We have chosen as a simple domain for developing our methods the task of detecting 'snow events,' i.e., times and places where there was significant snowfall in an area of the U.S. and using information extracted from the text and images to infer snow depth. Tweets containing text, images and GPS data are used, and ground truth snow depth data is obtained from nearby weather stations.

Yimin Tan has been developing novel machine learning methods for using text and image features for this task. Specifically, he is developing a nonhomogeneous Poisson process as a generative model for inferring snow depth. Chunhui Zhu has been exploring techniques for classifying images as containing a significant amount of snow or not. Experiments have been conducted using a dataset collected from Twitter with over 10,000 non-snow images and 1,600 snow images collected during the period January - April 2012. Over 100,000 CPU hours have been used so far to compute and evaluate a large variety of image features.


  • S. Mei, H. Li, J. Fan, X. Zhu, and C. R. Dyer, Inferring air pollution by sniffing social media, Proc. IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), 2014.
    [pdf | poster | data (370 MB) | UW news article | CBS news article]
  • J. Xu, H. Huang, A. Bellmore, and X. Zhu, School bullying in Twitter and Weibo: A comparative study, Proc. 8th International AAAI Conference on Weblogs and Social Media (ICWSM), 2014.
  • J. Xu, B. Burchfiel, X. Zhu, and A. Bellmore, An examination of regret in bullying tweets, Proc. North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT), 2013.
    [pdf | slides]
  • J. Xu, A. Bhargava, R. Nowak, and X. Zhu, Socioscope: Spatio-temporal signal recovery from social media, Proc. 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013.
    [pdf | code]
  • X. Zhu, Persistent homology: An introduction and a new text representation for natural language processing, Proc. 23rd Int. J. Conf. Artificial Intelligence (IJCAI), 2013.
    [pdf | slides | data and code]
  • J. Rosin, C. R. Dyer, and X. Zhu, The multimodal focused attribute model: A nonparametric Bayesian approach to simultaneous object classification and attribute discovery, Department of Computer Sciences Technical Report 1697, University of Wisconsin-Madison, January 2012.
  • M. Maynord, J. Tiachunpun, X. Zhu, C. R. Dyer, K.-S. Jun, and J. Rosin, An image-to-speech iPad app, Computer Sciences Department Technical Report 1774, University of Wisconsin-Madison, July 2012.


Graduate Students

Undergraduate Students

This project is based on work partially supported by the National Science Foundation under Grant No. IIS-1148012. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Computer Sciences | UW Home