Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Colloquia Archive 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 CMS Colloquia 
 Q-Mat Seminars 
 Q-Mat Seminars Archive 
 P/T Colloquia 
 Kac Lectures 
 Kac Fellows 
 Dist. Quant. Lecture 
 Ulam Scholar 
 CNLS Fellowship Application 
 Student Program 
 Past Visitors 
 History of CNLS 
 Maps, Directions 
 CNLS Office 
Thursday, December 06, 2012
2:00 PM - 2:45 PM
CNLS Conference Room (TA-3, Bldg 1690)

Postdoc Seminar

Inferring origin locations of tweets with quantitative confidence

Reid Priedhorsky
D-4 and CNLS

Twitter and other social internet systems offer a rich and voluminous stream of data which reflects the observations, mood, and knowledge of people distributed around the world. However, specific location information is missing from nearly all messages (e.g., roughly 1% of tweets contain a geotag), meaning that it is very difficult to draw conclusions about specific locales. We are using the content of tweets to infer missing locations, learning on the small fraction of tweets which do have a geotag. Specifically, we parse training tweets into (word, geopoint) pairs and then fit a gaussian mixture model (GMM) to the points associated with each distinct word in the training data. Then, the location estimate for a tweet is a combination of the GMMs previously learned for the words in that tweet. This goes beyond prior work to offer probabilistic, geographic location estimates (rather than a single best point or suggested locale names) which (we expect) will be more accurate than current techniques. We also offer more robust metrics for accuracy, precision, and calibration. We expect these techniques to impact a wide variety of social internet analysis research and applications. This talk will present work in progress, and so feedback, suggestions, and discussion will be greatly appreciated.

Host: Kipton Barros, T-4 and CNLS