Automatic Location Prediction of Depressed Teens using Data Extraction from Twitter Posts

Official Project

The Problem

The United States has entered the era of social networking, with a 2009 study reporting that 73% of teens between the ages of 12 and 17 are members of a social networking site [9]. With this new outlet for teens, there is an increased amount of data that is accessible to the public. Today depression and suicide are major problems amongst teenagers in the United States. A 2011 study from the Pew Research Center found that 28% of adolescents show signs of depression, and that 8% of adolescents show signs of severe depression. This depression can develop into thoughts of harming oneself or progress so far as ending one’s life. As late as 2007, 14.5% of students in grades 9-12 seriously considered suicide, and 47.9% of those teens made at least one suicide attempt [3]. The use of social media has become a constant aspect of the daily lives of most teens. The Pew Research Center shows that as of 2011 eight of ten teens use social networking sites. The percentage of teen twitter users in 2011 was at 16%; double that of a late 2009 survey [11]. With the rate at which social media use has been expanding, there is a vast amount of data available. Current scholarship on this have failed to realize the responsibility of monitoring and analyzing social media, but I believe through the use of techniques designed in computer science and the use of linear modeling I can identify: What are at risk individuals’ interactions with twitter posts and how can these individuals be identified? Twitter posts consist of 140 characters, and contain a statement from the individual user. Users are able to follow and be followed by other twitter users. They can see the posts of the individuals that they follow and vice versa. Additionally, posts are tagged with geographically accurate latitude/longitude coordinates. Firsthand, I have seen the negative effects of depression on individuals. My hometown of Williamsburg, VA, has suffered a numerous amount of suicides among adolescences. Additionally, my older sister was diagnosed with clinical depression 10 years ago. I have seen the negative effects depression has had on her life, and I want to be able to identify and send help to at risk teens at the earliest signs of depressive tendencies. Because of the large population of at risk people and the increasing number of twitter users, I want to use natural language processing methods developed through computer science and linear modeling techniques to target these individuals.

Plan of Action

I will begin my research by creating a subset of tweets believed to be indicative of depressive and suicidal thoughts, based off of studied characteristics of depressed and suicidal teens. [7][8]. The use of semantic role labeling (SRL) [5] and latent Dirichlet allocation (LDA) [1][2], a predictive model that extracts trigger words from a document or tweet, have effectively been used for crime prediction by analyzing tweets [12]. With the subset of tweets I establish, I will be able to apply SRL and LDA modeling techniques to predict hotspots for depressed and suicidal teens. This model will then be applied to a large populous, such as Chicago. Chicago is an ideal city because of its large population of nearly 2.7 million, the variety of social classes, and racial diversity within the population [4]. These factors will allow analysis of teens socioeconomic and demographic background, in relation to their symptoms. I will be able to use R statistical programming language, S-Plus, and SAS to analyze the collected data [13]. These software packages can help determine a generalized additive model, which can discover underlying factors of depression and help predict future incidents [13]. For geographic data management and visual modeling, I will use a toolkit programmed in Visual C# and PostGIS [10]. This spring semester I will begin studying the symptoms of depression among teens, in order to create tweet subsets that are indicative of at risk teens. Also, I will be collecting and analyzing the spatial, demographic, and socioeconomic features of specific locations throughout Chicago to build an accurate model to target depressed and suicidal teens. There are many papers available through the Systems Engineering Department and at Brown Library published by Professor Gerber, Professor Brown, and Phd. candidate Matt Huddleston, which I will have access to in Charlottesville this summer. With the data I accumulate, I will apply a mixture of the modeling techniques from my previous research to build an accurate model of depressed teens within Chicago. After this summer, depending on the progress made with my research, this project can cascade into a capstone project within the Systems Engineering Department where further progress can be made.

Find a Campaign