Predicting mental health issues from
naturally-occurring social and health-tracking data

Download the data extractor!
Report an issue

What if we could predict mood and improve mental health
without disrupting a user's life?

The Sochiatrist Social Data Extractor consists of a series of programs that allow you to get
an individual's private and public social media data, and store it in an easy to analyze CSV format.

Using the data extraction tool

Note: This software only works on Macs and macOS based operating systems.

Step 1:

Make sure that you've downloaded the linked zip file above, and that you've extracted it to a folder entitled Sochiatrist on your Desktop. Inside this folder, there should be a folder entitled sochiatrist-social-data-extractor.

Step 2:


python ~/Desktop/Sochiatrist/sochiatrist-social-data-extractor/ [participant ID]

Make sure to replace [participant ID] with the name or ID of the participant you are extracting data from. The program will automatically install the requirements that are needed. Make sure to enter your computer password when the program prompts you for it. It will then ask you a series of questions about what kind of data you would like to extract from the participant ( , what kind of phone, what kinds of social media data), and will generate corresponding individual CSV files for each social media type.

Step 3:


python ~/Desktop/Sochiatrist/sochiatrist-social-data-extractor/ [name of output file]

Make sure to replace [name of output file] with the name that you'd like for the output CSV file (for example, July25th-August10th-Participant20). This program will make a file with all of the messages from your selected date range, in order based on when the messages were sent. If you have multiple date ranges you need data from, just repeat this step. The program will also run the anonymizer script if you want to anonymize the messages.

If you have extracted Facebook data, the anonymization script will use the Facebook friends list to anonymize names within each message (e.g., "Jeff and I went to the mall" would become "038d017c and I went to the mall". If you have not extracted Facebook data, the sender and the reciever of the message will be anonymized, but not names within messages.

Step 4:

Delete all participant data. This might include the iPhone backup, the zip file and folder of Facebook data (which will be in your Downloads folder), and the unanonymized CSVs from each individual social media type in your CSVs folder. After you've cleaned up all unneeded participant data, you're done!

If you have any issues, please click here, and fill out the Google Form. We'll get back to you ASAP!

Output Data Format

After completing the data extraction process, you can expect to find all messages in a CSV entitled anonymized_[name of participant].csv. The data will look like this.

[example picture of row of data here]

As you can see above, all numbers are replaced with a pound sign/hashtag, all names are replaced with hashed representations of the name (e.g., 7ac988b056981), and all conversation participants are also anonymized in the fourth column.

Compatible Apps

For Android and iPhone

iMessages and Text Messages

Facebook (Messages and Timeline Posts)

Twitter (Messages and Direct Messages)

Instagram (Direct Messages and Posts)

WhatsApp Messages

Kik Messages

Development Team


This research is supported by NIH grants R21 HD088739-01, R01 MH108641-01A1, R01 MH110379-01A1, and R01 MH105379-02S1. It is done in collaboration with teams at Rhode Island Hospital, led by Nicole Nugent, Megan Ranney, and Daniel Dickstein

Would you like to participate in our study?

Sign me up!

Our next Sochiatrist study will start soon!