How to detect Suicidal behaviour from your tweets!

How Machine Learning can detect Suicidal behaviour

  • Share:


Depression is a silent and invisible enemy that seems to have been ignored so long that it’s time that we stand together and take notice. With competitive careers, stressful jobs we often ignore the need to balance our lives and maintain the harmony of the mind. According to WHO data, depression is a common illness in the world that affects more than 264 million people. And at its worst, it leads to suicides.

We all are aware that medical science now has preventive methods for depression but only when we seek help or have the necessary facilities to do so. This is a disease that, like all viruses, affects us all irrespective of our class, colour, gender, and caste. And with the pandemic hitting us right in the middle of our gut, we are seeing a steep rise in the cases of anxiety and depression. In COVID-19 monitoring mental health is of utmost importance due to the increasing no. of job losses, salary cut, personal and financial losses. Below we will talk about the steps of how we can track your mental health through social media platforms.

Problem Statement

How can we help to detect depression/suicidal tendencies among people by their post in Social Media (Twitter, Instagram)?


The goal of this project is in 2 folds:

  1. To show how depression leads to suicidal tendencies
  2. How to predict suicidal tendencies among people


Step 1 – Data Collection


The first step of this process involves collecting data from various Suicidal/ Depression forums. For our project, we used the data from This website is mainly a forum that consists of separate depression and suicidal threads; it also has advice and instruction. This forum acted as a fair source of

  • Depression Post
  • Suicidal Post
  • Advice/ Normal post

It was easy for us to label the posts manually since the website already has different types of thread. We scraped the website to collect the data. Various scraping tools are available, for the study we used a Chrome plugin called Web scraper

Step 2 – Data Cleaning & Preparation


Once we have the raw data with us, it’s essential to clean the data by removing all the duplicate sentences, any URLs, whitespaces, user names, and stopwords (basically noise in the data) that are not relevant for our study. We also removed brackets, dashes, colon, or any other symbols that were present in the data. Finally, the dataset has 600 posts which include (190 depression post, 180 suicidal posts and rest non-suicidal/depression post)

Apart from the data cleaning, we have to label the post across the 3 different categories (suicidal, depression & normal) which was done during this stage.

Step 3 – Data Exploration & Discovery


Before training the data, we thought of analyzing depression & suicidal posts by looking into words & topics.


Step 3.1 – WordCloud


WordCloud is used to provide a visual understanding of the data. The word cloud shown here is specific for “Depression post”. It can be clearly seen from the word cloud that most of the depression posts speak about depression and they seek help and need advice.

Step 3.2 – Topic Modelling


We performed a Machine learning analysis(called Topic Modelling) to analyze text data to find word groups and similar expressions. This is actually a Text analysis that helps us in gaining insights by classifying data by various topics. We extracted 3 topics from posts (depression and suicidal).

Topic Modelling gave us a list of words and a score indicating how many times each word was discussed in each topic. We identified the top 6 words from each topic and used the same to label each topic accordingly.

(Kindly note the labelling of the topics Care, Thought and Reach is completely subjective. The number of topics to be identified can also be varied)

It was interesting to find ‘work’ under two topics – Care and Thoughts. Does that mean work is one of the major reasons discussed under depression?

It is relevant from the graph that Reach was the most common topic being discussed across both Suicidal and Depression post. Thus, it is all the more important to listen and assist the former before they crossover to the latter. These were interesting observations which need more relevant data and require a separate study.

Step 3.3 – Does Depression Leads to Suicide


This is one of the goals of this study, to check whether depression really leads to suicidal tendencies. Topic Modelling also gave us an output indicating to what extent each topic was discussed across all the posts in threads. When we compared the results of Depression post and Suicidal post, we found that posts in both were quite similar in terms of the topics discussed.
Using the top words from each topic identified in the previous step as attributes, we ran Similarity Analysis across all posts. This helped us in identifying how posts from depression and suicidal are very alike.

Thus, we can conclude that Depression leads to Suicidal Tendency

Step 3.4 – Sentiment Analysis


We all know the word suicidal and depression have negative emotions. Just to check what machine thinks about the suicidal post, we thought of doing sentiment analysis. Sentiment analysis is an interpretation of emotions (positive, negative, and neutral). It can be done with the help of scripts or tools. In our case, we used a tool called MonkeyLearn

You can clearly see that suicidal post has a Negative sentiment with 98.4% confidence.

Step 4 – Machine Learning Algorithm


Machine Learning is the concept of programming in the computer in such a way that it studies the available data, and develops a relationship which can be later used to perform predictions. This is the premise of Artificial Intelligence.

Taking the current study as an example, the model will study existing posts and build a relationship, ie, which kind of posts in social media can suggest that the person is going through depression or that he/she might be suicidal. This is calling model training.

For this purpose, we labeled the posts which we had collected into their respective threads, ie, depression, suicidal or normal, and then randomly divided them into a training set and a test set. The training was used to train a machine learning or ML model. We started with Logistic Regression for simplification and easy interpretation for this study. We also tried other models like SVM (support vector machine) which resulted in better accuracy. 

AUROC (Area Under ROC curve) is a popular metric used to measure such models and a score above 0.5 can indicate that the model is already running better than a random prediction, ie, without any machine learning in place. 


Now, we have developed a model that can help in predicting suicidal tendencies. How can we make it useful and save the lives of millions of people?

One of the ways to find people is through User Generated Content. Millions of people are online today with the help of social media platforms like Twitter, Facebook, Instagram, Reddit, etc. It’s a platform for them to not only interact with others but also convey their feelings, thoughts, and emotions. If our model can detect those depressing tweets, maybe we can save a few thousands of lives.

We thought of choosing Twitter as a platform since people convey their feelings and are straightforward on Twitter. We thought of picking up 2 significant profiles:

Chester Bennington – American Singer-Songwriter who committed suicide on  2017

Deepika Padukone – Indian Actress – who was going through depression


Scraping Twitter Data & Predicting


After the selection of profiles for prediction, we scraped the Tweets from both the profiles and separated them according to the year. We tested all the texts in the order of the year it was posted and what we noticed was something insightful. The suicidal tendency of Chester Bennington increased over the year and he finally committed suicide in 2017 whereas the depression level of Deepika Padukone has decreased and it seems she is leading a happy married life.

Can these tweets be a signal?


There are a couple of limitations to this study.


The study was with a basic Machine Learning algorithm i.e. Logistic Regression & SVM where accuracy is not higher. The accuracy of this model can be further improved by Deep Learning algorithms like the Long Short Term Memory network (LSTM) model, which can be further combined with CNN Model.

The data set was very limited and consist of only 600 posts. We can collect data from various other forums and platforms. This will help us to further increase the accuracy of the model.

Conclusion & Further Improvements

Our aim is merely to create a prototype to support the premise that newer technologies like machine learning and deep learning can be leveraged to solve social issues; depression and suicides in this case. People with depression in most countries do not get the right attention or avoid reaching out because of certain social stigma attached to it. Though it is impossible for us to cover every nook and corner, the intention here was to create a social model based on user data available.


What are we trying to achieve? We merely wanted to give out the idea that these types of applications can be used to predict user behavior based on their social media usage, not only the posts they create but also posts they consume. AI, ML models can not only be used for multinational organizations to grow their business but also can be used for a  social cause. The findings can be used to identify and target the potential cases to specific motivational and inspiring groups/pages and self-help forums. Therapists and wellness organizations can leverage this opportunity to offer aid through social media applications by curating the target audience that needs help. 

This study was done as a part of an academic project at SP Jain Institute of Management & Research by Antara Datta, Suraj Chakraborty & Sugat Nayak, under the guidance of our professor Dr. Anitesh Barua.