Whatsapp Chat Analysis

Rajesh R.
5 min readJan 3, 2021

Analyze your Whatsapp group chat using Python.

By Rajesh R

Photo by Lukas Blazek on Unsplash

I have been intrigued by the phenomenal adoption of the Whatsapp mobile application globally. Whatsapp has also gradually become super popular, especially in India and among the Indian diaspora spread worldwide. Here it has become the most preferred alternative to SMS. A widespread affair is creating group chats within Families, colleagues, friends, and even people with common topic interests. I have witnessed lots of discussions in some group chats. As a Data Science enthusiast, it is natural for me to study the chats’ sensibilities objectively. The article will explain the building and working of a simple Python-based app to explore and analyze the group chat in Whatsapp.

A pre-requisite to understand this program would be developer knowledge of Python, visualization, and analysis basics.

Preparing Data:

One will need group chat data to begin the analysis. Export the chat archive using the export option in the group info. Select the export using the ‘Without Media’ option. The exported chat will be in a .zip file that would contain the complete conversation. You can airdrop or Bluetooth the file to your development machine. Then unzip the file to get the chat data file.

Python Code

We will discuss the Python code for analysis. In the code snippet below, we import all the libraries needed.

Import Libraries

Apart from the pandas, re, and seaborn libraries, we will also need the NLTK library for sentimental analysis.

Read Chat File

I have defined below a simple function to read all the data from the chat file.

Clean Chat File Data

In the utility function defined below, we can clean the input file data using regular expressions. My goal was to clean the noise messages that are of less value to performing analysis. Typical things to remove would be someone adding another user to the chat or someone leaving the conversation.

At the end of the function, the cleaned value returned is a list of sentences to be analyzed.

Determine Top Posters To The Group

It will be useful to get a view of top active users. We can measure this by the number of posts made to the chat group.

Read Chat File And Clean Data

In the code snippet below, the input chat file is read and cleaned using the optimized using sentence optimizer function defined earlier.

Read Chat File And Clean Data

The code segment below retrieves the active users and plots a histogram of the user messages. As one might note from the histogram, the group is driven predominantly by a few users.

Get Count Of Users Who Post Mostly

We can view the top message posters. The ‘frame’ is the panda object is the container that is holding the messages.

Phone numbers masked above in the picture for privacy.

Utility Functions To Perform Hourly Analysis

The functions below are defined to perform hourly analysis to determine when most messages get posted and get an hourly trend view.

The graph plot above captures the message trends in a 24-hour distribution plot. We can easily see that there are substantially more postings at 9 am and 11 am. There is a pause after which the messages peak in after 9 pm.

In the plot below, we can see that the message trends observed in the past two months.

Sentiment Analysis

The exciting part comes where we can use the built-in NLTK library to analyze Chats’ sentiments. Sentiment Analysis, a natural language processing technique, is used to determine whether data is positive, negative, or neutral. The code below demonstrates how to obtain the Compound score for all the chats in the group.

The Compound score is a metric that calculates the normalized ratings between -1(most extreme negative) and +1 (most extreme positive) for the chats.

The distribution plot depicts that most messages are in a positive sentiment area.

Conclusion

It was enjoyable working on this project and the chat analyzer. We can easily extend this project to analyze more than needed. Also, though explored minimally in this article, sentiment analysis is an extensive area, of which I just scratched the surface. Sentiment analysis is often performed on textual data to monitor sentiments expressed on a topic, brand, or product. The potential application is unlimited to the creative mind.

--

--

Rajesh R.

Engineer, Ph.D. Scholar, and writer. I blend technical expertise and storytelling to explore science and creativity. Happy reading!