Home / Blog / Data Science / Case Study: Telecom Customer Complaints Analysis

Case Study: Telecom Customer Complaints Analysis

  • June 26, 2022
  • 3155
  • 55
Author Images

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Read More >

A comprehensive investigation of complaints from clients in the telecom industry that examines how to have very satisfied customers.

Problem Introduction

Business problem:

  • Multiple client complaints may be made to a telecom firm. While filing complaints is nothing, it is one method we can give our customers better service.
  • In order to achieve customer pleasure, it is crucial to prioritise client convenience. This can only be done if we speak to these concerns honestly and effectively. Customers will be dissatisfied with our services, lose trust in us, and finally become disconnected from our network if we are unable to resolve their tickets. Therefore, responding to concerns is essential.
  • This use case examines a small dataset where customer complaints and other factors are carefully examined and machine learning modelling is carried out.
  • To determine which important issues are being presented while customers are withdrawing their concerns. The challenge might begin as indicated below.
  • To make a trend chart showing the number of complaints received on a daily and monthly basis.
  • To provide a frequency chart for different sorts of complaints.
  • Which state has the maximum number of complaints?
  • Building a unigram and bigram word cloud.

Learn the core concepts of Data Science Course video on Youtube:

Exploratory Data Analysis

Let's take tour of dataset

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

df = pd.read_csv("SP_telecom_complaints_data.csv")

  • Ticket #: A unique ID generated for each complaint
  • Customer Complaint: Written complaint of the customer
  • Date: date of complaint filing
  • Date_month_year: Date of the complaint filing in different format
  • Time: Time of complaint filing
  • Received Via: Medium of filing complaint which is either via Internet or Customer care
  • City: City where the customer has faced the issue
  • State: State where the customer has faced the issue
  • Zip Code: Zip code of area where the customer has faced the issue
  • Status: Status of the complaint from the side of firm
  • Filing on Behalf of Someone: Whether the customer filed the complaint for himself/herself or on behalf of someone else

Let us first see the ‘Percent wise distribution of complaint status’:

  • Closed: 33.17%
  • Open: 16.27%
  • Solved: 43.52%
  • Pending: 7.17%

Majority of the complaints (43.52% of total complaints) are solved whereas 33.17% are closed.

Now that we have a fair amount of idea about the dataset,

We have to perform some feature engineering on time and Date_month_year. Here we are adding two columns and saving them in a new column as an index. Convert it to DateTime data type along with that we are resetting the index.

df[ "date_index"] = df[ "Date_month_year"] + " " + df[ "Time"]

df["date_index"] = pd.to_datetime(df["date_index"])
df["Date_month_year"] = pd.to_datetime(df["Date_month_year"])
df = df.set_index(df["date_index"])

To know how many complaints are there in a month we will find :

f = df.groupby(pd.Grouper(freq="d")).size()
df.groupby(pd.Grouper(freq= "D")).size().plot()


df.groupby(pd.Grouper(freq= "M")).size().plot()


We can infer that June has more complaints. It seems like the busiest month is June (6) as majority of the complaint filing is being done in this month.

df[ 'Day'] = pd.to_datetime(df[ 'Date']).dt.day
df[ 'Month'] = pd.to_datetime(df[ 'Date']).dt.month
df[ 'Year'] = pd.to_datetime(df[ 'Date']).dt.year
Now which day has more complaints can also be analysed .

fig, ax = plt.subplots(figsize = (15,10))
df.groupby(['Day','Status']).count()['Ticket #'].unstack().plot.bar(ax=ax)


The number of closed complaints is larger in the first few days (1, 2, 3, etc.) of the month, but on the 23rd, 24th, and 25th, there is an abrupt and unanticipated rise in the number of closed complaints. The likelihood that a complaint will be resolved is therefore significantly increased if it is filed on the 23rd, 24th, and 25th of each month.

fig, ax = plt.subplots(figsize = (15,10)) df.groupby(['Month','Status']).count()['Ticket #'].unstack().plot.bar(ax=ax)


In the 6th month we have more complaints solved .

To analyse which state has high resolved complaints :

df["newStatus"] = ["Open" if Status=="Open" or Status=="Pending" else "Closed" for Status in df["Status"]]
df.groupby(["State"]).size().sort_values(ascending=False).to_frame().reset_index().rename({0: "Count"}, axis=1)[:5]
Status_complaints = df.groupby(["State","newStatus"]).size().unstack().fillna(0)
Status_complaints.plot(kind="bar", figsize=(30,50), stacked=True)


Which state has maximum number of complaints:

df.groupby(["State"]).size().sort_values(ascending=False).to_frame().reset_index().rename({0: "Count"}, axis=1) max()

State: West Virginia
Count: 288
dtype: object

Customer Complaint

This variable is text data and hence will start with word counts.


word_count = df['Customer Complaint'].str.split().apply(len).value_counts()
word_dict = dict(word_count)
word_dict = dict(sorted(word_dict.items(), key=lambda x: x[1]))
ind = np.arange(len(word_dict))
p1 = plt.bar(ind, list(word_dict.values()))
plt.ylabel('count of complaints')
plt.title('Words present in each complaint')
plt.xticks(ind, list(word_dict.keys()))


We can build a word cloud :

from wordcloud import WordCloud, STOPWORDS text = df['Customer Complaint'] wordcloud = WordCloud().generate(str(text)) plt.figure( figsize=(10,10) ) plt.imshow(wordcloud) plt.axis("off") plt.show()


The most common concerns with complaints are speed and Internet problems, and among all of them, we can observe that service is the best of all the word tokens. Service for this company should thus be enhanced.

Let's build a bigram word cloud

doc_complete = df[ "Customer Complaint"].tolist()
doc_clean = [clean(doc).split() for doc in doc_complete]
ip_rev_string = " ".join(doc_complete)


import nltk
import re
# word cloud with bigram
#Punkt is designed to learn parameters (a list of abbreviations, etc.) unsupervised from a corpus similar to the target domain.

from wordcloud import WordCloud, STOPWORDS

WNL = nltk.WordNetLemmatizer()

# Lowercase and tokenize
text = ip_rev_string.lower()

# Remove single quotes early since it causes problems with the tokenizer.
#text = text.replace("'", "")

tokens = nltk.word_tokenize(text)
text1 = nltk.Text(tokens)
# Remove extra chars and remove stop words.#Split the string by each occurrence of the pattern
text_content = [''.join(re.split("[ .,;:!?‘’``''@#$%^_&*()<>{}~\n\t\\\-]" , word)) for word in text1]

# Create a set of stopwords
stopwords_wc = set(STOPWORDS)
customised_words = ['price', 'great','doesn','SP','sp']
# If you want to remove any particular word form text which does not contribute much in meaning

new_stopwords = stopwords_wc.union(customised_words)

# Remove stop words
text_content = [word for word in text_content if word not in new_stopwords]

# Take only non-empty entries
text_content = [s for s in text_content if len(s) != 0]

# Best to get the lemmas of each word to reduce the number of similar words
text_content = [WNL.lemmatize(t) for t in text_content]
nltk_tokens = nltk.word_tokenize(text)
bigrams_list = list(nltk.bigrams(text_content))

dictionary2 = [' '.join(tup) for tup in bigrams_list]
print (dictionary2)

# Using count vectoriser to view the frequency of bigrams
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(ngram_range=(2, 2))
bag_of_words = vectorizer.fit_transform(dictionary2)

sum_words = bag_of_words.sum(axis=0)##rowsum in r
words_freq = [(word, sum_words[0, columnname]) for word, columnname in vectorizer.vocabulary_.items()]
words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)

# Generating wordcloud
words_dict = dict(words_freq)
WC_height = 1000
WC_width = 1500
WC_max_words = 100
wordCloud = WordCloud(max_words=WC_max_words, height=WC_height, width=WC_width, stopwords=new_stopwords)
plt.title('Most frequently occurring bigrams connected by same colour and font size')
plt.imshow(wordCloud, interpolation='bilinear')


This helps us understand the causes of complaints better. The majority of them include speed, internet, and billing difficulties.


Our investigation led us to some conclusions about how to reduce customer turnover by offering better services, including internet services.

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore

Data Science Placement Success Story

Data Science Training Institutes in Other Locations

Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Visakhapatnam, Tirunelveli, Aurangabad

Data Analyst Courses in Other Locations

ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka


Navigate to Address

360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia

Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia

+60 19-383 1378

Get Direction: Data Science Course

Make an Enquiry
Call Us