Sent Successfully.
Home / Blog / Data Science / Case Study: Telecom Customer Complaints Analysis
Case Study: Telecom Customer Complaints Analysis
Table of Content
A comprehensive investigation of complaints from clients in the telecom industry that examines how to have very satisfied customers.
Problem Introduction
Business problem:
- Multiple client complaints may be made to a telecom firm. While filing complaints is nothing, it is one method we can give our customers better service.
- In order to achieve customer pleasure, it is crucial to prioritise client convenience. This can only be done if we speak to these concerns honestly and effectively. Customers will be dissatisfied with our services, lose trust in us, and finally become disconnected from our network if we are unable to resolve their tickets. Therefore, responding to concerns is essential.
- This use case examines a small dataset where customer complaints and other factors are carefully examined and machine learning modelling is carried out.
- To determine which important issues are being presented while customers are withdrawing their concerns. The challenge might begin as indicated below.
- To make a trend chart showing the number of complaints received on a daily and monthly basis.
- To provide a frequency chart for different sorts of complaints.
- Which state has the maximum number of complaints?
- Building a unigram and bigram word cloud.
Learn the core concepts of Data Science Course video on Youtube:
Exploratory Data Analysis
Let's take tour of dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv("SP_telecom_complaints_data.csv")
- Ticket #: A unique ID generated for each complaint
- Customer Complaint: Written complaint of the customer
- Date: date of complaint filing
- Date_month_year: Date of the complaint filing in different format
- Time: Time of complaint filing
- Received Via: Medium of filing complaint which is either via Internet or Customer care
- City: City where the customer has faced the issue
- State: State where the customer has faced the issue
- Zip Code: Zip code of area where the customer has faced the issue
- Status: Status of the complaint from the side of firm
- Filing on Behalf of Someone: Whether the customer filed the complaint for himself/herself or on behalf of someone else
Let us first see the ‘Percent wise distribution of complaint status’:
- Closed: 33.17%
- Open: 16.27%
- Solved: 43.52%
- Pending: 7.17%
Majority of the complaints (43.52% of total complaints) are solved whereas 33.17% are closed.
Now that we have a fair amount of idea about the dataset,
We have to perform some feature engineering on time and Date_month_year. Here we are adding two columns and saving them in a new column as an index. Convert it to DateTime data type along with that we are resetting the index.
df[ "date_index"] = df[ "Date_month_year"] + " " + df[ "Time"]
df["date_index"] = pd.to_datetime(df["date_index"])
df["Date_month_year"] = pd.to_datetime(df["Date_month_year"])
df = df.set_index(df["date_index"])
df.head(3)
To know how many complaints are there in a month we will find :
f = df.groupby(pd.Grouper(freq="d")).size()
df.groupby(pd.Grouper(freq= "D")).size().plot()
df.groupby(pd.Grouper(freq= "M")).size().plot()
We can infer that June has more complaints. It seems like the busiest month is June (6) as majority of the complaint filing is being done in this month.
df[ 'Day'] = pd.to_datetime(df[ 'Date']).dt.day
df[ 'Month'] = pd.to_datetime(df[ 'Date']).dt.month
df[ 'Year'] = pd.to_datetime(df[ 'Date']).dt.year
df.head()
Now which day has more complaints can also be analysed .
fig, ax = plt.subplots(figsize = (15,10))
df.groupby(['Day','Status']).count()['Ticket #'].unstack().plot.bar(ax=ax)
The number of closed complaints is larger in the first few days (1, 2, 3, etc.) of the month, but on the 23rd, 24th, and 25th, there is an abrupt and unanticipated rise in the number of closed complaints. The likelihood that a complaint will be resolved is therefore significantly increased if it is filed on the 23rd, 24th, and 25th of each month.
fig, ax = plt.subplots(figsize = (15,10)) df.groupby(['Month','Status']).count()['Ticket #'].unstack().plot.bar(ax=ax)
In the 6th month we have more complaints solved .
To analyse which state has high resolved complaints :
df["newStatus"] = ["Open" if Status=="Open" or Status=="Pending" else "Closed" for Status in df["Status"]]
df.groupby(["State"]).size().sort_values(ascending=False).to_frame().reset_index().rename({0: "Count"}, axis=1)[:5]
Status_complaints = df.groupby(["State","newStatus"]).size().unstack().fillna(0)
Status_complaints
Status_complaints.plot(kind="bar", figsize=(30,50), stacked=True)
Which state has maximum number of complaints:
df.groupby(["State"]).size().sort_values(ascending=False).to_frame().reset_index().rename({0: "Count"}, axis=1) max()
State: West Virginia
Count: 288
dtype: object
Customer Complaint
This variable is text data and hence will start with word counts.
#https://stackoverflow.com/a/37483537/4084039
word_count = df['Customer Complaint'].str.split().apply(len).value_counts()
word_dict = dict(word_count)
word_dict = dict(sorted(word_dict.items(), key=lambda x: x[1]))
ind = np.arange(len(word_dict))
plt.figure(figsize=(10,5))
p1 = plt.bar(ind, list(word_dict.values()))
plt.ylabel('count of complaints')
plt.title('Words present in each complaint')
plt.xticks(ind, list(word_dict.keys()))
plt.show()
We can build a word cloud :
from wordcloud import WordCloud, STOPWORDS text = df['Customer Complaint'] wordcloud = WordCloud().generate(str(text)) plt.figure( figsize=(10,10) ) plt.imshow(wordcloud) plt.axis("off") plt.show()
The most common concerns with complaints are speed and Internet problems, and among all of them, we can observe that service is the best of all the word tokens. Service for this company should thus be enhanced.
Let's build a bigram word cloud
doc_complete = df[ "Customer Complaint"].tolist()
doc_clean = [clean(doc).split() for doc in doc_complete]
doc_clean
ip_rev_string = " ".join(doc_complete)
ip_rev_string
import nltk
import re
# word cloud with bigram
nltk.download('punkt')
#Punkt is designed to learn parameters (a list of abbreviations, etc.) unsupervised from a corpus similar to the target domain.
from wordcloud import WordCloud, STOPWORDS
WNL = nltk.WordNetLemmatizer()
# Lowercase and tokenize
text = ip_rev_string.lower()
# Remove single quotes early since it causes problems with the tokenizer.
#text = text.replace("'", "")
tokens = nltk.word_tokenize(text)
text1 = nltk.Text(tokens)
#help(nltk.Text)
# Remove extra chars and remove stop words.#Split the string by each occurrence of the pattern
text_content = [''.join(re.split("[ .,;:!?‘’``''@#$%^_&*()<>{}~\n\t\\\-]" , word)) for word in text1]
# Create a set of stopwords
stopwords_wc = set(STOPWORDS)
customised_words = ['price', 'great','doesn','SP','sp']
# If you want to remove any particular word form text which does not contribute much in meaning
new_stopwords = stopwords_wc.union(customised_words)
# Remove stop words
text_content = [word for word in text_content if word not in new_stopwords]
# Take only non-empty entries
text_content = [s for s in text_content if len(s) != 0]
# Best to get the lemmas of each word to reduce the number of similar words
text_content = [WNL.lemmatize(t) for t in text_content]
help(nltk.word_tokenize)
nltk_tokens = nltk.word_tokenize(text)
bigrams_list = list(nltk.bigrams(text_content))
print(bigrams_list)
dictionary2 = [' '.join(tup) for tup in bigrams_list]
print (dictionary2)
# Using count vectoriser to view the frequency of bigrams
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(ngram_range=(2, 2))
bag_of_words = vectorizer.fit_transform(dictionary2)
vectorizer.vocabulary_
sum_words = bag_of_words.sum(axis=0)##rowsum in r
words_freq = [(word, sum_words[0, columnname]) for word, columnname in vectorizer.vocabulary_.items()]
words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
print(words_freq[:100])
# Generating wordcloud
words_dict = dict(words_freq)
WC_height = 1000
WC_width = 1500
WC_max_words = 100
wordCloud = WordCloud(max_words=WC_max_words, height=WC_height, width=WC_width, stopwords=new_stopwords)
wordCloud.generate_from_frequencies(words_dict)
plt.figure(4)
plt.title('Most frequently occurring bigrams connected by same colour and font size')
plt.imshow(wordCloud, interpolation='bilinear')
plt.axis("off")
plt.show()
This helps us understand the causes of complaints better. The majority of them include speed, internet, and billing difficulties.
Conclusion
Our investigation led us to some conclusions about how to reduce customer turnover by offering better services, including internet services.
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Data Science Placement Success Story
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Visakhapatnam, Tirunelveli, Aurangabad
Data Analyst Courses in Other Locations
ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka
Navigate to Address
360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia
Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia
+60 19-383 1378