Login
Congrats in choosing to up-skill for your bright career! Please share correct details.
Home / Blog / Data Science / Case Study: Telecom Customer Complaints Analysis
Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.
Table of Content
A comprehensive investigation of complaints from clients in the telecom industry that examines how to have very satisfied customers.
Business problem:
Let's take tour of dataset
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline df = pd.read_csv("SP_telecom_complaints_data.csv")
Majority of the complaints (43.52% of total complaints) are solved whereas 33.17% are closed.
Now that we have a fair amount of idea about the dataset,
We have to perform some feature engineering on time and Date_month_year. Here we are adding two columns and saving them in a new column as an index. Convert it to DateTime data type along with that we are resetting the index.
df[ "date_index"] = df[ "Date_month_year"] + " " + df[ "Time"] df["date_index"] = pd.to_datetime(df["date_index"]) df["Date_month_year"] = pd.to_datetime(df["Date_month_year"]) df = df.set_index(df["date_index"]) df.head(3) To know how many complaints are there in a month we will find : f = df.groupby(pd.Grouper(freq="d")).size() df.groupby(pd.Grouper(freq= "D")).size().plot()
df.groupby(pd.Grouper(freq= "M")).size().plot()
We can infer that June has more complaints. It seems like the busiest month is June (6) as majority of the complaint filing is being done in this month.
df[ 'Day'] = pd.to_datetime(df[ 'Date']).dt.day df[ 'Month'] = pd.to_datetime(df[ 'Date']).dt.month df[ 'Year'] = pd.to_datetime(df[ 'Date']).dt.year df.head() Now which day has more complaints can also be analysed . fig, ax = plt.subplots(figsize = (15,10)) df.groupby(['Day','Status']).count()['Ticket #'].unstack().plot.bar(ax=ax)
The number of closed complaints is larger in the first few days (1, 2, 3, etc.) of the month, but on the 23rd, 24th, and 25th, there is an abrupt and unanticipated rise in the number of closed complaints. The likelihood that a complaint will be resolved is therefore significantly increased if it is filed on the 23rd, 24th, and 25th of each month.
fig, ax = plt.subplots(figsize = (15,10)) df.groupby(['Month','Status']).count()['Ticket #'].unstack().plot.bar(ax=ax)
In the 6th month we have more complaints solved .
To analyse which state has high resolved complaints :
df["newStatus"] = ["Open" if Status=="Open" or Status=="Pending" else "Closed" for Status in df["Status"]] df.groupby(["State"]).size().sort_values(ascending=False).to_frame().reset_index().rename({0: "Count"}, axis=1)[:5] Status_complaints = df.groupby(["State","newStatus"]).size().unstack().fillna(0) Status_complaints Status_complaints.plot(kind="bar", figsize=(30,50), stacked=True)
Which state has maximum number of complaints:
df.groupby(["State"]).size().sort_values(ascending=False).to_frame().reset_index().rename({0: "Count"}, axis=1) max()
State: West Virginia Count: 288 dtype: object
Customer Complaint
This variable is text data and hence will start with word counts.
#https://stackoverflow.com/a/37483537/4084039
word_count = df['Customer Complaint'].str.split().apply(len).value_counts() word_dict = dict(word_count) word_dict = dict(sorted(word_dict.items(), key=lambda x: x[1])) ind = np.arange(len(word_dict)) plt.figure(figsize=(10,5)) p1 = plt.bar(ind, list(word_dict.values())) plt.ylabel('count of complaints') plt.title('Words present in each complaint') plt.xticks(ind, list(word_dict.keys())) plt.show()
from wordcloud import WordCloud, STOPWORDS text = df['Customer Complaint'] wordcloud = WordCloud().generate(str(text)) plt.figure( figsize=(10,10) ) plt.imshow(wordcloud) plt.axis("off") plt.show()
The most common concerns with complaints are speed and Internet problems, and among all of them, we can observe that service is the best of all the word tokens. Service for this company should thus be enhanced.
Let's build a bigram word cloud
doc_complete = df[ "Customer Complaint"].tolist() doc_clean = [clean(doc).split() for doc in doc_complete] doc_clean ip_rev_string = " ".join(doc_complete) ip_rev_string
import nltk import re # word cloud with bigram nltk.download('punkt') #Punkt is designed to learn parameters (a list of abbreviations, etc.) unsupervised from a corpus similar to the target domain.
from wordcloud import WordCloud, STOPWORDS WNL = nltk.WordNetLemmatizer() # Lowercase and tokenize text = ip_rev_string.lower() # Remove single quotes early since it causes problems with the tokenizer. #text = text.replace("'", "") tokens = nltk.word_tokenize(text) text1 = nltk.Text(tokens) #help(nltk.Text) # Remove extra chars and remove stop words.#Split the string by each occurrence of the pattern text_content = [''.join(re.split("[ .,;:!?‘’``''@#$%^_&*()<>{}~\n\t\\\-]" , word)) for word in text1] # Create a set of stopwords stopwords_wc = set(STOPWORDS) customised_words = ['price', 'great','doesn','SP','sp'] # If you want to remove any particular word form text which does not contribute much in meaning new_stopwords = stopwords_wc.union(customised_words) # Remove stop words text_content = [word for word in text_content if word not in new_stopwords] # Take only non-empty entries text_content = [s for s in text_content if len(s) != 0] # Best to get the lemmas of each word to reduce the number of similar words text_content = [WNL.lemmatize(t) for t in text_content] help(nltk.word_tokenize) nltk_tokens = nltk.word_tokenize(text) bigrams_list = list(nltk.bigrams(text_content)) print(bigrams_list) dictionary2 = [' '.join(tup) for tup in bigrams_list] print (dictionary2) # Using count vectoriser to view the frequency of bigrams from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(ngram_range=(2, 2)) bag_of_words = vectorizer.fit_transform(dictionary2) vectorizer.vocabulary_ sum_words = bag_of_words.sum(axis=0)##rowsum in r words_freq = [(word, sum_words[0, columnname]) for word, columnname in vectorizer.vocabulary_.items()] words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True) print(words_freq[:100]) # Generating wordcloud words_dict = dict(words_freq) WC_height = 1000 WC_width = 1500 WC_max_words = 100 wordCloud = WordCloud(max_words=WC_max_words, height=WC_height, width=WC_width, stopwords=new_stopwords) wordCloud.generate_from_frequencies(words_dict) plt.figure(4) plt.title('Most frequently occurring bigrams connected by same colour and font size') plt.imshow(wordCloud, interpolation='bilinear') plt.axis("off") plt.show()
This helps us understand the causes of complaints better. The majority of them include speed, internet, and billing difficulties.
Our investigation led us to some conclusions about how to reduce customer turnover by offering better services, including internet services.
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Visakhapatnam, Tirunelveli, Aurangabad
ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka
360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia
Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia
+60 19-383 1378
Didn’t receive OTP? Resend
Let's Connect! Please share your details here