Sent Successfully.
Home / Blog / Data Science Digital Book / Text Mining
Text Mining

Table of Content
Analyzing unstructured Text data by generating structured data in key-value pair form. Deriving insights from the extracted keywords by arranging the extracted keywords in a plain space with font sizes varying based on their frequency is called WordCloud.
Collect the text data / Extract data from sources.
Pre-Process the Data
- Typos
- Case - uppercase / lowercase / proper case
- Punctuations & special symbols (‘%’, ‘!’, ‘&’, etc.)
- Filler words, connectors, pronouns (‘all’, ‘for’, ‘of’, ‘my’, ‘to’, etc.)
- Numbers
- Extra White spaces
- Custom words
- Stemming
- Lemmatization
- Tokenization - Tokenization refers to the process of splitting a sentence into its constituent words
Click here to learn Data Science in Hyderabad
Document Term Matrix / Term Document Matrix
Documents arranged in rows and Terms arranged in columns is called as DTM and transpose of DTM is TDM.
Word Cloud
Positive Word Cloud - words present in positive dictionary.
Negative Word Cloud - words present in negative dictionary.
Bigram - two words repeated together - gives better context of the content.
Click here to learn Data Science in Bangalore
Natural Language Processing (NLP)
Text Analytics is the method of extracting meaningful insights and answering questions from text data.
Natural Language Understanding(NLU)
A process by which an inanimate object (not alive - machines, systems, robots) with computing power is able to comprehend spoken language.
Example: Humans talk to robot
Click here to learn Data Analytics in Bangalore
Natural Language Generation (NLG)
A process by which an inanimate object (not alive - machines, systems, robots) with computing power is able to manifest its thoughts in a language that humans are able to understand.
Example: Robot responds to human queries
Click here to learn Data Analytics in Hyderabad
Topic Modeling Algorithms
LSA/LSI (Latent Semantic Analysis/Latent Semantic Indexing)
Reducing dimension for classification. LSA assumes that the words will occur in similar pieces of text if they have similar meaning.
LDA (Latent Dirichlet Allocation)
A topic modelling method that generates topics based on words/expression frequency from documents.
Click here to learn Machine Learning in Hyderabad
Text Summarization:
Process of producing concise version of text by retaining all the important information.
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Navigate to Address
360DigiTMG - Data Analytics, Data Science Course Training Hyderabad
2-56/2/19, 3rd floor, Vijaya Towers, near Meridian School, Ayyappa Society Rd, Madhapur, Hyderabad, Telangana 500081
099899 94319