Call Us

Home / Blog / Data Science Digital Book / Text Mining

Text Mining

  • January 13, 2023
  • 8561
  • 51
Author Images

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Read More >

Analyzing unstructured Text data by generating structured data in key-value pair form. Deriving insights from the extracted keywords by arranging the extracted keywords in a plain space with font sizes varying based on their frequency is called WordCloud.

Collect the text data / Extract data from sources.

Text Mining

Pre-Process the Data

  • Typos
  • Case - uppercase / lowercase / proper case
  • Punctuations & special symbols (‘%’, ‘!’, ‘&’, etc.)
  • Filler words, connectors, pronouns (‘all’, ‘for’, ‘of’, ‘my’, ‘to’, etc.)
  • Numbers
  • Extra White spaces
  • Custom words
  • Stemming
  • Lemmatization
  • Tokenization - Tokenization refers to the process of splitting a sentence into its constituent words

Click here to learn Data Science in Hyderabad


Document Term Matrix / Term Document Matrix

Documents arranged in rows and Terms arranged in columns is called as DTM and transpose of DTM is TDM.

Word Cloud

Positive Word Cloud - words present in positive dictionary.

Negative Word Cloud - words present in negative dictionary.

Bigram - two words repeated together - gives better context of the content.

Click here to learn Data Science in Bangalore


Natural Language Processing (NLP)

Text Analytics is the method of extracting meaningful insights and answering questions from text data.

Natural Language Understanding(NLU)

A process by which an inanimate object (not alive - machines, systems, robots) with computing power is able to comprehend spoken language.

Example: Humans talk to robot

Click here to learn Data Analytics in Bangalore

Natural Language Generation (NLG)

A process by which an inanimate object (not alive - machines, systems, robots) with computing power is able to manifest its thoughts in a language that humans are able to understand.

Example: Robot responds to human queries

Click here to learn Data Analytics in Hyderabad


POS Tags

Parts of Speech Tagging – Process of tagging words within sentences into their respective PoS and then labelling them.

Click here to learn Artificial Intelligence in Bangalore

Named Entity Recognition

Named entities are usually not present in the dictionaries so we need to treat them separately. People, place, organizations, quantities, percentages, etc.

Click here to learn Artificial Intelligence in Hyderabad


Topic Modeling Algorithms

LSA/LSI (Latent Semantic Analysis/Latent Semantic Indexing)

Reducing dimension for classification. LSA assumes that the words will occur in similar pieces of text if they have similar meaning.

LDA (Latent Dirichlet Allocation)

A topic modelling method that generates topics based on words/expression frequency from documents.

Click here to learn Machine Learning in Hyderabad

Text Summarization:

Process of producing concise version of text by retaining all the important information.

Text Mining

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore

Data Science Training Institutes in Other Locations

Navigate to Address

360DigiTMG - Data Analytics, Data Science Course Training Hyderabad

2-56/2/19, 3rd floor, Vijaya Towers, near Meridian School, Ayyappa Society Rd, Madhapur, Hyderabad, Telangana 500081

099899 94319

Get Direction: Data Science Course

Read
Success Stories
Make an Enquiry