Home / Blog / Data Science Digital Book / Text Mining

Text Mining

January 13, 2023
51

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Pre-Process the Data

Typos
Case - uppercase / lowercase / proper case
Punctuations & special symbols (‘%’, ‘!’, ‘&’, etc.)
Filler words, connectors, pronouns (‘all’, ‘for’, ‘of’, ‘my’, ‘to’, etc.)
Numbers
Extra White spaces
Custom words
Stemming
Lemmatization
Tokenization - Tokenization refers to the process of splitting a sentence into its constituent words

Click here to learn Data Science in Hyderabad

Document Term Matrix / Term Document Matrix

Documents arranged in rows and Terms arranged in columns is called as DTM and transpose of DTM is TDM.

Word Cloud

Positive Word Cloud - words present in positive dictionary.

Negative Word Cloud - words present in negative dictionary.

Bigram - two words repeated together - gives better context of the content.

Click here to learn Data Science in Bangalore

Natural Language Processing (NLP)

Text Analytics is the method of extracting meaningful insights and answering questions from text data.

Natural Language Understanding(NLU)

A process by which an inanimate object (not alive - machines, systems, robots) with computing power is able to comprehend spoken language.

Example: Humans talk to robot

Click here to learn Data Analytics in Bangalore

Natural Language Generation (NLG)

A process by which an inanimate object (not alive - machines, systems, robots) with computing power is able to manifest its thoughts in a language that humans are able to understand.

Example: Robot responds to human queries

Click here to learn Data Analytics in Hyderabad

POS Tags

Parts of Speech Tagging – Process of tagging words within sentences into their respective PoS and then labelling them.

Click here to learn Artificial Intelligence in Bangalore

Named Entity Recognition

Named entities are usually not present in the dictionaries so we need to treat them separately. People, place, organizations, quantities, percentages, etc.

Click here to learn Artificial Intelligence in Hyderabad