Login
Congrats in choosing to up-skill for your bright career! Please share correct details.
Home / Blog / Data Science / NLP Tool Kit
Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.
Table of Content
Vivi: Hey, Navi
Yes, Vivi :Navi
Vivi: When did your son start speaking complete sentences?
When he was about 2 years old: Navi
Vivi: Oh, Is it? My daughter started speaking full sentences when she was about 1 year old
Interesting! How did it happen, Vivi? :Navi
Vivi: Myself and my family members used to speak a lot to our daughter. So she picked up the language faster.
From the exchange above, it is clear that a newborn picks up new words, word groups, and phrases more quickly when they are used often in speech. Similar to this, when we train a natural language processing system, the system learns a language more quickly and accurately the more new words, groups of words, and sentences we teach it.
For text analysis and natural language processing, Python has a module called NTLK.
What is Natural Language Processing?
A method called natural language processing trains a computer to comprehend written or spoken language. Humans talk with one another using a common language so that they may comprehend one another's viewpoints and provide the appropriate response. In an NLP system, a machine rather than a human makes the interaction, comprehension, and response.
Click here to explore 360DigiTMG.
Click here to learn Data Science in Hyderabad
To install NLTK and use it in our Python programs, follow the below steps:
The first step in processing text using NLTK is Tokenization. Tokenizing is a process of breaking text into smaller parts i.e. paragraphs to sentences, sentences to words. There are two types of tokenizers.
>>> sampletext= “Artificial Intelligence is sometimes called Machine Intelligence. It is intelligence demonstrated by machines” >>> from nltk.tokenize import sent_tokenize >>> sent_tokenize(sampletext)
Output: [‘Artificial Intelligence is sometimes called Machine Intelligence’, ‘It is intelligence demonstrated by machines’]
>>> sampletext= “Artificial Intelligence is sometimes called Machine Intelligence” >>> from nltk.tokenize import word_tokenize >>> word_tokenize(sampletext)
Output: [‘Artificial’,’ Intelligence’,’ is’, ’sometimes’,’called’, ’Machine’,’ Intelligence’]
Click here to learn Data Science in Bangalore
Stemming is the process of bringing words into the norm. There will be one root word and several spellings of that term. The root word for play, for instance, has variants such as plays, playing, play-area, etc. We can identify the root word of any variants via stemming.
The "PorterStemmer" algorithm is part of NLTK. This method finds the root word from the collection of tokenized words.
Output: call call call call
Lemmatization is the computational process of determining a word's lemma based on its meaning. The suffix is removed from the word during the stemming process. It removes either the word's beginning or finish. The process of lemmatization is seen as intelligent since the correct form may be determined by consulting a lexicon. Lemmatization hence helps to create better machine learning features.
Click here to learn Data Analytics in Hyderabad
Output: Stemming for tries is try Stemming for cries is cry
Output: Lemma for tries is try Lemma for cries is cry
Click here to learn Artificial Intelligence in Bangalore
Watch Free Videos on Youtube
WordNet is an NLP database with a collection of synonyms, antonyms, and brief definitions.
Stop words can be removed from the text before processing it. Stop words are to be removed from text data to remove noise from the data. It is one of the pre-processing steps in text processing.
Output: ['Find', 'frequency', 'word', 'text', 'file', '!']
Stop words like ‘of’, ‘from’, and ’a’ are removed from the text data.
Click here to learn Artificial Intelligence in Hyderabad, Machine Learning in Hyderabad, Machine Learning in Bangalore
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia
Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia
+60 19-383 1378
Didn’t receive OTP? Resend
Let's Connect! Please share your details here