Learn & Let Learn: Chapter 2 - Introduction to Natural Language Processing

Mobile App Developer with around 8 years of experience in the tech industry.
Hey there!
NLP stands for Natural Language Processing. This is an introductory blog on NLP to aware you of the basics, how to kick start, and some future scopes. Let's start with the basics, shall we?
What is Natural Language Processing?
In easy words, it is broadly defined as the automatic manipulation of the natural language, like speech & text, by software.
According to Wikipedia: It is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.
It's been around for quite a couple of decades, growing with the rise of computers & powerful machines. It is quite a difficult job to understand what the natural language is meaning based on the context of the person who is speaking.
It is hard from the standpoint of the child, who must spend many years acquiring a language โฆ it is hard for the adult language learner, it is hard for the scientist who attempts to model the relevant phenomena, and it is hard for the engineer who attempts to build systems that deal with natural language input or output. These tasks are so hard that Turing could rightly make fluent conversation in natural language the centerpiece of his test for intelligence.
โ Page 248, Mathematical Linguistics, 2010.
Okay, let's make it simple to understand:
Suppose, you are a 10-year old and you have asked a question (let's say: Theory of Relativity)
So, the answer should be in such a way that it should make a 10-year old understand. Then and then only, it will be useful.
But the same answer will not be as informative to a 20-year old or a 30-year old or Scientist who is actually researching more about it. So, based on the subject the explanation may differ which should make much more sense & informative at the same time.
So, here comes into the picture, introduction to Machine Learning & Artificial Intelligence, where the latter is a bubble which comprises of both, Natural Language Processing & Machine Learning. All of these are used interchangeably.
Ways to process Natural Language
- Syntactic Analysis
- Semantic Analysis
Syntactic Analysis
Parsing the content (input) based on basic grammar rules, sentence formations, the way the words are organized, and how the words are relating to each other.
It includes the following sub-tasks:
- Tokenization - Breaking the content in smaller words/phrases/sentences
- Part of speech(PoS) tagging - taking into account the verb, adverb, noun, adjective to get the basic context of the content. It helps in Tokenization
- Lemmatization & Stemming - reducing the inflected words. It helps in easier analyzation
- Removal of Stop Words - removal of fillers and frequently used words (For eg: I, they, have, like, suppose, etc.)
Semantic Analysis
It focuses on understanding the meaning of the content.
It includes the following sub-tasks:
- Word Sense Disambiguation - understanding the way a particular word has been used in a given sentence or group of sentences
- Relation Extraction - tries to create a co-relation between subjects in the content (for eg: in-between places, person, organization, etc.)
Uses of NLP
- Sentiment Analysis
- Language Translation
- Text Extraction / Summarization
- Chat Bots
- Improvising Analytical Tools
- Classification
And the list goes on. Please mention in the comment section, if you find any interesting application of NLP. Love to learn and get a new perspective.
How to Kick Start?
- Programming pre-requisite: To learn the basics/syntax of Python
- Install a suitable IDE for your workspace. I recommend, PyCharm
- Let's start with the basics, using the Syntactic Analysis (as there are a lot of open-source projects + models which are good for learning).
- Let's try Tokenization: separating all the words in the given content. Logic: As the content is separated by space, so write a logic to separate it out.
- Removal of Stop words: As you have all the words separated, now delete all the stop words from the list (You can find a model in Python which contains all the stop words based on a particular language)
- Add normalization: There is a lot of open-source normalization logic available. You can use anyone. To create your own, stay tuned! (I will cover this part in detail in my upcoming blogs)
Future Scope
- Learn the basics by practicing, using open source projects already available
- Create your own logic for NLP
- Extracting meaningful insights
- Creating Chat Bots
- DIY: custom models and analyzer tools
Do share your thoughts on this topic in the comments section. Will love to know your perspective on the same ๐
Thank you for reading! If you have reached so far, please like the article. It will encourage me to write more such articles. Do share your valuable suggestions, I appreciate your honest feedback!
And with that, it's a wrap! I hope you found the article useful. I write about career, blogging, programming, and productivity. If this is something that interests you, please share the article with your friends and connections.
