Demystifying NLP and how Tweetsense uses it
N atural Language Processing, or NLP for short is really much simpler than it sounds. All it means is that a computer is processing words instead of other data. For example, if you open up an Excel sheet, you can sum up a column of numbers or multiply entries in one column by another so long as they’re all filled with numbers. But if you want to search a document for a certain word, the application searches through it using words. Natural language processing encompasses an entire field of processing English/any human readable text. This post just goes over what Tweetsense does at a high level.
Say you have the following sentence:
The quick blue democrat jumped over the rambling conservative but the democrat was beaten by Bloomberg.
How would a computer categorize this information? Well, there is no right or wrong answer (The best answer would be to use machine learning or artificial intelligence, but that’s in the next post). Let’s focus on the sentence itself. Remember in grade school when you were learning things like “noun”, “verb”, “subject”, “predicate” and “subordinate clauses”? While we humans have the luxury of not caring anymore about things like modalverbs or subjunctive cases, this is how natural language programmers make computers understand human languages. The example sentence can be analyzed by a computer like so:
By categorizing every bit of the preceding sentence, we can get a computer to rewrite the sentence as:
Bloomberg beat the quick blue democrat who jumped over the rambling conservative.
Because we were able to get the computer to understand how the sentence was structured and what is considered “proper”, we were able to automatically rewrite a sentence from its passive form (Something being defeated by Bloomberg) to its much stronger and concise active form (Bloomberg defeating something) without losing any important information!
This is just one example of the power of natural language processing and a part of what Tweetsense can do. In the next post, we’ll discuss how computers can extract long term meaning from sentences and even entire documents.