Aspect Extraction! In series on Aspect Based Sentiment Analysis (ABSA)

This is second blog in the series of Aspect Based Sentiment Analysis. In this blog, I will give a very brief intro on ABSA, an idea about Aspect Extraction techniques that can be followed. And the next blog will be in continuation of this, where I will be discussing aspect-category formation and detection; and the last blog will be of the sentiment polarity prediction. 

Aspect Based Sentiment Analysis:

Aspect based sentiment analysis (ABSA) is a problem statement in the field of Natural Language Processing (NLP) that has attracted and is attracting the service based and product based companies to know their overall performance in the market. 

A series of analysis performed on the feedback given by the customers in the form of reviews. The organizations from these reviews set try to perform Aspect Extraction, AspectCategory Formation-Detection and Sentiment Polarity classification. Image below is a concise flow of ABSA process.

Note: There is no hard and fast rule in the position of categorize aspect and sentiment polarity classification.

Question is Why is ABSA important? Basically the aspects (derived from the reviews) gives the insights regarding the properties of the product of which the customers have concerns. These could be positive or negative terms. These insights help the organization to know areas where they need improvement.

Flow of ABSA

Aspect Extraction:

The process of finding the phrases that the customer is raising his issues or giving information regarding his experience the properties of product/service. These phrases are called Aspect Terms. For instance, if the review is ‘while wearing the diaper got torn. It says medium but size is small.’ Here the aspects are {‘wearing the diaper got torn’ and ‘says medium but size is small’}. 

Here another question is raised, how to find such aspect terms? There are different techniques by which you can achieve this:

  • Regex POS tags pattern match
  • Spacy Dependency parser
  • NLTK (Genie! in NLP world) parsers

Extraction using Regex patterns

Question how regex is going to work over the reviews? What about the generalization power? Yes this is true, regex works on the sub-string. And for strings like ‘While my baby was playing diaper starts leaking’, no one wants to make separate rules for ‘leak’, ‘leaks’ etc. Regex would not be built on the words it will be developed on the other information the word has!

The text has other information (within it). Part of Speech of each word. POS Tags. Tada! Whether the word used is NOUN, ADJECTIVE or ADVERB, a set of rules is generated over this information. As they are POS the rules are generalized enough to work on any set of reviews (with few tweaks off course). An image below is the example of POS tags of the words in sentence ‘This is sentiment analysis’

POS Tags of the words in the sentence “This is sentiment analysis”

To get the POS tags, either Spacy or NLTK library can be used. Spacy is preferred in the corporate world because of its flexibility to use. As it is a deep neural network based predictive model, which comes with the text normalization, POS tags, NER at a single place. No need to decide what type of Lemmatizer should be used.

Building first regex POS tag rule:

Rules are basically the patterns of these POS tags. Build in such a way that these rules could handle the order of the tags in the sentence, and provide generalization. For this, analysis over the data is required. Consider, the requirement is to get phrases where the Adjective is enhancing the property of Noun. Like, ‘bad product’, ‘good quality’, ‘not good <brand name> product’. The developer can analyze the POS tag patterns and can come up with “((<DET>)?)((<PART>)?)(<ADJ>)((<PROPN>)*)((<NOUN>))*” 

The results of this pattern over a few sentences are shown in the below image. (Remember by this rule, phrases with adjectives followed by nouns are targeted. One can build another rule where nouns can come before adjectives and so on.)

Results of regex pattern mentioned above on some sentences.

Other ways of Aspect Extraction

As already mentioned, one way is using regex patterns of POS tags the other way is using Spacy dependency parsers. An example/preview of spacy dependency parser is given in image below:

Spacy dependency parser example
Spacy dep API credits: https://explosion.ai/demos/displacy

Spacy gives connection between words. Cool!!! isn’t it? But it has some issues with it. In Spacy dependency parser, the word connection is provided how they are connected. For instance, in image, ‘this’ is connected to ‘is’ as an ‘nsubj’ which is the subject. So, in sentence ‘this’ is ‘subject‘ and ‘sentence analysis’ is ‘attribute‘ for that subject. Hence rules can be formed on these dependencies.

Working on these dependencies is quite difficult to work on. For example, in this image. To make a string ‘this is sentiment analysis’, there would be a loop to the left of ‘is’ and right to it and join these words. This process is long and somewhat difficult to manage. (Rules would be long and complicated). To save time and complication of loops, regex parsers are preferred. And they allow flexibility to play with order of POS tags as well.

NLTK parsers

NLTK is a huge library with all NLP related functions. All you need to think about the function related to NLP and it already has it. It is a genie in the NLP world!!!! NLTk provides different parsers like Recursive Descent Parsing, Shift-Reduce Parsing, The Left-Corner Parser and dependency parsers. All of them have one issue defining ‘Context free grammar’ which is complex to build and difficult to modify. And with the condition that rules of CFG should keep check on recursion. This is difficulty to power to infinity!!!

Summary

Aspect sentiment analysis could be the key for the organizations to improve their services, get more engaged with the customers. There are multiple techniques that could be used for aspect extraction. Regex POS tag pattern, Spacy dependency parsers, NLTK parsers. From all these methods, Regex is preferred for its simplicity, maintainability and generalization