Sentiment Polarity Classification! in the series of Aspect Based Sentiment Analysis

This is the last blog in this series of ABSA. In previous two blogs we learnt about different techniques of performing aspect extraction, then we looked at, what are categories and why are they needed, category formation and detection. Now the time has come to evaluate the aspect in terms of positive or negative.

Sentiment Polarity Classification

Sentiment polarity classification, every review or aspect has customer sentiment inside it, whether the customer was enjoying the service or he wanted the organization/restaurants to improve in some criteria. Now the question is how one can detect the sentiments from the phrase given?

Types of polarity classification techniques.

Lexical Based

In NLP space, there are libraries like VaderSentimentAnalyzer, Textblob and other sentiment analyzers provided by their respective researchers/organizations.

The VaderSentimentAnalyzer works on a dictionary. It means that it has a bag of words which holds  positive, negative or neutral words in them respectively. The library applies some set of rules to calculate polarity for sentences/aspects. These models are called lexical based sentiment analyzers. These are unsupervised models. Because they work on their dictionary only. 

Vader Sentiment is the module in the NLTK library that can be used for sentiment analysis. As per it’s documentation the module can be used to find the sentiment polarity of a sentence or of individual words. The developers have mentioned that it can handle emojis, some phrases that are commonly used in the English language.

Vader has dictionaries of words in which each word is given the score between -4 to 4, where -4 is strongly negative, while 4 is strongly positive. And when an input is given to Vader it returns a dictionary with 4 keys Pos , Neg, Neu, and Compound. These are the normalized positive, negative, neutral scores allotted to the input. 

Thanks Vader for this awesome library but my question is can we update the dictionary or you have a huge corpus?

Yes, we can update the library with new words and assign a score between -4 to 4 to each word. Tada! Easy to update and easy to use.

Issues with lexical based sentiment polarizes.

Predictive model

Another way of finding the sentiment of a phrase is using the predictive model. 

The advantage of predictive models is that they learn the word ordering and semantic connections among words with respect to the ‘positive’ or ‘negative’ labels. For instance, ‘cheap quality’ and ‘cheap buy’ can be understood by the model if the appropriate target and size of datasets is provided to it.

And to achieve high accuracy at predictive models, we can use ‘Pre-trained models’ like Fine-tuning the BERT model.

BERT? What kind of sorcery is this? 

BERT (Bidirectional Encoder Representations for Transformer) is a new method of pre-training language representations. According to documentation Google has given in their Github page, it is a pre-trained model trained over a general purpose language model on a large text corpus. And can be used for NLP tasks. 

BERT is an unsupervised model means it is trained over the plain text only, which makes it an all purpose model. It can be used to solve different NLP tasks including sentence level classification, question answering. And most importantly it looks at the data from left to right and right to left manner. Like learning through question answering methods. It increases the learning perspective and accuracy of understanding the word ordering. 

Image credit Devlin et al.

So why does BERT have a name in NLP? Why so about transformers? 

Basically, transformers work on the attention layer.” Well yes! and. “Attention layer helps the model to understand the inter-connectivity or can say semantic connections between the words in the sentence.” Very well indeed Google!

The research over BERT has opened the gate for transfer learning. That is, using the knowledge extracted from the prior learning of the model and that learning could be transffered to other tasks. This is pre-training followed by fine-tuning or adaptation. Wow!

Image credit Sebastian Ruder- The state of transfer learning in NLP.

BERT is based on the masked language. It means randomly masks a few words in the sentence and tries to predict it. This is a question answering method. This method allows a model with an attention layer to predict the next or previous word. Super genius Google!

Now the question is what should ML enthusiasts do to leverage the beauty of BERT. Fine tune BERT.

 As have already been mentioned BERT fulfills the transfer learning. There are different ways to perform transfer learning. One is fine-tuning

Fine-tuning BERT:

Firstly, fine tuning is the process where the parameters of the models(pre-trained) are adjusted according to the data we are currently working on. It is different from Hyper parameter tuning which is totally different. Don’t get confused between two.  But remember in fine tuning we can tune a few hyper-parameters like Epoch cycles, batch size, learning rate and max length of input.

Leverage BERT for polarity classification tasks

Let’s start with fine tuning BERT!!!!

We have used the BertTransformers available as a python module. One can install that using pip. When BertTransformers is installed few other libraries are also installed BertTokenizer, DataLoader etc.

To fine tune the BERT. Firstly, the labelled data is tokenized using BertTokenizer. And labels are used as a target to train the model.

BertTokenizer splits the data i.e, the sentence/phrase into words. And this Tokenizer also assigns the IDs which are the integer numbers. Or you can consider it as the BertTokenizer creates the indexing for each word in the corpus. So now a review of text is replaced by a sequence of numbers which is basically the ID of each word. To maintain uniformity or have fixed length input, BertTokenizer adds padding to the series so that it’s size is equal to the Max length given as a hyper-parameters. This is called Padding Mask.

The attention layer works on these IDs to remember the connection between respective words how they are used and considers their relative positions in the sentence as well. 

BertTokenizer creates the BERT special tokens and assigns to sequence elements called token type ids.

These sequences of IDs, attention masks, token type ids are the input data to the BERT model. While target is simply the binary classification target. 0 means negative, 1 means positive.

Now we have target data, BERT specific data (sequences). Now, this data could be fed to the BERT pre-tranined model and its training and validation accuracy could be monitored at each epoch cycle. One can build the feedback loop to increase the power of fine tune BERT sentiment classification. Now you can have sentiment polarity for each aspect as well as of the reviews.

Summary

 So, in this series. We started with the simple reviews. using POS tagging and regex parsers we extracted the aspect terms.. i.e performed aspect extraction. Through aspect terms extracted, we performed analysis using word embedding to generate the categories. Categories, the bigger domain that will be helping the organizations to make a decisions regarding area of improvement. Lastly, we went through two strategies Lexical and predictive models to predict the sentiment polarity. Discussed, how to levarage transfer learning on BERT for the same purpose.