Vader Vs BERT Sentiment Polarizer

Polarizer is a module that will predict the sentiment (Positive, Negative, Neutral) of a given text.

Vader is a lexicon based sentiment analysis module. While BERT is a neural network that can be trained on the data for sentiment classification. This post analyzes key differences between the two.

Lexicon/Rule basedBased on Neural Network
Doesn’t generalize well. Is based on the word polarity given in its reference dictionary file.Being a neural network, it predicts the sentiment based on the semantic usage of the word.
The polarity of similar statement may differ with the size of the statementStatement length plays a vital role in generalization power of the model. The polarity score is calculated on the whole statement.
Its generalization is based on, if the word is present in its corpus and on the polarity score allottedIts generalization is based on the computations performed in the attention layer and also on the training data, on which BERT is trained. It also considers the neighboring words while performing the classification i.e context matters.
The polarity score that is allotted to a word should be between -4 to 4. And there is no concrete method to derive the best score. And this may lead to unexpected results.No word score is allotted. The Neural network learns from the labeled training dataset and tries to generalize the computations based on the occurrence of word wrt other words in the data point. And attention layer is responsible for the word interactions which derives the model to get semantic of the statement provided to it.
Vader doesn’t require training, it is rule based module of python for sentiment analysisBERT requires training over data to get complete hold of the context.
Comparative Description for Vader and BERT

During the analysis on Vader, it was found that it doesn’t consider the neighboring word for computing the polarity scores. It just sums up the word scores mentioned in its dictionary file, with some predefined rules of negation it computes the results. Hence, ‘Cheap product’ and ‘Cheap price’ would have the similar negation polarity. This is because Vader doesn’t check that ‘cheap’ word with word ‘price’ means it is regarding the cost and should mark it as positive. It will just sum the score of cheap with product/price and give the result based on that.

While BERT can distinguish between the two based on the training set. If the training datasets contains the statement where ‘cheap price’ is marked as positive and ‘cheap product’ as negative then BERT learns that cheap with other words may have different context meaning and may derive the score for new statements based on this learning.

Another observation, having more neutral words in the statement Vader normalization technique may lead positive/negative statements to neutral sentiment. While BERT computes the result on the word interactions and marking of them in the training datasets.

BERT can be modified with more training datasets, while in order to use Vader one must have good domain knowledge and grasp over the language to allot score to the word.