This blog is in the continuation of “Aspect Extraction! in the series of aspect based sentiment analysis”. In the last post, we learnt about Aspect terms and their extraction techniques. This post will be regarding the categories, what are they, why they are needed and lastly how one can generate categories from the data provided. After this blog, there will another post regarding the Sentiment Polarity Classification that will be end of this series.
Category and why are they needed?
Few months back, I went to a restaurant and ordered my favorite dishes ‘Dal makhni’ and ‘naan.‘ Naan was soft and tender. Muah! But dal makhni had a foul smell. I had a quarrel with the attendant regarding the servings. He said,”sir, there is no smell.” Hence, I wrote a feedback to the restaurant. “I ordered the Dal makhni and naan. Naan was tender and soft. But dal had a foul smell. I raised it, but the attendant Mr Harry Potter, misbehaved with me.” How dare you Mr. H. Potter!!!!
I am discussing this because it is somewhat related to the ABSA. In my case the manager scolded the attendant on seeing the review. But what would if the review was given online with thousands of more reviews. How would restaurant managers or any organization would know where to improve? What functional block of the restaurant is performing well and where it need more focus. Does he need to go through all reviews? This may get exhaustive work for anyone. But we are ML enthusiast. So how to automate this. We can find the aspect terms from the review. But what to do with them they are just phrases???
This is where the aspect category comes into the picture. Aspect Category is the label given to the set of related aspect terms. For instance, ‘dal makhni’ and ‘naan’ are related to ‘food’ category. While ‘attendant’ is related to ‘service.’ Categories increase the diameter of analysis over the data and helps in decision making. From a broader perspective to minute details where organization/restaurant can improve.
Considering situation of above mentioned restaurant, where it need to make a decision. Should they change chef or recipe? So, if restaurant get several (rather single) reviews against many ‘food’ items, then they might consider changing the chef. But what if negative sentiments are against ‘dal makhni’ then changing the recipe is more economical option than hiring a new ‘dal makhni’ chef. Now, they have proper data justification for it. If there are more negative reviews in the food category then the restaurant can work on chefs. If services have good ratings then they can appreciate them. Decision taking becomes easy and efficient.
But how would ML model will know? This could be supervised problem problem, where the category like ‘food’ and ‘service’ could be given as target. And the data could be converted to structured data. Another approach is discussed, is when you are not provided with labeled data, Unsupervised problem. How exciting!!!!
From the last blog, we know how to extract the aspect terms. But how to group them? The aspect terms are textual data which is unstructured and ML doesn’t work directly on unstructured data.
We can transform the unstructured data into structured by generating/using word embedding.
Count vectors, TF-IDF are syntax limited, built over word frequency, don’t work on the word semantics. Word2Vec and Glove could be used to target semantic meaning of words. Glove is suggested because it trained on huge corpus size. After that we can perform clustering to group similar aspect terms. Simple!!!
Now the issue is, aspects are phrases and we have embedding for each word. Now what to do? So many problems? No need to worry!, there could be different ways to overcome this. Two possible ways are given in below image.
Aspect terms clustering
ML provides two types of clustering portions, one is ‘Euclidean distance’ based clustering which we normally perform on the data. Another is ‘Cosine Similarity’ based clustering.”
In NLP, related problems, it is found that semantic closest vectors can be found by using cosine similarity. Because it doesn’t depend on the magnitude of vectors but on angular distance i.e Cosine Distance between them.
The cosine similarity works on the theta. While the euclidean distance works on D. It is practically found that cosine similarity clustering works better when there are outliers. Important thing about cosine similarity is that it normalizes the vector distance while calculating the similarity. Hence the magnitude of v1 and v2 doesn’t make any difference. Isn’t it cool that outliers can be handled by Cosine Similarity! Hats off!
Clustering over the aspect terms embedding, results in the group of similar terms and now they can be given any name. This name is the category name for that group. Tada! You have created categories from the data.
Note: This is the category formation steps over the unlabeled data. Sometimes it’s difficult to define the categories before analyzing the data. There can be a case where you may not have the data for a category at all. So, we prefer category formation-detection. But with problem statements it may get changed.
From the reviews you have performed the aspect extraction. And from aspect terms performed category formation-detection.
From this result we can say that cluster 0 is about the quality, cluster 1 is services, cluster 2 is value money. With this we can say we now know how to form the categories.
In this we learnt, what are categories, why are they needed. How to convert the unstructured data into structured data using word embedding. Then clustering could be formed by using cosine similarity or euclidean distance, to form the categories from the aspect terms. Next next blog, we will discuss the polarity classification model.