Train a model to detect opinions on different categories (ABSA) for laptop reviews

In this tutorial, we use AutoNLU to detect all categories and the corresponding opinions of laptop reviews. For example, the sentence “The battery live is bad, but they won’t replace it” contains two categories, namely, battery and support. Additionally, both categories have a negative sentiment.

[1]:
%load_ext tensorboard

!pip install xmltodict -q
[2]:
import autonlu
from autonlu import Model
from autonlu.absa import ABSAMetricCallback

import pandas as pd
import numpy as np

import requests
import xmltodict
[ ]:
autonlu.login()

Download data and prepare a dataset

We start to download the SemEval laptop dataset and convert the XML file into lists as needed by the AutoNLU engine.

[3]:
def download_data():
    url = "https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/master/datasets/ABSA-SemEval2015/ABSA-15_Laptops_Train_Data.xml"
    response = requests.get(url)
    data = xmltodict.parse(response.content)
    return data
[4]:
def parse_data(data):
    X, Y = [], []

    for review in data["Reviews"]["Review"]:
        sentences = review["sentences"]["sentence"]
        sentences = [sentences] if "text" in sentences else sentences

        for entry in sentences:
            text = entry["text"]

            aspects = []
            if "Opinions" in entry:
                aspect_list = entry["Opinions"]["Opinion"]
                aspect_list = [aspect_list] if "@category" in aspect_list else aspect_list
                for aspect in aspect_list:
                    category = aspect["@category"]
                    sentiment = aspect["@polarity"]
                    aspects.append([category, sentiment])

            X.append(text)
            Y.append(aspects)
    return X, Y
[5]:
data = download_data()
X, Y = parse_data(data)

print(X[5])
print(Y[5])
This computer is really fast and I'm shocked as to how easy it is to get used to...
[['LAPTOP#OPERATION_PERFORMANCE', 'positive'], ['LAPTOP#USABILITY', 'positive']]

Training of the model

Now we continue to train the model. At first, we load the DeepOpinion consumer-electronics model so we already work with a model that has some basic understanding of consumer electronics. We then train the model on our new laptop dataset. The AutoNLU engine will automatically detect that we have a class/labeling task and will adjust all hyperparameters accordingly.

[ ]:
model = Model("DeepOpinion/consumer-electronics_base_en", standard_label="neutral")
model.train(X, Y)
model.save("absa_laptop")

Now that we trained the model lets test some sentences and evaluate the result:

[7]:
prediction_model = Model("absa_laptop")
questions = [
    "I love this device!",
    "The battery live is bad, but they wont replace it!",
    "Wow, this laptop has an incredible long battery life and blazing speed."
]

tags = prediction_model.predict(questions)

for i, tag in enumerate(tags):
    print(f"\n\n{questions[i]}\n    {str(tag)}")


I love this device!
    [['LAPTOP#GENERAL', 'positive']]


The battery live is bad, but they wont replace it!
    [['BATTERY#OPERATION_PERFORMANCE', 'negative'], ['SUPPORT#QUALITY', 'negative']]


Wow, this laptop has an incredible long battery life and blazing speed.
    [['BATTERY#OPERATION_PERFORMANCE', 'positive'], ['LAPTOP#OPERATION_PERFORMANCE', 'positive']]

This looks really good - it correctly detects e.g. that “they won’t replace it” is correlated with the support and that it’s a negative statement too.