{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Train a model to detect opinions on different categories (ABSA) for laptop reviews\n", "\n", "In this tutorial, we use AutoNLU to detect all categories and the corresponding opinions of laptop reviews. For example, the sentence \"The battery live is bad, but they won't replace it\" contains two categories, namely, battery and support. Additionally, both categories have a negative sentiment." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext tensorboard\n", "\n", "!pip install xmltodict -q" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import autonlu\n", "from autonlu import Model\n", "from autonlu.absa import ABSAMetricCallback\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "import requests\n", "import xmltodict" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "autonlu.login()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download data and prepare a dataset\n", "We start to download the SemEval laptop dataset and convert the XML file into lists as needed by the AutoNLU engine." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def download_data():\n", " url = \"https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/master/datasets/ABSA-SemEval2015/ABSA-15_Laptops_Train_Data.xml\"\n", " response = requests.get(url)\n", " data = xmltodict.parse(response.content)\n", " return data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def parse_data(data):\n", " X, Y = [], []\n", "\n", " for review in data[\"Reviews\"][\"Review\"]:\n", " sentences = review[\"sentences\"][\"sentence\"]\n", " sentences = [sentences] if \"text\" in sentences else sentences\n", "\n", " for entry in sentences:\n", " text = entry[\"text\"]\n", " \n", " aspects = []\n", " if \"Opinions\" in entry:\n", " aspect_list = entry[\"Opinions\"][\"Opinion\"]\n", " aspect_list = [aspect_list] if \"@category\" in aspect_list else aspect_list\n", " for aspect in aspect_list:\n", " category = aspect[\"@category\"]\n", " sentiment = aspect[\"@polarity\"]\n", " aspects.append([category, sentiment])\n", " \n", " X.append(text)\n", " Y.append(aspects)\n", " return X, Y" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This computer is really fast and I'm shocked as to how easy it is to get used to...\n", "[['LAPTOP#OPERATION_PERFORMANCE', 'positive'], ['LAPTOP#USABILITY', 'positive']]\n" ] } ], "source": [ "data = download_data()\n", "X, Y = parse_data(data)\n", "\n", "print(X[5])\n", "print(Y[5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training of the model\n", "\n", "Now we continue to train the model. At first, we load the DeepOpinion consumer-electronics model, so we already work with a model that has some basic understanding of consumer electronics. We then train the model on our new laptop dataset. The AutoNLU engine will automatically detect that we have a class/labeling task and will adjust all hyperparameters accordingly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = Model(\"DeepOpinion/consumer-electronics_base_en\", standard_label=\"neutral\")\n", "model.train(X, Y)\n", "model.save(\"absa_laptop\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we trained the model lets test some sentences and evaluate the result:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "I love this device!\n", " [['LAPTOP#GENERAL', 'positive']]\n", "\n", "\n", "The battery live is bad, but they wont replace it!\n", " [['BATTERY#OPERATION_PERFORMANCE', 'negative'], ['SUPPORT#QUALITY', 'negative']]\n", "\n", "\n", "Wow, this laptop has an incredible long battery life and blazing speed.\n", " [['BATTERY#OPERATION_PERFORMANCE', 'positive'], ['LAPTOP#OPERATION_PERFORMANCE', 'positive']]\n" ] } ], "source": [ "prediction_model = Model(\"absa_laptop\")\n", "questions = [\n", " \"I love this device!\",\n", " \"The battery live is bad, but they wont replace it!\",\n", " \"Wow, this laptop has an incredible long battery life and blazing speed.\"\n", "]\n", "\n", "tags = prediction_model.predict(questions)\n", "\n", "for i, tag in enumerate(tags):\n", " print(f\"\\n\\n{questions[i]}\\n {str(tag)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This looks really good - it correctly detects e.g. that \"they won't replace it\" is correlated with the support and that it's a negative statement too." ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 2 }