{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Train a model to label reviews from the GooglePlay store\n", "\n", "In this tutorial, we will show you how you can train a model using AutoNLU on a custom dataset.\n", "More precisely, we train a model to predict reviews of the Google Play store. This dataset contains reviews by many different users with a star rating out of five possible stars. Our goal is to predict the sentiment (positive, negative or neutral) of a given review.\n", "\n", "You can also compare training using AutoNLU to training with other frameworks such as HuggingFace which is shown in https://curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/ for the same dataset. As you can see, we achieve the same results with only 20 lines of code. Also, no expert machine learning knowledge is needed, as hyperparameters are automatically selected by our AutoNLU engine.\n", "\n", "Note: We recommend using a machine with an Nvidia GPU for this tutorial." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mWARNING: You are using pip version 21.1.1; however, version 21.1.2 is available.\r\n", "You should consider upgrading via the '/home/paethon/git/py39env/bin/python3.9 -m pip install --upgrade pip' command.\u001b[0m\r\n" ] } ], "source": [ "%load_ext tensorboard\n", "\n", "!pip install pandas -q" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import autonlu\n", "from autonlu import Model\n", "import pandas as pd\n", "import numpy as np\n", "import gdown" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User name/Email: admin\n", "Password: ········\n" ] } ], "source": [ "autonlu.login()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download and prepare dataset\n", "At first, we automatically download and prepare the google play app reviews dataset. Note that this installs gdown in your pip environment." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Downloading...\n", "From: https://drive.google.com/uc?id=1S6qMioqPJjyBLpLVz4gmRTnJHnjitnuV\n", "To: /home/paethon/git/autonlu/tutorials/.cache/data/googleplay/apps.csv\n", "100%|██████████| 134k/134k [00:00<00:00, 2.02MB/s]\n", "Downloading...\n", "From: https://drive.google.com/uc?id=1zdmewp7ayS4js4VtrJEHzAheSW-5NBZv\n", "To: /home/paethon/git/autonlu/tutorials/.cache/data/googleplay/reviews.csv\n", "7.17MB [00:00, 23.6MB/s]\n" ] }, { "data": { "text/html": [ "
\n", " | content | \n", "score | \n", "
---|---|---|
0 | \n", "Update: After getting a response from the deve... | \n", "1 | \n", "
1 | \n", "Used it for a fair amount of time without any ... | \n", "1 | \n", "
2 | \n", "Your app sucks now!!!!! Used to be good but no... | \n", "1 | \n", "
3 | \n", "It seems OK, but very basic. Recurring tasks n... | \n", "1 | \n", "
4 | \n", "Absolutely worthless. This app runs a prohibit... | \n", "1 | \n", "