{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Train a model to automatically detect tags of stackoverflow questions\n", "\n", "In this tutorial we use the AutoNLU engine to classify tags for questions that are asked on StackOverflow. In contrast to the previous example, we now have for each question an arbitrary number of tags e.g. the questions \"what is the difference between java and javascript\" can be tagged with \"java\" and \"javascript\". We will now demonstrate how simple it is to train an NLP model on this task using the AutoNLU engine. Let's start to include all libs:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext tensorboard\n", "\n", "!pip install pandas -q" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import autonlu\n", "from autonlu import Model\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "autonlu.login()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download data and prepare dataset\n", "A dataset that contains StackOverflow questions and their corresponding tags already exist. We download this dataset to train our model:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Text | \n", "Tags | \n", "
---|---|---|
2 | \n", "aspnet site maps has anyone got experience cre... | \n", "['sql', 'asp.net'] | \n", "
4 | \n", "adding scripting functionality to net applicat... | \n", "['c#', '.net'] | \n", "
5 | \n", "should i use nested classes in this case i am ... | \n", "['c++'] | \n", "
6 | \n", "homegrown consumption of web services i have b... | \n", "['.net'] | \n", "
8 | \n", "automatically update version number i would li... | \n", "['c#'] | \n", "