=============== DocumentModel =============== ------------------- The Document Format ------------------- The :class:`autonlu.DocumentModel` class is built around the concept of documents, which are in essence dictionaries with a specified structure. Supported Document Format for Prediction ------------------------------------------------------ During inference/prediction, the document will be annotated to look like the format which is also expected for training (this format will be explained in the following section). In addition to the required key/value pairs, the dictionary can also contain arbitrary, additional key/value pairs which are simply ignored and preserved during annotation. .. code-block:: python [ { "segments": [ { "text": "The room was very nice, but the staff was bad.", }, { "text": "The bathroom was dirty.", } ] } ] The following structure would for example also be allowed and the additional key/value pairs will be preserved during the prediction process. .. code-block:: python [ { "document_text": "The room was very nice, but the staff was bad. \ The bathroom was dirty.", "segments": [ { "text": "The room was very nice, but the staff was bad.", "id": 12234, }, { "text": "The bathroom was dirty.", "id": 12234, } ] } ] After sending this this document through the prdiction process, we might get back an annotated document that looks like the following: .. code-block:: python [ { "document_text": "The room was very nice, but the staff was bad. \ The bathroom was dirty.", "segments": [ { "text": "The room was very nice, but the staff was bad.", "id": 12234, "tags": [{"class": "Room", "label": "POS"}, {"class": "Staff", "label": "NEG"}], "classes": ["Cleanliness", "Room", "Staff"], "standard_label": "NONE", }, { "text": "The bathroom was dirty.", "id": 12234, "tags": [{"class": "Cleanliness", "label": "NEG"}], "classes": ["Cleanliness", "Room", "Staff"], "standard_label": "NONE", } ] } ] Document Format Supported for Training and Produced from Prediction ------------------------------------------------------------------- Depending on the task, the document format supported by :class:`DocumentModel` looks slightly different. Class Task ~~~~~~~~~~ .. code-block:: python [ { "segments": [ { "text": "The room was very nice, but the staff was bad.", "tags": [{"class": "Room"}, {"class": "Staff"}], "classes": ["Cleanliness", "Room", "Staff"], }, { "text": "The bathroom was dirty.", "tags": [{"class": "Cleanliness"}], "classes": ["Cleanliness", "Room", "Staff"], } ] } ] For training, ``classes`` can be missing, in which case a global list of all classes, specified in :func:`~autonlu.DocumentModel.train` will be used. Label Task ~~~~~~~~~~ .. code-block:: python [ { "segments": [ { "text": "The room was very nice, but the staff was bad.", "tags": [{"label": "NEU"}], }, { "text": "The bathroom was dirty.", "tags": [{"label": "NEG"}], } ] } ] Class-Label Task ~~~~~~~~~~~~~~~~ .. code-block:: python [ { "segments": [ { "text": "The room was very nice, but the staff was bad.", "tags": [{"class": "Room", "label": "POS"}, {"class": "Staff", "label": "NEG"}], "classes": ["Cleanliness", "Room", "Staff"], "standard_label": "NONE", }, { "text": "The bathroom was dirty.", "tags": [{"class": "Cleanliness", "label": "NEG"}], "classes": ["Cleanliness", "Room", "Staff"], "standard_label": "NONE", } ] } ] For training, ``classes`` can be missing, in which case a global list of all classes, specified in :func:`~autonlu.DocumentModel.train` will be used. For training, ``standard_label`` is also allowed to be missing and can either be supplied when calling :func:`~autonlu.DocumentModel.train` or if left out generally, ``classes`` will also be ignored and no class label pairs, which are not explicitly mentioned in ``tags``, will be generated. ------------- DocumentModel ------------- .. toctree:: :maxdepth: 2 :caption: Contents: .. autoclass:: autonlu.DocumentModel :members: predict, train, save, evaluate, finetune, select_to_label, modeltype ------------------ Helper Functions ------------------ .. autofunction:: autonlu.get_segment_class_pairs_doc .. autofunction:: autonlu.documents2simplemodelformat .. autofunction:: autonlu.get_all_labels_from_documents