

Return status information about the current system


key (Optional[str]) – A JSON web token which is used for authentication. If no key is given, the key is alternatively taken from the environment variable DO_PRODUCT_KEY.


A dictionary containing information about the currently used key, about reachable sites (huggingface, and Studio, and Google as a general indication of available internet), package versions, and general system information. If no key was given or set as an env variable, the “key” entry will be None.


>>> import autonlu, pprint
>>> pprint.pprint(autonlu.status())
{'key': {'info': {'exp': datetime.datetime(2030, 8, 3, 10, 50, 1),
                  'functionality': ['Analysis/*', 'Training/*', 'Models/*'],
                  'iat': datetime.datetime(2020, 8, 5, 10, 50, 1),
                  'iss': 'deepopinion.ai',
                  'languages': ['*'],
                  'sub': 3},
         'message': 'OK',
         'verified': True},
 'package_versions': {'autonlu': '0.2.0',
                      'numpy': '1.18.5',
                      'pynvml': '8.0.4',
                      'requests': '2.24.0',
                      'security': '0.3.5',
                      'sklearn': '0.23.2',
                      'tensorboard': '2.3.0',
                      'torch': '1.7.0',
                      'torch_optimizer': '0.0.1a16',
                      'tqdm': '4.48.1',
                      'transformers': '3.3.1'},
 'reachable': {'api.deepopinion.ai': True,
               'google.com': True,
               'huggingface.co': True},
 'sysinfo': {'architecture': 'x86_64',
             'hostname': '',
             'ip-address': '',
             'mac-address': '60:45:cb:86:22:51',
             'platform': 'Linux',
             'platform-release': '5.9.10-arch1-1',
             'platform-version': '#1 SMP PREEMPT Sun, 22 Nov 2020 14:16:59 '
             'processor': '',
             'python_build': ('default', 'Sep 30 2020 04:00:38'),
             'python_version': '3.8.6',
             'ram': '31 GB'}}
>>> print("Key expires at:", autonlu.status()["key"]["info"]["exp"])
Key expires at: 2030-08-03 10:50:01
autonlu.split_dataset(*sets, split_at, seed=42)

Takes lists of elemets of a dataset and splits off a random selection. E.g. usable for train/validation splits

  • sets – An arbitrary amount of lists of the same length that should be split together. Elements that have the same position before splitting will be in the same split and will also have the same position afterwards

  • split_at (float) – Value between 0 and 1. The percent of data to be used for the validation/test-set

  • seed (int) – Seed to use for the random part of the dataset splitting. If the same seed is provided, it is guaranteed that you will get the same split of the dataset every time. A fixed seed is used by default to prevent errors (a fixed seed is usually what you want). If you would like to not use a fixed seed, set seed to None.

Return type



A tuple containing each of the splits of all lists that were given to the function


>>> X = ["The room was nice", "We did not enjoy our stay", ...]
>>> Y = ["POS", "NEG", ...]
>>> trainX, trainY, testX, testY = split_dataset(X, Y, split_at=0.2)

Returns True if the specified model is a valid model that can be loaded by AutoNLU


modelname (str) – Model to be checked. Can be anything that can also be used by SimpleModel to load a model.

Return type



True if the specified model is valid, False otherwise


>>> if autonlu.is_valid("/path/to/uploaded/model"):
>>>     # Add model to database

Returns the model type of a specified model


modelname (str) – Model to be checked. Can be anything that can also be used by SimpleModel to load a model.

Return type



base, class, label, classlabel, or invalid


Returns best available device to use for classifier.

Can be overwritten by the environment variable DO_DEVICE

Return type



'cuda' if CUDA is available and 'cpu' otherwise. If the environment variable DO_DEVICE is set, its content will be returned instead.


>>> device = get_device()
>>> if device == "cuda":
>>>     print("CUDA is available")

Loads config.json given a filepath (file or directory)


config_path (str) – Path to the config.json or a directory containing config.json

Return type



Dictionary with the content of config.json


ValueError – In case the given filename does not exist or the given path does not contain a file named config.json


>>> config = get_config("/path/to/model/config.json")
>>> config = get_config("/path/to/model")
autonlu.get_model_dir(model_name, organization=None, key=None, baseurl=None)

Returns the folder for a given model name or returns the name itself if the model seems to be a huggingface model

This ensures that SimpleModel, Model, and DocumentModel can simply use the returned value to load a model from different sources. In practice this method does not have to be called separately since SimpleModel, Model, and DocumentModel use it internally to resolve model names.

  • model_name (str) –

    Name of the model to be loaded. Can be:

  • organization (Optional[str]) – Name of organization (if any) to use

  • key (Optional[str]) – Token to authenticate a user. If not given, a token will be looked up in the environment variable DO_PRODUCT_KEY.

  • baseurl (Optional[str]) – Base url of the studio instance used for this call. If None, the environment variable DO_BASEURL will be used. If DO_BASEURL is not defined, the standard base-url will be used.

Return type



  • If model_name is a directory, the directory itself.

  • If model_name is the name of a model in Studio, the model is downloaded if necessary and the directory is returned. If the model was loaded previously, the path to the cached model will be returned

  • If model_name is the name of a model in the Huggingface model repo, the name itself is returned.


ValueError – If model is found in DO download API but could not be downloaded


  • Load model from directory:

    >>> modeldir = get_model_dir("path/to/my/private/model")
    >>> model = SimpleModel(modeldir)
  • Load model from Studio:

    Assumes the environment variable DO_PRODUCT_KEY is correctly set

    >>> modeldir = get_model_dir("DeepOpinion/hotels_absa_en")
    >>> model = SimpleModel(modeldir)
  • Load model from Huggingface model repo:

    >>> modeldir = get_model_dir("bert-base-uncased")
    >>> model = SimpleModel(modeldir)

Returns a list of all classes a model was trained on


sourcedir (str) – Either a model path containing a meta.json, a config.json, a classes.json, or an aspects.json file. If multiple files are available they will be used in the given order.

Return type



List of all classes, sorted alphabetically. If the model does not use classes (e.g. for a label-task) None will be returned


ValueError – If no classes could be found in any of the mentioned files


>>> modeldir = get_model_dir("en-base", key=TOKEN)
>>> classes = get_classes(modeldir)

Returns a list of all labels a model was trained on


sourcedir (str) – Path to a model containing either a meta.json or a config.json. If both are available, meta.json will be preferred.

Return type



List of labels. The order is important, since the model itself is internally trained on label numbers and those are the index of the label in the list. Returns None if no labels were found.


KeyError – If the labels given in config.json do not have contiguous indices.


Assumes the environment variable DO_PRODUCT_KEY is correctly set

>>> modeldir = get_model_dir("en-base")
>>> all_labels = get_all_labels(modeldir)

Returns the standard label a model was trained on

It will be assumed that all classes without a specific label implicitly have the standard label (if the standard label is not None).

A standard label will generally be used in cases where one label occurs at a much higher frequency than others. E.g. when solving aspect based sentiment analysis, the labels will generally be ["NONE", "NEG", "NEU", "POS"], but since most aspects will not occur in most sentences, the NONE label will be predominant (often more than 90%) and it makes sense to make NONE the standard label. I.e. if a class/aspect is not mentioned during training or after prediction in the annotated document, it is assumed to have the label NONE (i.e. it did not occurr).

The standard label is actually not trained into the model and can be switched after training without any ill effects, but generally it will be fixed for a use case and is therefore associated with the model.


sourcedir (str) – Path to a model containing a meta.json

Return type



The name of the standard label (which has to occurr in the list of all labels)


Assumes the environment variable DO_PRODUCT_KEY is correctly set

>>> modeldir = get_model_dir("en-hotels-absa")
>>> all_labels = get_all_labels(modeldir)
autonlu.get_segment_class_pairs(segments, all_classes)

Takes segments and all possible classes and returns a list of tuples containing all segment/class combinations

  • segments (List[str]) – List of all segments

  • all_classes (List[str]) – List of all possible classes

Return type

List[Tuple[str, str]]


List of (segment, class) tuples, containing all possible segment/class combinations


>>> segments = ["Room was clean", "Staff was unfriendly"]
>>> all_classes = ["Room", "Staff"]
>>> segclspair = get_segment_class_pairs(segments, all_classes)
segclspair == [("Room was clean", "Room"), ("Room was clean", "Staff"),
               ("Staff was unfriendly", "Room"), ("Staff was unfriendly", "Staff")]
autonlu.get_segment_class_pairs_with_labels(segments, classlabels, standard_label=None, all_classes=None, label_probabilities={})

Generates segment class pairs with an associated list of target labels from a list of segments and a list of classlabels

  • segments (List[str]) – Pieces of text for which the segment/class pairs should be generated

  • classlabels (List[List[Tuple[str, str]]]) – A list of (class, label) tuples that assigns labels to classes for each segment

  • standard_label (Optional[str]) – If given, all classes are assumed to have this label if no specific label was given in classlabels. To work, also needs all_classes.

  • all_classes (Optional[List[str]]) – List of all possible classes that will be used to generate segment/class pairs for the standard label if no specific classlabel was given.

  • label_probabilities (Dict[str, float]) – A dictionary, mapping label names to the probability (number between 0 and 1) of that label occurring in the generated data. All labels not mentioned in label_probabilities are assumed to have a probability of 1. Can be used to subsample certain labels if they are overrepresented.

Return type

Tuple[List[Tuple[str, str]], List[int]]


A tuple (X, Y) where X is a list of segment/class pairs and Y a list of corresponding labels


  • Without standard_label and all_classes:

    >>> segments = ["Hello", "World"]
    >>> get_segment_class_pairs_with_labels(segments,
    >>>     classlabels=[[("C1", "L1"), ("C2", "L3")], [("C2", "L4")]])
    X = [("Hello", "C1), ("Hello", "C2"), ("World", "C2")]
    Y = ["L1", "L3", "L4"]
  • With standard_label and all_classes:

    >>> segments = ["Hello", "World"]
    >>> get_segment_class_pairs_with_labels(segments,
    >>>     classlabels=[[("C1", "L1"), ("C2", "L3")], [("C2", "L4")]],
    >>>     standard_label="L2",
    >>>     all_classes=["C1", "C2", "C3"])
    X = [("Hello", "C1), ("Hello", "C2"), ("Hello", "C3"), ("World", "C1"), ("World", "C2"), ("World", "C3")]
    Y = ["L1", "L3", "L2", "L2", "L4", "L2"]
  • With standard_label, all_classes and label_probabilities:

    >>> segments = ["Hello", "World"]
    >>> get_segment_class_pairs_with_labels(segments,
    >>>     classlabels=[[("C1", "L1"), ("C2", "L3")], [("C2", "L4")]],
    >>>     standard_label="L2",
    >>>     all_classes=["C1", "C2", "C3"],
    >>>     label_probabilities={"L2": 0.2})
    Possible output (Samples with a label of "L2" will occurr with a probability of approx 20%):
    X = [("Hello", "C1), ("Hello", "C2"), ("World", "C2"), ("World", "C3")]
    Y = ["L1", "L3", "L4", "L2"]
autonlu.fix_seed(seed, cudnn_deterministic=True)

Fixes the seed for pytorch, numpy, python, etc.

Follows recommendations from https://pytorch.org/docs/stable/notes/randomness.html

Warning: Even all this does not ensure perfect determinism in all cases, since there is no way to make atomic operations from CUDA deterministic!

  • seed (int) – The seed that should be used for all random number generators (pytorch, numpy, python)

  • cudnn_deterministic (bool) – If set to True, it also makes cuDNN deterministic (if it is used). Warning: Might slow down training/inference.


Returns the base model of a huggingface transformer model. (i.e. the pytorch model without the prediction heads etc.)