SimpleModel¶

Warning! SimpleModel is only documented for reference purposes and should not be used directly for production anymore!

class autonlu.SimpleModel(model_folder, key=None, baseurl=None, state_callback=None, stop_callback=None, encrypt=True, device=None, log_dir='tensorboard_logs', use_samplehash=True, trial=None, task=None)¶

A class implementing a versatile model that can be trained on (one of) various tasks. To determine which task the model shall learn, you can set the task argument. If the argument is not set, a "classification" task is assumed (for more information, see list of parameters).

The model is loaded the first time some operation is performed on it (e.g. predict(), train(), finetune(), …)

Parameters

model_folder (str) –
A path or name of the model that should be used. Will be sent through get_model_dir() and can therefore be:
- The path to a model
- The name of a model available in Studio
- The name of a model available in the Huggingface model repo
task (Optional[str]) –
Determines which task the model shall learn. Possible values are:

”classification”
The standard value, which is chosen when task is not set elsewise. The “classification” task comprises the subtasks classlabel, class and label, which are automatically derived from the target labels during training.

”token_classification”
Each word in a sample has a label. A typical example is named entity recognition (NER)

”question_answering”
Each sample consists of a question-context-tuple. The model seeks one or more passages in the context which qualify as answer to the question.
key (Optional[str]) – A JSON web token which is used for authentication. If no key is given, the key is alternatively taken from the environment variable DO_PRODUCT_KEY.
baseurl (Optional[str]) – Base url of the studio instance used for this call. If None, the environment variable DO_BASEURL will be used. If DO_BASEURL is not defined, the standard base-url will be used. In most cases this has not to be changed unless you are working with an on-premise version of Studio
state_callback (Optional[Callable]) –
Something callable (function or class with __call__ member function) taking one keyword argument progress, which is called with the current progress in percent after each batch. E.g.:
```
>>> # Print current progress after each batch
>>> def callback(progress):
>>>     print(f"Current progress = {progress}")
```
stop_callback (Optional[Callable]) –
Something callable (function or class with __call__ member function) taking no arguments. Is called after each batch and prediction or training is stopped if True is returned. E.g.:
```
>>> # Stop after 10 batches
>>> i = 0
>>> def callback():
>>>     nonlocal i
>>>     i += 1
>>>     if i >= 10:
>>>         return True
>>>     return False
```
encrypt (bool) – If True, the model is encrypted on save.
device (Optional[str]) – Which device the model should be used on ("cpu" or "cuda"). If None, a device will be automatically selected: If a CUDA capable GPU is available, it will automatically be used, otherwise the cpu. This behavior can be overwritten by specifically setting the environment variable DO_DEVICE to either "cpu" or "cuda". autonlu.utils.get_best_device() is used to select the device.
log_dir (Optional[str]) – Specifies in which directory Tensorboard logs should be written. If None, no logs will be written. The logs for individual runs will be put into subdirectories named after the current timestamp.
use_samplehash (bool) – If True (the default), a hash for all trained samples will be saved. These hashes are used during active learning (select_to_label()) to exclude sentences that have already been trained on. If False, these hashes will not be saved, which speeds up some processes, saves memory, and reduces the size of the saved model. It can be useful to disable the sample hash if huge amounts of training data are being used.
trial (Optional[Trial]) – A trial represents a single setup for automatic hyperparameter optimization. See also https://optuna.readthedocs.io/en/stable/reference/trial.html

Variables

device – The device the model is running on ("cpu" or "cuda" (when running on a GPU))

predict(X, batchsize=128, verbose=False, dynamic_quantization=False, markup_dict=None, is_markup=False)¶

Predicts the correct results for a list of samples X, depending on the task the model was trained for.

Parameters

X (List) –
A list of samples for which we do the prediction. The format of the list elements depends on the specific task the model was trained on respectively on SimpleModel.task (set in initialization). Generally, one should use the same data format that was used for training. In the following, we review the data format as determined by SimpleModel.task:
”classification”
X a list of samples, where each sample is a string or tuple of strings.

”token_classification”
Each single word is associated with its own label. To this end, it’s important to know what qualifies as a word (is e.g. “555 2131” one word or two?). To provide clarity, individual samples and the returned results can have two distinct formats:

lists of words

>>> X = [["Tom", "was", "in", "London", "."], ["Lisa", "loves", "Paris", "."]]

markup language texts

>>> X = ["<person>Tom</person> was in <place>London</place>.", >>> "<person>Lisa</person> loves <place>Paris</place>."]

You have to choose one method. Mixtures of both methods are not allowed. A few more words to the usage of markup language: If you decide to use markup language, the argument is_markup has to be set to True. However, when you do prediction, you are usually not aware of the correct labels. Therefore, a plain text (string) without label tags counts as markup text input. If the input contains label tags anyway, they are ignored. The results are returned as markup language texts - this time with label tags. Further, when you intend to use prediction with a markup language text, a markup_dict is needed. If the model was trained with markup language, the model remembers the markup_dict of the training. Elsewise, you have to provide it explicitly as argument.
”question_answering”
Each sample is a tuple consisting of a question and a context in which the answer can be found. As for “token_classification”, questions and contexts can be either given entirely as list of words or as markup language texts. If markup language is used, the argument is_markup has to be set to True. Further, if you use markup language here but did not train the model on markup language, an markup_dict has to be provided. Examples:

list of words

>>> X = [(["What", "color", "do", "bananas", "have", "?"], >>> ["Tomatoes", "are", "red", "and", "bananas", "are", "yellow", "."])]

markup language texts

>>> X = [("What color do bananas have?", >>> "Tomatoes are red and bananas are <answer>yellow</answer>.")]
batchsize (int) – Number of samples to predict in one inference step. A higher value generally means higher throughput, but the system might run out of memory if this is set too high. In cases where the GPU memory is running out the system will automatically switch to smaller batch sizes without loss of data, but the switching takes time. Higher values than 128 (the default) usually does not increase the performance by much.
dynamic_quantization (bool) – If enabled, forward propagation is executed with lower precision (int8) to speed up predictions. This is only supported on the CPU. Warning: This feature could reduce the accuracy of your model.
is_markup (bool) – For the tasks "token_classification" and "question_answering", results can be returned in the form of markup language texts (see explanation for X). In this case, is_markup has to be set to True. The standard value is False.
markup_dict (Optional[dict]) –
Optional. A markup_dict is only needed when X contains markup language texts. When the model was trained on markup language, the model remembers the markup_dict of the training. Only when you like to use another markup_dict than the one used for training, you need to provide it explicitly. A markup_dict is a dictionary whose keys are the label numbers > 0 (i.e. 1, 2, 3, …). The 0 label corresponds to the unmarked label - that is the label which includes everything with no special label (hence, it needs no translation into markup tags). The values of the dictionary are tuples of the start and end tags of the label. Example:
```
>>> markup_dict = {1: ("<person>", "</person>"), 2: ("<place>", "</place>"), 3: ("<animal>", "</animal>")}
```
verbose (bool) – If True, a progress bar is shown during prediction

Return type

Dict

Returns

A dictionary containing probabilities, labels, entropies and logits. If the human correction system was set up (i.e. func:SimpleModel.calculate_human_correction_data was called) the dictionary also contains mistake_probabilities, which gives an estimated upper bound on the probability (range [0, 1]) that the prediction might be incorrect. In case of a “token_classification” or “question_answering” task, word_lists and markup_text are included, too. probabilities, as well as logits, are a 2D numpy arrays containing the probabilities/logits for all possible labels (in the order of self.all_labels), and entropies is a 1D numpy array containing the entropies for all the predicted samples. labels contain the labels with the highest probability for each predicted sample. manual_check_recommended indicates whether the prediction should ideally be checked by a human for maximal accuracy. If no threshold was set using the human correction system, AutoNLU will not recommend any samples for manual checkup.

Raises

ValueError – If the product key could not be authorized

Example

Assumes the environment variable DO_PRODUCT_KEY is correctly set

>>> m = SimpleModel(model_folder="DeepOpinion/hotels_absa_en")
>>> segments = [("The room was nice, but the staff was unfriendly", "Room"),
>>>    ("The room was nice, but the staff was unfriendly", "Staff")]
>>> res = m.predict(segments)
res = {'labels': ['POS', 'NEG'],
       probabilities': array([[0.00974869, 0.00836688, 0.09056191, 0.89132255],
                              [0.01662647, 0.9558573 , 0.01873435, 0.00878185]], dtype=float32),
       'logits': array([[-1.8422663, -1.9951179,  0.3866345,  2.6733072],
                       [-1.0473781,  3.004235 , -0.9280151, -1.6856858]], dtype=float32),
       'entropies': array([0.40521544, 0.227365  ], dtype=float32),
       'manual_check_recommended': [False, False]}
m.all_labels = ['NONE', 'NEG', 'NEU', 'POS']

prune(layers_to_prune)¶

Set the layers of a model which should be pruned (i.e. not used and removed during training).

Only call this function if you want to prune specific layers and know their number. In most cases you will want to use auto_prune().

Parameters: layers_to_prune (List[int]) – A list of integers, containing all layer_ids that should be pruned. Therefore, layer_id ∈ [0, num_hidden_layers].

auto_prune(X, Y, valX=None, valY=None, valsplit=0.1, num_layers_to_prune=6, always_prune=None, max_num_samples=40000, epochs=3, verbose=False)¶

Only for "classification" tasks (standard value, if not set elsewise during initialization). Automatically selects the best layers to prune from the current model by using a greedy search strategy: Each layer is left out and the highest accuracy after pruning a layer is then selected. For pruning more layers it is assumed that this previous selection is also a good starting point for pruning more layers. Note that this method internally remembers the layers to be pruned so calling train() after auto_prune() is sufficient for the pruned layers to be ignored.

Parameters

X (List[Union[str, Tuple[str, str]]]) – Input samples. Either a list of strings for text classification or a list of pairs of strings for text pair classification.
Y (List[str]) – Training target. List containing the correct labels as strings.
valX (Optional[List[Union[str, Tuple[str, str]]]]) – Input samples used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via the score_callback. Same format as X. If None, a part of X will be split off.
valY (Optional[List[str]]) – Training target used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via the score_callback. Same format as Y. If None, a part of Y will be split off.
valsplit (float) – If valX or valY is not given, specifies how much of the training data should be split off for validation. Default is 10%.
num_layers_to_prune (int) – How many layers should be pruned from the given architecture.
always_prune (Optional[List[int]]) – A list of layer ids that should always be pruned, independent what the greedy heuristic selects
max_num_samples (int) – The maximum number of training-samples to use by the greedy heuristic for training different candidates. Decreasing this number increases the speed, but decreases the accuracy of the final pruned model.
epochs (int) – Number of epochs to train a candidate before evaluating the accuracy. Decreasing this value increases the speed to find layers to prune, but also decreases the accuracy of the final pruned model.
verbose (bool) – If True, information about the overall progress of finding layers to prune is shown.

train(X, Y=None, valX=None, valY=None, valsplit=0.1, do_evaluation=True, label_probabilities={}, epochs=2000, do_early_stopping=None, seed=None, learning_rate=2e-05, batchsize=32, autobatchsize=False, metric_callback=None, score_callback=None, mindatasetsize=70000, maxdatasetsize=200000, val_metric='val_accuracy', val_maximize=True, patience_epochs=2, lr_reduction_patience=1, lr_reduction_factor=0.1, epsilon=0.0001, decay_func_name='exp_sqr', nb_opti_steps=625, total_lr_decay=0.0625, verbose=False, is_markup=False, markup_dict=None, *, calculate_human_correction_data=True)¶

Trains a model on a specific task determined by SimpleModel.task. If you did not specify SimpleModel.task in the initialization, SimpleModel.task is set to "classification".

SimpleModel.train() offers two different methods of training, which differ in the way the learning rate is: adjusted and under which conditions the training is stopped.

The “switch” to choose between the two methods is the argument do_early_stopping. When set to True, the model will be tested on the validation data in regular intervals. Depending on the test results, the learning rate might be reduced or the training might be stopped if the model does not improve anymore. If do_early_stopping is set to False, the training runs nb_opti_steps optimization steps and proceeds independently of the evaluation. After each optimization step, the learning_rate is slightly reduced. If do_early_stopping is not specified by the user, do_early_stopping is set to False for OMI models and True for other models. Both training methods come with specific arguments.

Parameters

X (List) –
A list of training samples. The format of the list elements depends on the specific training task given by SimpleModel.task (set in initialization). SimpleModel.task can have the following values:
”classification”
X is a list of samples, where each sample is a string or a pair of strings.

”token_classification”
Each single word is associated with its own label. To this end, it’s important to know what qualifies as a word (is e.g. “555 2131” one word or two?). To provide clarity, individual training samples can be provided in two distinct ways:

lists of words

>>> X = [["Tom", "was", "in", "London", "."], ["Lisa", "loves", "Paris", "."]]

markup language texts

>>> X = ["<person>Tom</person> was in <place>London</place>.", >>> "<person>Lisa</person> loves <place>Paris</place>."]

You have to choose one method. Mixtures of both methods are not allowed. If you decide to use markup language, the argument is_markup has to be set to True.
”question_answering”
List of samples, where each sample is a tuple consisting of a question and a context, in which the answer can be found. As for “token_classification”, questions and contexts can be either given entirely as list of words or as markup language texts. If markup language is used, the argument is_markup has to be set to True. Examples:

list of words

>>> X = [(["What", "color", "do", "bananas", "have", "?"], >>> ["Tomatoes", "are", "red", "and", "bananas", "are", "yellow", "."])]

markup language texts

>>> X = [("What color do bananas have?", >>> "Tomatoes are red and bananas are <answer>yellow</answer>.")]
Y (Optional[List]) –
Training targets. List containing the correct answers. As for X, the format of Y depends on the value of Model.task. Model.task can have the following values:
”classification”
Y is a list containing the correct labels as strings.

”token_classification”
If X is given as a list of markup language texts, the label information is already encoded into X. Hence, no Y is needed (i.e. Y = None). If the samples in X are provided as lists of words, Y must be provided as lists of word-labels. A word-label can be a number or a string, e.g.:

Y = [[1, 0, 0, 2, 0], [1, 0, 2, 0]]

Y = [["1", "0", "0", "2", "0"], ["1", "0", "2", "0"]]

Y = [["person", "", "", "place", ""], ["person", "", "place", ""]]
”question_answering”
If X is given as markup language (i.e. a list of tuples of markup language texts), Y has to be None. If the question(s) and context(s) in X are provided as list of words, the answer(s) must be provided as list(s) of word-label with exactly one label for each word in the context. Each label can take one of two values denoting “word is part of the answer” or “word is not part of the answer”. You can name these two labels as you like, but you have to stick to these names for all samples. e.g:

Y = [[0, 0, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 1, 0, 0]]

Y = [["0", "0", "0", "0", "0", "0", "0", "1", "1", "0"], ["0", "0", "1", "0", "0"]]

Y = [["", "", "", "", "", "", "", "answer", "answer", ""], ["", "", "answer", "", ""]]

Remark: In case your dataset contains several answers, you can also provide a list of answers to each question (so Y is a list (over samples) of lists (over answers) of lists of labels). When you provide a list of answers, you need to do it for every sample (a list with one element (list) is okay, too).
valX (Optional[List]) – Input samples used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via the score_callback. Same format as X. If None and do_evaluation is True, a part of X will be split off.
valY (Optional[List]) – Training target used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via the score_callback. Same format as Y. If None and do_evaluation is True, a part of Y will be split off.
valsplit (float) – If do_evaluation is True and valX or valY is not given, specifies how much of the training data should be split off for validation. Default is 10%.
do_evaluation (bool) – If set to False no evaluation is done. This also means that early stopping is automatically deactivated
is_markup (bool) – For the tasks "token_classification" and "question_answering" the training data can be provided as markup language text (see explanation for X). In this case, is_markup has to be set to True. The standard value is False.
markup_dict (Optional[dict]) –
Optional. A markup_dict is only needed when X contains markup language texts. But even in that case, the markup_dict is automatically generated from the input data if not provided explicitly. markup_dict is a dictionary whose keys are the label numbers > 0 (i.e. 1, 2, 3, …). The 0 label corresponds to the unmarked label - that is the label which includes everything with no special label (hence, it needs no translation into markup tag). The values of the dictionary are tuples of the start and end tags of the label. Example:
```
>>> markup_dict = {1: ("<person>", "</person>"), 2: ("<place>", "</place>"), 3: ("<animal>", "</animal>")}
```
When you train a model with markup language and also like to be able to use the model with word-lists, you need to know which label number (key) corresponds to which label tags (value). In this case, it’s recommended to provide an explicit markup_dict where you control the key-value-mapping. Providing an explicit markup_dict would also allow to omit certain label types by simply not mentioning them in the markup_dict. This can be useful when your markup text contains more label types than you like to train. Labels which are not mentioned in the markup_dict are not trained.
label_probabilities (Dict[str, float]) – Only for the “classification” task (the standard value of Model.task). A dictionary, mapping label names to the probability (number between 0 and 1) of that label being used for training. All labels not mentioned in label_probabilities are assumed to have a probability of 1. Can be used to subsample certain labels if they are overrepresented.
seed (Optional[int]) – Fix the random seed to make training deterministic (i.e. with the same seed and the same input data in the same order, the resulting model should be identical). Warning! Setting a seed can slow down training.
learning_rate (float) – The learning rate to be used during training. Higher learning rates will lead to faster convergence, but might lead to worse overall accuracy and if the learning rate is set too high, the system might not learn anything.
batchsize (int) – The number of samples to use in one training step. This also sets the number of samples to accumulate for one weight update if the number is bigger than 32 (at a minimum, 32 samples are always accumulated). A higher value generally means higher throughput, but the system might run out of memory if this is set too high. In cases where the GPU memory is running out, the system will automatically switch to smaller batch sizes without loss of data. The wrong batch size might also inhibit proper training.
autobatchsize (bool) – Deprecated! This option should not be used anymore and will be removed. With the new dynamic batchsize lowering on CUDA memory error, this is not needed anymore! If True the batchsize will be determined automatically. If True, the parameter batchsize gives the maximal batchsize to use.
metric_callback (Optional[Callable]) –
Something callable (function or class with __call__ function) that takes two keyword arguments Y_true (containing true label numbers from the validation dataset) and Y_pred (containing the label numbers predicted by the currently trained model) and returns a metric, which which will be passed as an argument to score_callback. Used to define the metric (e.g. accuracy) to use for the reported score E.g.:
```
>>> # Return accuracy as a metric
>>> import numpy as np
>>> def callback(Y_true, Y_pred):
>>>     return np.sum(Y_true == Y_pred) / len(Y_true)
```
score_callback (Optional[Callable]) –
Something callable (function or class with __call__ function) taking one keyword argument score that is filled with the output of metric_callback and evaluated in regular intervals during training. E.g.:
```
>>> # Print current score
>>> def callback(score):
>>>     print(f"Current score = {score}")
```
verbose (bool) – If True, information about the training progress will be shown on the terminal.
do_early_stopping (Optional[bool]) –
If True, early stopping will be used. I.e. the model will be tested on the validation data in regular intervals and training will be stopped if the model does not improve anymore. If False, a preset schedule of nb_opti_steps optimization steps is used combined with a decaying learning rate.
Arguments used when do_early_stopping is True:
- epochs: The maximum number of epochs used for training.
- mindatasetsize: Early stopping assumes the datasets size to be at least mindatasetsize. A large mindatasetsize in essence means that the patience for early stopping will be increased. Default is 70.000 to train small datasets longer since this works better in practice. A value of 70.000 in essence means that datasets with less than 70.000 samples will be trained for as long as a data set with 70.000 samples
- maxdatasetsize: Early stopping assumes the datasets size to be at most maxdatasetsize. A small maxdatasetsize in essence means that the patience for early stopping will be decreased. Default is 200.000 to train large datasets for a shorter time. A value of 200.000 in essence means that datasets with more than 200.000 samples will only be trained for as long as a data set with 200.000 samples. This does NOT mean that only 200.000 of the samples will be used. All the data is still being utilized. This only influences at which point early stopping decides that a model does not improve anymore!
- val_metric: The validation metric to use for the BestModelKeeper (i.e. which metric should be used to determine if a model is better than another one). Generally this should not be changed from val_accuracy.
- val_maximize: If True, a higher value of val_metric is considered better, if False, a smaller value is considered better. Has to fit the specific metric in val_metric
- patience_epochs: Defines how many epochs are waited without the model improving before the training is stopped.
- lr_reduction_patience: Proportion of one epoch to wait without improvement until the learning rate is reduced.
- lr_reduction_factor: The factor with which the learning rate is multiplied if the patience runs out.
- epsilon: Maximal difference in the metric for early stopping that should be considered identical.
Arguments used when do_early_stopping is False:
- decay_func_name: Describes the kind of learning rate decay. Options are: - “linear” - “exp” - “exp_sqr”
- nb_opti_steps: The number of optimization steps after which the training is stopped.
- total_lr_decay: Describes the factor by which the initial learning_rate has been reduced when the training ends. This total learning rate reduction is the result (product) of many small reductions made after each.optimization step. These small reductions are calculated from total_lr_decay, nb_opti_steps and decay_func_name.
calculate_human_correction_data (bool) – If True, the human correction system is automatically set up using the validation data (if present)

Example

Assumes the environment variable DO_PRODUCT_KEY is correctly set

>>> m = SimpleModel(model_folder="albert-base-v2")
>>> segments = [("The room was nice, but the staff was unfriendly", "Room"),
>>>    ("The room was nice, but the staff was unfriendly", "Staff")]
>>> valsegments = [("The room was beautiful, but the staff was rude", "Room"),
>>>    ("The room was beautiful, but the staff was rude", "Staff")]
>>> m.train(X = segments, Y=["POS", "NEG"], valX=valsegments, valY=["POS", "NEG"])

evaluate(X, Y=None, batchsize=128, dynamic_quantization=False, verbose=False, markup_dict=None, is_markup=False, none_label=None)¶

Evaluates a model on given data and returns different performance metrics

Parameters

X (List) –
A list of samples to be evaluated. The format of the list elements depends on the specific task the model was trained on and is determined by the value SimpleModel.task (set in initialization). SimpleModel.task can have the following values:
classification
X is either a list of strings for text classification or a list of pairs of strings for text pair classification.

token_classification
Each single word is associated with its own label. To this end, it’s important to know what qualifies as a word (is e.g. “555 2131” one word or two?). To provide clarity, individual training samples can be provided in two distinct ways:

lists of words

>>> X = [["Tom", "was", "in", "London", "."], ["Lisa", "loves", "Paris", "."]]

markup language texts

>>> X = ["<person>Tom</person> was in <place>London</place>.", >>> "<person>Lisa</person> loves <place>Paris</place>."]

You have to choose one method. Mixtures of both methods are not allowed. If you decide to use markup language, the argument is_markup has to be set to True. Further, when you intend to do evaluation with a markup language text, a markup_dict is needed. If the model was trained with markup language, the model remembers the markup_dict of the training. Elsewise, you have to provide it explicitly as argument.
question_answering
Each sample is a tuple consisting of a question and a context in which the answer can be found. As for “token_classification”, questions and contexts can be either given entirely as list of words or as markup language texts. If markup language is used, the argument is_markup has to be set to True. Further, if you use markup language here but did not train the model on markup language, a markup_dict has to be provided. Examples:

list of words

>>> X = [(["What", "color", "do", "bananas", "have", "?"], >>> ["Tomatoes", "are", "red", "and", "bananas", "are", "yellow", "."])]

markup language

>>> X = [("What color do bananas have?", >>> "Tomatoes are red and bananas are <answer>yellow</answer>.")]
Y (Optional[List]) –
List containing the correct answers. As for X, the format of Y depends on the value of Model.task. Model.task can have the following values:
”classification”
Y is a list containing the correct labels as strings.

”token_classification”
If X is given as a list of markup language texts, the label information is already encoded into X. Hence, no Y is needed (i.e. Y = None). If the samples in X are provided as lists of words, Y must be provided as lists of word-labels. A word-label can be a number or a string, e.g.:

Y = [[1, 0, 0, 2, 0], [1, 0, 2, 0]]

Y = [["1", "0", "0", "2", "0"], ["1", "0", "2", "0"]]

Y = [["person", "", "", "place", ""], ["person", "", "place", ""]]
”question_answering”
If X is given as markup language (i.e. a list of tuples of markup language texts), Y has to be None. If the question(s) and context(s) in X are provided as word- lists, the answer(s) must be provided as list(s) of word-label with exactly one label for each word in the context. Each label can take one of two values denoting “word is part of the answer” or “word is not part of the answer”. You can name these two labels as you like, but you have to stick to these names for all samples. e.g:

Y = [[0, 0, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 1, 0, 0]]

Y = [["0", "0", "0", "0", "0", "0", "0", "1", "1", "0"], ["0", "0", "1", "0", "0"]]

Y = [["", "", "", "", "", "", "", "answer", "answer", ""], ["", "", "answer", "", ""]]

Remark: In case your dataset contains several answers, you can also provide a list of answers to each question (so Y is a list (over samples) of lists (over answers) of lists of labels). When you provide a list of answers, you need to do it for every sample (a list with one element (list) is okay, too).
is_markup (bool) – For the tasks “token_classification” and “question_answering” the data can be provided as markup language text (see explanation for X). In this case, is_markup has to be set to True. The standard value is False.
markup_dict (Optional[dict]) –
Optional. A markup_dict is only needed when X contains markup language texts. When the model was trained on markup language, the model remembers the markup_dict of the training. Only when you like to use another markup_dict than the one used for training, you need to provide it explicitly. A markup_dict is a dictionary whose keys are the label numbers > 0 (i.e. 1, 2, 3, …). The “0” label corresponds to the unmarked label - that is the label which includes everything with no special label (hence, it needs no translation into markup tags). The values of the dictionary are tuples of the start and end tags of the label. Example:
```
>>> markup_dict = {1: ("<person>", "</person>"), 2: ("<place>", "</place>"), 3: ("<animal>", "</animal>")}
```
none_label (Optional[str]) – Optional. Defines which label should be considered as the NONE label. Will be used for f1_binary. If this is None (the default), The f1_binary metric does not make much sense and will not be returned.
batchsize (int) – Number of samples to predict in one inference step. A higher value generally means higher throughput, but the system might run out of memory if this is set too high. In cases where the GPU memory is running out the system will automatically switch to smaller batch sizes without loss of data, but the switching takes time. Higher values than 128 (the default) usually does not increase the performance by much.
dynamic_quantization (bool) – If enabled, forward propagation is executed with lower precision (int8) to speed up predictions. This is only supported on the CPU. Warning: This feature could reduce the accuracy of your model.
verbose (bool) – If True, information about the evaluation progress will be shown on the terminal.

Return type

Dict

Returns

A dictionary containing accuracy, f1_weighted, precision_weighted, and recall_weighted. If none_label is not None, f1_binary is returned as well

Example

>>> X = [("The room was very nice, but the staff was bad.", "Room"),
>>>      ("The room was very nice, but the staff was bad.", "Staff")]
>>> Y = ["POS", "NEG"]
>>> model = autonlu.SimpleModel("DeepOpinion/hotels_absa_en")
>>> metrics = model.evaluate(X, Y)
>>> # metrics == {'accuracy': 1.0, 'f1_weighted': 1.0, 'precision_weighted': 1.0, 'recall_weighted': 1.0}

select_to_label(X, acquisitionsize=50, modelsamples=3, al_samples=100, preselectionsize=None, verbose=False)¶

Selects sentences that the current model would like as additional training data to maximally improve performance

Parameters

X (List[Union[str, Tuple[str, str]]]) – A list of segments or segment pairs the system can select to be added to the training data. Usually this is data that is available, but not yet labelled.
acquisitionsize (int) – The number of samples the system should select. The higher the number, the more data can be labelled in one go. More iterations, with smaller acquisitionsizes will be able to learn more from fewer manually labelled samples though. Values from 50 to 100 are generally a good compromise.
modelsamples (int) – How often different variants from the current model should be used to sample the given segments. Higher numbers will lead to more accurate results, but will also take more time.
al_samples (int) – During selection of the requested segments, a probability distribution has to be approximated. al_samples specifies how many samples should be taken from this distribution as an approximation. Higher values lead to more accurate results, but the runtime increases.
preselectionsize (Optional[int]) – Especially when X is getting very big, the selection process can become slow. preselectionsize specifies how many samples should be pre-selected using a much faster method. Higher values lead to a better selection, but increase the runtime. If None, the preselectionsize is 10 * acquisitionsize
verbose (bool) – If True, information about the active learning process is shown, also shows progress bars

Return type

Tuple[List[int], List[float]]

Returns

A Tuple (idxs, scores) where idxs are the indices of the selected samples and score is how unsure the model was about those samples. The score is not the only criteria that is used to select samples so the scores are not necessarily monotonically decreasing.

Example:

>>> m = SimpleModel(model_folder="DeepOpinion/hotels_absa_en")
>>> X = [("The room was horrible", "Room"), ("The room was horrible", "Staff"), ...]
>>> idxs, scores = m.select_to_label(X=X, acquisitionsize=3)
idxs = [1234, 453, 112]
scores = [3.45, 2.67, 2.55]

save(modeldir)¶

Saves the current model.

If only a language model is present (meaning only finetuning was called), it will be saved in the appropriate format so it can be used as a base model for training of an actual task. A base model can also be loaded and finetuning can be continued.

Parameters: modeldir (str) – The path where the model should be saved. If the folder does not exist yet, it will be created
Raises: autonlu.core.ModelSaveException – If saving the model fails

finetune(corpus_filename, batchsize=4, burnin_epochs=0.01, burnin_timelimit=None, burnin_lr=0.002, training_epochs=1, training_timelimit=None, training_lr=2e-05, lm_tasks=['NSP', 'combinedMLM'], loss_weights=[], length=500, teacher=None, verbose=False)¶

Performs language model fine tuning on a given text corpus. Only available for "classification" tasks (standard value, if not set elsewise in the initialization).

This command will also automatically generate a tensorboard-log, visualizing the different losses over time. The logs are saved in a “runs” directory and can be displayed by using tensorboard --rundirs=runs

Parameters

corpus_filename (str) – The text file to be used for language model fine tuning. This should be a standard text file where documents are separated by two new-lines.
batchsize (int) – The number of sequences to be used for one pass during fine tuning. The batchsize for the burn in phase is automatically four times higher. If multiple GPUs are being used, the batchsize is multiplied by the number of available GPUs. If the batch size is too big, the system will automatically half the batch size until the batches fit on the GPUs without loss of data.
burnin_epochs (float) – Number of epochs to be used for the burn in phase. In the burn in phase, the language model is kept fixed and only the prediction heads are trained. This lets the whole system stabilize without messing up the actual language model. The number of epochs can be given as floating point numbers. When set to 1.0, on average, the whole text of the training corpus will have been seen once by the model. The number of burn in epochs should be selected so this phase takes around 10 minutes. More is usually not necessary.
burnin_timelimit (Optional[float]) – Number of seconds after which the burnin phase will be ended. If the number of epochs is reached before, the burnin phase will end earlier than that. If None, the burnin will proceed until the epochs are finished.
burnin_lr (float) – Learning rate to be used for the burnin phase
training_epochs (float) – Number of epochs to be used for language model finetuning. The number of epochs can be given as floating point numbers. When set to 1.0, on average, the whole text of the training corpus will have been seen once by the model.
training_timelimit (Optional[float]) – Number of seconds after which the training will be ended. If the number of epochs is reached before, the training will end earlier than that. If None, the burnin will proceed until the epochs are finished.
training_lr (float) – Learning rate to be used for the language model fine tuning
lm_tasks (List[str]) –
Describes the task to be learned. Possible list elements are: SO: Sentence Ordering NSP: Next Sentence Prediction SONSP: SO & NSP prelabeled: uses a trainer to label sentences prelabeled_words: uses a trainer to label sentences, where “sentences” are just consecutive words

(i.e. not sentences in the grammatical sense)

soloMLM: independent Mask Language Model combinedMLM: a MLMtask which is trained together with the other tasks on the same data
loss_weights (List[float]) – Gives a particular weight to the losses of the lm_tasks. If empty, each loss has the weight 1
length (Union[int, List[int]]) – Determines the number of tokens per sentence in a batch. If length is a list of two integers, the number of tokens per sentence in a batch takes a random value within the two integers ([low, high]). If length is an integer, this is the number tokens per sentence. Remark: Currently, for all lm_tasks except prelabeled, a “sentence” is just a sequence of consecutive words/tokens of a given length. For prelabeled, grammatical sentences are used. Here, the length is defined by the sentence itself.
teacher (Optional[LMTeacher]) – An instance of autonlu.finetuning.LMTeacher. Needed for the tasks prelabeled and prelabeled_words, where labels are provided by a teacher.
verbose (bool) – If True, progress bars with additional information will be shown during training

upload(name, display_name=None, short_description='', long_description='', language='en', verbose=False)¶

Uploads this model to Studio

Parameters

name (str) – The internal name that should be used for the model in Studio (e.g. this is the name you can use to later download the model from Studio again). Has to be unique. If a model with the same name already exists on Studio, a ModelNameExists exception will be thrown.
display_name (Optional[str]) – The name which should be displayed for this model in Studio. Does not have to be unique.
short_description (str) – The description that is shown below the model name in the model list. If empty, the content of long_description will be used.
long_description (str) – The description that is shown when the model is opened. If empty, the content of short_description will be used.
language (str) – A language identifier (e.g. "en", "de"). https://en.wikipedia.org/wiki/ISO_639-1
verbose (bool) – If True, some information is printed when compressing the model is finished etc.

Raises

ModelNameExists if the chosen name is already used in Studio –