Model¶
Model implements the easiest to use interface into our framework and
uses SimpleModel
internally for training, inference,
etc. Currently, three different text classification tasks and two
sequence labeling tasks are supported.
Text classification¶
- Label
Exactly one label from a set of possible labels is assigned to each segment of text. For example, this might be used for detecting sentiment where each sentence either has a sentiment (e.g.
negative
,neutral
, orpositive
) or not, indicated by the classnone
. You have this task, if the training target is a list of strings. E.g.["A", "B", "A"]
- Class
An arbitrary number of classes (from none to all possible classes) is assigned to each segment of text. For example, this might be used for topic detection, where each sentence can have none or multiple topics. You have this task, if the training target is a list of lists of strings E.g.
[["A", "B"], [], ["A"]]
- Class Label
An arbitrary number of classes (from none to all possible classes ) is assigned to each segment of text and each detected class is assigned exactly one label from a set of possible labels. For example, this can be used for aspect based sentiment detection where each sentence can have multiple topics/aspects and each of them has a sentiment. You have this task, if the training target is a list of lists of pairs of strings. E.g.
[[["A", "1"], ["B", "1"]], [], [["A", "2"]]]
Sequence labeling¶
- token_classification
Each word in a given text can have its own label, e.g.
person
,location
,organization
, etc. Training samples are provided in the form of simple markup language texts.- question_answering
Samples consist of
question
-context
- tuples. The model searches for words in thecontext
which qualify as answer to thequestion
. For training, thecontext
is provided in a markup language, where the correct words are tagged.
Every trained model that can be loaded has (implicitly) one of these
tasks associated with it. For text classification, the specific task
is automatically deduced from the data. More precisely, it depends on
whether a list of all classes is given and how the list of labels
looks like in the meta.json
file in the model folder.
As a user, you do not have to concern yourself with this.
For the sequence labeling tasks, it suffice to instantiate the model
with the argument task=”token_classification” or
task=”question_answering”.
This makes for a very easy to use interface to our whole system and many tasks can be solved with < 5 lines of code.
Model types¶
Currently, three different model types are supported by AutoNLU:
- Standard Models
Support all tasks and are the main model to be used for label tasks and sequence labeling tasks. A standard model is used if no postfix is attached to a base model name. E.g.
Model("bert-base-uncased")
would create a standard model. We recommend this model for label tasks.- OMI Models
Is a newly introduced type of model which is highly effective for class and classlabel tasks. For these tasks it generally trains and predicts much faster and achieves slightly higher accuracies that standard models. You are using this kind of model type if you append
#omi
to any base model name (e.g.Model("bert-base-uncased#omi")
). OMI models do not support, and are not useful for, label tasks. They have some additional disadvantages. For example, standard models generalize to unseen classes to a certain degree (i.e. they can be used to predict classes they were not seen on previously). This is not possible for OMI. We recommend this model in most cases for class as well as classlabel tasks.- CNN Models
Is a highly efficient model type which supports all three tasks. It can be more than 10 times faster than even an OMI model for prediction. You are using this kind of model type if you append
#cnn
to any base model name (e.g.Model("albert-base-v2#cnn")
). The base model here mainly specifies which embeddings will be used for the tokens. We recommend usingalbert-base-v2
in most cases. The speed comes at a price and CNN models usually achieve slightly lower accuracies than the two other types of models. In addition, some functionality is not currently supported for CNN models (e.g.autonlu.Model.finetune()
,autonlu.Model.select_to_label()
) We recommend CNN models if prediction speed is very important and as a student model target for distillation.
The Model Class¶
- class autonlu.Model(model_folder, all_labels=None, standard_label=None, task=None, **kwargs)¶
The class
Model
implements a versatile model that can be trained on (one of) various tasks. Prediction results are returned in an easy to use format depending on the task the model was trained for.To determine which task the model shall learn, you can set the
task
argument. If the argument is not set, a"classification"
task is assumed (for more information, see thetask
argument).Most of the arguments are forwarded directly to the constructor of
SimpleModel
.- Parameters
model_folder (
str
) –A path or name of the model that should be used. Will be sent through
get_model_dir()
and can therefore be:The path to a model
The name of a model available in Studio
The name of a model available in the Huggingface model repo
task (
Optional
[str
]) –Determines which task the model should learn. This is only relevant when loading a base model to be trained. When an already trained model is loaded this value is not needed! Possible values are:
- ”classification”
The standard value which is also chosen when
task
is not set. The"classification"
task comprises the subtasksclasslabel
,class
andlabel
, which are automatically derived from the target labels during training.- ”token_classification”
Each word in a sample has a label. A typical example is named entity recognition (NER).
- ”question_answering”
Each sample consists of a
question
-context
-tuple. The model seeks one or more passages in the context which qualify as answers to the question.
all_labels (
Optional
[List
[str
]]) – A list of all labels the model should use for a"classification"
task. If None, the list of labels will be determined automatically from the training and evaluation data. Setting this manually is mostly useful if a new model is to be trained and the training data does not contain an example of each possible label.standard_label (
Optional
[str
]) – The standard label to use forclasslabel
(sub)tasks. If the standard label is set, each class that is not explicitly mentioned for a sample of a classlabel task will be assumed to have this label. During prediction, classes that get assigned this label will also not be mentioned explicitly in the results.key – A JSON web token which is used for authentication. If no key is given, the key is alternatively taken from the environment variable
DO_PRODUCT_KEY
.baseurl – Base url of the studio instance. If None, the environment variable
DO_BASEURL
will be used. IfDO_BASEURL
is not defined, the standard base-url of the official DeepOpinion Studio server will be used. In most cases this has not to be changed unless you are working with an on-premise version of Studiostate_callback –
Something callable (function or class with
__call__
member function) taking one keyword argumentprogress
. The callback is called with the current progress in percent after each batch. E.g.:>>> # Print current progress after each batch >>> def callback(progress): >>> print(f"Current progress = {progress}")
stop_callback –
Something callable (function or class with
__call__
member function) taking no arguments. Is called after each batch and prediction or training is stopped if True is returned. E.g.:>>> # Stop after 10 batches >>> i = 0 >>> def callback(): >>> nonlocal i >>> i += 1 >>> if i >= 10: >>> return True >>> return False
encrypt – If True, the model is encrypted on save.
device – Which device the model should be used on (
"cpu"
or"cuda"
). IfNone
, a device will be automatically selected: If a CUDA capable GPU is available, it will automatically be used, otherwise the CPU will be used. This behavior can be overwritten by specifically setting the environment variableDO_DEVICE
to either"cpu"
or"cuda"
.log_dir – Specifies in which directory Tensorboard logs should be written. Default is
tensorboard_logs
. IfNone
, no logs will be written. The logs for individual runs will be put into subdirectories named after the current timestamp.use_samplehash – If True (the default), a hash for all trained samples will be saved. These hashes are used during active learning (
select_to_label()
) to exclude sentences that were already seen during trained. IfFalse
, these hashes will not be saved, which speeds up some processes, saves memory, and reduces the size of the saved model. It can be useful to disable the sample hash if huge amounts of training data are being used.trial – A trial represents a single setup for automatic hyperparameter optimization. See also https://optuna.readthedocs.io/en/stable/reference/trial.html
- modeltype()¶
Returns which modeltype we have loaded.
- Return type
str
- Returns
One of
"base"
,"label"
,"class"
,"classlabel"
,"token_classification"
, and"question_answering"
.
- predict(X, classes_to_analyze=None, return_extras=False, recommend_manual_check=False, **kwargs)¶
Predicts the correct results for a list of samples
X
, depending on thetask
the model was trained for.All arguments from
SimpleModel.predict()
can also be used and are included in the following list of arguments- Parameters
X (
List
) –A list of samples for which we do the prediction. The format of the list elements depends on the specific task the model was trained on. Generally, one should use the same data format that was used for training. In the following, we review the data format as determined by
Model.task
:- ”classification”
X
is a list of strings.- ”token_classification”
X
is a list of strings. The strings might also contain label information in the form of markup language tags (in case one likes to test the prediction of samples for which the correct labels are already known). This label information is ignored and the markup language tags are removed before the prediction is done.- ”question_answering”
X
is a list of pairs of strings, consisting of aquestion
string and acontext
string in which the answer might be found. As for"token_classification"
, thecontext
string may contain markup tags, which are ignored for the prediction.
classes_to_analyze (
Optional
[List
[str
]]) – Used for the"classification"
task in case of aclass
orclasslabel
subtask. Specifies a list of classes that should be considered during prediction. IfNone
, the full list of classes learned by the model will be used.batchsize – Number of samples to predict in one inference step. A higher value generally means higher throughput, but the system might run out of memory if this is set too high. In cases where the GPU memory is running out, the system will automatically switch to smaller batch sizes without loss of data, but the switching takes time. Higher values than 128 (the default) usually does not increase the performance by much.
dynamic_quantization – If
True
, forward propagation is done with lower precision to speed up predictions. This is only supported on the CPU. Warning: This feature could reduce the accuracy of your model.return_extras (
bool
) – IfTrue
, the system returns, in addition to the processed results, extra information about the prediction. Currently, this entails the raw samples and results of the underlyingSimpleModel
, the label probabilities and entropies for all the predictions, and information about whether a sample should be checked manually and the probability of the prediction being incorrect in case the human correction system is in use. In case of a"token_classification"
or"question_answering"
task, word-lists that show how the text was split into words are returned, in addition to the labels of all words.recommend_manual_check (
bool
) – IfTrue
, the predictions consist of (label, bool) pairs where bool indicates whether this sample should be checked manually if the human correction system is in use.verbose – If
True
, a progress bar is shown during prediction.
- Returns
- “classification”
- A list of strings for a label subtask, e.g.
["POS", "NEG", "NEG", "POS"]
- A list of lists of strings for a class subtask, e.g.
[["service"], [], ["support", "sales"]]
- A list of lists of lists of two strings (class and label) for a classlabel subtask, e.g.
[[["room", "POS"], ["service", "NEG"]], [["cleanliness", "NEU"]]]
- ”token_classification”
- A list of markup language texts, e.g.
["<person>Tom</person> was in <location>London</location>.", "<person>Lisa</person> loves <location>Paris</location>."]
- ”question_answering”
A list of markup language texts.
>>> X = [("What color do bananas have?", "Tomatoes are red and bananas are yellow."), >>> ("What color do tomatoes have?", "Tomatoes are red and bananas are yellow.")] >>> prediction = model.predict(X=X)
The result (prediction) should look as follows:
["Tomatoes are red and bananas are <answer>yellow</answer>.", "Tomatoes are <answer>red</answer> and bananas are yellow."]
If
return_extras
, a Tuple(result, extras)
is returned whereresult
is in the previously described format andextras
is a dictionary, containing additional information. The dictionary contains the following keys and values:- ”raw_samples”
Contains the actual samples that were sent through the
SimpleModel
.- ”label_probabilities”
Contains the probabilities for all possible labels.
- ”entropies”
Contains the entropies of all predictions (a high entropy indicates that the system was less sure in its prediction).
- ”mistake_probabilities”
If the human correction system is set up (i.e. func:Model.calculate_human_correction_data was called) Contains a probability for each prediction giving an estimated upper bound on the probability (range [0, 1]) that the prediction might be incorrect. For class and classlabel tasks the probabilities relate to the individual samples found in
"raw_samples"
- ”word_lists”
Only for the tasks
"token_classification"
and"question_answering"
. Returns a list (over all samples) of lists of words showing how the model split the text into words (tokens).- ”word_labels”
Only for the tasks
"token_classification"
and"question_answering"
. Returns a list over lists. The outer list is over all samples, the inner list contains the predicted label of each word inword_lists
- Return type
The output format is determined by the task the current model was trained for
- Raises
ValueError – If the product key could not be authorized
Example
Assumes the environment variable
DO_PRODUCT_KEY
is correctly set>>> m = Model(model_folder="DeepOpinion/hotels_absa_en") >>> segments = ["The room was nice, but the staff was unfriendly"] >>> res, extras = m.predict(segments, return_extras=True) res = [[['Room', 'POS'], ['Staff', 'NEG']]] extras["raw_samples"] = [[('The room was nice, but the staff was unfriendly', 'Activities'), ('The room was nice, but the staff was unfriendly', 'Ambiance'), ('The room was nice, but the staff was unfriendly', 'Amenities'), ... ] extras["label_probabilities"] = [array([[9.9923420e-01, 2.1814957e-04, 2.0768745e-04, 3.3991490e-04], [9.9462909e-01, 3.1983077e-03, 1.2891486e-03, 8.8348583e-04], [9.9342984e-01, 3.9202035e-03, 1.5014511e-03, 1.1485954e-03], ... ] extras["entropies"] = [[['Activities', 0.0070808344], ... ['View', 0.010186324], ['WiFi', 0.005671744]]] extras["manual_check_recommended"] = [[['Activities', False], ['Ambiance', False], ['Amenities', False], ... ['WiFi', False]]]
- train(X, Y=None, valX=None, valY=None, valsplit=0.1, do_evaluation=True, label_probabilities={}, all_classes=None, val_all_classes=None, all_labels=None, learning_rate=None, mindatasetsize=None, patience_epochs=None, lr_reduction_patience=None, lr_reduction_factor=None, epsilon=None, rawX=None, rawY=None, do_early_stopping=None, decay_func_name=None, nb_opti_steps=None, total_lr_decay=None, *, calculate_human_correction_data=True, **kwargs)¶
Trains a model on a specific task. If you did not specify a task during the initialization, the standard task is
"classification"
. In this case, the model can be trained on one of the three subtasksclasslabel
,class
orlabel
. The subtask is automatically deduced from the format of the training data. For more details, see explanations for the parametersX
andY
.Model.train()
offers two different methods of training, which differ in the way the learning rate is adjusted and under which conditions the training is stopped. The two methods are selected with the argumentdo_early_stopping
. When set toTrue
, the model will be tested on the validation data in regular intervals. Depending on the test results, the learning rate might be reduced or the training might be stopped if the model does not improve anymore. Ifdo_early_stopping
is set toFalse
, the training runsnb_opti_steps
optimization steps and proceeds independently of the evaluation. After each optimization step, thelearning_rate
is slightly reduced. Ifdo_early_stopping
is not specified by the user,do_early_stopping
is set toFalse
for all OMI models and forlabel
tasks. It is set toTrue
forclass
andclasslabel
tasks that do not use an OMI model. Both training methods come with specific arguments.All arguments from
SimpleModel.train()
can also be used and are included in the following list of arguments- Parameters
X (
List
) –A list of training samples. The format depends on the specific training task:
- ”classification”
X
are the input text samples as a list of strings- ”token_classification”
X
is a list of strings in a simple markup language format. Words can be associated with a label. Example:>>> X = ["<person>Tom Miller</person> was in <location>London</location>.", >>> "<person>Lisa</person> loves <location>Paris</location>."]
- ”question_answering”
X
is a list of pairs of strings, consisting of aquestion
string and acontext
markup language string in which the correct answer(s) are marked by start and end tags. Example:>>> X = [("What color do bananas have?", >>> "Tomatoes are red and bananas are <answer>yellow</answer>."), >>> ("What color do tomatoes have?", >>> "Tomatoes are <answer>red</answer> and bananas are yellow."))]
Y (
Optional
[List
]) –Training targets. Only needed for the
"classification"
task. For"token_classification"
and"question_answering"
, the training targets are already contained inX
andY
can be set toNone
(or ignored). The"classification"
task knows the three subtasksclass
,label
andclasslabel
. The correct subtask is automatically derived from the format ofY
, which can be:- label subtask
A list of strings, e.g.
>>> Y = ["POS", "NEG", "NEG", "POS"]
- class subtask
A list of lists of strings, e.g.
>>> Y = [["service"], [], ["support", "sales"]]
- classlabel subtask
A list of lists of lists of two strings (class and label), e.g.
>>> Y = [[["room", "POS"], ["service", "NEG"]], [["cleanliness", "NEU"]]]
valX (
Optional
[List
[Union
[str
,Tuple
[str
,str
]]]]) – Input samples used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via thescore_callback
. Same format asX
. IfNone
, a part (10%) ofX
will be split off.valY (
Optional
[List
[str
]]) – Training target used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via thescore_callback
. Same format asY
. A part (10%) ofY
will be split off ifNone
and we are training a classification task.rawX (
Optional
[List
[str
]]) – Input text samples that should be used for raw training targets. Currently only supported for classification tasks.rawY (
Optional
[List
[Union
[str
,Tuple
[str
,str
],Tuple
[str
,bool
]]]]) – Training targets for rawX. They consist of single targets per sample inrawX
. I.e. a string for a label task, a tuple of (class, label) for a classlabel task and a tuple (class, bool) for a class task where bool indicates whether class is contained in the text or not. Note that for the classlabel task the label always has to be given explicitly sincestandard_label
is ignored for raw data. This raw format is mainly intended to be used when data is obtained during active learning, or as a side product of manually checking samples suggested by the human correction system. Currently only supported for classification tasks.valsplit (
float
) – IfvalX
orvalY
is not given, specifies how much of the training data should be split off for validation. Default is 10%.do_evaluation (
bool
) – If set toFalse
no evaluation is done. This also means that early stopping is automatically deactivated.label_probabilities (
Dict
[str
,float
]) – A dictionary, mapping label names to the probability (a number between0.0
and1.0
) of that label being used for training. All labels not mentioned inlabel_probabilities
are assumed to have a probability of1.0
. Can be used to subsample certain labels if they are overrepresented. Currently only supported for classification tasks.all_classes (
Union
[List
[str
],List
[List
[str
]],None
]) – Only used for class or classlabel tasks. Either a list of all possible classes or a list of lists of all possible classes if the possible classes are different for each samples. The latter is useful when using thestandard_label
and certain classes should not be generated for specific samples, which happens when using active learning viaselect_to_label()
. Alternatively use therawX
andrawY
arguments. IfNone
, the list of possible classes will be determined automatically fromY
andvalY
val_all_classes (
Union
[List
[str
],List
[List
[str
]],None
]) – Same asall_classes
, just for the validation dataall_labels (
Optional
[List
[str
]]) – A list of all possible labels. IfNone
, the list of possible labels will be determined fromY
andvalY
. Can be set explicitly in cases where not all possible labels do occur in the training and validation set (e.g. because they will only be used in a later training session)seed – Fix the random seed to make training deterministic (i.e. with the same seed and the same input data in the same order, the resulting model should be identical). Warning! Setting a seed can slow down training.
learning_rate (
Optional
[float
]) – The learning rate to be used at the start of training. Higher learning rates will lead to faster convergence, but might lead to worse overall accuracy and if the learning rate is set too high, the system might not learn anything. IfNone
, an appropriate learning rate for the given task is being selected.2e-4
for label tasks and2e-5
for class and classlabel tasks.batchsize – The number of samples to use in one training step. This also sets the number of samples to accumulate for one weight update if the number is bigger than 32 (at a minimum, 32 samples are always accumulated). A higher value generally means higher throughput, but the system might run out of memory if this is set too high. In cases where the GPU memory is running out, the system will automatically switch to smaller batch sizes without loss of data.
autobatchsize – Deprecated! This option should not be used anymore and will be removed. With the new dynamic batchsize, lowering on CUDA memory error, this is not needed anymore! If
True
the batchsize will be determined automatically. IfTrue
, the parameter batchsize gives the maximal batchsize to use.metric_callback –
Something callable (function or class with
__call__
function) that takes two keyword argumentsY_true
(containing true label numbers from the validation dataset) andY_pred
(containing the label numbers predicted by the currently trained model) and returns a metric, which will be passed as an argument toscore_callback
. The format ofY_true
andY_pred
is the one used bySimpleModel()
. Used to define the metric (e.g. accuracy) to use for the reported score E.g.:>>> # Return accuracy as a metric >>> import numpy as np >>> def callback(Y_true, Y_pred): >>> return np.sum(Y_true == Y_pred) / len(Y_true)
score_callback –
Something callable (function or class with
__call__
function) taking one keyword argumentscore
that is filled with the output ofmetric_callback
and evaluated in regular intervals during training. E.g.:>>> # Print current score >>> def callback(score): >>> print(f"Current score = {score}")
verbose – If
True
, information about the training progress will be shown on the terminal.do_early_stopping (
Optional
[bool
]) –If
True
, early stopping will be used. I.e. the model will be tested on the validation data in regular intervals and training will be stopped if the model does not improve anymore. IfFalse
, a preset schedule ofnb_opti_steps
optimization steps is used, combined with a decaying learning rate.do_early_stopping
isFalse
by default, with the exception ofclass
andclasslabel
tasks that are trained without an OMI model (postfix#omi
for the base model)- Arguments used when
do_early_stopping
isFalse
: decay_func_name: Describes the kind of learning rate decay to use. Options are: - “linear” - “exp” - “exp_sqr”
nb_opti_steps: The number of optimization steps after which the training is stopped.
total_lr_decay: Sets the factor by which the initial
learning_rate
will be reduced by the end of the training.
- Arguments used when
do_early_stopping
isTrue
: epochs: The maximum number of epochs used for training.
mindatasetsize: Early stopping assumes the datasets size to be at least
mindatasetsize
. A largemindatasetsize
in essence means that the patience for early stopping will be increased. Default is 4,000 for label tasks, 0 for class tasks, and 70,000 for classlabel tasks, to train small datasets longer since this works better in practice. A value of 70,000 in essence means that datasets with less than 70,000 samples will be trained for as long as a data set with 70,000 samplesmaxdatasetsize: Early stopping assumes the datasets size to be at most
maxdatasetsize
. A smallmaxdatasetsize
in essence means that the patience for early stopping will be decreased. Default is 200,000 to train large datasets for a shorter time. A value of 200,000 in essence means that datasets with more than 200,000 samples will only be trained for as long as a data set with 200,000 samples. This does NOT mean that only 200,000 of the samples will be used. All the data is still being utilized. This only influences at which point early stopping decides that a model does not improve anymore!val_metric: The validation metric to use to determine if a model is better than another one. Generally this should not be changed from
val_accuracy
if you don’t know exactly what you are doing.val_maximize: If
True
, a higher value ofval_metric
is considered better, ifFalse
, a smaller value is considered better. Has to fit the specific metric used inval_metric
.patience_epochs: Defines how many epochs are waited without the model improving before the training is stopped.
lr_reduction_patience: Proportion of one epoch to wait without improvement until the learning rate is reduced.
lr_reduction_factor: The factor with which the learning rate is multiplied if the patience runs out.
epsilon: Maximal difference in the metric for which should still be considered identical.
- Arguments used when
calculate_human_correction_data (
bool
) – IfTrue
, the human correction system is automatically set up using the validation data (if present)
Example
Assumes the environment variable
DO_PRODUCT_KEY
is correctly set>>> # The standard value for Model.task is "classification". Hence, instead of >>> # m = Model("albert-base-v2", standard_label = "NONE", task="classification") >>> # it suffices to do: >>> m = Model("albert-base-v2", standard_label = "NONE") >>> segments = ["The room was nice, but the staff was unfriendly!", >>> "They served great food and the drinks were ok."] >>> Y = [[["room", "POS"], ["staff", "NEG"]], >>> [["food", "POS"], ["drinks", "NEU"]]] >>> m.train(X = segments, Y=Y, valX=segments, valY=Y)
- evaluate(X, Y=None, all_classes=None, batchsize=128, dynamic_quantization=False, verbose=False, **kwargs)¶
Evaluates a model on given data and returns different performance metrics.
- Parameters
X (
List
) –A list of samples to be evaluated. The format of the list elements depends on the specific task the model was trained on and is determined by the value
Model.task
(set in initialization).Model.task
can have the following values:- ”classification”
X
is a list of strings, where each string is a sample of text.- ”token_classification”
X
is a list of strings in a simple markup language format. In"token_classification"
, words can be associated with a label. Example:>>> X = ["<person>Tom Miller</person> was in <location>London</location>.", >>> "<person>Lisa</person> loves <location>Paris</location>."]
- ”question_answering”
X
is a list of pairs of strings, consisting of aquestion
string and acontext
markup language string in which the correct answer(s) are marked by start and end tags. Example:>>> X = [("What color do bananas have?", >>> "Tomatoes are red and bananas are <answer>yellow</answer>."), >>> ("What color do tomatoes have?", >>> "Tomatoes are <answer>red</answer> and bananas are yellow."))]
Y (
Optional
[List
]) –Correct answers. Only needed for the
"classification"
task. For"token_classification"
and"question_answering"
, the correct answers are already contained inX
. Hence,Y
can be set toNone
(respectively you can omitY
, sinceNone
is the default value). The"classification"
task knows the three subtasksclass
,label
andclasslabel
and data has to be provided in the following format:- label subtask
A list of strings, e.g.
>>> Y = ["POS", "NEG", "NEG", "POS"]
- class subtask
A list of lists of strings, e.g.
>>> Y = [["service"], [], ["support", "sales"]]
- classlabel subtask
A list of lists of lists of two strings (class and label), e.g.
>>> Y = [[["room", "POS"], ["service", "NEG"]], [["cleanliness", "NEU"]]]
all_classes (
Union
[List
[str
],List
[List
[str
]],None
]) – Only for the"classification"
task. Either a list of all possible classes or a list of lists of all possible classes if the possible classes are different for each sample. This is useful when using thestandard_label
and certain classes should not be generated for specific samples, which happens when using active learning viaselect_to_label()
. IfNone
, the list of possible classes associated with the trained model will be used automatically.batchsize (
int
) – Number of samples to predict in one inference step. A higher value generally means higher throughput, but the system might run out of memory if this is set too high. In cases where the GPU memory is running out, the system will automatically switch to smaller batch sizes without loss of data, but the switching takes time. Higher values than128
(the default) usually does not increase the performance by much.dynamic_quantization (
bool
) – IfTrue
, forward propagation is done with lower precision to speed up predictions. This is only supported on the CPU. Warning: This feature could reduce the accuracy of your model.verbose (
bool
) – IfTrue
, information about the evaluation progress will be printed to the terminal.
- Returns
A dictionary containing
accuracy
,f1_weighted
,precision_weighted
, andrecall_weighted
. For word labeling, question answering, and class tasks, as well as for classlabel tasks with a setstandard_label
,f1_binary
is returned as well.
Example
>>> X = ["The room was very nice, but the staff was bad."] >>> Y = [[["Room", "POS"], ["Staff", "NEG"]]] >>> model = autonlu.Model("DeepOpinion/hotels_absa_en") >>> metrics = model.evaluate(X, Y) >>> # metrics == {'accuracy': 1.0, 'f1_weighted': 1.0, 'f1_binary': 1.0, 'precision_weighted': 1.0, >>> # 'recall_weighted': 1.0}
- distill(student_model, X, Y=None, unlabelledX=None, unlabelled_epochs=2, chunk_size=5000, valX=None, valY=None, valsplit=0.1, label_probabilities={}, all_classes=None, val_all_classes=None, learning_rate=None, mindatasetsize=None, patience_epochs=None, lr_reduction_patience=None, lr_reduction_factor=None, epsilon=None, verbose=False, **kwargs)¶
Distills the model into a student model. It requires a labeled training dataset and optionally an unlabelled dataset (it’s highly recommended to use a large unlabelled dataset). Distillation currently only works for
"classification"
tasks - i.e.Model.task == "classification"
(the standard value - seeautonlu.Model
).All arguments from
SimpleModel.distill()
can also be used and are included in the following list of arguments- Parameters
student_model (
Union
[str
,Model
]) – either a string representing a name or path of a model, or an instance ofModel
.X – Input text samples as a list oft strings
Y –
Training target. List containing the correct output. The input format depends on the subtask
- label subtask
A list of strings, e.g.
>>> Y = ["POS", "NEG", "NEG", "POS"]
- class subtask
A list of lists of strings, e.g.
>>> > Y= [["service"], [], ["support", "sales"]]
- classlabel subtask
A list of lists of lists of two strings (class and label), e.g.
>>> Y = [[["room", "POS"], ["service", "NEG"]], [["cleanliness", "NEU"]]]
unlabelledX (
Optional
[List
[str
]]) – Input text samples as a list oft strings, optional. Default: None.chunk_size (
int
) – Size of the chunks to use for unlabelled distillationunlabelled_epochs (
int
) – int; Number of epochs to train over on the unlabelled dataset. In unlabelled distillationdo_early_stopping
isFalse
, and the learning rate scheduler defaults to a linear decayvalX (
Optional
[List
[Union
[str
,Tuple
[str
,str
]]]]) – Input samples used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via thescore_callback
. Same format asX
. If None, a part ofX
will be split off.valY (
Optional
[List
[str
]]) – Training target used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via thescore_callback
. Same format asY
. If None, a part ofY
will be split off.valsplit (
float
) – IfvalX
orvalY
is not given, specifies how much of the training data should be split off for validation. Default is 10%.label_probabilities (
Dict
[str
,float
]) – A dictionary, mapping label names to the probability (number between 0 and 1) of that label being used for training. All labels not mentioned inlabel_probabilities
are assumed to have a probability of 1. Can be used to subsample certain labels if they are overrepresented.all_classes (
Union
[List
[str
],List
[List
[str
]],None
]) – Either a list of all possible classes or a list of lists of all possible classes if the possible classes are different for each samples. This is useful when using thestandard_label
and certain classes should not be generated for specific samples, which happens when using active learning viaselect_to_label()
. If None, the list of possible classes will be determined automatically fromY
andvalY
val_all_classes (
Union
[List
[str
],List
[List
[str
]],None
]) – Same asall_classes
, just for the validation dataall_labels – A list of all possible labels. If None, the list of possible labels will be determined from
Y
andvalY
. Can be set explicitly in cases where not all possible labels do occur in the training and validation set (e.g. because they will only be used in a later training session)temperature – Defines the temperature factor used in the distillation loss calculation (student and teacher logits are divided by temperature before being passed to softmax functions). Therefore, the higher the temperature, the smoother the probability distributions get. Typically temperatures between 1.0 and 5.0 give the best results. Defaults to 1.0
alpha – a factor determining the relative proportion of CrossEntropy in the total distillation loss. Has to be between 0.0 and 1.0.
seed – Fix the random seed to make training deterministic (i.e. with the same seed and the same input data in the same order, the resulting model should be identical). Warning! Setting a seed can slow down training.
learning_rate (
Optional
[float
]) – The learning rate to be used during training. Higher learning rates will lead to faster convergence, but might lead to worse overall accuracy and if the learning rate is set too high, the system might not learn anything. If None, an appropriate learning rate for the given task is being selected. 2e-4 for label tasks and 2e-5 for class and classlabel tasks.batchsize – The number of samples to use in one training step. This also sets the number of samples to accumulate for one weight update if the number is bigger than 32 (at a minimum, 32 samples are always accumulated). A higher value generally means higher throughput, but the system might run out of memory if this is set too high. In cases where the GPU memory is running out, the system will automatically switch to smaller batch sizes without loss of data. The wrong batch size might also inhibit proper training.
mindatasetsize (
Optional
[int
]) – Early stopping assumes the datasets size to be at leastmindatasetsize
. A largemindatasetsize
in essence means that the patience for early stopping will be increased. Default is 0 for label tasks and 70,000 otherwise to train small datasets longer since this works better in practice. A value of 70.000 in essence means that datasets with less than 70.000 samples will be trained for as long as a dataset with 70.000 samplesmaxdatasetsize – Early stopping assumes the datasets size to be at most
maxdatasetsize
. A smallmaxdatasetsize
in essence means that the patience for early stopping will be decreased. Default is 200,000 to train large datasets for a shorter time. A value of 200,000 in essence means that datasets with more than 200,000 samples will only be trained for as long as a dataset with 200,000 samples. This does NOT mean that only 200,000 of the samples will be used. All the data is still being utilized. This only influences at which point early stopping decides that a model does not improve anymore!val_metric – The validation metric to use for the BestModelKeeper (i.e. which metric should be used to determine if a model is better than another one). Generally this should not be changed from
val_loss
.val_maximize – If True, a higher value of
val_metric
is considered better, if False, a smaller value is considered better. Has to fit the specific metric inval_metric
cache_dir –
- Directory used to cache the teacher logits (if the teacher model is saved on disk, a subdirectory named
precomp_logits
in the model folder will be used)
verbose: If True, information about the training progress will be shown on the terminal.
Example
Assumes the environment variable
DO_PRODUCT_KEY
is correctly set>>> m = Model("albert-base-v2", standard_label = "NONE") >>> segments = ["The room was nice, but the staff was unfriendly!", >>> "They served great food and the drinks were ok."] >>> Y = [[["room", "POS"], ["staff", "NEG"]], >>> [["food", "POS"], ["drinks", "NEU"]]] >>> student = m.distill("albert-base-v2#cnn", X = segments, Y=Y, valX=segments, valY=Y)
- select_to_label(X, classes_to_analyze=None, **kwargs)¶
Selects sentences that the current model would like as additional training data to maximally improve performance. Currently only supported for “classification” tasks.
All arguments from
SimpleModel.select_to_label()
can also be used and are included in the following list of arguments- Parameters
X – A list of segments or segment pairs the system can select to be added to the training data. Usually this is data that is available, but not yet labelled.
classes_to_analyze (
Optional
[List
[str
]]) – Used in case of a class or classlabel task. Specifies a list of classes that should be considered when selecting sentences for labeling. IfNone
, the list of all known classes is used automatically. This can be useful if certain classes are underrepresented in the training data and we would like to concentrate our selection on those classes.acquisitionsize – The number of samples the system should select. The higher the number, the more data can be labelled in one go. More iterations, with smaller acquisitionsizes will be able to learn more from fewer manually labelled samples though. Values from 50 to 100 are generally a good compromise.
modelsamples – How often different variants from the current model should be used to sample the given segments. Higher numbers will lead to more accurate results, but will also take more time.
al_samples – During selection of the requested segments, a probability distribution has to be approximated.
al_samples
specifies how many samples should be taken from this distribution as an approximation. Higher values lead to more accurate results, but the runtime increases.preselectionsize – Especially when
X
is getting very big, the selection process can become slow.preselectionsize
specifies how many samples should be pre-selected using a much faster method. Higher values lead to a better selection, but increase the runtime. If None, the preselectionsize is10 * acquisitionsize
verbose – If
True
, information about the active learning process is shown, also shows progress bars
- Returns
A Tuple
(samples_to_label, scores)
wheresamples_to_label
are the samples that the system would like to see labeled. In case of a class or classlabel task, the samples are (segment, class) tuples andscore
is how unsure the model was about the given samples. The score is not the only criteria that is used to select samples so the scores are not necessarily monotonically decreasing.
Example:
>>> m = Model("DeepOpinion/hotels_absa_en") >>> X = ["The room was horrible", "The food was quite nice", ...] >>> samples_to_label, scores = m.select_to_label(X=X, acquisitionsize=2) samples_to_label = [("The room was horrible", "room"), ("We really enjoyed the stay", "satisfaction")] scores = [1.34, 0.561]
- save(model_dir)¶
Saves the current model.
If only a language model is present (meaning only finetuning was called), it will be saved in the appropriate format so it can be used as a base model for training of an actual task. A base model can also be loaded and finetuning can be continued.
- Parameters
modeldir – The path where the model should be saved. If the folder does not exist yet, it will be created
- Raises
autonlu.core.ModelSaveException – If saving the model fails
- finetune(corpus_filename, batchsize=4, burnin_epochs=0.01, burnin_timelimit=None, burnin_lr=0.002, training_epochs=1, training_timelimit=None, training_lr=2e-05, lm_tasks=['NSP', 'combinedMLM'], loss_weights=[], length=500, teacher=None, verbose=False)¶
Performs language model fine tuning on a given text corpus. Only available for
"classification"
tasks (standard value, if not set elsewise in the initialization).This command will also automatically generate a tensorboard-log, visualizing the different losses over time. The logs are saved in a “runs” directory and can be displayed by using
tensorboard --rundirs=runs
- Parameters
corpus_filename (
str
) – The text file to be used for language model fine tuning. This should be a standard text file where documents are separated by two new-lines.batchsize (
int
) – The number of sequences to be used for one pass during fine tuning. The batchsize for the burn in phase is automatically four times higher. If multiple GPUs are being used, the batchsize is multiplied by the number of available GPUs. If the batch size is too big, the system will automatically half the batch size until the batches fit on the GPUs without loss of data.burnin_epochs (
float
) – Number of epochs to be used for the burn in phase. In the burn in phase, the language model is kept fixed and only the prediction heads are trained. This lets the whole system stabilize without messing up the actual language model. The number of epochs can be given as floating point numbers. When set to 1.0, on average, the whole text of the training corpus will have been seen once by the model. The number of burn in epochs should be selected so this phase takes around 10 minutes. More is usually not necessary.burnin_timelimit (
Optional
[float
]) – Number of seconds after which the burnin phase will be ended. If the number of epochs is reached before, the burnin phase will end earlier than that. If None, the burnin will proceed until the epochs are finished.burnin_lr (
float
) – Learning rate to be used for the burnin phasetraining_epochs (
float
) – Number of epochs to be used for language model finetuning. The number of epochs can be given as floating point numbers. When set to 1.0, on average, the whole text of the training corpus will have been seen once by the model.training_timelimit (
Optional
[float
]) – Number of seconds after which the training will be ended. If the number of epochs is reached before, the training will end earlier than that. If None, the burnin will proceed until the epochs are finished.training_lr (
float
) – Learning rate to be used for the language model fine tuninglm_tasks (
List
[str
]) –Describes the task to be learned. Possible list elements are:
SO
: Sentence OrderingNSP
: Next Sentence PredictionSONSP
: SO & NSPprelabeled
: uses a trainer to label sentencesprelabeled_words
: uses a trainer to label sentences, where “sentences” are just consecutive words(i.e. not sentences in the grammatical sense)
soloMLM
: independent Mask Language ModelcombinedMLM
: a MLMtask which is trained together with the other tasks on the same dataloss_weights (
List
[float
]) – Gives a particular weight to the losses of the lm_tasks. If empty, each loss has the weight 1length (
Union
[int
,List
[int
]]) – Determines the number of tokens per sentence in a batch. If length is a list of two integers, the number of tokens per sentence in a batch takes a random value within the two integers ([low, high]). If length is an integer, this is the number tokens per sentence. Remark: Currently, for all lm_tasks exceptprelabeled
, a “sentence” is just a sequence of consecutive words/tokens of a given length. Forprelabeled
, grammatical sentences are used. Here, the length is defined by the sentence itself.teacher (
Optional
[LMTeacher
]) – An instance ofautonlu.finetuning.LMTeacher
. Needed for the tasksprelabeled
andprelabeled_words
, where labels are provided by a teacher.verbose (
bool
) – If True, progress bars with additional information will be shown during training
- upload(name, display_name=None, short_description='', long_description='', language='en', verbose=False)¶
Uploads this model to Studio
- Parameters
name (
str
) – The internal name that should be used for the model in Studio (e.g. this is the name you can use to later download the model from Studio again). Has to be unique. If a model with the same name already exists on Studio, aModelNameExists
exception will be thrown.display_name (
Optional
[str
]) – The name which should be displayed for this model in Studio. Does not have to be unique.short_description (
str
) – The description that is shown below the model name in the model list. If empty, the content oflong_description
will be used.long_description (
str
) – The description that is shown when the model is opened. If empty, the content ofshort_description
will be used.language (
str
) – A language identifier (e.g."en"
,"de"
). https://en.wikipedia.org/wiki/ISO_639-1verbose (
bool
) – If True, some information is printed when compressing the model is finished etc.
- Raises
ModelNameExists if the chosen name is already used in Studio –
- prune(layers_to_prune)¶
Set the layers of a model which should be pruned (i.e. not used and removed during training).
Only call this function if you want to prune specific layers and know their number. In most cases you will want to use
auto_prune()
.- Parameters
layers_to_prune (
List
[int
]) – A list of integers, containing all layer_ids that should be pruned. Therefore, layer_id ∈ [0, num_hidden_layers].
- auto_prune(X, Y, valX=None, valY=None, valsplit=0.1, num_layers_to_prune=6, always_prune=None, max_num_samples=40000, epochs=3, verbose=False)¶
Only for
"classification"
tasks (standard value, if not set elsewise during initialization). Automatically selects the best layers to prune from the current model by using a greedy search strategy: Each layer is left out and the highest accuracy after pruning a layer is then selected. For pruning more layers it is assumed that this previous selection is also a good starting point for pruning more layers. Note that this method internally remembers the layers to be pruned so callingtrain()
afterauto_prune()
is sufficient for the pruned layers to be ignored.- Parameters
X (
List
[Union
[str
,Tuple
[str
,str
]]]) – Input samples. Either a list of strings for text classification or a list of pairs of strings for text pair classification.Y (
List
[str
]) – Training target. List containing the correct labels as strings.valX (
Optional
[List
[Union
[str
,Tuple
[str
,str
]]]]) – Input samples used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via thescore_callback
. Same format asX
. If None, a part ofX
will be split off.valY (
Optional
[List
[str
]]) – Training target used for validation of the model during training. E.g. for stopping training early if there is no progress anymore or to report the current score via thescore_callback
. Same format asY
. If None, a part ofY
will be split off.valsplit (
float
) – IfvalX
orvalY
is not given, specifies how much of the training data should be split off for validation. Default is 10%.num_layers_to_prune (
int
) – How many layers should be pruned from the given architecture.always_prune (
Optional
[List
[int
]]) – A list of layer ids that should always be pruned, independent what the greedy heuristic selectsmax_num_samples (
int
) – The maximum number of training-samples to use by the greedy heuristic for training different candidates. Decreasing this number increases the speed, but decreases the accuracy of the final pruned model.epochs (
int
) – Number of epochs to train a candidate before evaluating the accuracy. Decreasing this value increases the speed to find layers to prune, but also decreases the accuracy of the final pruned model.verbose (
bool
) – If True, information about the overall progress of finding layers to prune is shown.