Changelog¶

To update your installation of AutoNLU please make sure that you have set the environment variable PIP_PULL to your Gemfury token:

export PIP_PULL=your gemfury token
# Example (not a valid token):
# export PIP_PULL=jGxWW-4qKlz8ARZOhJgG9BIVuxsU9231

Then execute the following command:

pip install autonlu --upgrade --extra-index-url=https://${PIP_PULL}:@pypi.fury.io/deepopinion

1.6.0 (2022-07-07)¶

Python 3.9 is now officially supported
Topic modelling has been added with the class TopicModel. Have a look at the corresponding tutorial to see how to use this feature.
Label tasks have been changed to not use early stopping by default and hyperparameter settings have been optimized for this
Some smaller bugfixes

1.5.0 (2022-03-01)¶

Data cleaning: We added functionality to select which samples of a dataset should be checked by a human for detecting label errors as well as samples with general bad data quality (i.e. nonsensical sentences etc.) with the DataCleaner class. Have a look at the corresponding tutorial to see how to use it in practice.
Automatic hyperparameter tuning with the class AutoMl has become more flexible. Have a look at the corresponding tutorial to see the new interface in action.

1.4.0 (2021-12-09)¶

Active learning, via autonlu.Model.select_to_label(), is now also correctly supported for OMI models

1.3.0 (2021-11-30)¶

AutoNLU now supports two completely new tasks with “word labeling” (can be used for e.g. named entity recognition, information extraction, …) and question answering. Have a look at the two new tutorials to see how it is used.
The F1 binary metric was added to autonlu.Model.evaluate() if applicable.

1.2.0 (2021-11-16)¶

If only one epoch will be done (e.g. during prediction), no separate progress bar for the epochs will be shown.
The default mindatasetsize for label tasks has been changed from 0 to 4000 for better performance on small datasets. This will increase training time for small datasets. Training time of bigger datasets is unaffected.
Fixes a bug where in-memory studies using autonlu.AutoMl started the hyperparameter optimization from scratch when additional experiments were performed
Removes many warnings from internal packages
The all_labels argument from Model.distill() was removed since it was not actually used

1.1.0 (2021-09-28)¶

A TypeError exception is now thrown if labels or classes provided for training are not strings
Entropies are now returned for autonlu.Model.predict() if return_extras = True
The argument return_probabilities of autonlu.Model.predict() has been renamed to return_extras in preparation for also returning additional information
Fixed a memory leak when using autonlu.AutoMl
The argument return_logits has been removed from autonlu.SimpleModel.predict(). The logits are now always in the returned dictionary

1.0.1 (2021-09-13)¶

PyTorch 1.9.0 is now supported.

1.0.0 (2021-08-27)¶

Adds autonlu.AutoMl to automatically search for hyperparameters and to automatically select the best model for a given task.
Adds a new form of training for OMI models and with that some new optional parameters, which can be set when calling autonlu.Model.train().

0.16.0 (2021-06-18)¶

Adds model distillation via autonlu.Model.distill(). Also see the tutorial “Reduce inference time by distilling your model”
Fixes a bug where the reported learning rate in the TensorBoard logs was not always correct when early stopping was turned off

0.15.2 (2021-05-28)¶

Internal bugfixes

0.15.1 (2021-05-07)¶

Tensorboard logs for language model finetuning are now written to the same directory (by default tensorboard_logs) as all the other logs
Some smaller bugfixes

0.15.0 (2021-05-03)¶

Added support for pruning models (reducing the size of models with a minimal impact on model performance) with the functions autonlu.SimpleModel.prune(), autonlu.SimpleModel.auto_prune(), autonlu.Model.prune(), and autonlu.Model.auto_prune(). There is also a new tutorial, demonstrating this feature: “Increase the speed and reduce the memory consumption by pruning layers of models.”
Added the option to use dynamic quantization for autonlu.SimpleModel.predict() and autonlu.Model.predict() with the parameter dynamic_quantization to speed up prediction times when using the CPU.

0.14.0 (2021-04-23)¶

Added support for macOS
Fixes bug where temporarily saved models by the BestModelKeeper were not always being deleted when program crashes/is interrupted during training
Fixes CUDA out of memory errors which can occurr when training bigger (e.g. RoBERTa large) models

0.13.1 (2021-04-19)¶

Saving a model in autonlu.SimpleModel and autonlu.Model is more robust now. Non-existing parent directories will be automatically genereated and a autonlu.core.ModelSaveException is thrown if saving fails.
Fixed a bug where the autonlu.DocumentModel.evaluate() would not work for certain Tasks

0.13.0 (2021-04-16)¶

Added a new argument use_samplehash for the constructor of autonlu.SimpleModel and autonlu.Model, which can be used to deactivate the sample hash to reduce memory consumption and processing time when a lot of training data is being used
Fixed a problem where some Studio interactions fail silently when the given product key is not correct (e.g. for autonlu.list_models()). Now, a LoginException is thrown.
Fixed a bug that could crash training when writing metrics in an unexpected format to Tensorboard

0.12.0 (2021-04-15)¶

autonlu.DocumentModel.evaluate() now supports special metrics for all three tasks (class, label, and classlabel)

0.11.1 (2021-04-01)¶

Fixes a bug in autonlu.split_dataset() where the xVal and yVal contained both y values of the Y dataset.

0.11.0 (2021-04-01)¶

Adds autonlu.data_dependence.DataDependence to visualize the effect of the training set size on the model accuracy
Adds translation capabilities to AutoNLU in the form of the autonlu.Translator class
Fixes a bug where the all_labels argument of the constructors of autonlu.Model and autonlu.SimpleModel was not used

0.10.0 (2021-03-22)¶

A bug in the metric calculation of autonlu.DocumentModel was fixed
A bug in autonlu.DocumentModel was fixed that lead to a crash in certain situations

0.9.0 (2021-03-18)¶

Additional arguments to influence the training (especially early stopping) have been added to autonlu.SimpleModel.train() and autonlu.Model.train().
Better default settings for class and label tasks have been introduced which should provide a good balance between model accuracy and training time.

0.8.0 (2021-03-17)¶

Makes finetuning of language models with autonlu.SimpleModel.finetune() and autonlu.Model.finetune() more flexible by adding additional losses

0.7.1 (2021-03-12)¶

Added Windows support

0.7.0 (2021-03-12)¶

Fixes a bug with tokenization in finetuning
The training loss is now also logged

0.6.1 (2021-02-25)¶

The standard logging mode was changed from DEBUG to WARN
The default learning rate for label tasks was increased to 2e-4. This should improve training time as well as final accuracy in most cases.
The default mindatasetsize for label tasks was decreased to 0. This should improve training time.

0.6.0 (2021-02-22)¶

A new function autonlu.get_model_by_id() was added that allows the download of models using the model ID. This method even works for models that are not returned with autonlu.list_models() (e.g. custom trained models inside projects, etc.)
The function autonlu.list_models() now returns a list of tuples containing (model name, display name, model id). The agrument get_ids was removed.
autonlu.SimpleModel and autonlu.Model have an argument baseurl now (i.e. a different address of a Studio instance). This is mainly interesting for on-premise Studio instances.
Added functionality to upload models to Studio with autonlu.SimpleModel.upload(), autonlu.Model.upload(), and autonlu.studio.upload_model()

0.5.0 (2021-02-15)¶

The functions autonlu.Model.evaluate() and autonlu.SimpleModel.evaluate() were added, which offer a way to easily determine some evaluation metrics of models (accuracy, F1 score, precision, and recall)
A bug in the function autonlu.list_models() has been fixed which led to a crash of the function call if specific models were present on Studio

0.4.0 (2021-02-11)¶

autonlu.DocumentModel now also supports class and label tasks
The function autonlu.get_product_key() has been added to be able to retrieve the current valid product key from studio using a users login information
The function autonlu.login() has been added as a convenient way to use AutoNLU by simply providing the Studio login information (user name and passwort)
autonlu.Model and autonlu.DocumentModel now correctly support per segment class lists (i.e. for class and classlabel tasks it is possible to specify different all_classes for individual samples which is useful for e.g. active learning)
Removed the need to set the DO_PUBLIC_KEY manually. This is now only needed for on-premise solutions with separate user management.