Changelog

To update your installation of AutoNLU please make sure that you have set the environment variable PIP_PULL to your Gemfury token:

export PIP_PULL=your gemfury token
# Example (not a valid token):
# export PIP_PULL=jGxWW-4qKlz8ARZOhJgG9BIVuxsU9231

Then execute the following command:

pip install autonlu --upgrade --extra-index-url=https://${PIP_PULL}:@pypi.fury.io/deepopinion

1.6.0 (2022-07-07)

  • Python 3.9 is now officially supported

  • Topic modelling has been added with the class TopicModel. Have a look at the corresponding tutorial to see how to use this feature.

  • Label tasks have been changed to not use early stopping by default and hyperparameter settings have been optimized for this

  • Some smaller bugfixes

1.5.0 (2022-03-01)

  • Data cleaning: We added functionality to select which samples of a dataset should be checked by a human for detecting label errors as well as samples with general bad data quality (i.e. nonsensical sentences etc.) with the DataCleaner class. Have a look at the corresponding tutorial to see how to use it in practice.

  • Automatic hyperparameter tuning with the class AutoMl has become more flexible. Have a look at the corresponding tutorial to see the new interface in action.

1.4.0 (2021-12-09)

1.3.0 (2021-11-30)

  • AutoNLU now supports two completely new tasks with “word labeling” (can be used for e.g. named entity recognition, information extraction, …) and question answering. Have a look at the two new tutorials to see how it is used.

  • The F1 binary metric was added to autonlu.Model.evaluate() if applicable.

1.2.0 (2021-11-16)

  • If only one epoch will be done (e.g. during prediction), no separate progress bar for the epochs will be shown.

  • The default mindatasetsize for label tasks has been changed from 0 to 4000 for better performance on small datasets. This will increase training time for small datasets. Training time of bigger datasets is unaffected.

  • Fixes a bug where in-memory studies using autonlu.AutoMl started the hyperparameter optimization from scratch when additional experiments were performed

  • Removes many warnings from internal packages

  • The all_labels argument from Model.distill() was removed since it was not actually used

1.1.0 (2021-09-28)

  • A TypeError exception is now thrown if labels or classes provided for training are not strings

  • Entropies are now returned for autonlu.Model.predict() if return_extras = True

  • The argument return_probabilities of autonlu.Model.predict() has been renamed to return_extras in preparation for also returning additional information

  • Fixed a memory leak when using autonlu.AutoMl

  • The argument return_logits has been removed from autonlu.SimpleModel.predict(). The logits are now always in the returned dictionary

1.0.1 (2021-09-13)

  • PyTorch 1.9.0 is now supported.

1.0.0 (2021-08-27)

  • Adds autonlu.AutoMl to automatically search for hyperparameters and to automatically select the best model for a given task.

  • Adds a new form of training for OMI models and with that some new optional parameters, which can be set when calling autonlu.Model.train().

0.16.0 (2021-06-18)

  • Adds model distillation via autonlu.Model.distill(). Also see the tutorial “Reduce inference time by distilling your model”

  • Fixes a bug where the reported learning rate in the TensorBoard logs was not always correct when early stopping was turned off

0.15.2 (2021-05-28)

  • Internal bugfixes

0.15.1 (2021-05-07)

  • Tensorboard logs for language model finetuning are now written to the same directory (by default tensorboard_logs) as all the other logs

  • Some smaller bugfixes

0.15.0 (2021-05-03)

0.14.0 (2021-04-23)

  • Added support for macOS

  • Fixes bug where temporarily saved models by the BestModelKeeper were not always being deleted when program crashes/is interrupted during training

  • Fixes CUDA out of memory errors which can occurr when training bigger (e.g. RoBERTa large) models

0.13.1 (2021-04-19)

0.13.0 (2021-04-16)

  • Added a new argument use_samplehash for the constructor of autonlu.SimpleModel and autonlu.Model, which can be used to deactivate the sample hash to reduce memory consumption and processing time when a lot of training data is being used

  • Fixed a problem where some Studio interactions fail silently when the given product key is not correct (e.g. for autonlu.list_models()). Now, a LoginException is thrown.

  • Fixed a bug that could crash training when writing metrics in an unexpected format to Tensorboard

0.12.0 (2021-04-15)

0.11.1 (2021-04-01)

0.11.0 (2021-04-01)

0.10.0 (2021-03-22)

0.9.0 (2021-03-18)

  • Additional arguments to influence the training (especially early stopping) have been added to autonlu.SimpleModel.train() and autonlu.Model.train().

  • Better default settings for class and label tasks have been introduced which should provide a good balance between model accuracy and training time.

0.8.0 (2021-03-17)

0.7.1 (2021-03-12)

  • Added Windows support

0.7.0 (2021-03-12)

  • Fixes a bug with tokenization in finetuning

  • The training loss is now also logged

0.6.1 (2021-02-25)

  • The standard logging mode was changed from DEBUG to WARN

  • The default learning rate for label tasks was increased to 2e-4. This should improve training time as well as final accuracy in most cases.

  • The default mindatasetsize for label tasks was decreased to 0. This should improve training time.

0.6.0 (2021-02-22)

0.5.0 (2021-02-15)

0.4.0 (2021-02-11)

  • autonlu.DocumentModel now also supports class and label tasks

  • The function autonlu.get_product_key() has been added to be able to retrieve the current valid product key from studio using a users login information

  • The function autonlu.login() has been added as a convenient way to use AutoNLU by simply providing the Studio login information (user name and passwort)

  • autonlu.Model and autonlu.DocumentModel now correctly support per segment class lists (i.e. for class and classlabel tasks it is possible to specify different all_classes for individual samples which is useful for e.g. active learning)

  • Removed the need to set the DO_PUBLIC_KEY manually. This is now only needed for on-premise solutions with separate user management.