Translation

AutoNLU performs translations on premise on your CPU/GPU and not on a cloud service!

Note that the translation capability of AutoNLU is mainly intended to translate data for training and prediction. The translations overall do not reach a quality which can be achieved with e.g. Google Translate or DeepL.

Translator

class autonlu.Translator(source_language, target_language='en', batchsize=128, num_beams=1, device=None, verbose=False)

A class to translate datasets from one language to another

Parameters
  • source_language (str) – International code for the source language (eg: "en", "fr", "de" …). For the international codes see e.g. here: https://developers.google.com/admin-sdk/directory/v1/languages

  • target_language (str) – International code for the target language into which the source language is translated. Default is "en"

  • batchsize (int) – Determines the batchsize during the translation. Will be automatically reduced if too big. Default is 128.

  • device (Optional[str]) – Which device the model should be used on ("cpu" or "cuda"). If None, a device will be automatically selected

  • num_beams (int) – number of beams in beam-search. Setting this value to its minimum 1 increases speed considerably for the price of (possibly) lower quality in translation. Default is 1, for higher quality translations a value of 4 can be used.

  • verbose (bool) – If True, progress information is displayed

translate(sentence_list)

Translates a list of strings

Parameters

sentence_list (List[str]) – List of strings that should be translated

Returns

A list of strings with translated sentences

Example

>>> from autonlu import Translator
>>> translator = Translator("de")
>>> X = ["Das Auto is schön", "Das Armaturenbrett ist kompliziert"]
>>> X_trans = translator.translate(X)
>>> assert X_trans == ["The car is nice", "The dashboard is complicated"]
create_dictionaries(X)

Returns a translation dictionary for the given data X

Parameters

X (List[Union[str, Tuple[str, str]]]) – Dataset. Either a list of strings for text classification or a list of pairs of strings (interpreted as segment and class) for text pair classification.

Returns

A Tuple (sentence_dict, class_dict).

  • If X is a list of strings, sentence_dict is a dictionary, with X[i] as keys and the translations as values, while class_dict is None.

  • If X is a list of tuples (pairs), the keys of sentence_dict are X[i][0], while the keys of class_dict are X[i][1]. The values are the translated texts in both cases.

Example:

>>> # Assume we have Xtrain, Xvalid and Xtrain, each a list of sentence-class-pairs (List[Tuple[str, str]])
>>> # which we like to translate from German to English
>>> from autonlu import Translator, translate_dataset
>>> if slow_but_higher_quality == True:  # fictive boolean; we'll use 4 beams
>>>     translator = Translator("de", target_language="en", num_beams=4, verbose=True)
>>> else:  # fast but possibly lower quality; we'll use only one beam
>>>     translator = Translator("de", target_language="en", num_beams=1, verbose=True)
>>> sentence_dict, class_dict = translator.create_dictionaries(Xtrain + Xvalid + Xtest)
>>> del translator
>>> # Optionally, use "save_csv_dictionary" and/or "load_csv_dictionary" for sentence_dict and/or class_dict
>>> # Get datasets translated with the help of the dictionaries
>>> Xtrain = translate_dataset(Xtrain, sentence_dict, class_dict)
>>> Xvalid = translate_dataset(Xvalid, sentence_dict, class_dict)
>>> Xtest = translate_dataset(Xtest, sentence_dict, class_dict)

Helper Functions

autonlu.load_csv_dictionary(filename)

Loads a dictionary from a csv-file. The csv-file-data must consist of the two columns `original` and `translation`.

Parameters

filename (str) – path plus name of the csv-file from which the dictionary is loaded.

Return type

Dict

Returns

A dictionary, containing the original sentences as keys and the translation as values

autonlu.save_csv_dictionary(filename, dictionary)

Saves a language dictionary as a csv-file with the two columns `original` and `translation`.

Parameters
  • filename (str) – path plus name of the csv-file into which the dictionary is saved

  • dictionary (Dict) – A dictionary containing the original sentences as keys and the translation as values

autonlu.translate_dataset(X, sentence_dict, class_dict=None)

Translates a dataset with the help of a mandatory sentence dictionary and an class dictionary (the latter is only needed when the dataset is a list of sentence-class-pairs).

Parameters
  • X (List[Union[str, Tuple[str, str]]]) – Dataset. Either a list of strings for text classification or a list of pairs of strings (interpreted as sentence an class) for text pair classification.

  • sentence_dict (Dict) – A type `dict` dictionary containing the original sentences as keys and the translation as values (each key appears only once, while in the variable “X”, the same sentence might appear several times)

  • class_dict (Optional[Dict]) – Similar to sentence_dict. This dictionary is used for the second tuple element (index [1]) in case X is a list of tuples

Returns

A translated dataset of the same structure as the input dataset