Translation¶

AutoNLU performs translations on premise on your CPU/GPU and not on a cloud service!

Note that the translation capability of AutoNLU is mainly intended to translate data for training and prediction. The translations overall do not reach a quality which can be achieved with e.g. Google Translate or DeepL.

Supported Languages for Translation

Translator¶

class autonlu.Translator(source_language, target_language='en', batchsize=128, num_beams=1, device=None, verbose=False)¶

A class to translate datasets from one language to another

Parameters

source_language (str) – International code for the source language (eg: "en", "fr", "de" …). For the international codes see e.g. here: https://developers.google.com/admin-sdk/directory/v1/languages
target_language (str) – International code for the target language into which the source language is translated. Default is "en"
batchsize (int) – Determines the batchsize during the translation. Will be automatically reduced if too big. Default is 128.
device (Optional[str]) – Which device the model should be used on ("cpu" or "cuda"). If None, a device will be automatically selected
num_beams (int) – number of beams in beam-search. Setting this value to its minimum 1 increases speed considerably for the price of (possibly) lower quality in translation. Default is 1, for higher quality translations a value of 4 can be used.
verbose (bool) – If True, progress information is displayed

translate(sentence_list)¶

Translates a list of strings

Parameters: sentence_list (List[str]) – List of strings that should be translated
Returns: A list of strings with translated sentences

Example

>>> from autonlu import Translator
>>> translator = Translator("de")
>>> X = ["Das Auto is schön", "Das Armaturenbrett ist kompliziert"]
>>> X_trans = translator.translate(X)
>>> assert X_trans == ["The car is nice", "The dashboard is complicated"]

create_dictionaries(X)¶

Returns a translation dictionary for the given data X

Parameters

X (List[Union[str, Tuple[str, str]]]) – Dataset. Either a list of strings for text classification or a list of pairs of strings (interpreted as segment and class) for text pair classification.

Returns

A Tuple (sentence_dict, class_dict).

If X is a list of strings, sentence_dict is a dictionary, with X[i] as keys and the translations as values, while class_dict is None.
If X is a list of tuples (pairs), the keys of sentence_dict are X[i][0], while the keys of class_dict are X[i][1]. The values are the translated texts in both cases.

Example:

>>> # Assume we have Xtrain, Xvalid and Xtrain, each a list of sentence-class-pairs (List[Tuple[str, str]])
>>> # which we like to translate from German to English
>>> from autonlu import Translator, translate_dataset
>>> if slow_but_higher_quality == True:  # fictive boolean; we'll use 4 beams
>>>     translator = Translator("de", target_language="en", num_beams=4, verbose=True)
>>> else:  # fast but possibly lower quality; we'll use only one beam
>>>     translator = Translator("de", target_language="en", num_beams=1, verbose=True)
>>> sentence_dict, class_dict = translator.create_dictionaries(Xtrain + Xvalid + Xtest)
>>> del translator
>>> # Optionally, use "save_csv_dictionary" and/or "load_csv_dictionary" for sentence_dict and/or class_dict
>>> # Get datasets translated with the help of the dictionaries
>>> Xtrain = translate_dataset(Xtrain, sentence_dict, class_dict)
>>> Xvalid = translate_dataset(Xvalid, sentence_dict, class_dict)
>>> Xtest = translate_dataset(Xtest, sentence_dict, class_dict)

Helper Functions¶

autonlu.load_csv_dictionary(filename)¶

Loads a dictionary from a csv-file. The csv-file-data must consist of the two columns `original` and `translation`.

Parameters: filename (str) – path plus name of the csv-file from which the dictionary is loaded.
Return type: Dict
Returns: A dictionary, containing the original sentences as keys and the translation as values

autonlu.save_csv_dictionary(filename, dictionary)¶

Saves a language dictionary as a csv-file with the two columns `original` and `translation`.

Parameters

filename (str) – path plus name of the csv-file into which the dictionary is saved
dictionary (Dict) – A dictionary containing the original sentences as keys and the translation as values

autonlu.translate_dataset(X, sentence_dict, class_dict=None)¶

Translates a dataset with the help of a mandatory sentence dictionary and an class dictionary (the latter is only needed when the dataset is a list of sentence-class-pairs).

Parameters

X (List[Union[str, Tuple[str, str]]]) – Dataset. Either a list of strings for text classification or a list of pairs of strings (interpreted as sentence an class) for text pair classification.
sentence_dict (Dict) – A type `dict` dictionary containing the original sentences as keys and the translation as values (each key appears only once, while in the variable “X”, the same sentence might appear several times)
class_dict (Optional[Dict]) – Similar to sentence_dict. This dictionary is used for the second tuple element (index [1]) in case X is a list of tuples

Returns

A translated dataset of the same structure as the input dataset