AutoNLU performs translations on premise on your CPU/GPU and not on a cloud service!
Note that the translation capability of AutoNLU is mainly intended to translate data for training and prediction. The translations overall do not reach a quality which can be achieved with e.g. Google Translate or DeepL.
- class autonlu.Translator(source_language, target_language='en', batchsize=128, num_beams=1, device=None, verbose=False)¶
A class to translate datasets from one language to another
str) – International code for the source language (eg:
"de"…). For the international codes see e.g. here: https://developers.google.com/admin-sdk/directory/v1/languages
str) – International code for the target language into which the source language is translated. Default is
int) – Determines the batchsize during the translation. Will be automatically reduced if too big. Default is
str]) – Which device the model should be used on (
None, a device will be automatically selected
int) – number of beams in beam-search. Setting this value to its minimum
1increases speed considerably for the price of (possibly) lower quality in translation. Default is
1, for higher quality translations a value of
4can be used.
bool) – If
True, progress information is displayed
Translates a list of strings
str]) – List of strings that should be translated
A list of strings with translated sentences
>>> from autonlu import Translator >>> translator = Translator("de") >>> X = ["Das Auto is schön", "Das Armaturenbrett ist kompliziert"] >>> X_trans = translator.translate(X) >>> assert X_trans == ["The car is nice", "The dashboard is complicated"]
Returns a translation dictionary for the given data
str]]]) – Dataset. Either a list of strings for text classification or a list of pairs of strings (interpreted as segment and class) for text pair classification.
A Tuple (
Xis a list of strings,
sentence_dictis a dictionary, with
X[i]as keys and the translations as values, while
Xis a list of tuples (pairs), the keys of
X[i], while the keys of
X[i]. The values are the translated texts in both cases.
>>> # Assume we have Xtrain, Xvalid and Xtrain, each a list of sentence-class-pairs (List[Tuple[str, str]]) >>> # which we like to translate from German to English >>> from autonlu import Translator, translate_dataset >>> if slow_but_higher_quality == True: # fictive boolean; we'll use 4 beams >>> translator = Translator("de", target_language="en", num_beams=4, verbose=True) >>> else: # fast but possibly lower quality; we'll use only one beam >>> translator = Translator("de", target_language="en", num_beams=1, verbose=True) >>> sentence_dict, class_dict = translator.create_dictionaries(Xtrain + Xvalid + Xtest) >>> del translator >>> # Optionally, use "save_csv_dictionary" and/or "load_csv_dictionary" for sentence_dict and/or class_dict >>> # Get datasets translated with the help of the dictionaries >>> Xtrain = translate_dataset(Xtrain, sentence_dict, class_dict) >>> Xvalid = translate_dataset(Xvalid, sentence_dict, class_dict) >>> Xtest = translate_dataset(Xtest, sentence_dict, class_dict)
Loads a dictionary from a csv-file. The csv-file-data must consist of the two columns
str) – path plus name of the csv-file from which the dictionary is loaded.
- Return type
A dictionary, containing the original sentences as keys and the translation as values
- autonlu.save_csv_dictionary(filename, dictionary)¶
Saves a language dictionary as a csv-file with the two columns
str) – path plus name of the csv-file into which the dictionary is saved
Dict) – A dictionary containing the original sentences as keys and the translation as values
- autonlu.translate_dataset(X, sentence_dict, class_dict=None)¶
Translates a dataset with the help of a mandatory sentence dictionary and an class dictionary (the latter is only needed when the dataset is a list of sentence-class-pairs).
str]]]) – Dataset. Either a list of strings for text classification or a list of pairs of strings (interpreted as sentence an class) for text pair classification.
Dict) – A type
`dict`dictionary containing the original sentences as keys and the translation as values (each key appears only once, while in the variable “X”, the same sentence might appear several times)
Dict]) – Similar to
sentence_dict. This dictionary is used for the second tuple element (index ) in case
Xis a list of tuples
A translated dataset of the same structure as the input dataset