Translation¶
AutoNLU performs translations on premise on your CPU/GPU and not on a cloud service!
Note that the translation capability of AutoNLU is mainly intended to translate data for training and prediction. The translations overall do not reach a quality which can be achieved with e.g. Google Translate or DeepL.
Translator¶
- class autonlu.Translator(source_language, target_language='en', batchsize=128, num_beams=1, device=None, verbose=False)¶
A class to translate datasets from one language to another
- Parameters
source_language (
str
) – International code for the source language (eg:"en"
,"fr"
,"de"
…). For the international codes see e.g. here: https://developers.google.com/admin-sdk/directory/v1/languagestarget_language (
str
) – International code for the target language into which the source language is translated. Default is"en"
batchsize (
int
) – Determines the batchsize during the translation. Will be automatically reduced if too big. Default is128
.device (
Optional
[str
]) – Which device the model should be used on ("cpu"
or"cuda"
). IfNone
, a device will be automatically selectednum_beams (
int
) – number of beams in beam-search. Setting this value to its minimum1
increases speed considerably for the price of (possibly) lower quality in translation. Default is1
, for higher quality translations a value of4
can be used.verbose (
bool
) – IfTrue
, progress information is displayed
- translate(sentence_list)¶
Translates a list of strings
- Parameters
sentence_list (
List
[str
]) – List of strings that should be translated- Returns
A list of strings with translated sentences
Example
>>> from autonlu import Translator >>> translator = Translator("de") >>> X = ["Das Auto is schön", "Das Armaturenbrett ist kompliziert"] >>> X_trans = translator.translate(X) >>> assert X_trans == ["The car is nice", "The dashboard is complicated"]
- create_dictionaries(X)¶
Returns a translation dictionary for the given data
X
- Parameters
X (
List
[Union
[str
,Tuple
[str
,str
]]]) – Dataset. Either a list of strings for text classification or a list of pairs of strings (interpreted as segment and class) for text pair classification.- Returns
A Tuple (
sentence_dict
,class_dict
).If
X
is a list of strings,sentence_dict
is a dictionary, withX[i]
as keys and the translations as values, whileclass_dict
isNone
.If
X
is a list of tuples (pairs), the keys ofsentence_dict
areX[i][0]
, while the keys ofclass_dict
areX[i][1]
. The values are the translated texts in both cases.
Example:
>>> # Assume we have Xtrain, Xvalid and Xtrain, each a list of sentence-class-pairs (List[Tuple[str, str]]) >>> # which we like to translate from German to English >>> from autonlu import Translator, translate_dataset >>> if slow_but_higher_quality == True: # fictive boolean; we'll use 4 beams >>> translator = Translator("de", target_language="en", num_beams=4, verbose=True) >>> else: # fast but possibly lower quality; we'll use only one beam >>> translator = Translator("de", target_language="en", num_beams=1, verbose=True) >>> sentence_dict, class_dict = translator.create_dictionaries(Xtrain + Xvalid + Xtest) >>> del translator >>> # Optionally, use "save_csv_dictionary" and/or "load_csv_dictionary" for sentence_dict and/or class_dict >>> # Get datasets translated with the help of the dictionaries >>> Xtrain = translate_dataset(Xtrain, sentence_dict, class_dict) >>> Xvalid = translate_dataset(Xvalid, sentence_dict, class_dict) >>> Xtest = translate_dataset(Xtest, sentence_dict, class_dict)
Helper Functions¶
- autonlu.load_csv_dictionary(filename)¶
Loads a dictionary from a csv-file. The csv-file-data must consist of the two columns
`original`
and`translation`
.- Parameters
filename (
str
) – path plus name of the csv-file from which the dictionary is loaded.- Return type
Dict
- Returns
A dictionary, containing the original sentences as keys and the translation as values
- autonlu.save_csv_dictionary(filename, dictionary)¶
Saves a language dictionary as a csv-file with the two columns
`original`
and`translation`
.- Parameters
filename (
str
) – path plus name of the csv-file into which the dictionary is saveddictionary (
Dict
) – A dictionary containing the original sentences as keys and the translation as values
- autonlu.translate_dataset(X, sentence_dict, class_dict=None)¶
Translates a dataset with the help of a mandatory sentence dictionary and an class dictionary (the latter is only needed when the dataset is a list of sentence-class-pairs).
- Parameters
X (
List
[Union
[str
,Tuple
[str
,str
]]]) – Dataset. Either a list of strings for text classification or a list of pairs of strings (interpreted as sentence an class) for text pair classification.sentence_dict (
Dict
) – A type`dict`
dictionary containing the original sentences as keys and the translation as values (each key appears only once, while in the variable “X”, the same sentence might appear several times)class_dict (
Optional
[Dict
]) – Similar tosentence_dict
. This dictionary is used for the second tuple element (index [1]) in caseX
is a list of tuples
- Returns
A translated dataset of the same structure as the input dataset