SampleHash

class autonlu.SampleHash(hashfile=None)

Used to produce reproducible hashes of strings, and string iterators, save those hashes and check whether a given string or string iterator was already seen previously

In AutoNLU, this is used during active learning to exclude samples that have already been seen previously without the user of AutoNLU having to do anything.

Parameters

hashfile (Optional[str]) – Filename where the hash values should be stored

Examples: >>> sh = SampleHash() >>> sh.add(“This is a sample”) >>> sh.save(“samples.hash”) >>> sh2 = SampleHash(“samples.hash”) >>> assert sh2.was_added(“This is a sample”) == True >>> assert sh2.was_added(“This is another sample”) == False

add(sample)

Adds a sample to the hashed values

Parameters

sample – Sample to add. Can be an iterator, or something that can be converted to a string. When an iterator is given, all elements are converted to a string and concatenated. Multiple occurrences of the same sample are only saved once.

was_added(sample)

Checks if a sample has been added previously

Parameters

sample – Sample to check. Can be an iterator, or something that can be converted to a string. When an iterator is given, all elements are converted to a string and concatenated.

Returns

True if the sample has been added before

save(hashfile)

Saves the collected hash values in a pickle file

Parameters

hashfile – Filename where the hash values should be stored