SampleHash¶
- class autonlu.SampleHash(hashfile=None)¶
Used to produce reproducible hashes of strings, and string iterators, save those hashes and check whether a given string or string iterator was already seen previously
In AutoNLU, this is used during active learning to exclude samples that have already been seen previously without the user of AutoNLU having to do anything.
- Parameters
hashfile (
Optional
[str
]) – Filename where the hash values should be stored
Examples: >>> sh = SampleHash() >>> sh.add(“This is a sample”) >>> sh.save(“samples.hash”) >>> sh2 = SampleHash(“samples.hash”) >>> assert sh2.was_added(“This is a sample”) == True >>> assert sh2.was_added(“This is another sample”) == False
- add(sample)¶
Adds a sample to the hashed values
- Parameters
sample – Sample to add. Can be an iterator, or something that can be converted to a string. When an iterator is given, all elements are converted to a string and concatenated. Multiple occurrences of the same sample are only saved once.
- was_added(sample)¶
Checks if a sample has been added previously
- Parameters
sample – Sample to check. Can be an iterator, or something that can be converted to a string. When an iterator is given, all elements are converted to a string and concatenated.
- Returns
True
if the sample has been added before
- save(hashfile)¶
Saves the collected hash values in a pickle file
- Parameters
hashfile – Filename where the hash values should be stored