Include duplicate text in NER

Hi,
I think (could be wrong) that NER Manual automatically doesn't show duplicate lines of text for annotation (as in song lyrics where a line or chorus is repeated). If, for some reason, we wanted these to be annotated, is there a way to override the auto-setting? Apologies if this is in the documentation, I haven't been able to find it,
Thank you!

Hi! Song lyrics are definitely an interesting edge case :sweat_smile: To answer your question: yes, you could easily achieve that by setting your own hashes. Prodigy uses a hashing system to decide whether two examples are identical or different "questions" about the same input (like, different label suggestions on the same text). You can read more about it here: https://prodi.gy/docs/api-loaders#hashing

Typically, those hashes are set automatically, but you can also pre-define them in the data and Prodigy will respect them. So in your use case, you would want the duplicate examples to have different input hashes (and as a result, different task hashes) so they're considered different examples.

1 Like