Same text appearing twice (with matches and without)


I'm using textcat.teach to annotate texts for a binary classification problem. I have defined my patterns to highlight in "--patterns ./patterns.jsonl". However, while I was annotating, I noticed that some texts appeared twice in Prodigy display interface, with defined patterns NOT highlighted for the first time and defined patterns highlighted for the second time. This led me annotating the same text twice.

Out of my 81 texts this happened 3 times. Texts with multiple matched patterns are all alright, but 3 texts with only one matched pattern have this issue.

Can you help me with this? Thanks you in advance!

Kind regards,

Hi! Which version of Prodigy are you using? And if you're looking at the _input_hash and _task_hash generated for the two tasks that are identical (except for the match), are they the same or different?

It sounds like what might be happening is that for some reason, those tasks receive different hashes and Prodigy thinks they're different – when they should receive the same hashes, because their content should be treated as identical.