Seeds for text classification appearing multiple times

I’m trying to classify long newspaper articles as to whether or not they relate to an event. I’ve fed in a list of bootstrapped seed terms to point Prodigy in the right direction (using --patterns), however what ends up happening is the same article reappears multiple times with each different seed term highlighted. What am I doing wrong here?

Thanks

Hi! This is currently expected behaviour because the pattern matcher just yields out every result – but you’re right that it’s not very practical and we probably want to change this and make “one match per example” the default behaviour.

In the meantime, check out this thread for more details and custom recipe scripts that let you modify the behaviour so that you only ever see an example once, even if it contains multiple matches:

1 Like