Hi! This is currently expected behaviour because the pattern matcher just yields out every result – but you’re right that it’s not very practical and we probably want to change this and make “one match per example” the default behaviour.
If you’re annotating for text classification, you’re giving feedback on the text plus label. The patterns are mostly a means to an end, so if you do see an example with the correct label, you should accept it.
That said, if you do get a lot of duplicate matches, you could also write a function that keeps track of the original example texts you’ve already seen and only yields out an example once:
def filter_stream(stream):
seen = set()
for eg in stream:
# Get the hash idenfitying the original input, e.g. the text
input_hash = eg["_input_hash"]
if input_hash not in seen:
yield eg
seen.add(input_hash)
stream = filter_stream(stream)