Matching and teaching an NER keeps telling me "No tasks available"

Hi!

I’ve got ~10,000 sentences in sentences.txt and the same number of rules in rules.jsonl. Each rule will match in at least one sentence, so there will be plenty of matches. I just want to mark each match as good or bad. But when I try to start a new NER with ner.match, it will give me a few matches to annotate and then say “No tasks available”. If I save, refresh, and mark all the same examples again, I may get a couple more sentences to mark before “No tasks available” appears. Something similar happens with ner.teach.

How so I stop this happening please?

ner.teach will score both model suggestions and matches and focus on only showing you the most relevant examples for annotation. So it's expected that you won't see every single example or every single match and that some will be skipped in favour of others.

ner.match should just show you the examples and matches as they come in, though. However, exact examples that are already in your current dataset will be skipped. Did you double-check that all of your matches definitely match (e.g. by calling into spaCy's Matcher or PhraseMatcher directly)?

Thanks for the reply.

This is an entirely new dataset. I didn't check the rules match, but they are mostly two-word phrases. Some taken at random:

{"label": "JOB", "pattern": [{"lower": "network"}, {"lower": "administrator"}]}
{"label": "JOB", "pattern": [{"lower": "radio"}, {"lower": "announcer"}]

And there are sentences like this:

...the San Francisco network administrator who refused to hand over...
...and later became a popular radio announcer for the team...

So I'm assuming the problem isn't failure to match?

So I think I’ve fixed it (30+ clicks so far without the error, when it’s normally 1 or 2). Turns out there were a ton of dupes in rules.jsonl (cos more than one sentence might contain “network administrator”, for example). Once I deduped the file, it looks to work (though I don’t really understand why).

Thanks for getting to the bottom of this – if that’s the case, this is pretty interesting indeed :thinking: I wonder if this is something in the underlying spaCy matchers or in Prodigy’s pattern matcher wrapper. But i n any case, it gives us something to investigate!