Is it possible to automatically update patterns.jsonl or prodigy system so that any token highlighted is added to the entity as a pattern to highlight in the rest of the data? This would make NER labelling much faster. Maybe the patterns could be stored in the database rather than a separate file?
Not sure if this is already possible or if not would it be in any update at some point?
Key is using update callback to add new spans as patterns:
def update(answers):
patterns = set()
for eg in answers:
for span in eg.get("spans", []):
# Get the text of each annotated span given its offsets
span_text = eg["text"][span["start"]:span["end"]]
patterns.add({"pattern": span_text, "label": span["label"]})
matcher.add_patterns(patterns)
There are a few more recent posts that build off of this idea: