ner.match to jsonl without getting into the interface

ines · October 13, 2018, 10:36am

The main idea of ner.match is to give you an interface so you or your annotators can accept and reject matches to bootstrap training sets with positive and negative examples, and to allow creating training data from patterns that produce false positives and to explore patterns interactively.

If you only want to create matches based on patterns, you could just use spaCy's Matcher directly and save the matches as JSONL? If you do want to annotate with Prodigy but with custom match logic (or any other rules), you could also write your own custom recipe that implements your logic and only yields out examples that you want. Here's an example of how the stream could be generated:

def get_stream():
    for doc in nlp.pipe(texts):  # pipe your texts through spaCy
        matches = matcher(doc)
        for match_id, start, end in matches:
            span = doc[start:end]
            # your custom logic here to decide if you want the match
            yield {
                'text': doc.text, 
                'spans': [{
                    'start': span.start_char, 
                    'end': span.end_char,
                     # use the pattern name as the match label
                    'label': doc.vocab.strings[match_id] 
                }]
            }

Do you have an example of the patterns you use? Because unless you have patterns for both spans, or use operators (via the "OP" key), you should only see the actual matches, not partial ones.

Topic		Replies	Views
Convert output of spaCy PhraseMatcher to prodigy JSONL ner , spacy , solved	3	1144	May 3, 2021
Use patterns.jsonl to automatically annotate entire dataset spancat	6	512	October 20, 2022
Providing NER token spans only (no character offsets) usage , spacy , best-practices	2	1873	August 12, 2019
Doccano annotated data review in prodigy. ner , solved	1	649	April 5, 2023
Pre-annotate entities with patterns usage , ner , solved	6	762	January 11, 2023

ner.match to jsonl without getting into the interface

Related topics