Store the annotation obtained by ner.manual and --patterns at once

fsa · June 22, 2021, 11:52am

I am using the ner.manual recipe with patterns to annotate a given text

prodigy ner.manual dataset spacy_model source --label --patterns

However, currently I don't like to go through all annotations and confirm them one by one instead I want to store at once in the given database all matched entities/labels with the one in the patterns file. The initial annotation I will then use to build the model in an active learning scenario.

ines · June 23, 2021, 2:00am

Hi! In that case, you could just load the patterns with spaCy directly to label all matches automatically and then use that data to pretrain you model. My comment here explains how to do this:

Using the EntityRuler has the advantage that it takes patterns in the same format as Prodigy and takes care of filtering out overlaps (which can theoretically occur with multiple patterns).

fsa · June 25, 2021, 9:20am

@ines Thanks a lot !
it works now

Here I share my experience:
My source data is in jsonl format and look like:

{"text":"abcd","meta":{"source":"doc1"}}
.
.

I wrote a code (compatible with SpaCy 2.5) based on your explanation to read a set of documents and annotate them based on patterns file:

# path of jsonl file contains the performed annotation to be loaded in the db
db_jsonl_path='db_jsonl.jsonl'
nlp = English()
ruler = EntityRuler(nlp)
# the patterns file
ruler.from_disk('patterns.jsonl') 
nlp.add_pipe(ruler)

# source data in jsonl format
source_path='soure_data.jsonl'
# Using readlines()
source_file = open(source_path, 'r')
Lines = source_file.readlines()
 
for line in Lines:
    data = json.loads(line.strip())
    input=data['text']
    doc = nlp(input)
    spans = [{"start": ent.start_char, "end": ent.end_char, "label": ent.label_} for ent in doc.ents]
    example = {"text": doc.text, "spans": spans}
    with open(db_jsonl_path, 'w') as f:
                       f.write(json.dumps(example+'\n')

when done, load the performed annotation, stored in db_jsonl_path, into a prodigy db:

prodigy db-in db_name path/db_jsonl.jsonl

I still have a simple question, how to add the meta data ("meta":{"source":"doc1"}) into the spans so it can be stored in the db later a long with other information like entities, position, label etc.

ines · June 28, 2021, 2:02am

You can add all of that to the dict that you create as example in your code The "text" and "spans" are what's required to annotate named entities, but you can also include a key "meta" with custom properties – for example, the index of the current line (you can just increment a counter variable or use Python's enumerate()).

Everything in "meta" will be be displayed in the bottom right corner of the annotation card. You can also include any other custom properties in the example that will be saved with the annotations in the database (e.g. for meta infor that you don't want to display in the UI).

fsa · June 28, 2021, 9:12am

Thanks a lot, I added the meta data and it is displayed in the bottom right corner of the annotation card:

# path of jsonl file contains the performed annotation to be loaded in the db
db_jsonl_path='db_jsonl.jsonl'
nlp = English()
ruler = EntityRuler(nlp)
# the patterns file
ruler.from_disk('patterns.jsonl') 
nlp.add_pipe(ruler)

# source data in jsonl format
source_path='soure_data.jsonl'
# Using readlines()
source_file = open(source_path, 'r')
Lines = source_file.readlines()
 
for line in Lines:
    data = json.loads(line.strip())
    input=data['text']
    doc_id=data['meta']
    doc = nlp(input)
    spans = [{"start": ent.start_char, "end": ent.end_char, "label": ent.label_} for ent in doc.ents]
    example = {"text": doc.text, "spans": spans,"meta":doc_id}
    with open(db_jsonl_path, 'a') as f:
                       f.write(json.dumps(example+'\n')

Topic		Replies	Views
How to perform automatically NER annotation based on patterns? usage , ner , spacy	1	621	June 2, 2021
Pre Annotate Data with Pattern ner	3	539	December 9, 2021
Pre-annotate entities with patterns usage , ner , solved	6	762	January 11, 2023
Adding patterns to entity ruler in the loop usage , ner , spacy	6	1006	September 22, 2021
NER automatically update patterns ner	2	332	February 2, 2023

Store the annotation obtained by ner.manual and --patterns at once

Related topics