Store the annotation obtained by ner.manual and --patterns at once

fsa · June 25, 2021, 9:20am

@ines Thanks a lot !
it works now

Here I share my experience:
My source data is in jsonl format and look like:

{"text":"abcd","meta":{"source":"doc1"}}
.
.

I wrote a code (compatible with SpaCy 2.5) based on your explanation to read a set of documents and annotate them based on patterns file:

# path of jsonl file contains the performed annotation to be loaded in the db
db_jsonl_path='db_jsonl.jsonl'
nlp = English()
ruler = EntityRuler(nlp)
# the patterns file
ruler.from_disk('patterns.jsonl') 
nlp.add_pipe(ruler)

# source data in jsonl format
source_path='soure_data.jsonl'
# Using readlines()
source_file = open(source_path, 'r')
Lines = source_file.readlines()
 
for line in Lines:
    data = json.loads(line.strip())
    input=data['text']
    doc = nlp(input)
    spans = [{"start": ent.start_char, "end": ent.end_char, "label": ent.label_} for ent in doc.ents]
    example = {"text": doc.text, "spans": spans}
    with open(db_jsonl_path, 'w') as f:
                       f.write(json.dumps(example+'\n')

when done, load the performed annotation, stored in db_jsonl_path, into a prodigy db:

prodigy db-in db_name path/db_jsonl.jsonl

I still have a simple question, how to add the meta data ("meta":{"source":"doc1"}) into the spans so it can be stored in the db later a long with other information like entities, position, label etc.

Topic		Replies	Views
How to perform automatically NER annotation based on patterns? usage , ner , spacy	1	619	June 2, 2021
Pre Annotate Data with Pattern ner	3	533	December 9, 2021
(Re)using labels in patterns usage , spacy	1	315	July 21, 2021
prelabel data using regex and how to use the active learning functionality and get the model usage , ner , spacy	3	545	October 14, 2021
a question about custom recipe usage , solved	9	670	July 18, 2021

Store the annotation obtained by ner.manual and --patterns at once

Related topics