convert prodigy annotation file to iob format

codingnoobneedshelp · April 16, 2020, 10:08am

Hello,

i want to run BERT-NER with PyTorch, which requires IOB tagging instead. Is there any conversion code from prodigy annotation data format to IOB?

ines · April 16, 2020, 12:40pm

Yes, check out the section here for examples of how to create IOB and BILUO tags from character offsets: https://prodi.gy/docs/named-entity-recognition#tip-offsets-biluo

The easiest way is to use spaCy, which lets you add entities as character offsets and then gives you easy access to each token's ent_iob_ and ent_type_ attributes. Just make sure that the tokenization matches the one you used during annotation and use nlp.pipe for faster conversion. So the whole end-to-end script that reads your Prodigy dataset and outputs IOB could look like this:

from prodigy.components.db import connect
import spacy

db = connect()
prodigy_annotations = db.get_dataset("your_ner_dataset")
examples = ((eg["text"], eg) for eg in prodigy_annotations)
nlp = spacy.blank("en")
for doc, eg in nlp.pipe(examples, as_tuples=True):
    doc.ents = [doc.char_span(s["start"], s["end"], s["label"]) for s in eg["spans"]]
    iob_tags = [f"{t.ent_iob_}-{t.ent_type_}" if t.ent_iob_ else "O" for t in doc]
    print(doc.text, iob_tags)  # do something here...

codingnoobneedshelp · April 16, 2020, 1:23pm

Thanks a lot.

Topic		Replies	Views
NER Prodigy to IOB2 format usage , ner , spacy	1	1117	August 4, 2020
prodigy ner train error iob translated to json annotation data usage , ner , training	3	617	March 28, 2022
Ner format to CONLL usage , ner , solved	7	5362	June 4, 2019
Convert spaCy training json file to prodigy jsonl format for db-in command enhancement , ner , spacy	1	594	June 15, 2020
Output Annotations in a dataframe form usage , ner , solved	2	546	May 2, 2020

convert prodigy annotation file to iob format

Related topics