Ner format to CONLL

Hi! I’d recommend writing your own converter, yes. spaCy actually ships with a biluo_tags_from_offsets helper that takes a text and character offsets and returns the BILUO entity labels. So this might be helpful?

You can also interact with Prodigy’s database directly from Python, so you’ll be able to skip the whole exporting/importing/exporting part.

Here’s an example (untested, but something along those lines should work):

from prodigy.components.db import connect
from spacy.gold import biluo_tags_from_offsets
from spacy.lang.en import English   # or whichever language tokenizer you need

nlp = English()

db = connect()  # uses settings from your prodigy.json
examples = db.get_dataset('your_dataset')  # load the annotations

for eg in examples:
    doc = nlp(eg['text'])
    entities = [(span['start'], span['end'], span['label'])
                for span in eg['spans']]
    tags = biluo_tags_from_offsets(doc, entities)
    # do something with the tags here
2 Likes