Hi! I’d recommend writing your own converter, yes. spaCy actually ships with a biluo_tags_from_offsets
helper that takes a text and character offsets and returns the BILUO entity labels. So this might be helpful?
You can also interact with Prodigy’s database directly from Python, so you’ll be able to skip the whole exporting/importing/exporting part.
Here’s an example (untested, but something along those lines should work):
from prodigy.components.db import connect
from spacy.gold import biluo_tags_from_offsets
from spacy.lang.en import English # or whichever language tokenizer you need
nlp = English()
db = connect() # uses settings from your prodigy.json
examples = db.get_dataset('your_dataset') # load the annotations
for eg in examples:
doc = nlp(eg['text'])
entities = [(span['start'], span['end'], span['label'])
for span in eg['spans']]
tags = biluo_tags_from_offsets(doc, entities)
# do something with the tags here