Ner format to CONLL

ines · January 24, 2019, 4:07pm

Hi! I’d recommend writing your own converter, yes. spaCy actually ships with a biluo_tags_from_offsets helper that takes a text and character offsets and returns the BILUO entity labels. So this might be helpful?

You can also interact with Prodigy’s database directly from Python, so you’ll be able to skip the whole exporting/importing/exporting part.

Here’s an example (untested, but something along those lines should work):

from prodigy.components.db import connect
from spacy.gold import biluo_tags_from_offsets
from spacy.lang.en import English   # or whichever language tokenizer you need

nlp = English()

db = connect()  # uses settings from your prodigy.json
examples = db.get_dataset('your_dataset')  # load the annotations

for eg in examples:
    doc = nlp(eg['text'])
    entities = [(span['start'], span['end'], span['label'])
                for span in eg['spans']]
    tags = biluo_tags_from_offsets(doc, entities)
    # do something with the tags here

Topic		Replies	Views
Prodigy JSONL (or spaCY Doc) to CoNLL 2003 usage , ner , spacy , custom	4	924	November 2, 2022
JSONL format to CONLL	3	1320	January 12, 2023
NER Prodigy to IOB2 format usage , ner , spacy	1	1118	August 4, 2020
convert .tsv format to prodigy jsonl ner , spacy	1	748	February 8, 2021
How to convert JSONL annotation file to CONLL BIO tags? usage , ner , spacy	1	1828	October 6, 2021

Ner format to CONLL

Related topics