Different meta data in ner.correct output - 'spans'.

EM22 · February 22, 2022, 10:44am

Hi,

I noticed that in the ner.correct output, there are two different types of span dictionaries in the meta data.
Most of the spans have five keys in the dictionary - 'start', 'end', 'token_start', 'token_end', 'label',
but some of them have an additional 3 keys - 'text', 'source', 'input_hash'.

Why is there a difference? What spans get the additional meta data?

ines · February 22, 2022, 5:12pm

Hi! Prodigy's JSON format allows attaching arbitrary metadata to the objects and while the extra keys aren't required, they're added by the recipe to the pre-labelled predictions by default. The most relevant one is probably source: this tells you where the prediction is coming from, e.g. the spaCy model used to pre-annotate the data. Annotations you add manually in the UI won't have this key. If you don't care about this info, it's also safe to just ignore it

Topic		Replies	Views
Confused about the structure of spans in NER examples	5	228	January 30, 2024
NER manual source data fomat Getting Started usage , ner , spacy	1	240	September 21, 2022
Duplicate entity annotations ner	4	1956	March 13, 2019
Passing the same sample more than once (with different meta-data) to the annotation server	6	44	June 25, 2025
Combining ner.teach with patterns file and manual correction of spans usage , ner , front-end	2	786	September 11, 2020

Different meta data in ner.correct output - 'spans'.

Related topics