Annotation JSON

ryanwesslen · August 10, 2022, 10:23pm

Thanks for your follow up!

Good point! What recipe did you use to create those annotations, specifically the "COUNTRY" and "JOB_TITLE" spans?

I suspect you used a correct recipe (either ner.correct or span.correct). The highlighted examples (with the extra keys for text, source, and input_hash) is the normal behavior for model suggested annotations from a correct recipe.

Perhaps the other spans (don't have the extra keys) were created with the same recipe, but are "manual" annotations (i.e., you only highlighted) and weren't model suggested since the en_core_web_lg doesn't have the custom entities ("COUNTRY" and "JOB_TITLE").

Said differently, the extra keys (text, source, and input_hash) are created when annotated using model assisted correction.

If you had a trained NER model that had all the entities and were model suggestions, then you would have all the keys/data.

One caveat: it is possible to not have these extra fields for entity types that were in your model (e.g., an ORG) because there could be entities that you manually created and weren't model suggestions.

It's not required to train so for purposes of training, it's arbitrary.

However, the data does identify the source of the suggestion (e.g., model). Also, having this info distinguishes it as model suggested ("gold" annotations) because it reflects added confidence that the model would select this label (i.e., the data/model are consistent). So in that way, having this data would tell you this data is more important than manual annotations.

Just curious, have you trained a model by updating the original (e.g., --base-model en_core_web_lg to update) with both your custom entities and fine-tuned entities from en_core_web_lg?

If so, since you're mixing old and new entity types, make sure to account for potential catastrophic forgetting:

I hope this answers your question and let us know if you have other questions!

Topic		Replies	Views
Difference between Input hash and task hash database	1	1904	July 22, 2020
Prodigy hashing behavior usage	13	28	July 27, 2024
Passing the same sample more than once (with different meta-data) to the annotation server	4	38	September 17, 2024
Logic behind hash keys (in relation to REVIEW API)	4	12	October 16, 2024
Examples re-served despite identical hashes	3	314	August 3, 2022

Annotation JSON

Related topics