How to train custom NER with preannotations?

Javier-Jimenez99 · July 26, 2021, 10:43am

I wan't to create a custom NER annotator recipe with pre-annotated values, similar of ner.manual adding patterns. Which values should return the stream to allow this functionality?

ines · July 27, 2021, 1:38am

Hi! You can find an example of the expected JSON format that Prodigy creates for NER here: https://prodi.gy/docs/api-interfaces#ner_manual

The most importan parts are the "text" and "spans", describing the character offsets of the entities. So a pre-annotated example you can create could look like this:

{
  "text": "First look at the new MacBook Pro",
  "spans": [
    {"start": 22, "end": 33, "label": "PRODUCT", "token_start": 5, "token_end": 6}
  ]
}

You can also export data in this format as JSONL and load it into ner.manual, and Prodigy will respect the existing annotations.

If you're using a custom recipe, you can call Prodigy's add_tokens helper to automatically add the "tokens" and span token indices, so you won't have to do this manually and be sure that they match the model's tokenization. So your logic could look like this:

stream = JSONL(source)  # or however you load the raw data
stream = add_your_preannotations(source)
stream = add_tokens(nlp, stream)

Javier-Jimenez99 · July 28, 2021, 1:45pm

Thanks a lot! We finally solve this problem with a similar solution.

Topic		Replies	Views
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018
Create a jsonl pre-populated with annoatations from .txt file usage , ner	4	1067	March 1, 2021
Custom recipe w/o model usage , ner , solved	2	673	April 18, 2018
Using a handmade annotation file for model training ner , best-practices	3	1627	June 22, 2018
recipe proposing list of custom chosen sentences for manual new usage , ner , custom , solved	4	1095	January 21, 2018

How to train custom NER with preannotations?

Related topics