Pre-annotation does not work

aisha_harbi · November 16, 2021, 2:50pm

I have implemented a custom recipe for my model, the text appears and labels on the UI except that the pre-annotations are not there, the text is not highlighted....

I used the pseudocode here : https://prodi.gy/docs/named-entity-recognition#custom-model
I followed the same format.

ljvmiranda921 · November 17, 2021, 2:11am

Hi @aisha_harbi ,

If the text appears but the labels on the UI aren't there, then the usual problem is that the tokens and spans are not being aligned correctly. To clarify:

For a Token, the start and end are the character offsets
For a Span, the token_start and token_end are the token indices, while the start and end are again, character offsets.

What you can do is first check the samples of your JSONL file, then work your way backwards in case there's some missing piece of logic (e.g. alignment, missing keys, etc.)

aisha_harbi · November 17, 2021, 7:50am

isn't this the expected format?

{
  "text": "Apple updates its analytics service with new metrics",
  "spans": [{"start": 0, "end": 5, "label": "ORG"}]
}

ljvmiranda921 · November 17, 2021, 7:58am

You're missing token_start and token_end in the spans. You also need another key, tokens that should contain a list of this data structure:

... "tokens": [{"text": str, "id": int, "start": int, "end": int},...]

A minimal structure looks like this:

{
   "text":"Welcome to Prodigy!",
   "tokens":[
      {
         "text":"str",
         "start":"int",
         "end":"int",
         "id":"int"
      },
      {
         "text":"str",
         "start":"int",
         "end":"int",
         "id":"int"
      }
   ],
   "spans":[
      {
         "token_end":"int",
         "token_start":"int",
         "label":"str",
         "start":"int",
         "end":"int"
      }
   ],
   "meta":{
      "ids":[
         "str",
         "str"
      ],
      "start_indices":[
         "int",
         "int"
      ]
   }
}

aisha_harbi · November 17, 2021, 11:27am

Do the keys within the list's dictionary have to be in order? and how do I create the meta and start_indices lists? Because I have all the keys within the span and tokens and text lists..... Manual annotating works fine, it's just the pre-annotation that's not working. I think what I'm trying to ask is what does the pre annotation depend on, the spans, right?

Thanks so much,

aisha_harbi · November 17, 2021, 12:08pm

Never mind it works, thanks again

Topic		Replies	Views
preannotated spans in input json not showing up usage , spancat	6	927	August 24, 2021
UI crashes on custom spans usage , ner , front-end , solved	6	403	May 27, 2021
How to train custom NER with preannotations? usage , ner , solved	2	337	July 28, 2021
rel.manual not accepting entities because of tokenization ner , solved , relations	7	1055	April 17, 2024
Token indices in NER jsonl format usage , ner , solved	1	534	May 20, 2019

Pre-annotation does not work

Related topics