I have implemented a custom recipe for my model, the text appears and labels on the UI except that the pre-annotations are not there, the text is not highlighted....
I used the pseudocode here : https://prodi.gy/docs/named-entity-recognition#custom-model
I followed the same format.
Hi @aisha_harbi ,
If the text appears but the labels on the UI aren't there, then the usual problem is that the tokens and spans are not being aligned correctly. To clarify:
- For a Token, the
start
and end
are the character offsets
- For a Span, the
token_start
and token_end
are the token indices, while the start
and end
are again, character offsets.
What you can do is first check the samples of your JSONL file, then work your way backwards in case there's some missing piece of logic (e.g. alignment, missing keys, etc.)
1 Like
isn't this the expected format?
{
"text": "Apple updates its analytics service with new metrics",
"spans": [{"start": 0, "end": 5, "label": "ORG"}]
}
You're missing token_start
and token_end
in the spans
. You also need another key, tokens
that should contain a list of this data structure:
... "tokens": [{"text": str, "id": int, "start": int, "end": int},...]
A minimal structure looks like this:
{
"text":"Welcome to Prodigy!",
"tokens":[
{
"text":"str",
"start":"int",
"end":"int",
"id":"int"
},
{
"text":"str",
"start":"int",
"end":"int",
"id":"int"
}
],
"spans":[
{
"token_end":"int",
"token_start":"int",
"label":"str",
"start":"int",
"end":"int"
}
],
"meta":{
"ids":[
"str",
"str"
],
"start_indices":[
"int",
"int"
]
}
}
1 Like
Do the keys within the list's dictionary have to be in order? and how do I create the meta and start_indices lists? Because I have all the keys within the span and tokens and text lists..... Manual annotating works fine, it's just the pre-annotation that's not working. I think what I'm trying to ask is what does the pre annotation depend on, the spans, right?
Thanks so much,
Never mind it works, thanks again