Span Cat Annotations and Incorrect Predictions

I'm having issues with the span cat model not correctly predicting, or attempting to predict, all annotated labels in a text with the spancat model.

Backstory: I am currently training a span cat model to try to identify medical phrases in clinical narratives. We have annotated around 800 clinical narratives from real patient data and labelled them using prodigy's span cat annotator to identify phrases as disorders, findings, procedures, etc. We are using span cat, rather than NER, because many of the medical phrases have multiple sub-phrases within the terms that are separate labels altogether. For instance, "family history of heart disease" would be a situation, but the text "heart disease" is a disorder.

After training the model with prodigy, we found that the results had very low sensitivity - very few terms were ever being identified, even when predicting on the data used to train the model. We thought we needed more data, so we continued to add more examples and iteratively train, but the sensitivity was still low so I looked into how the annotations are being saved and I found something peculiar: Only the labels which contained a "text" attribute were being predicted and identified in the model. In other words, the labels contained in each example in the jsonl file was annotated differently, such that only the spans with the "text" attribute were being predicted, and the spans without it were not. Let me give an example:

A middle-aged female presents to the emergency room with the concern of active gastrointestinal (GI) bleeding. 99-Tc labeled RBC scan is performed, and imaged were acquired for 4 hours. No abnormal area of increased radiotracer uptake is identified. The patient had a bowel movement during the scan, and the stool was imaged and was positive for radioactivity.

The JSONL output for that annotation appeared as follows:

{"text":"A middle-aged female presents to the emergency room with the concern of active gastrointestinal (GI) bleeding. 99-Tc labeled RBC scan is performed, and imaged were acquired for 4 hours. No abnormal area of increased radiotracer uptake is identified. The patient had a bowel movement during the scan, and the stool was imaged and was positive for radioactivity. ","_input_hash":-1194054636,"_task_hash":-470800897,"tokens":[{"text":"A","start":0,"end":1,"id":0,"ws":true},{"text":"middle","start":2,"end":8,"id":1,"ws":false},{"text":"-","start":8,"end":9,"id":2,"ws":false},{"text":"aged","start":9,"end":13,"id":3,"ws":true},{"text":"female","start":14,"end":20,"id":4,"ws":true},{"text":"presents","start":21,"end":29,"id":5,"ws":true},{"text":"to","start":30,"end":32,"id":6,"ws":true},{"text":"the","start":33,"end":36,"id":7,"ws":true},{"text":"emergency","start":37,"end":46,"id":8,"ws":true},{"text":"room","start":47,"end":51,"id":9,"ws":true},{"text":"with","start":52,"end":56,"id":10,"ws":true},{"text":"the","start":57,"end":60,"id":11,"ws":true},{"text":"concern","start":61,"end":68,"id":12,"ws":true},{"text":"of","start":69,"end":71,"id":13,"ws":true},{"text":"active","start":72,"end":78,"id":14,"ws":true},{"text":"gastrointestinal","start":79,"end":95,"id":15,"ws":true},{"text":"(","start":96,"end":97,"id":16,"ws":false},{"text":"GI","start":97,"end":99,"id":17,"ws":false},{"text":")","start":99,"end":100,"id":18,"ws":true},{"text":"bleeding","start":101,"end":109,"id":19,"ws":false},{"text":".","start":109,"end":110,"id":20,"ws":true},{"text":"99","start":111,"end":113,"id":21,"ws":false},{"text":"-","start":113,"end":114,"id":22,"ws":false},{"text":"Tc","start":114,"end":116,"id":23,"ws":true},{"text":"labeled","start":117,"end":124,"id":24,"ws":true},{"text":"RBC","start":125,"end":128,"id":25,"ws":true},{"text":"scan","start":129,"end":133,"id":26,"ws":true},{"text":"is","start":134,"end":136,"id":27,"ws":true},{"text":"performed","start":137,"end":146,"id":28,"ws":false},{"text":",","start":146,"end":147,"id":29,"ws":true},{"text":"and","start":148,"end":151,"id":30,"ws":true},{"text":"imaged","start":152,"end":158,"id":31,"ws":true},{"text":"were","start":159,"end":163,"id":32,"ws":true},{"text":"acquired","start":164,"end":172,"id":33,"ws":true},{"text":"for","start":173,"end":176,"id":34,"ws":true},{"text":"4","start":177,"end":178,"id":35,"ws":true},{"text":"hours","start":179,"end":184,"id":36,"ws":false},{"text":".","start":184,"end":185,"id":37,"ws":true},{"text":"No","start":186,"end":188,"id":38,"ws":true},{"text":"abnormal","start":189,"end":197,"id":39,"ws":true},{"text":"area","start":198,"end":202,"id":40,"ws":true},{"text":"of","start":203,"end":205,"id":41,"ws":true},{"text":"increased","start":206,"end":215,"id":42,"ws":true},{"text":"radiotracer","start":216,"end":227,"id":43,"ws":true},{"text":"uptake","start":228,"end":234,"id":44,"ws":true},{"text":"is","start":235,"end":237,"id":45,"ws":true},{"text":"identified","start":238,"end":248,"id":46,"ws":false},{"text":".","start":248,"end":249,"id":47,"ws":true},{"text":"The","start":250,"end":253,"id":48,"ws":true},{"text":"patient","start":254,"end":261,"id":49,"ws":true},{"text":"had","start":262,"end":265,"id":50,"ws":true},{"text":"a","start":266,"end":267,"id":51,"ws":true},{"text":"bowel","start":268,"end":273,"id":52,"ws":true},{"text":"movement","start":274,"end":282,"id":53,"ws":true},{"text":"during","start":283,"end":289,"id":54,"ws":true},{"text":"the","start":290,"end":293,"id":55,"ws":true},{"text":"scan","start":294,"end":298,"id":56,"ws":false},{"text":",","start":298,"end":299,"id":57,"ws":true},{"text":"and","start":300,"end":303,"id":58,"ws":true},{"text":"the","start":304,"end":307,"id":59,"ws":true},{"text":"stool","start":308,"end":313,"id":60,"ws":true},{"text":"was","start":314,"end":317,"id":61,"ws":true},{"text":"imaged","start":318,"end":324,"id":62,"ws":true},{"text":"and","start":325,"end":328,"id":63,"ws":true},{"text":"was","start":329,"end":332,"id":64,"ws":true},{"text":"positive","start":333,"end":341,"id":65,"ws":true},{"text":"for","start":342,"end":345,"id":66,"ws":true},{"text":"radioactivity","start":346,"end":359,"id":67,"ws":false},{"text":".","start":359,"end":360,"id":68,"ws":true}],
"spans":[
{"start":14,"end":20,"text":"female","source":"./output/model-best","input_hash":-1194054636,"token_start":4,"token_end":4,"label":"FINDING"},
{"start":79,"end":109,"token_start":15,"token_end":19,"label":"DISORDER"},
{"start":111,"end":133,"token_start":21,"token_end":26,"label":"TEST"}],"_view_id":"spans_manual","answer":"accept","_timestamp":1642257676}

The model only predicts one span: "female". However, there are two other spans that were labelled: "Active gastrointestinal (GI) bleeding" (Tokens 15-19 labelled as disorder) and "99-Tc labeled RBC scan" (tokens 21 to 26 as test). At the bottom of the output, you can see that only the "female" label had the attributes of "text" and "source". The missing labels did not.

This does not appear to be a coincidence with this one example. I have tested multiple examples, each with varying levels of complexity, and I found over and over again that any of the labelled spans that did not contain "text" or "source" were never predicted in the model.

I am training the model with default variables. Is there something that I am doing wrong that leads to this incorrect behavior? Possible bug somewhere? Thank you for any help you can provide!

Edit: Potentially?? related issue: When I add a seed file for manual annotations, if the seed terms contain special characters like hyphens or parantheses, the seeds are not identified.

Hi.

Were you able to find a solution to the problem? I am facing the same sort of problem while training the data.

Hi! Sorry for the extreme delay in responding. This seems definitely unusual if the "span" attributes are being saved except for the text. @PrithaSarkar , can you give more information regarding your problem? Are you also having spans that do not contain the text attribute?

My potential hunches:

When I add a seed file for manual annotations, if the seed terms contain special characters like hyphens or parantheses, the seeds are not identified.

If you're using an additional file to supply the annotations, then I think you should ensure that all span attributes are present and correct (e.g., it has a text attribute, the start and end indices are correct, etc.). If you can elaborate further how the seed file was generated, then I think that information would be helpful to debug the problem.

One "hack" that you can do is to create a spaCy DocBin file that contains those annotations, and do the training via spacy train instead. It seems that the span offsets are correct (based from your single example), so you can probably do something like:

Hi,

So, basically my problem is same as OPs. While training, the scores are stuck at 0. There was another thread with this problem and some kind-hearted people recommended to set values of config file to default and play with the n-gram but nothing seems to work for me. For example:

To give you an example of what my training dataset looks like:

{"id”:"xxxxx","meta":{"id”:"xxxxx”},"source_url":"https://blahblahblah.com","text":"Competing interest statement The authors declare no conflict of interest.","_input_hash":0,"_task_hash":0,"tokens":[{"token":"Competing","start":0,"end":9,"id":0,"ws":"true"},{"token":"interest","start":10,"end":18,"id":1,"ws":"true"},{"token":"statement","start":19,"end":28,"id":2,"ws":"true"},{"token":"The","start":29,"end":32,"id":3,"ws":"true"},{"token":"authors","start":33,"end":40,"id":4,"ws":"true"},{"token":"declare","start":41,"end":48,"id":5,"ws":"true"},{"token":"no","start":49,"end":51,"id":6,"ws":"true"},{"token":"conflict","start":52,"end":60,"id":7,"ws":"true"},{"token":"of","start":61,"end":63,"id":8,"ws":"true"},{"token":"interest","start":64,"end":72,"id":9,"ws":"false"},{"token":".","start":72,"end":73,"id":10,"ws":"false"}],"_view_id":"blocks","spans":[{"start":29,"end":64,"token_start":3,"token_end":9,"label":"none_declared"}],"answer":"accept","_timestamp":0,"_annotator_id":"spancat-exclude","_session_id":"spancat-exclude"}

I encountered another problem. While debugging the dataset, prodigy told me that there was not enough examples of a certain label. However, upon manual inspection it was seen that there were thousands of examples of that certain label. Does prodigy only consider unique examples in the regard? My dataset has and will always have a lot of duplicate entries in the "text" field.

Looking forward to your advice.

Could you link the particular thread?

Could you share the error message as well as the stats that you've inspected? How many examples do you have in your dataset?

Another hint: it seems like your issue with the score is more related to spacy train than to Prodigy. Did you check the spaCy discussion forum for similar issue? Maybe this one?