Welcome to the forum @awagner-mainz 
You're right - the default parsing function of SpanCat task is not handling correctly the responses you receive. If you look at the main parsing logic, you'll notice that upon splitting on |
it expects 4 items, while in your case:
the first |
is being substituted with a comma which will lead to ValueError
which, in turn, will lead to skipping the response and not creating any spans that Prodigy can use to highlight.
Now, how to fix it:
Option 1
Try to modify the prompt, by explicitly stating it should not add any extra characters such as e.g. *
or that the entity should be an exact span from the paragraph etc.
You can easily modify the prompt by submitting a custom prompt template via config:
[components.llm.task]
@llm_tasks = "spacy.SpanCat.v3"
labels = ["COMPLIMENT", "INSULT"]
template = "path/to/the/template.jinja2"
You can use the original SpanCat template as a starting point.
Option 2
Since the offending characters seem to be consistently *
you could also write your own task.
spaCy docs on how to define custom tasks can be found here.
This blog shows step by step how to define custom task class using a simpler use case: Elegant prompt versioning and LLM model configuration with spacy-llm | by Déborah Mesquita | Towards Data Science
This just to have an overview. Your case will a bit more complex because you'll need to make sure the answers are valid spans with respect to the text. You can find the current logic here: spacy-llm/spacy_llm/tasks/span/parser.py at 117f68963870fd2a4af4c706c40cf223c6ae6fde · explosion/spacy-llm · GitHub.
You'll need to recreate most of it adding the extra logic for cleaning up *
- do let us know if you run into problems implementing that! (Sharding is optional - you don't have to worry about it for the first version.)
As to spancat vs NER question:
The very high level difference between spancat and NER is that spancat generates candidates for a span to which a classifier assigns probabilities. NER, on the other hand, predicts when an entity starts and ends.
This why, in general for spans that have well defined boundaries NER tends to yield better results than spancat. Also, it is usually easier to model atomic entities, so if your overlapping case (the REFERENCE) can be compound of sub entities, I would recommend annotating simpler entities and infer the compund ones in post processing (e.g. via rules that specify that a combination of given entities is a REFERENCE - you could use spaCy Entity Ruler for this).
It might be that for some of your entities NER will be a better fit and for others perhaps it will be spancat. Yet another solutions could be just rule-based matching. It always takes some experimentation.
The good news is, if we eliminate the need for overlapping spans, the NER and spanscat annotations are essentially just spans and it's very easy to to transform NER annotations to spancat annotations to experiment and compare the performance of spaCy NER and spancat on the same data. Assuming you have your NER annotations in spaCy DocBin format (e.g. using data-to-spacy
recipe), such function that filters a subset of categories could look something like this:
def rewrite_as_spans(nlp, dataset, split, dataset_dir, target_ner_file, cats):
docbin = DocBin().from_disk(dataset)
docs = list(docbin.get_docs(nlp.vocab))
for doc in docs:
old_ents = doc.ents
new_ents = []
new_spans = []
for ent in doc.ents:
if ent.label_ in cats:
new_spans.append(ent)
else:
new_ents.append(ent)
assert len(list(old_ents)) == len(new_spans) + len(new_ents)
doc.set_ents(new_ents)
doc.spans["sc"] = new_spans
output_path = f"{dataset_dir}/{str(target_ner_file)}_{split}.spacy"
DocBin(docs=docs, store_user_data=True).to_disk(output_path)
msg.good(f"Saved rewritten {split} datast to {output_path}")