I want to train a model which does both SpanCat and NER. I plan to use the SpanCat as sort of a parser and then use the NER on those extracted spans to extract my desired entities. I was wondering if there is an annotation interface where I can do both. Like the joint span and relation extraction. And then train it with a [transformer, spancat, ner] pipeline spacy model.
Apologies in advance if what I said above doesn't make sense, I'm new to this.
Thanks for your question and welcome to the Prodigy community
You're asking a great question. I saw you also posted a similar post on the spaCy GitHub Discussions forum. That's great - thank you! The spaCy core team can better help you with what's the right way to train the model.
There is not an annotation interface available that can toggle between ner and spancat like ner/relations. It's a bit harder than creating a custom recipe as we'd need to create a custom React component. It's an interesting use case that I don't think we anticipated so I'll make a note in case there's interest in the future to create one.
I have been thinking if there is a hack you could try instead and I'm a bit stuck. Initially, I was going to propose creating the spancat first with spans.manual (which you likely did), but then use that trained spancat model on ner.correct to create your ner annotations by adding --component "spancat" so it uses the existing spancat model but in the format of ner (e.g., no overlapping spans). The problem was I wasn't getting any of the predictions as I was when I ran spans.correct. If you wanted, you could try to modify ner.correct to handle this case.
So in summary, right now doing the spancat and ner separately may be your best bet.
Thanks again for your question and let us know if you have any further questions!
Thanks for the quick reply, I really appreciate it! I was initially thinking of separating the two annotation workflows, was just wondering if I could do it together. What I want to do is use the spancat to extract key sentences from a document and then use NER on only those key sentences, because I don't want the NER to run on other parts of the doc. So with regards to that, I think I will have to annotate it separately.
It would be really nice similar to how we can customize pipelines with spacy, we could do the same with prodigy annotations, merging different interfaces and extract one annotation file to train the entire pipeline in one go. But I do understand that it would be a very difficult task to code that haha.