Span vs NER, compatibility with transformers models

hi @Alvaro8gb!

Thanks for your message. That sounds like a fascinating project.

Check out this post from spaCy GitHub Discussions:

The example REL component doesn't work out of the box with spancat, but it should be possible to make it work. You'd need to modify the code to use the spangroups assigned by your spancat instead of entities on the Doc.

it looks like the only place you'd need to modify to get it working is the instance generator. That's designed so that you can register your own alternative generator instead, too, so you can copy it, give it a new name, and modify your config accordingly. The evaluation script would also need modification, and technically the component could be changed to not rely on doc.ents, but that's more of a bookkeeping detail, and shouldn't affect functionality.

For ner and spancat, there are lot of relevant posts on spaCy GitHub on transformers.

If you're training spancat, be aware that memory can be an issue if you're not careful. This is more the case when you may have long spans. Typically, modifying the suggester function or batch size can help (see this post).

If you're interested in using transformers for the rel_component, Sofie recently released an accompanying blog (see transformer section):

I'd suggest if you have issues, post on spaCy GitHub discussions forum. The spaCy core team supports that forum (this forum is mainly for Prodigy-specific questions) and they'll can help more if you have a config.cfg file you're debugging.

Not specific to transformers, but since you mentioned considering NER vs. spancat, have you seen the spaCy team's ner_spancat_compare template project?

It provides an interesting experiment comparing ner and spancat performance on biomedical literature. They do an excellent job too of exploring span characteristics metrics to provide intuition as to how well spancat will identify correct spans.

You can clone this repo if you have spaCy by running spacy project clone experimental/ner_spancat_compare. You can then fetch assets with spacy project assets then run spacy project run all.

Also, have you considered rules as well? Were you aware of spanruler, which enables overlapping spans like spancat?

My colleagues @victorialslocum and @ljvmiranda921 have created a template project and an accompanying blog post for using spanruler with ner, which may add another option with rules.

Hope this helps!