hi @Alvaro8gb!
Thanks for your message. That sounds like a fascinating project.
Check out this post from spaCy GitHub Discussions:
The example REL component doesn't work out of the box with spancat, but it should be possible to make it work. You'd need to modify the code to use the spangroups assigned by your spancat instead of entities on the Doc.
it looks like the only place you'd need to modify to get it working is the instance generator. That's designed so that you can register your own alternative generator instead, too, so you can copy it, give it a new name, and modify your config accordingly. The evaluation script would also need modification, and technically the component could be changed to not rely on
doc.ents
, but that's more of a bookkeeping detail, and shouldn't affect functionality.
For ner
and spancat
, there are lot of relevant posts on spaCy GitHub on transformers.
If you're training spancat
, be aware that memory can be an issue if you're not careful. This is more the case when you may have long spans. Typically, modifying the suggester function or batch size can help (see this post).
If you're interested in using transformers
for the rel_component
, Sofie recently released an accompanying blog (see transformer section):
I'd suggest if you have issues, post on spaCy GitHub discussions forum. The spaCy core team supports that forum (this forum is mainly for Prodigy-specific questions) and they'll can help more if you have a config.cfg
file you're debugging.
Not specific to transformers
, but since you mentioned considering NER vs. spancat
, have you seen the spaCy team's ner_spancat_compare
template project?
It provides an interesting experiment comparing ner
and spancat
performance on biomedical literature. They do an excellent job too of exploring span characteristics metrics to provide intuition as to how well spancat
will identify correct spans.
You can clone this repo if you have spaCy by running
spacy project clone experimental/ner_spancat_compare
. You can then fetch assets withspacy project assets
then runspacy project run all
.
Also, have you considered rules as well? Were you aware of spanruler
, which enables overlapping spans like spancat
?
My colleagues @victorialslocum and @ljvmiranda921 have created a template project and an accompanying blog post for using spanruler
with ner
, which may add another option with rules.
Hope this helps!