Unable to use train and run data-to-spacy recipes for spancat on prodigy 1.11.10

ryanwesslen · February 10, 2023, 7:41pm

Thanks for your question and welcome to the Prodigy community

First off -- thank you so much for your detailed issue. This helps us so much and we greatly appreciate (and can respond much faster) when users provide good details of their issue.

Do you have the same problem if you remove the --base-model? Either when annotating (e.g., in training or converting the data with data-to-spacy)?

We've recently found some potential issues with the --base-model with prodigy train, but maybe it also affectsdata-to-spacy too.

Just curious, can you explain your thinking of using the en_core_sci_sm model (SciSpaCy)?

Typically base models are used when you want to use those vectors in a future pipeline, so I could see if using SciSpaCy in data-to-spacy if you wanted your pipeline to have SciSpaCy's vectors during training. (I guess in theory, you could also use the sole vector models like en_core_sci_lg instead).

I could also see SciSpaCy helping if you wanted to use a correct or teach model that you were trying to use one of its' components (say a custom ner) and correct/teach it in Prodigy. However, for spans, you likely may be just as well okay with a blank tokenizer.

prodigy spans.manual my_project en_core_sci_sm C:\Prodigy\Data\my_project.csv --loader csv --label RESPIRATORY,NEGATIVE

Also for annotating manual recipes, you essentially could use any English tokenizer (e.g., blank:en). But I don't think the annotations are the problem. It's training or running data-to-spacy.

I'll admit I haven't used SciSpaCy before so I'll need to look more into it.

One last thing - I see you're running spaCy 3.4.4. Do you know if SciSpaCy 0.5.0 works for spaCy 3.4.4? just know sometimes it's hard to keep up with newer versions of spaCy, for example:

Regardless, let us know if you can at least overcome this bottleneck.

Topic		Replies	Views
Unable to train textcat model using en_core_web_md as a base model textcat	11	1683	May 2, 2023
Spancat is not trained spancat	12	1113	July 27, 2022
prodigy train result is different with the spacy train result, why? usage , ner , spacy , solved	7	756	February 3, 2023
train --spancat questions usage , transformers , training , spancat	2	760	January 26, 2022
Can't find component 'spancat' in pipeline.	3	371	January 23, 2023

Unable to use train and run data-to-spacy recipes for spancat on prodigy 1.11.10

Related topics