Train spancat bug

Hello :slight_smile: !

I'm doing a spancat test.
I have annotated a first batch of examples and now I would like to have a first idea of the scores. So I tried to launch a training.
My command line seems to be correct since everything runs as it should.However the process stops by itself without giving any error code.

The command line I use is the following :

prodigy train ./testmodelespan/ --spancat SpanClauseClassification -es 0.3 -m fr_core_news_sm --lang "fr" --label-stats

Here is a screenshot of the error, it just says that the "process stopped"

Thanks for your help !

Hi! How much memory do you have and could you run profiling to check whether you're running out of memory? If your process just dies, one of the likely explanations is that you're running out of memory.

Hi !

we were configured on 8go, we went to 32go and the problem is still the same.
What could be the other causes?

Thanks :slight_smile: !

Is there anything else you can see in the logs that looks relevant? It really sounds like this is more of a system-level problem rather than a problem with the training logic itself. Also, could you try exporting your data with data-to-spacy and train with spaCy direclty? If the problem still remains, could you open an issue on the spaCy tracker?

Hi Inรจs,

Is there a log file where we can see the type of errors ?

I tried to export my data with data-to-spacy but the problem is still the same : "processus stoped".

So, I was wondering if i could juste export my annotations with db-out and train a new model with spacy directly ?
I use db-out to export my data in .json.

Thanks :slight_smile:

Ah, I was referring to your server logs here! For example, if you're using a cloud provider, you can typically view the logs and see everything that's happening.

This does indicate that there's either a problem at the spaCy level, or that you're running out of memory preprocessing your data. 8gb would probably be too little so maybe double-check that you definitely have 32gb available? Also, maybe try re-installing spaCy as well, just in case you ended up with a broken installation somehow.

Is it possible that the spans that I annotated are too long and kill the process ?

It's definitely possible that some of the logic used to preprocess the spans and calculate the best-matching suggester range is inefficient. So IF you're running out of memory, that'd explain why. So it'd be important to check whether you're in fact running out of memory.

Also, could you run data-to-spacy again with the PRODIGY_LOGGING=basic environment variable to show more logs? This should give you an idea where it happens.