SpanCat Training Error on Custom Preprocessed Dataset

hi @daffahilmyf!

Thanks for your question and welcome to the Prodigy community :wave:

Thanks as well for providing your examples. This helped a lot!

I noticed that in your updated file it looks like you stemmed and removed stop words. Is this correct?

  • Original (worked): "text": "In order to fulfil the requirements of some railways, it should be possible to provide an alternative means of link assurance indication."
  • New (didn't work): "text": "in order fulfil requir some railway possibl provid altern mean link assur indic"

Can you explain more on why the pre-processing is needed?

There's no need to do pre-processing like this and we generally recommend against doing it. For example, this is a good post on the background on stop words:

1 Like