training from scratch

Hello,

I want to train a model for a language not existing neither in the listed languages
nor as UD treebank. Could that be possible?

Hi @vivian ,

Yes you can. However, it takes some work because you'll need data depending on the task at hand:

  • For NER / textcat: it's a bit easier because you can use xx as the spaCy base language and start from there.
  • If it's a dependency parser, that will be much more involved as you need to create your own treebank.

For more general guide and advice, I suggest looking at this Discussion thread

1 Like

Thanks a lot!! : ) Haven't found this discussion thread so far.