text classification - is prodigy a good fit for the project?

So first of all, I wish you'd seen the academic page! For PhD students we often grant a trial academic license, which we're pretty generous about extending: https://prodi.gy/academic . We approve these manually so there's often some delay, and we're not always able to provide the trial. But I think you'll probably be able to get started with Prodigy this way if you apply.

To answer your questions though:

  • If you need to have certain keywords highlighted, you can add a spans object to your stream examples, so you won't have to use HTML. You can use an HTML view, but it makes things a lot harder, because you'll have to map to and from the text in your recipe in order to update the model (since the models all expect text, not html).

  • For textcat, you can use active learning from a cold-start -- it will start off not really making helpful suggestions, and then gradually learn more. The same thing doesn't really work for NER though, since the ner.teach relies on the model predicting specific entities.

  • You can have two labels as check-boxes, which would allow you to annotate the labels at the same time, while also being non-mutually-exclusive.

1 Like