Annotation without model

I'm new to Prodigy and want to use it to do either binary or multiclass labeling of datasets without a model in the annotation loop (i.e. I want to manually label everything in I pass in as the dataset). How do I use textcat.teach or textcat.manual without passing a model parameter? I get errors if I try to leave the model out or if I type in the name of a non-existing (blank) model.

Hi! textcat.teach uses the model to suggest potential candidates and annotations, so for that workflow, you do need a base model that it can update. textcat.manual doesn't actually use the model you pass in at the moment (we added that argument because we wanted to support auto-loading the labels that are already present in a model, but that turned out to be a bad idea). So you can just pass in any model name, like en_core_web_sm. Sorry about the confusion here – we'll fix this in the next release.

Alternatively, you could also write a simple custom recipe like this one that streams in your data, adds options for the labels and lets you annotate with a multiple choice interface: For binary classification, you could use the classification interface and a top-level label ({"text": "...", "label": "..."}) and then accept/reject the examples.

Ok, that helps. Thanks for your reply. Is there any documentation that provides an explanation of how textcat classifier works (what features from loaded model it uses, what ML classifier it is using, etc.)?

By default, Prodigy will use spaCy's TextCategorizer with the default architecture. The component is fully independent and doesn't require any of the other statistical model components (like the tagger, parser or NER). It only uses the tokens and word vectors, if they are available in the model. See here for the implementation of the simple CNN classifier architecture.

Of course, you can always export the data and train any other text classifier. Our prodigy-recipes repo also has an example script for using active learning with a custom model (illustrated with a dummy model that "predicts" random numbers).