Hi! I understand that
textcat.teach can use pattern-matching to bootstrap the labeling of (rare) classes in text classification tasks, but I would like to know your thoughts about using zero-shot classifiers (e.g. HuggingFace transformers pipelines).
In an ideal workflow, I'd like a zero-shot classifier to replace the pattern matcher, to allow me to quickly accept/reject labels set by it. By entering more and more annotations, I would 1) measure the accuracy of my zero-shot baseline 2) start training another model with the active learning logic (hopefully outperforming the zero-shot baseline).
Would this make sense as an enhancement for Prodigy?
What is the best approximation of the workflow above with current tools? I was currently thinking about
- running inference with a zero-shot classifier
reviewthe labels one by one manually for a subset of examples to create a golden set
- train a spacy model over that set
textcat.teachthat model to further refine it
but was wondering if there are more elegant/less cumbersome ways. Thanks!