Thanks! We’re in the process of publishing the recipes on Github. We’re just figuring out the best process to build them into the Prodigy wheels once they’re in a separate repo. Once they’re published, pull requests will be very welcome!
About the accuracy vs F-score: This is a topic that makes me feel dumb every now and again, because it seems like it should be quite obvious, but then I find myself scratching my head.
If the model is constrained to output one class prediction per instance, I think accuracy should be the same as micro-averaged F1, right? However, this is obviously not true if we let the model predict multiple classes per instance, which the default spaCy text classification model is allowed to do.