Prodigy functionality on entities annotation/image classification/model training


Sorry in advance for the long list of questions below. I am trying to confirm of the functionality that Prodigy can provide.

  1. First, I am wondering if we’re trying to create multiple named entities which are specifically used to label product concepts, such as ‘size, color, durability, service’ etc., could we use Prodigy to easily annotate multiple product concepts on its UI, thus coming up with appropriate training dataset.

  2. Regarding to image classification, could we also label images to multiple classes via Prodigy UI. Instead of object detection, I am now more interested on image classification. Whether it’s a good shot, has a good composition etc. Thus, we can train model to identify good images from bad ones. We need a more efficient way to come up with training dataset.

  3. Could we export the annotated samples with its corresponding labels for use in other cases?

  4. I know that while in the process of annotation, Prodigy already kicks off refining its model to the new data points. I am wondering if we can set up our own model instead of the default to be used in annotation learning process.

  5. Regarding to the model that’s constantly refined in the annotation process, could we export the model and use it in other cases. I am wondering what framework the default model is built on. I am more familiar with PyTorch, thus, not sure if we can specify the framework to be used in Prodigy’s default model.

Thanks again if anyone could help clarifying these question. Really appreciated.


Hi! Answers below:

Sure – you might want to check out the demo of the manual NER interface and the ner.manual recipe. Once you have a pre-trained model, you can also use that to pre-label entities and then correct them – this can often save you time, because even if your model is only correct 50% of the time, that's still 50% less work for you.

One approach could be to use the choice interface, but with an "image" field instead of a "text". This will let you select one (or multiple) categories for a given image. Also see this page for an example of how a custom recipe for this could look.

Sure – the db-out command lets you export your annotations in a handy JSONL file (newline-delimited JSON). For instance:

prodigy db-out classification_dataset > data.jsonl

The built-in recipes for NER and text classification use spaCy, but you can always write a custom recipe to plug in your own PyTorch models. You can see an example of this here:

The example uses a super simple "dummy model" that outputs random numbers to illustrate the idea – I hope that makes it easy to follow. When implementing your own model to be updated in the loop, just make sure that it is sensitive enough to single batches of updates.