Thanks for sharing more details!
It sounds like for your use case, having a solid experiment workflow will be very valuable: you want to monitor how your model is improving, intervene if it’s not and have incremental, reproducible steps that build up your dataset.
One idea for a workflow could be: Every day, week etc. you automatically export the new additions from your CMS and save them in an easy-to-read format, e.g. Prodigy’s JSONL. For each annotator, you then (automatically or manually) start up an instance of Prodigy on a separate port with the data and set it up to save to a separate dataset, like
week32_annotator1. In the beginning, you may want to ask the annotators to label the data by hand and then use that to train the model. But later on, you could slowly transition to a workflow that has the annotators review the model’s predictions. In that case, you’d run your model over the data, add the label predicted by the model and then use a binary recipe that lets you annotators accept or reject (which should be super fast, too).
Once the annotation is done (e.g. if every incoming example is in every annotated dataset), you can get the data and run some metrics over it. Prodigy assigns hashes to each example, which makes it easy to find identical examples across datasets. For example, you want to check whether annotators agree and which examples are “controversial”.
This is especially important in the beginning while you’re still figuring out your process and label scheme. Maybe it turns out that one label is particularly difficult to assign, so you might decide to revise the label scheme, or provide better annotation guidelines. That kinda stuff always sounds simple and trivial, but it’s actually one of the biggest bottlenecks we’ve seen, along with making reasonable choices of what to train.