For our use case, we want to start from scratch with streaming a raw dataset from a kafka queue. We’re set up to label custom entities in text without a model in the loop.
We have a single, common postgres for the db.
We’d like to be able to go back into a dataset and add labels on it; for example, one labeler might be annotating phone numbers, and another labeler might come along 6 months later and want to add an address label to the same data.
Is there currently support for updating an existing dataset that’s built off a streaming queue?
The ner.manual recipe respects pre-defined entities, so the annotator will see everything that was labelled before, can correct the spans and also add new entities.
While you can technically add the new annotations to the old dataset, I’d still recommend creating a new one when you re-annotate an existing set later. This gives you a cleaner separation and if something goes wrong, you’ll always have a record of the previous set.