We are manually tagging entities in the paragraphs extracted from several documents.
The json generated by extracting all the paragraphs has around 10,000 paras to be labeled.
We are half way through labeling activity and we realized there was a mistake in how the paras were generated. We know how to fix it.
Problem:
- We first need to identify paras which are not labeled yet.
- Apply the fix to these paras.
- Update input json to prodigy with only these fixed unlabeled paras.
What is the best way to identify the unlabeled paras other than comparing input json and the sq-lite database dump.
Thank You