Datasets and using pre-annotated data

If you look at the “Annotation task formats” section in your PRODIGY_README.html, you’ll find the exact JSON format that Prodigy expects for pre-annotated data for the different annotation types (NER, text classification etc.). The format should be pretty straightforward: for each example, you usually have a "text" and then either a "label" or "spans", depending on what you’re annotating. You can then convert your pre-annotated data accordingly. For example, for named entity recognition, you’ll need the text and the start/end character offsets and labels for the entities in that text.

1 Like