Hi,
Our group has a project that requires us to import manually annotated free text notes with custom entities and relationships from brat into Prodigy. At this step, we just want to visually inspect the annotated file in Prodigy.
I believe db-in would be the recipe to use, but I am unsure how to attach both the unlabeled free text note and the annotations associated with what had been manually done in brat. I have both the note and the annotations converted to JSON, with the annotations having the proper span and labels for the relationships. I don’t think this is a unique use case, but was unable to find any guidance on how to achieve this.
A related use case is to take a UIMA CAS xmi file (from machine annotated output, again converted to JSON) and with it, do a side-by-side comparison with the manually annotated note. Is there a way to do side-by-side annotation exploration of a document in Prodigy?
The db-in command is mostly useful if you have existing annotations and want to add them to a dataset, so you can use Prodigy to train a model from them later on. If you only want to load the annotations into Prodigy and inspect them, you could also load in your JSON as the source data, and then run the mark recipe, which will show you whatever comes in and render it with a given interface. For example:
prodigy mark your_dataset your_converted_data.json --view-id ner
The "Annotation task formats" section in your PRODIGY_README.html has more info on how exactly the data should look for different types of annotations. For NER, that would be something like this:
{
"text": "Apple updates its analytics service with new metrics",
"spans": [
{"start": 0, "end": 5, "label": "ORG"}
]
}
The annotation interface best suited for this would probably be compare – see here for a demo. It's mostly designed for quick and efficient A/B Evaluation and also supports an additional "input" field at the top. So you could render the original raw text and two different annotated versions.
That’s strange – Prodigy doesn’t do anything special here and just reads in the file (choosing the loader based on the --loader argument or othewise guessing it from the file extension). So I suspect there’s something else going on here. It really shouldn’t matter how your JSON is formatted, as long as it’s valid,
How did you save the file and how did you name it? If you accidentally name a regular JSON file .jsonl (newline-delimited JSON), this can lead to no data being loaded, since the loader is reading in the data line-by-line instead of parsing the whole file as JSON.