Importing existing custom annotated data from brat

Hi,
Our group has a project that requires us to import manually annotated free text notes with custom entities and relationships from brat into Prodigy. At this step, we just want to visually inspect the annotated file in Prodigy.

I believe db-in would be the recipe to use, but I am unsure how to attach both the unlabeled free text note and the annotations associated with what had been manually done in brat. I have both the note and the annotations converted to JSON, with the annotations having the proper span and labels for the relationships. I don’t think this is a unique use case, but was unable to find any guidance on how to achieve this.

A related use case is to take a UIMA CAS xmi file (from machine annotated output, again converted to JSON) and with it, do a side-by-side comparison with the manually annotated note. Is there a way to do side-by-side annotation exploration of a document in Prodigy?

Thanks very much in advance!

The db-in command is mostly useful if you have existing annotations and want to add them to a dataset, so you can use Prodigy to train a model from them later on. If you only want to load the annotations into Prodigy and inspect them, you could also load in your JSON as the source data, and then run the mark recipe, which will show you whatever comes in and render it with a given interface. For example:

prodigy mark your_dataset your_converted_data.json --view-id ner

The "Annotation task formats" section in your PRODIGY_README.html has more info on how exactly the data should look for different types of annotations. For NER, that would be something like this:

{
    "text": "Apple updates its analytics service with new metrics",
    "spans": [
        {"start": 0, "end": 5, "label": "ORG"}
    ]
}

The annotation interface best suited for this would probably be comparesee here for a demo. It's mostly designed for quick and efficient A/B Evaluation and also supports an additional "input" field at the top. So you could render the original raw text and two different annotated versions.

Thanks! Very much appreciated.

1 Like

Hi,
A few questions. I tried the following JSON in you example for importing manual annotations:

Representation 1. 
[
{
    "text": "Apple updates its analytics service with new metrics",
    "spans": [{"start": 0, "end": 5, "label": "ORG"}
    ]
}
]

Representation 2.
[{
	"text": "Apple updates its analytics service with new metrics",
	"spans": [{
		"start": 0,
		"end": 5,
		"label": "ORG"
	}]
}]

While both are valid JSON, only representation 2 rendered in Prodigy (number 1., gave "No tasks available.")

My other question is how to deal with multiple labels on the same text. I tried

[{
	"text": "Apple updates its analytics service with new metrics",
	"spans": [{
		"start": 0,
		"end": 5,
		"label": "ORG"
	}, {
		"start": 13,
		"end": 15,
		"label": "TEST"
	}]

}] 

but, it renders as,

No tasks available.

in Prodigy. We need input a text item with multiple labels and cannot see any other way to do it. Am I missing something?

Thanks very much in advance!

Given the format of the text files we’re using, we need to use json, versus jsonl, for practical reasons.

Okay, I answered my 2nd question. Apparently, the space between the closing two }] was the culprit. So, this renders fine:

[{
	"text": "Apple updates its analytics service with new metrics",
	"spans": [{
		"start": 0,
		"end": 5,
		"label": "ORG"
	}, {
		"start": 13,
		"end": 15,
		"label": "ORG"
	}]
}]

So, while both are valid JSON, only the 2nd form worked. This seems rather strange.

That’s strange – Prodigy doesn’t do anything special here and just reads in the file (choosing the loader based on the --loader argument or othewise guessing it from the file extension). So I suspect there’s something else going on here. It really shouldn’t matter how your JSON is formatted, as long as it’s valid,

How did you save the file and how did you name it? If you accidentally name a regular JSON file .jsonl (newline-delimited JSON), this can lead to no data being loaded, since the loader is reading in the data line-by-line instead of parsing the whole file as JSON.

Nothing special going on, and the file extension was .json.

I am in process of converting our large files with tons of annotations over to JSON, so we’ll let you know of any oddities that may pop up.

Just completed my first successful test with one of our fully notated text files. We’ll chalk the initial formatting issue up to a fluke.

Thanks!