Importing existing custom annotated data from brat

GregSilverman · September 19, 2018, 1:01am

Hi,
Our group has a project that requires us to import manually annotated free text notes with custom entities and relationships from brat into Prodigy. At this step, we just want to visually inspect the annotated file in Prodigy.

I believe db-in would be the recipe to use, but I am unsure how to attach both the unlabeled free text note and the annotations associated with what had been manually done in brat. I have both the note and the annotations converted to JSON, with the annotations having the proper span and labels for the relationships. I don’t think this is a unique use case, but was unable to find any guidance on how to achieve this.

A related use case is to take a UIMA CAS xmi file (from machine annotated output, again converted to JSON) and with it, do a side-by-side comparison with the manually annotated note. Is there a way to do side-by-side annotation exploration of a document in Prodigy?

Thanks very much in advance!

ines · September 19, 2018, 9:58am

The db-in command is mostly useful if you have existing annotations and want to add them to a dataset, so you can use Prodigy to train a model from them later on. If you only want to load the annotations into Prodigy and inspect them, you could also load in your JSON as the source data, and then run the mark recipe, which will show you whatever comes in and render it with a given interface. For example:

prodigy mark your_dataset your_converted_data.json --view-id ner

The "Annotation task formats" section in your PRODIGY_README.html has more info on how exactly the data should look for different types of annotations. For NER, that would be something like this:

{
    "text": "Apple updates its analytics service with new metrics",
    "spans": [
        {"start": 0, "end": 5, "label": "ORG"}
    ]
}

The annotation interface best suited for this would probably be compare – see here for a demo. It's mostly designed for quick and efficient A/B Evaluation and also supports an additional "input" field at the top. So you could render the original raw text and two different annotated versions.

GregSilverman · September 19, 2018, 2:02pm

Thanks! Very much appreciated.

GregSilverman · September 28, 2018, 7:46pm

Hi,
A few questions. I tried the following JSON in you example for importing manual annotations:

Representation 1. 
[
{
    "text": "Apple updates its analytics service with new metrics",
    "spans": [{"start": 0, "end": 5, "label": "ORG"}
    ]
}
]

Representation 2.
[{
	"text": "Apple updates its analytics service with new metrics",
	"spans": [{
		"start": 0,
		"end": 5,
		"label": "ORG"
	}]
}]

While both are valid JSON, only representation 2 rendered in Prodigy (number 1., gave "No tasks available.")

My other question is how to deal with multiple labels on the same text. I tried

[{
	"text": "Apple updates its analytics service with new metrics",
	"spans": [{
		"start": 0,
		"end": 5,
		"label": "ORG"
	}, {
		"start": 13,
		"end": 15,
		"label": "TEST"
	}]

}]

but, it renders as,

No tasks available.

in Prodigy. We need input a text item with multiple labels and cannot see any other way to do it. Am I missing something?

Thanks very much in advance!

GregSilverman · September 28, 2018, 9:12pm

Given the format of the text files we’re using, we need to use json, versus jsonl, for practical reasons.

GregSilverman · September 28, 2018, 9:40pm

Okay, I answered my 2nd question. Apparently, the space between the closing two }] was the culprit. So, this renders fine:

[{
	"text": "Apple updates its analytics service with new metrics",
	"spans": [{
		"start": 0,
		"end": 5,
		"label": "ORG"
	}, {
		"start": 13,
		"end": 15,
		"label": "ORG"
	}]
}]

So, while both are valid JSON, only the 2nd form worked. This seems rather strange.

ines · September 29, 2018, 7:49am

That’s strange – Prodigy doesn’t do anything special here and just reads in the file (choosing the loader based on the --loader argument or othewise guessing it from the file extension). So I suspect there’s something else going on here. It really shouldn’t matter how your JSON is formatted, as long as it’s valid,

How did you save the file and how did you name it? If you accidentally name a regular JSON file .jsonl (newline-delimited JSON), this can lead to no data being loaded, since the loader is reading in the data line-by-line instead of parsing the whole file as JSON.

GregSilverman · September 29, 2018, 2:54pm

Nothing special going on, and the file extension was .json.

I am in process of converting our large files with tons of annotations over to JSON, so we’ll let you know of any oddities that may pop up.

…

Just completed my first successful test with one of our fully notated text files. We’ll chalk the initial formatting issue up to a fluke.

Thanks!

Topic		Replies	Views
prodigy use case for annotation having pre-annotated text usage , solved	8	1266	March 11, 2019
Datasets and using pre-annotated data Getting Started usage , solved	23	5535	November 15, 2020
Restore lost annotated dataset from training.jsonl and evalution.jsonl found in a trained model usage , database , solved	4	498	January 21, 2020
Use Prodigy purely as an annotating tool? usage , spacy , solved	10	1935	December 12, 2018
annotating entities in text documents usage , ner , solved	15	9945	November 28, 2017

Importing existing custom annotated data from brat

Related topics