✨ Demo: fully manual NER annotation interface

imranarshad · January 15, 2018, 4:23pm

I've been following your example below

{
    "text": "Hello Apple",
    "tokens": [
        {"text": "Hello", "start": 0, "end": 5, "id": 0},
        {"text": "Apple", "start": 6, "end": 11, "id": 1}
    ],
    "spans": [
        {"start": 6, "end": 11, "startIdx": 1, "endIdx": 2, "label": "ORG"}
    ]
}

This is how my output for task looks like

{
    "text": "Unspecified malignant neoplasm of skin of unspecified part of face",
    "_input_hash": 1363891672,
    "_task_hash": 35627368,
    "tokens": [
        {
            "text": "Unspecified",
            "start": 0,
            "end": 11,
            "id": 0
        },
        {
            "text": "malignant",
            "start": 12,
            "end": 21,
            "id": 1
        },
        {
            "text": "neoplasm",
            "start": 22,
            "end": 30,
            "id": 2
        },
        {
            "text": "of",
            "start": 31,
            "end": 33,
            "id": 3
        },
        {
            "text": "skin",
            "start": 34,
            "end": 38,
            "id": 4
        },
        {
            "text": "of",
            "start": 39,
            "end": 41,
            "id": 5
        },
        {
            "text": "unspecified",
            "start": 42,
            "end": 53,
            "id": 6
        },
        {
            "text": "part",
            "start": 54,
            "end": 58,
            "id": 7
        },
        {
            "text": "of",
            "start": 59,
            "end": 61,
            "id": 8
        },
        {
            "text": "face",
            "start": 62,
            "end": 66,
            "id": 9
        }
    ],
    "spans": [
        {
            "text": "skin",
            "start": 34,
            "end": 38,
            "startIdx": 4,
            "endIdx": 5,
            "label": "BODY_PART"
        },
        {
            "text": "face",
            "start": 62,
            "end": 66,
            "startIdx": 9,
            "endIdx": 10,
            "label": "BODY_PART"
        }
    ]
}

Not sure what I'm doing wrong. I'm using latest version spacy==2.0.5 and prodigy==1.2.0

Thanks

ines · January 15, 2018, 9:12pm

Thanks for sharing the example – I'll try it out and have a look.

As I mentioned above, annotating pre-defined spans in the manual interface is not "officially" supported in Prodigy v1.2.0 – so everything you're doing here is pretty experimental and may not work perfectly. The updated interface and new ner.make-gold workflow coming in v1.3.0 will implement all of the required changes in the core library, so you'll be able to use this workflow out-of-the-box.

imranarshad · January 15, 2018, 11:03pm

Any estimates how long we may have to wait?

ines · January 16, 2018, 12:38pm

@imranarshad There’ll hopefully be another update this week. We do want to get a few other changes and fixes in as well.

imranarshad · January 16, 2018, 9:34pm

Awesome @ines

Thanks

andy · January 18, 2018, 7:28pm

Two questions on workflow and training for the manual interface:

should the manual annotations be exhaustive for the text, like a “GoldParse”, or can they be incomplete like ner.teach?
how does --exclude work for the manual interface? Will it just hash the text since there’s no proposed label? It would be nice if it knew which tags were available in the manual interface and incorporated that, so another session with different active tags could see the sentence again.

honnibal · January 19, 2018, 12:46am

With no model to update, you can use your own target strategy. I think doing one entity at a time exhaustively will be a good approach

It would currently exclude by input hash. I think you'd be best off writing your own filter to give you fine-grained control.

imranarshad · January 25, 2018, 8:13pm

Hey @ines how is progress with newer version.

ines · February 2, 2018, 1:49am

@imranarshad Sorry for the delay – we had to push an update to spaCy first, and then ended up implementing some more features.

Just released Prodigy v1.3.0! See here for the new ner.make-gold workflow. There’s now also a pos.make-gold recipe for annotating part-of-speech tags in a similar way. See the new changelog for an overview of all updates in the new version.

Topic		Replies	Views
Ambiguous NER annotation decisions usage , ner , solved , best-practices	12	4676	February 12, 2018
Segmentation and newlines in ner.manual usage , ner , done	26	5525	August 14, 2019
annotating entities in text documents usage , ner , solved	15	9933	November 28, 2017
✨🔗 Beta testers wanted: new manual dependencies & relations UI (v1.10) dep , news , relations	37	3230	June 26, 2020
ner.manual - simple usage Getting Started usage , ner , solved	7	2466	October 11, 2018

✨ Demo: fully manual NER annotation interface

Related topics