✨ Demo: fully manual NER annotation interface

I've been following your example below

{
    "text": "Hello Apple",
    "tokens": [
        {"text": "Hello", "start": 0, "end": 5, "id": 0},
        {"text": "Apple", "start": 6, "end": 11, "id": 1}
    ],
    "spans": [
        {"start": 6, "end": 11, "startIdx": 1, "endIdx": 2, "label": "ORG"}
    ]
}

This is how my output for task looks like

{
    "text": "Unspecified malignant neoplasm of skin of unspecified part of face",
    "_input_hash": 1363891672,
    "_task_hash": 35627368,
    "tokens": [
        {
            "text": "Unspecified",
            "start": 0,
            "end": 11,
            "id": 0
        },
        {
            "text": "malignant",
            "start": 12,
            "end": 21,
            "id": 1
        },
        {
            "text": "neoplasm",
            "start": 22,
            "end": 30,
            "id": 2
        },
        {
            "text": "of",
            "start": 31,
            "end": 33,
            "id": 3
        },
        {
            "text": "skin",
            "start": 34,
            "end": 38,
            "id": 4
        },
        {
            "text": "of",
            "start": 39,
            "end": 41,
            "id": 5
        },
        {
            "text": "unspecified",
            "start": 42,
            "end": 53,
            "id": 6
        },
        {
            "text": "part",
            "start": 54,
            "end": 58,
            "id": 7
        },
        {
            "text": "of",
            "start": 59,
            "end": 61,
            "id": 8
        },
        {
            "text": "face",
            "start": 62,
            "end": 66,
            "id": 9
        }
    ],
    "spans": [
        {
            "text": "skin",
            "start": 34,
            "end": 38,
            "startIdx": 4,
            "endIdx": 5,
            "label": "BODY_PART"
        },
        {
            "text": "face",
            "start": 62,
            "end": 66,
            "startIdx": 9,
            "endIdx": 10,
            "label": "BODY_PART"
        }
    ]
}

Not sure what I'm doing wrong. I'm using latest version spacy==2.0.5 and prodigy==1.2.0

Thanks

Thanks for sharing the example – I'll try it out and have a look.

As I mentioned above, annotating pre-defined spans in the manual interface is not "officially" supported in Prodigy v1.2.0 – so everything you're doing here is pretty experimental and may not work perfectly. The updated interface and new ner.make-gold workflow coming in v1.3.0 will implement all of the required changes in the core library, so you'll be able to use this workflow out-of-the-box.

Any estimates how long we may have to wait?

@imranarshad There’ll hopefully be another update this week. We do want to get a few other changes and fixes in as well.

Awesome @ines

Thanks

Two questions on workflow and training for the manual interface:

  1. should the manual annotations be exhaustive for the text, like a “GoldParse”, or can they be incomplete like ner.teach?
  2. how does --exclude work for the manual interface? Will it just hash the text since there’s no proposed label? It would be nice if it knew which tags were available in the manual interface and incorporated that, so another session with different active tags could see the sentence again.

With no model to update, you can use your own target strategy. I think doing one entity at a time exhaustively will be a good approach

It would currently exclude by input hash. I think you'd be best off writing your own filter to give you fine-grained control.

1 Like

Hey @ines how is progress with newer version.

@imranarshad Sorry for the delay – we had to push an update to spaCy first, and then ended up implementing some more features.

Just released Prodigy v1.3.0! :tada: See here for the new ner.make-gold workflow. There’s now also a pos.make-gold recipe for annotating part-of-speech tags in a similar way. See the new changelog for an overview of all updates in the new version.

3 Likes