Export Annotated Data from ner.manual to get list of words per label

ryanwesslen · September 7, 2022, 2:32pm

To help make things easier, can you avoid pasting in images and use the code feature to paste in your code? This makes it much easier for us to replicate.

Can you explain what you're trying to accomplish? I saw you changed for span in eg.get("span", []): to for span in eg.get("REGEX", []):. What are you trying to do here?

The problem is in your examples -- I assume koText is a set of annotations from ner.manual -- doesn't have a "REGEX" key. The get method is will get the key with the accompanying key name.

This is what an example annotation looks like for the ner.manual recipe:

from prodigy.components.db import connect
db = connect()
examples = db.get_dataset("ner_manual")
import pprint
pprint.pprint(examples[0])
{'_input_hash': -136499144,
 '_is_binary': False,
 '_task_hash': -986839541,
 '_timestamp': 1662560235,
 '_view_id': 'ner_manual',
 'answer': 'accept',
 'spans': [{'end': 33,
            'label': 'PRODUCT',
            'start': 22,
            'token_end': 6,
            'token_start': 5}],
 'text': 'First look at the new MacBook Pro.',
 'tokens': [{'end': 5, 'id': 0, 'start': 0, 'text': 'First', 'ws': True},
            {'end': 10, 'id': 1, 'start': 6, 'text': 'look', 'ws': True},
            {'end': 13, 'id': 2, 'start': 11, 'text': 'at', 'ws': True},
            {'end': 17, 'id': 3, 'start': 14, 'text': 'the', 'ws': True},
            {'end': 21, 'id': 4, 'start': 18, 'text': 'new', 'ws': True},
            {'end': 29, 'id': 5, 'start': 22, 'text': 'MacBook', 'ws': True},
            {'end': 33, 'id': 6, 'start': 30, 'text': 'Pro', 'ws': False},
            {'end': 34, 'id': 7, 'start': 33, 'text': '.', 'ws': False}]}

Topic		Replies	Views
Getting Started Questions usage , ner	1	625	November 6, 2018
Training on part of the custom annotations usage , ner , database	4	673	October 22, 2021
Processing annotated data usage , ner	1	308	January 20, 2022
Exported annotations missing text ner	2	224	November 10, 2022
Re-labling custom dataset with Prodigy usage , ner	2	605	June 28, 2021

Export Annotated Data from ner.manual to get list of words per label

Related topics