Export Annotated Data from ner.manual to get list of words per label

hi @do12siwu!

To help make things easier, can you avoid pasting in images and use the code feature to paste in your code? This makes it much easier for us to replicate.

Can you explain what you're trying to accomplish? I saw you changed for span in eg.get("span", []): to for span in eg.get("REGEX", []):. What are you trying to do here?

The problem is in your examples -- I assume koText is a set of annotations from ner.manual -- doesn't have a "REGEX" key. The get method is will get the key with the accompanying key name.

This is what an example annotation looks like for the ner.manual recipe:

from prodigy.components.db import connect
db = connect()
examples = db.get_dataset("ner_manual")
import pprint
pprint.pprint(examples[0])
{'_input_hash': -136499144,
 '_is_binary': False,
 '_task_hash': -986839541,
 '_timestamp': 1662560235,
 '_view_id': 'ner_manual',
 'answer': 'accept',
 'spans': [{'end': 33,
            'label': 'PRODUCT',
            'start': 22,
            'token_end': 6,
            'token_start': 5}],
 'text': 'First look at the new MacBook Pro.',
 'tokens': [{'end': 5, 'id': 0, 'start': 0, 'text': 'First', 'ws': True},
            {'end': 10, 'id': 1, 'start': 6, 'text': 'look', 'ws': True},
            {'end': 13, 'id': 2, 'start': 11, 'text': 'at', 'ws': True},
            {'end': 17, 'id': 3, 'start': 14, 'text': 'the', 'ws': True},
            {'end': 21, 'id': 4, 'start': 18, 'text': 'new', 'ws': True},
            {'end': 29, 'id': 5, 'start': 22, 'text': 'MacBook', 'ws': True},
            {'end': 33, 'id': 6, 'start': 30, 'text': 'Pro', 'ws': False},
            {'end': 34, 'id': 7, 'start': 33, 'text': '.', 'ws': False}]}