How to use file annotated_news_headlines-ORG-PERSON-LOCATION-ner.jsonl ?

My environment

Microsoft Windows [Version 10.0.19042.1237]
(c) Microsoft Corporation. All rights reserved.

C:\Users\donhuvy>python -m prodigy stats

============================== ✨  Prodigy Stats ==============================

Version          1.11.5
Location         C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\prodigy
Prodigy Home     C:\Users\donhuvy\.prodigy
Platform         Windows-10-10.0.19042-SP0
Python Version   3.9.7
Database Name    SQLite
Database Id      sqlite
Total Datasets   2
Total Sessions   9

C:\Users\donhuvy>python --version
Python 3.9.7


I read
Then I download file


I run

prodigy train --ner ner_news_headlines


C:\Users\donhuvy>python -m prodigy train --ner ner_news_headlines
ℹ Using CPU

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
✔ Generated training config

=========================== Initializing pipeline ===========================
[2021-10-14 20:45:21,000] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 0 | Evaluation: 0 (20% split)
Training: 0 | Evaluation: 0
Labels: ner (0)
[2021-10-14 20:45:21,010] [INFO] Pipeline: ['tok2vec', 'ner']
[2021-10-14 20:45:21,012] [INFO] Created vocabulary
[2021-10-14 20:45:21,012] [INFO] Finished initializing nlp object
Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python39\lib\", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\prodigy\", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src\prodigy\core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\", line 367, in call
    cmd, result = parser.consume(arglist)
  File "C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\prodigy\recipes\", line 277, in train
    return _train(
  File "C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\prodigy\recipes\", line 189, in _train
    nlp = spacy_init_nlp(config, use_gpu=gpu_id)
  File "C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\spacy\training\", line 84, in init_nlp
    nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
  File "C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\spacy\", line 1272, in initialize
    proc.initialize(get_examples, nlp=self, **p_settings)
  File "C:\Users\donhuvy\AppData\Roaming\Python\Python39\site-packages\spacy\pipeline\", line 211, in initialize
    validate_get_examples(get_examples, "Tok2Vec.initialize")
  File "spacy\training\example.pyx", line 64, in
TypeError: [E930] Received invalid get_examples callback in `Tok2Vec.initialize`. Expected function that returns an iterable of Example objects but got: []


How to fix it? How to used annotated file?

Hi! After downloading the file, did you import it into a dataset ner_news_headlines? It sounds like the problem might be that there's no data to train from and the training doesn't fail very gracefully.

So you want to do the following after downloading the annotated data:

prodigy db-in ner_news_headlines annotated_news_headlines-ORG-PERSON-LOCATION-ner.jsonl

I just tried it locally and if I import the data, I can train from it with no problem :slightly_smiling_face:

1 Like

Thank you! Your information is helpful. Now I know how to import annotated JSONL to database.

1 Like