Prodigy annotations from older from to newer version

Zainpann · January 14, 2020, 6:40am

Recently I resubscribed Prodigy,I used to have prodigy 1.7.1.
I export the annotations from my older prodigy version and insert(db-in) that jsonl file into prodigy newer version(1.9.5).
The problem is I am unable to batch train the NER in this newer version. and geeting the below error:

Prodigy now comes with a new general-purpose train command that supports all
components and can be used with binary accept/reject annotations by setting the
--binary flag. It also features an improved training loop and more detailed
per-entity-type results. Give it a try!
Loaded model en_core_web_sm
Using 50% of examples (12948) for evaluation
Using 100% of remaining examples (15666) for training
Dropout: 0.2
Batch size: 16
Iterations: 10

BEFORE 0.009
Correct 2071
Incorrect 217049
Entities 33920
Unknown 31848

LOSS RIGHT WRONG ENTS SKIP ACCURACY

Traceback (most recent call last):
File "C:\Users\BNV\AppData\Local\Programs\Python\Python36\Lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\BNV\AppData\Local\Programs\Python\Python36\Lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\BNV\Envs\spacy221\lib\site-packages\prodigy_main.py", line 60, in
controller = recipe(args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 213, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "C:\Users\BNV\Envs\spacy221\lib\site-packages\plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "C:\Users\BNV\Envs\spacy221\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func((args + varargs + extraopts), **kwargs)
File "C:\Users\BNV\Envs\spacy221\lib\site-packages\prodigy\deprecated\train.py", line 143, in ner_batch_train
examples, batch_size=batch_size, drop=dropout, beam_width=beam_width
File "cython_src\prodigy\models\ner.pyx", line 351, in prodigy.models.ner.EntityRecognizer.batch_train
File "cython_src\prodigy\models\ner.pyx", line 443, in prodigy.models.ner.EntityRecognizer._update
File "cython_src\prodigy\models\ner.pyx", line 436, in prodigy.models.ner.EntityRecognizer._update
File "cython_src\prodigy\models\ner.pyx", line 437, in prodigy.models.ner.EntityRecognizer._update
File "C:\Users\BNV\Envs\spacy221\lib\site-packages\spacy\language.py", line 515, in update
proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 445, in spacy.syntax.nn_parser.Parser.update
File "nn_parser.pyx", line 550, in spacy.syntax.nn_parser.Parser._init_gold_batch
File "transition_system.pyx", line 95, in spacy.syntax.transition_system.TransitionSystem.get_oracle_sequence
File "transition_system.pyx", line 156, in spacy.syntax.transition_system.TransitionSystem.set_costs
ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means that the model can't be updated in a way that's valid and satisfies the correct annotations specified in the GoldParse. For example, are all labels added to the model? If you're training a named entity recognizer, also make sure that none of your annotated entity spans have leading or trailing whitespace. You can also use the experimental debug-data command to validate your JSON-formatted training data. For details, run:
python -m spacy debug-data --help

ines · January 14, 2020, 11:28am

Hi! The main difference between spaCy v2.0 and v2.1 is the handling of whitespace in entities. This used to be an issue and could cause significantly worse results, because the entity recognizer was allowed to predict newlines and whitespace tokens as entities.

So if your data contains spans starting with / consisting of only whitespace, that's the one thing you need to fix and filter out. See these threads for details:

Zainpann · January 15, 2020, 1:06pm

I think the problem is different:
I first just db-in "n1.jsonl" in a new dataset, no error with the training.
Then I db-in "n2.jsonl" in another new dataset, no error with training.

then I db-in n2.jsonl in the first dataset, then there is same ValueError: [E024].So Whenever I am doing "db-in" those 2 jsonl file in a single dataset, I am facing the error.

Note:Those two jsonl were produced using prodigy 1.7, but I am facing issues while retraining it on prodigy 1.9.5. Please help me with this issue

ines · January 15, 2020, 1:55pm

How are you evaluating? If you're not using a dedicated evaluation set (e.g. via --eval-id), Prodigy will hold back a random sample from the dataset. So if there's one problematic example in one of your datasets, it may end up in the evaluation set and the model won't be updated with it (and you won't get an error). But when you combine the examples, the held-back evaluation data will be different and the problematic example might end up in the training set. You can test this by using the same dataset (or any other dataset, really) for evaluation and training on the full data. This should always give you an error.

Zainpann · January 16, 2020, 5:04am

okay if that's an issue then I am having too many(almost 1000 out of 41000) examples like the one below in my json file, is there any script which can handle this, i tried the one you mentioned in above reply but it didnt work.
Example:
{"text":" ","start":83,"end":84,"id":16}

Zainpann · January 16, 2020, 7:51am

I solved this issue by removing the script which removes the promblematic entities, by running this script one by one on every jsonl file resolves the issue.

Topic		Replies	Views
Prodigy annotations to SpaCy train spacy	13	5617	January 31, 2018
Prodigy ner.batch-train vs Spacy train usage , spacy , best-practices	13	3498	June 2, 2020
update to Prodigy 1.8 and spaCy 2.1 meta , solved	11	3236	September 12, 2019
Unable to use Prodigy annotations with SpaCy CLI train usage , spacy , solved	2	1502	October 8, 2019
NER training on dataset which was annotated on older version. usage , ner , spacy	1	2264	January 26, 2021

Prodigy annotations from older from to newer version

LOSS RIGHT WRONG ENTS SKIP ACCURACY

Related topics