I used this command to annotate directional relations and dependencies between tokens:
prodigy rel.manual my_dataset my_pre_annotated_model ./file_to_data.jsonl --label label1,label1 --span-label ner_label1,ner_label2,ner_labe3 --wrap
after that, how can i convert the resulted dataset to a model so i can correct it and increase its accuracy?
i tried this:
prodigy dep.batch-train my_dataset blank:en --output ./model_Dependency --label label1,label2 --eval-split 0.2 --n-iter 10
but i got this error:
ValueError: [E021] Could not find a gold-standard action to supervise the dependency parser. The GoldParse was projective. The transition system has 3 actions. State at failure: __0 __0 is_0 | under testing
what should i do in this case?
Are the relations you've annotated syntactic dependencies? If not, training a dependency parser doesn't really make sense and it's expected that it won't be able to learn from your annotations.
spaCy doesn't have a built-in component for generic relation prediction and the type of model you choose will depend on the type of relations you want to predict. So you'll have to bring your own model implementation that fits to your specific task. Using
db-out, you can export your Prodigy annotations and the data should include everything you need: the tokens, their offsets and the labels relations with references to the tokens they connect.
Also see this thread for some pointers re: custom relation prediction:
Thank you @ines for your reply.
In fact, my work is something like this:
if i have a text like this for exemple: "Oxygen, nitrogen and hydrogen detector flows were set at 85, 7, and 4 mL/min, respectively"
After manual annotation:
And now I have to connect each material with its corresponding flow, how can i do that?
If you have annotated data in this format, you can load it into the
rel.manual recipe to connect the annotated spans. To load data from an existing dataset, you can use the
dataset: prefix, as shown in the example here: https://prodi.gy/docs/dependencies-relations#ner
After you've created the relation annotations, you can use
prodigy db-out to export your JSON-formatted data, so you can use it to train a model (or for any other process).
Thank you @ines again for your reply.
I have another question please. In my case is it possible to use an existing model like "en_core_web_sm" then correct and update it based on my work or do you think it is better if i start from scratch?
Based on the examples you've posted, it's probably best to start from scratch, at least for NER. The corpus used to train the general-purpose English models have been trained with various labels that overlap with your definitions, especially for numbers / units. Trying to teach the model that all these entities it previously predicted are suddently something completely different will be much harder and potentially a lot less stable, and the results will be much harder to reason about.
Hi @ines, i have another question please,
to generate a model from pre-annotated dataset i used:
prodigy ner.batch-train mannual_annotation_set blank:en --output .\model_2_1
so can i use dep.batch-train if i want to build a model after the relation annotations?
something like this:
prodigy dep.batch-train mannual_annotation_set blank:en --output .\model_2_2
You should use the new general-purpose
train recipe: https://prodi.gy/docs/recipes#train Alternatively, you can also run
data-to-spacy to export your annotations for training with spaCy. In general, you should
However, as I said in my previous post, it doesn't sound like your annotations are actually for syntactic dependencies? If your annotations aren't actually dependency parsing annotations, you won't be able to train a dependency parser with them.
Instead, you probably want to use a different implementation for general-purpose relation annotation:
Thank you for your reply.