Prerequisites for the dep.teach recipe

snakonechny · January 17, 2019, 8:58pm

I would like to test the dep.teach recipe available in Prodigy (the use case here is training a model that infers relationships between entities). I’m starting with a custom-trained NER model, in this case for Spanish.

When trying to start an annotation session, I get the following error:

ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.

Running my_ner_model.pipe_names produced the following: ['ner'].

I revised my model by adding both the sentencizer and the parser pipeline components, but I’m now getting a cryptic Segmentation fault (core dumped) error. Adding memory on my VM does not solve this issue.

Is there a step/requirement here that I’m missing? Could you point me to any relevant logs that might help with sorting this out?

Cheers,
SN

honnibal · January 18, 2019, 1:45pm

Hey,

The dep.teach recipe is still a bit experimental, but it should work quite well for fine-tuning the accuracy of a pre-trained parser on a new dataset, for domain adaptation.

If you’re starting from scratch, things are a fair bit harder. Annotating trees from scratch is still a lot of work, and we don’t really have a better approach than the free solutions, which you can find here: https://universaldependencies.org/tools.html

We’re going to be doing dependency annotations ourselves as well, so we’ll be working on better solutions. But for now you should annotate at least 500-1000 sentences manually, train an initial model, and then try out the dep.teach recipe to progressively improve different labels.

snakonechny · January 18, 2019, 3:13pm

Thanks for your quick reply, @honnibal! A follow-up question:
the Spanish language model (es_core_news_md) available at the moment has a pre-trained parser. Is this enough to get started with domain-specific training?

honnibal · January 18, 2019, 3:15pm

It should be — give it a try, and see how you go. This will only work if you’re using the same annotation scheme as the pretrained model, though. If you’re trying to learn custom relations, you’ll need to start from scratch.

snakonechny · January 18, 2019, 4:03pm

Understood, makes sense!

Topic		Replies	Views
Training dependency parser usage , ner , done , spacy	5	3884	March 11, 2018
No code open for dep.teach recipe? usage , solved , dep	2	718	March 10, 2020
rel.manual to train ner and dependency ner , done , solved , dep , relations	15	2050	September 7, 2020
Train dependency parser to detect sentences boundaries usage , spacy , solved , dep	4	823	May 3, 2019
Recipe ner.batch-train results in ValueError: [E030] usage , ner , spacy , solved	10	2448	June 25, 2019

Prerequisites for the dep.teach recipe

Related topics