After solving this problem, I have started to label data via
dep.teach. However, how the arcs are displayed seem either unintuitive or inconsistent. I understand fully which direction the arcs should be pointing since I’ve manually labelled a lot of data using a custom program utilizing
displaCy. But, more than half the label candidates that
Prodigy pulls has the correct label, with the opposite arc direction.
>>> import spacy >>> nlp = spacy.load('en_core_web_sm') loading data for custok >>> doc = nlp('no free fluid in the pelvis') >>> print([(t.text,t.head.text,t.dep_) for t in doc]) [('no', 'free fluid', 'negate'), ('free fluid', 'free fluid', 'ROOT'), ('in', 'free fluid', 'prep'), ('the', 'pelvis', '-'), ('pelvis', 'in', 'refer')]
In the above example, the head of
the should be
pelvis with dep
- as the terminal suggest and that is how I’ve pretrained this model. In Prodigy, it suggests the correct label, but the arc is the opposite direction. However, in the below example, it does provide the correct label and the correct arc for
free fluid and
in that I’ve trained the model to do.
After going through ~500 labels in
Prodigy, more than 50% of the label candidates I had to reject because it’s the correct label, but the opposite direction, and I spot check here and there to make sure that my model should have predicted the correct arc. I tried a quick
dep.batch-train to see if it would increase my accuracy with how I think I should be accepting/rejecting these examples, and my model accuracy was reducing on every iteration for 10 iterations.
My question then is am I supposed to ignore the arc direction in Prodigy? Or is this expected behaviour of Prodigy to be suggesting incorrect arcs intentionally?