Hello again,
After solving this problem, I have started to label data via dep.teach
. However, how the arcs are displayed seem either unintuitive or inconsistent. I understand fully which direction the arcs should be pointing since I’ve manually labelled a lot of data using a custom program utilizing displaCy
. But, more than half the label candidates that Prodigy
pulls has the correct label, with the opposite arc direction.
Example (Prodigy):
(Terminal)
>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
loading data for custok
>>> doc = nlp('no free fluid in the pelvis')
>>> print([(t.text,t.head.text,t.dep_) for t in doc])
[('no', 'free fluid', 'negate'), ('free fluid', 'free fluid', 'ROOT'), ('in', 'free fluid', 'prep'), ('the', 'pelvis', '-'), ('pelvis', 'in', 'refer')]
In the above example, the head of the
should be pelvis
with dep -
as the terminal suggest and that is how I’ve pretrained this model. In Prodigy, it suggests the correct label, but the arc is the opposite direction. However, in the below example, it does provide the correct label and the correct arc for free fluid
and in
that I’ve trained the model to do.
After going through ~500 labels in Prodigy
, more than 50% of the label candidates I had to reject because it’s the correct label, but the opposite direction, and I spot check here and there to make sure that my model should have predicted the correct arc. I tried a quick dep.batch-train
to see if it would increase my accuracy with how I think I should be accepting/rejecting these examples, and my model accuracy was reducing on every iteration for 10 iterations.
My question then is am I supposed to ignore the arc direction in Prodigy? Or is this expected behaviour of Prodigy to be suggesting incorrect arcs intentionally?