Dependency Parsing - ROOT problem

Hey Prodigy Team,

I am working on a project that requires annotators to correct dependency parsing for Danish texts from most genres of the Danish Gigawords Corpus and most texts are multiple sentences. When I run prodigy with the dep.correct recipe then it does 1 of two things:

  1. If unsegmented then it loads all sentences in the text, but I cannot continue if there are multiple roots (this is understandable).
  2. If I run without the unsegmented option then it only loads the first sentence of each text and skips the rest in the text.

My solution at the moment is that I would have to split all texts into individual sentences for each line in the jsonl input file, since it only seems to take in 1 sentence.
Is there a solution to this where I wouldn't have to do this split?

Regards,
Stephan

Ah, maybe we should disable the ROOT validation in these cases or at least provide an option to disable it manually so you can annotate examples with multiple roots. You could easily make this adjustment yourself in the meantime: you can run prodigy stats to find the location of your Prodigy installation and then edit the recipe in recipes/dep.py and comment out this line:

"update": make_update if update else None,

The more elegant solution could be to add a key like "n_sents" to all outgoing examples that includes the number of sentences in the given example. In validate_answer, we could then ensure that the text doesn't include more ROOTs than sentences. This would still allow you to have one sentence with two roots and one with none, but it's still better than no check. Or we can implement a more extensive check that actually compares the ROOT indices to the sentence boundaries – but I'm not sure if that's worth the potentially added runtime expense.

That's definitely strange! I don't immediately understand why this would be happening :thinking: We'll look into this. In the meantime, you could also add some print statements in the recipe to see what's going on.

Hi Ines,

Thank you for the reply. I tried to comment out

"update": make_update if update else None,

however that didn't work. I did notice the next line seemed more relevant and I thought maybe you copied the wrong line. If I comment out the following line it solves the problem, so is that the correct line?

"validate_answer": validate_answer if "ROOT" in labels else None,

Regards,
Stephan

Sorry, looks like I must have copy-pasted this wrong! I definitely meant the validate_answer line!

No problem, I just copied the correct line :slight_smile:
I found another issue, though. If I do want to retag the ROOT then it doesn't seem able to do that because ROOT connects to itself, but prodigy won't allow me to tag something to itself, so I cannot correct wrong ROOT tags (even on the Prodigy website examples). Is there a way around this?

You should be able to double-click (or double-tap) on a token to attach it to itself!

Hi Ines,

I got it working thanks, I think because I'm using a laptop, the double click is not fast enough and last night when I was trying it, it wasn't tagging to ROOT.
Anyway, thanks for all the help and you've made a great product - keep up the good work :slight_smile:

Regards,
Stephan

1 Like