Possible bug - relations view - cannot select certain tokens

Hi!

I am experiencing a weird behaviour in the relations view. Namely, I cannot select certain tokens / spans to annotate relations, but just some that seem to be random ones.

Related gif too big to upload

The command I am running is:

prodigy rel.manual sn_ref_rel blank:it dataset:sn_ref --label PARENT,SIBLING,SAME_AS --span-label J-REF,L-REF

With the following software/system specs:

Ubuntu 20.04
Prodigy 1.11.7
Python 3.9.5

However, when I run exactly the same command, on exactly the same dataset, on exactly the same database, but with the following specs:

macOS Monterey 12.1
Prodigy 1.11.7
Python 3.9.2

Everything works as expected.

Related gif too big to upload

Any ideas?

Thanks in advance

Could you send me the one example that shows this behaviour? There could be something going awry with the Tokenizer but I agree it's a bit strange.

Could you also share the spaCy version that's installed with Prodigy?

Sure, here's the same example shown in the gifs: example.jsonl

In both cases I'm using spacy==3.2.2

When I run your example.jsonl through my PopOS hosted Prodigy server then I can confirm that I see the same issue appear. But part of that makes sense because the "tokens" key in your .jsonl file predefined what the tokens 'ought to be.

When I make a new example.jsonl that just has the text provided, just this:

{"text": "CONSIDERATO IN DIRITTO\n3. Il ricorso deve dichiarato inammissibile, per essere manifestamente infondati tutti i motivi dedotti. Difatti rileva la Corte che nel ricorso viene prospettata una valutazione delle prove diversa e pi\u00f9 favorevole al ricorrente rispetto a quella accolta nella sentenza di primo grado e confermata dalla sentenza di appello. In sostanza si ripropongono questioni di mero fatto che implicano una valutazione di merito preclusa in sede di legittimit\u00e0, a fronte di una motivazione esaustiva, immune da vizi logici; specificamente dalla lettura della sentenza della Corte territoriale non emergono, nella valutazione delle prove, evidenti illogicit\u00e0, risultando, invece, l'esistenza di un logico apparato argomentativo sulla base del quale si \u00e8 pervenuti alla conferma della sentenza di primo grado con riferimento alla responsabilit\u00e0 dell'imputata in ordine ai fatti a lei ascritti; in tal senso viene significativamente evidenziato, quanto al primo motivo di ricorso, che l'imputata si era chiaramente opposta all'azione dei pubblici ufficiali che si erano recati presso la sua abitazione per eseguire una perquisizione. Ed anche con riferimento al secondo motivo di ricorso, la sentenza impugnata evidenzia, senza contraddizioni, come la responsabilit\u00e0 dell'imputata in ordine al delitto di furto sia emersa sulla base dell'individuazione fotografica effettuata dalla persona offesa e delle dichiarazioni rese da quest'ultima; e quanto al delitto di ricettazione, la corte territoriale d\u00e0 atto con adeguata motivazione dei criteri utilizzati per attribuire il fatto contestato alla persona dell'attuale imputato. Tutto ci\u00f2 preclude qualsiasi ulteriore esame da parte della Corte di legittimit\u00e0 (Sez. U n. 12 del 31/5/2000, Jakani, Rv. 216260; Sez.. U. n. 47289 del 24.9.2003, Petrella, Rv. 226074).\n4. Alla dichiarazione di inammissibilit\u00e0 del ricorso consegue, per il disposto dell'art. 616 c.p.p., la condanna della ricorrente al pagamento delle spese processuali nonch\u00e9 al versamento, in favore della Cassa delle ammende, di una somma che, considerati i profili di colpa emergenti dal ricorso, si determina equitativamente in Euro 1000,00.\nP.Q.M.\nDichiara inammissibile il ricorso e condanna la ricorrente al pagamento delle spese processuali e della somma di Euro 1000,00 in favore della Cassa delle ammende."}

Then it seems to render "fine" when I run;

python -m prodigy rel.manual sn_ref_rel blank:it example.jsonl --label PARENT,SIBLING,SAME_AS --span-label J-REF,L-REF

This suggests that it's possibly a mishap that happened earlier when the dataset was created. Could you confirm that if you just render an example that only contains the text key that it does split up the tokens as you'd expect?

Unfortunately the problem does not go away even if I remove the "tokens" key in my example.jsonl. Besides, it would be pretty inconvenient to remove, as I use a dataset where I performed ner annotations as a source; I would need to db-out it and purge the .jsonl from the tokens, then use that as an input...

The tokenizer seems to be consistent with 394

While I was exploring this further I noticed something interesting about calling the way I've been callingrel.manual locally.

This line seems to cause the rendering error on my machine.

python -m prodigy rel.manual sm-ref-rel blank:it their-example.jsonl --label PARENT,SIBLING,SAME_AS --span-label J-REF,L-REF

But this one does not.

python -m prodigy rel.manual sm-ref-rel blank:it their-example.jsonl --label FOO --span-label BAR

The difference is in the label names. This led me to dive a bit deeper into your json file.

Before, my assumption was that this issue was related to the tokenizer, but now I see that you have some spans corresponding with the large token in your GIF. Note that these spans carry the J-REF or the L-REF label.

Here are the spans that I found:

"spans": [
    {
      "start": 1718,
      "end": 1764,
      "text": "Sez. U n. 12 del 31/5/2000, Jakani, Rv. 216260",
      "source": "./models/les_model_ref_tmp_3/model-best",
      "input_hash": 1945267493,
      "token_start": 275,
      "token_end": 287,
      "label": "J-REF"
    },
    {
      "start": 1766,
      "end": 1819,
      "token_start": 289,
      "token_end": 301,
      "label": "J-REF"
    },
    {
      "start": 1906,
      "end": 1914,
      "text": "art. 616",
      "source": "./models/les_model_ref_tmp_3/model-best",
      "input_hash": 1945267493,
      "token_start": 319,
      "token_end": 320,
      "label": "L-REF"
    },
    {
      "start": 1915,
      "end": 1921,
      "text": "c.p.p.",
      "source": "./models/les_model_ref_tmp_3/model-best",
      "input_hash": 1945267493,
      "token_start": 321,
      "token_end": 322,
      "label": "L-REF"
    }
  ],

This would explain why you see some of the tokens "clump" together. It does not explain why you see a difference between macOS and Ubuntu. So that's certainly still worth investigating further.

A temporary solution for you might be to run the command without referring to the spans currently in your json file. Via something like;

python -m prodigy rel.manual sm-ref-rel blank:it their-example.jsonl --label PARENT,SIBLING,SAME_AS --span-label FOOBAR

I want to admit that this feels a bit like a "hack". Effectively, we'd be pretending like there's a span with the label FOOBAR such that Prodigy ignores all the other spans in the dictionary.

A better solution for the long term, possibly, would be to filter out the spans beforehand for this labelling task. This would, however, require some manual work via the db-out command. This feels like a better solution in the long term, but I may be missing some context on the bigger picture to know for sure.

Could you confirm if the issue persists when you run the command with the --span-label FOOBAR setting? I'd love to confirm if the spans in the json file are the culprit of the behavior.

Thank you for diving deep into this.

I tried changing the label names, but I still encounter the same problem, but with a difference.

If I run the command with --span-label FOO, some of the selectable tokens are (starting from the beginning): CONSIDERATO, Il, deve, inammissibile, ,

While if I run the command with --span-label J-REF,L-REF, the selectable tokens change: CONSIDERATO, ., ricorso, deve

In any case, the problem is, I cannot change the --span-label, because it is exactly on those labels that I have to annotate the relations.

I've done some digging around and I learned that if there are "spans" present in the data, they'll be rendered as a merged span so you can annotate relationships between spans. If you want relationships between tokens, you'd have to remove the span.

There is however, one setting that might be able to help you out. If you add a prodigy.json file in the root of your folder you can pass extra settings to the recipe. In particular, you might want to add this one:

{
    "relations_span_labels": ["J-REF", "L-REF"]
}

This setting is described in more detail here. Effectively this allows us to label both the relationship and the span from the same interface. That also means that we're able to correct a wrong span when we see one.

Here's what I see render when I do that.

If I were now to scroll down to the span of interest, I can select it and follow up by pressing the garbage bin button.

When you click it, the span splits up which means that you can move on to setting relationships between tokens.

This setting may not fix everything you're interested in, but it might offer a middle ground. This way you'll be able to have spans that catch the attention of an annotator, but you will have to remove the span if you want to set relationships between tokens that reside inside of spans.

Hello and thank you again!

I already have that option enabled: if you look closely in my gifs you can see that I have the possibility to annotate spans as well,

The problem that remains, is that I cannot create relations between those spans for some reasons (and also between some seemingly random tokens, but I'm not interested in that, I just care about the spans).

Ah, I fear that I misinterpreted the GIF then. It's hard to show a click event in a GIF, but this helps the interpretation.

Just to check, what browsers are you using on both machines?

Sorry for not being clearer!

If it might help, in the gifs you might notice that the cursor does not change to show when it's clickable.

I'm using Brave in both machines. And I can't believe it, because it doesn't make sense: I am trying with Safari now and it works.

However, the Ubuntu machine is a remote server, so I access the annotation page from my mac, with the exact same browser that I use to annotate locally. Except the bug is experienced only on the remote one! That is why I assumed it was not a browser problem. I have now tried with Safari, Firefox and Chrome and it seems to work fine.

I'm sorry this kept you busy, I should have tried earlier other browsers... but again, same browser on local host behaved fine; no idea why it appeared only on remote host. I really cannot explain how this is possible. Let me know if you find an explanation :sweat_smile:

No worries about the GIFs! They totally help paint the picture; it's just that I looked at them too hastily. :sweat_smile:

I'm happy to hear that it seems to be a browser-related issue. I'm also running Prodigy on a server that I access via SSH locally. I tried Firefox, Safari and Chrome, and these all seemed to work out of the box.

I'll report back to the Prodigy devs that we may want to check if we can find a cause in the Brave browser, but for now, I'm happy that we figured out what was happening. :smiley:

1 Like