Does Prodigy support n-ary relations?

It does not seem that prodigy support annotating n-ary relations eg in the simple case of SVOs (Subject-Verb-Object as in for instance Jurafsky ed3 13.1.1 Word Order Typology)

Hi Yakov,

just to double-check, you're referring to this?

CleanShot 2023-06-15 at 15.36.37

It's not supported natively, no. I am wondering if you might be able to hack something together using a custom recipe though.

Demo

I have this examples.jsonl file locally, based on the image from the book.

{"text": "The green witch is at home this week.\nDiese Worch ist die grune Hexe ze Hause."}

Notice how I put a newline in there, that can be useful during rendering. Next I use this dataset via the rel.manual recipe, which uses the relations interface.

python -m prodigy rel.manual subject-verb-object en_core_web_md examples.jsonl --label relation --span-label segment --wrap

This allows me to select "segments". Basically like text-spans.

CleanShot 2023-06-15 at 15.48.01

Right now I'm using a general "segments" label, but you could try and be more specific. Next, I can make an association between two "languages".

CleanShot 2023-06-15 at 15.49.27

This approach is a bit of a "hack" for a few reasons.

  1. You're using a single interface for two language by using a newline character. So when you call db-out you will need to do some post-processing, but depending on your input data you might also want to be careful with newlines in your original text.
  2. This interface works, but will likely get annoying for longer sentences.
  3. You're still using a single tokeniser here, even if you're using two languages. This might be fine depending on your language, but it might also cause a whole bunch of subtle issues later down the line.

Does this help?

1 Like