It does not seem that prodigy support annotating n-ary relations eg in the simple case of SVOs (Subject-Verb-Object as in for instance Jurafsky ed3 13.1.1 Word Order Typology)
Hi Yakov,
just to double-check, you're referring to this?
It's not supported natively, no. I am wondering if you might be able to hack something together using a custom recipe though.
Demo
I have this examples.jsonl
file locally, based on the image from the book.
{"text": "The green witch is at home this week.\nDiese Worch ist die grune Hexe ze Hause."}
Notice how I put a newline in there, that can be useful during rendering. Next I use this dataset via the rel.manual recipe, which uses the relations interface.
python -m prodigy rel.manual subject-verb-object en_core_web_md examples.jsonl --label relation --span-label segment --wrap
This allows me to select "segments". Basically like text-spans.
Right now I'm using a general "segments" label, but you could try and be more specific. Next, I can make an association between two "languages".
This approach is a bit of a "hack" for a few reasons.
- You're using a single interface for two language by using a newline character. So when you call
db-out
you will need to do some post-processing, but depending on your input data you might also want to be careful with newlines in your original text. - This interface works, but will likely get annoying for longer sentences.
- You're still using a single tokeniser here, even if you're using two languages. This might be fine depending on your language, but it might also cause a whole bunch of subtle issues later down the line.
Does this help?
1 Like