Advanced Relation Labeling Receipe

Hi Folks,

Your tool is pretty nice and we are currently in the process of internal negotiations to buy a license. Before we do that, we are just evaluating possible requirements. One of these requirements is, whether it is also possible to label German texts, especially regarding the labeling of relations.

As far as I could see in the demo, you can link entities via spans directly from the text. Is there a possibility for relations, too?

Take the sentence:
Ute ist in Deutschland geboren.
We like to label the relation ist_geboren_in (without arrow spanning but with clicking on entities since they are spred accoress the sentence).

Furthermore, the question is whether you can also nest entities on-the-fly (sub-eintities). Example:

There are many "<"fruits (like "<"bananas">", "<"kiwis">" and "<"cherries">")">" to buy in the supermarket. The fruits of the entity food and the fruits are fruit.

Thank you very much for your help.

Hi! In general, Prodigy itself doesn't care about the language of the texts you use, so you can definitely label German texts. If you're working with pretrained models, their capabilities will of course vary by language, resources and library you use (spaCy or some other custom implementation).

I'm not 100% sure I understand the exact goal and data format you're trying to extract from the first example. But one approach would be to use the root ("ist") as the anchor, have one relation type that connects "ist" + "geboren", and two relation types that attach "Ute" and "Deutschland" to the root (e.g. something along the lines of "subject" and "object").

The NER interfaces (ner_manual and span highlighting mode in the relations UI) are designed with creating data to train named entity recognition models in mind, so they produce token-based tags, which can't overlap (typically the requirement for NER). However, you can always make multiple passes over the data and start by annotating the outer layer like sentence fragments and full phrases, and then annotate the actual named entities and proper nouns in the second step.