i was wondering if you could help me with some questions regarding the ability of annotation nested labels in prodigy. I want to annotate with the rel.manual ( i will be annotating relationships and labels). Regarding the labels what i would like to do is the following (using a portuguese law sentence):
< DEF > < SCOPE > Para os efeitos da supervisão complementar de empresas de seguros e de resseguros que fazem parte de um grupo segurador < /SCOPE > considera-se: a) < DEF0 >Empresa de seguros < /DEF0 > < DEF1 >a empresa prevista na < LREF > alínea b) do n.o 1 do artigo 2.o < LREF > < /DEF1 > < /DEF >
Where < LABEL > corresponds from where the LABEL would start, and < /LABEL > corresponds to where it ends.
I tried looking in the support forum but i couldn't find anything like this. Does the tool prodigy support this? Or is it impossible to do with the tool?
Hi! Prodigy's interfaces for NER (and the span annotation features within the relations UI) are set up to work best with the way named entity recognition models generally work: contiguous, non-overlapping spans, with clear beginnings and endings. If the relation annotation has to represent any number of nested spans with each span and token being potentially connected to each other, this easily gets very messy and inefficient, and there's usually a better way to structure the task that's both easier to annotate and easier to visualise.
In your case, it looks like you're mostly looking to highlight and connect text fragments and sections, right? For cases like this, we usually recommend making multiple passes over the data and focus on the different objectives: for example, extract the <DEF> blocks first, then collect more fine-grained annotations. When using the relations interface, it also makes sense to focus only on the spans that you actually want to connect.
How you structure the task also depends on what your model needs to predict later on. For example, it makes sense to predict boundary-sensitive spans that are more like named entities, e.g. person names, as token-based tags, whereas other labels maybe predicted as labels applying to a whole sentence or paragraph. This also influences how you structure the annotation.
Btw, not sure if this is relevant for your use case, but here's a thread on annotating relations between longer fragments in legal texts and possible and efficient annotation approaches: Using relations interface for large texts
My annotation is for a single sentence only, so i don't have the problem of annotation for large text.
The multiple passes is something i think could work. I just have another question. I am not the one doing the annotations, i just launch the prodigy web servers for 3 different annotators. So if there is a need to do multiple passes, would i have to wait for each to do the first annotation (for example extracting the < def >), then save the first level annotations, then for each annotator launch another server so they could do the next level annotation and so on?
Yes, that's typically what you would do. You don't necessarily have to wait for all examples to be annotated in the first pass, though – you could also do it in batches (e.g. once batch 1 is annotated with <def>, start the server to add the next annotations). In theory, you could also have a custom recipe that fetches the latest annotations from pass 1 from the database automatically.
Doing it in batches can be helpful, because it lets you check and validate the annotations early – for example, if one of the annotators is confused about the annotation scheme and annotates <def> inconsistently compared to everyone else, you can catch this early and fix it, before they spend more time adding fine-grained annotations that will likely also be wrong etc.