Ability to apply multiple labels to the same section of text?


Hi, apologies if this is covered off elsewhere, but I’m looking for an NLP labelling interface that allows a labeller to apply “labels within labels” to the same section of text, i.e. the possibility of a many to one relationship between labels and a piece of text.

e.g. “…The owner of the stables is not allowed to graze cattle outside…”

I would want to have all of this text labelled as the “restriction”, but within this label I would also want to label “the owner” as the individual being restricted, and “graze cattle outside” as the thing being restricted.

Is this possible with Prodigy?

(Ines Montani) #2

Hi! At the moment, this isn’t directly possible – at least not within the same session. The interface currently assumes a more classic named entity logic, where highlighted spans by definition cannot overlap.

Since you’re trying to label entire sentences (or at least more or less independent expressions) and then phrases within thesm, once approach could be to do this in two steps: first, focus on the RESTRICTION and highlight the sentences that label applies to. In the next session, automatically extract all highlighted sentences and ask about the fine-grained spans like the individual/subject/however you’re defining that.

One big challenge for use cases like this is often the consistency, so the more you can automate and restrict, the better. If you’re labelling longer phrases and are asking several people to annotate the same thing, you can easily get several different interpretations. For instance, you could highlight “owner”, “the owner”, “the owner of the stables” etc. Even if it’s just yourself doing the labelling, it can still be quite difficult to follow the same consistent scheme.

Depending on your other labels, you could also experiment with an approach where you use linguistic features like the dependency parse to infer the labels and relevant spans. A lot of the work you’re doing labelling “the owner” as the individual being restricted is already done by the dependency parser when it predicts the syntax. And the “restriction” aspect is often already covered by the verb. For instance, “is not allowed” is a pretty strong trigger phrase for a restriction. Once you have that, the dependency parse can easily tell you who isn’t allowed (“owner”, which has an attached determiner “the”), and so on.