Customizing prodigy for 1 relationship with multiple instances

I read through this topic, which is a great approach.

Is there some way you can think of that I could customize it for multiple instances within one document?

For example, if I have a wiki page with an undetermined number of date references, I might want to label “month” and “year” with a relationship between only the month and year that are related; the next instance of month and year referenced in the document won’t be related to the first, only to each other.

I couldn’t come up with a structure to do that, but maybe I’m missing something?

Hi! This is an interesting problem. How long are the dependencies that you need to be tracking? Is it mostly within the same sentence or paragraph, or do you need to capture relationships across the whole document? And do you expect much ambiguity?

I wonder if you could do something like this: first, label all months and years, so you have them unambiguously. You can probably set this up as a semi-automated manual task with a set of rules that pre-highlights the spans, so you only have to do minimal manual labelling.

Next, you could set up a choice task that steps through all year spans, gets the closest month spans as options and lets you select the one(s) that the year is attached to. If you have the tokens and maybe the sentence boundaries, you can use that to display just enough of a context window around the years and months referenced in the question. For example, one task could look like this:

John was born in July 1980, Peter in December. Ada was born in March 1995.

July
December
March

You could experiment with different highlighting styles for the question – maybe different colours for YEAR and MONTH, and then only include a span for the year that the question is about.

Hmm. Gotcha. I could maybe do that; it also occurred to me that I could do a rough approximation with “month_left” and “month_right” that identify direction of relationship… not ideal.

Yeah, I have tons and tons of use cases for this in basic NER and frame semantics.
Quantities, for example, are comprised of Operand, Value, Unit, Material (at least in some models) - “at least five cups of flour”. You could imagine references to quantities being very frequent within a single document.

We would love to have the ability to label complex relationships outside of sentences for sure - job descriptions seem to have rather extended relations that would be helpful to capture. Technically, any range ("$14-16/hr job") would require ambiguity handling.

I think the best example of distance is for dates though, where someone in a text message might say “Hi, I’d like to stay at your vacation rental for a family reunion. My family will be coming out on May 16th, but I won’t arrive until May 23rd. We’ll be leaving June 1.” Capturing date ranges (start and end date) would here require both multi-label and ambiguous relation annotation.

I know, pipe dreams :laughing:

Time and date ranges are super hard. You can have ranges that are open or closed at either end, at some point in the past, that reoccur…And then, even the endpoints of the range are ranges! “May 23rd” isn’t a point in time, it has a start and endpoint itself.

I would definitely recommend doing something ad hoc and application specific for these numeric entities, just to support the specific semantic requirements of what you’re doing. Every really general purpose approach I’ve seen has failed, because the fully general case is so much harder than what any specific application needs to do.

1 Like