Allowing implicit words in relation extraction

KennethEnevoldsen · June 29, 2023, 12:52am

We have a project where we are working on annotating relations for open-relation extraction.

Imagine a sentence like this:

"This forum the official place to discuss Prodigy"

You would like to annotate it:

Triplet: {"subject": This forum, "verb": is, "object": the official place to discuss Prodigy}

(Note that is is missing in the original text.)

My idea was to simply hi-jack the existing relation extraction module by simply adding a textfield to extend the text:

So the visual would show something like:

Where the ||| is just a dummy token to split the original text from tokens.

I would imagine that I could use a text-input field (similar to here) to add these extra tokens to the text, but can't seem to find a way to have two blocks interact. Is that even possible?

As I can see other open information extraction datasets have solved this problem using a fixed set (e.g. CaRB)

Screenshot 2023-06-28 at 17.50.56

Which would naturally be possible with some parsing, but I would like to keep it completely open.

Edit: An alternative option is to use the text-input field and just add the word "is" and then in a second round of annotations add the work "is" to the text using the above approach.

ryanwesslen · June 30, 2023, 7:53pm

hi @KennethEnevoldsen!

Thanks for your post and welcome to Prodigy community

I'm a big fan of augmenty and DaCy (although I don't speak Danish)

That's an interesting question. Right now, there's not an easy way as the components keep their own state. But we've been thinking about developing card callbacks, kind of a "re-render" button or functionality. @koaning started to think about this and we'll will restart this work after we release v1.12 (likely next week) and begin planning for v2.0. So thanks for the feedback!

Just thinking out loud - I wonder if this could introduce some tricky edge cases. For example, problems with mismatched tokenization/alignment, especially when making callbacks to the back end. I'll see if other teammates have thoughts or we welcome any additional thoughts you may have.

I'll see next week if other teammates have thoughts.

KennethEnevoldsen · June 30, 2023, 9:43pm

I'm a big fan of augmenty and DaCy (although I don't speak Danish)

Thanks @ryanwesslen, glad you enjoy them!

Right now, there's not an easy way as the components keep their own state. But we've been thinking about developing card callbacks, kind of a "re-render" button or functionality.

Ah that would be exactly what I was looking for. For now we will use a two stage approach were we add missing tokens in the first stage and then update the text for the second phase. However will be looking forward to trying out the feature once it is out.

I wonder if this could introduce some tricky edge cases. For example, problems with mismatched tokenization/alignment, especially when making callbacks to the back end.

I currently haven't found major issuess with the approach, however I might be missing some notable cases. I do see a potential issues with the tokenization if you retokenize the text, but I think I will just manually add the next tokens in (as to not influence the original tokenization).

I'll see next week if other teammates have thoughts.

Sounds lovely, would be happy to hear additional thoughts

Topic		Replies	Views
Rendering text in rels.manual as text usage , ner , front-end , relations	5	685	May 5, 2021
Loading non-Prodigy pre-annotated text relations	1	87	May 28, 2024
Inquiry on Using Relation Extraction Model for Annotation in Prodigy relations	6	195	June 10, 2024
Processing annotated data usage , ner	1	312	January 20, 2022
Using relations interface for large texts usage , ner , legal , relations	4	1094	October 5, 2020

Allowing implicit words in relation extraction

Related topics