✨🔗 Beta testers wanted: new manual dependencies & relations UI (v1.10)

ines · May 11, 2020, 10:06am

I already mentioned this briefly in the call for image UI beta testers: the upcoming Prodigy v1.10 will feature a completely new interface and a suite of recipes for efficient relation and dependency annotation We're especially excited about this one, since it's been one of the missing features in Prodigy and it took a while to develop efficient solutions for the various challenges involved.

Features include

manually annotate labelled dependencies and relationships between words and phrases
join entity and relation annotation: merge spans and assign directional relations to them in one interface
built-in recipes for dependency parsing, coreference resolution and fully custom relations
define custom workflows and automate as much as possible: use match patterns or a pretrained model to highlight spans or put a model in the loop to suggest relations
focus on what matters and ensure data quality and consistency: and use match patterns to disable tokens to exclude
keyboard shortcuts and support for touch devices

Examples & screenshots

Beta testing and requirements

If you want to help beta test the new feature and try them on your data, feel free to send me an email at ines@explosion.ai or a DM on Twitter . Requirements are:

Current Prodigy user on v1.9 (just include your order ID starting with #EX... when you get in touch)
Should have an immediate project you can test it on and time to test it this week.

If you have any questions, I'm also happy to answer them in this thread! If you have a biomedical use case, @SofieVL is also happy to help you get set up with the new relations workflows

adamkgoldfarb · May 13, 2020, 4:10pm

Excited to give it a look!

Reading up before a test run, and just a quick note that the black box in the doc for coref.manual is the same as for rel.manual.

ines · May 13, 2020, 5:47pm

Ah, thanks, looks like a copy-paste mistake. Will fix

nix411 · May 14, 2020, 9:47am

This is so cool and the documentation is great as well.

I do think one thing is missing in the documentation though; training the models. It's probably just adding a link and a small note on each of the recipes. E.g. if you label both entities and their relations in one go, then you produce training data for both train ner and train parser, right? I assume that the relations are actually independent of the named entities as well, right? It is still just its own DependencyParser ?

I hope it makes sense but a few words in the documentation to clarify about those things would be great I think. It would also be great with some ballpark numbers on how much data is needed for the custom relations to start performing. I do realise that this varies a lot but there might be some rule of thump for simple tasks etc!

adamkgoldfarb · May 14, 2020, 3:14pm

I second @nix411 on the point about "what next" for training with the annotations you produce.

I'm particularly interested in coreference resolution, and had been sniffing around the Hugging Face implementation on top of standard SpaCy en models. Training seems intensive and requires more data than we might feasibly produce through rapid annotation loops.

I'm probably missing something as I'm more a consumer than a deep learning/NLP expert, but seems like we'll have to roll our own training loops as opposed to leaning on buit-in Prodigy recipes. Completely fine, but maybe something for the documentation?

I'm going to try out the interface but also see if I can extract some predictions from the Hugging Face implementation and then "correct" them in Prodigy-- maybe that's the loop that's most useful.

ines · May 14, 2020, 6:09pm

Thanks and good point! The train recipe has actually been updated to support dependency parsing annotations in the "relations" format, e.g. created with dep.correct. So you can easily export dependency annotations in spaCy's format and/or run a parser training experiment. (Haven't tested the dependency parsing training extensively yet, so that could be a useful one to beta test as well.)

We're actually working on integrating the neuralcoref module into the core library, so you'll be able to train a coref model within spaCy

I do think creating your own coref corpus is possible, but it's definitely more work. One strategy could be to create additional annotations on your specific domain data that you can then mix in with existing coref annotations to get better accuracy on your data.

Prodigy's training recipes only cover spaCy, which is shipped with Prodigy by default. So you can use the relation annotations to train a dependency parser. If you want to train some other component or model that predicts relationships (or similar), you'd have to set up the training for that yourself.

I like the "what's next" section idea for the docs

nix411 · May 15, 2020, 9:43am

I love that you've implemented this new feature

Support dataset:{name} syntax as source argument in recipes to allow loading from existing datasets. For example, dataset:my_set will use examples dataset my_set as the input data.

It would be pretty cool to extend this feature to allow dataset:{name}:{answer}. A pretty common workflow is textcat documents for spam/filter. And then go through all the accepted ones for a downstream task. You could argue that you are better off by letting your model decide whether a documented is accepted through you spam filter first since that's the actual data you'll get when "live". Anyways; just a thought.

robertto · May 15, 2020, 9:51am

OMG! It looks great! can we also find other kind of relation? between word or even between sentences?

nix411 · May 15, 2020, 11:49am

Uhhh between sentences is a big thing for me

ines · May 15, 2020, 1:36pm

That's a very interesting idea and I definitely see the point. Sometimes you may want to load all examples, sometimes just the accepted ones and sometimes maybe only the ignored ones. (I'm not normally a fan of overly complex syntax that replaces relatively straightforward code, but in this case, I do think it's justified. I'll try it out )

You can annotate relations between any tokens or spans, and you can load in data with existing "spans". So you could make every sentence a span and then only annotate relations between sentences:

eg["spans"] = [{"start": sent.start_char, "end": sent.end_char} for sent in doc.sents]

It's possible that this makes it harder to distinguish the sentences visually, so an alternative approach would be to use the first token of each sentence as the "anchor" and disable all other tokens (by adding "disabled": true to the entry in the "tokens").

nix411 · May 15, 2020, 5:02pm

Ah yes that’s a smart way to do cross sentence relations. I wonder how a model will be able to point to the last sub-header for instance. I assume there is only one way to find out

For the dataset:: syntax you could just include all by default if no additional scope is set, e.g. dataset:my-dataset

jcbmyrstn · May 16, 2020, 2:51pm

Hi Ines,

I tried the relations recipe and it looks very promising. And now I would like to test my annotations by training training a model, but I'm not sure how to proceed.

I assume that the relations recipe is attached to the dependency parser pipe and that any training has to be done with prodigy dep.x, correct?

My other assumption is that the annotations produced by the relations recipe will be accessible as a custom label in the same pipeline, this means, they will show up as something as SUBJECT and LOCATION in the depency tree. How lost I am here?

J.

ines · May 17, 2020, 10:59am

The data format created specifies a key "relations" containing the relations. For dependency parsing annotations, the train and data-to-spacy have been updated to process this format, so you can use them to train a spaCy dependency parser.

If you want to train a model for custom relationships, you'd have to use your own model implementation. You can create the data with Prodigy, but you have to decide what it "means" and how you want to use it to train a model, or what else you want to do with it. spaCy currently doesn't have a built-in component to predict arbitrary labels for relationships between tokens and/or spans.

adamkgoldfarb · May 17, 2020, 2:31pm

Hello!

A note that the documentation for coref seems to be a bit wonky in FF.

It works as intended in Edge (Chromium) and excellently on mobile devices.

ines · May 17, 2020, 2:39pm

Ah, that's strange! What exactly do you mean by "wonky"? Do you have an example? I use FF as my main development browser and I haven't noticed anything yet.

adamkgoldfarb · May 17, 2020, 3:09pm

So sorry, Ines-- poor testing documentation on my part. Here we go:

It appears that the interactive window is not registering click events. Not sure if I just need to wait a bit for the js to load.

I cleared the FF cache and checked the console on refresh-- no errors are being thrown. The demo for relation annotation is also not registering click events.

Strangely, for both, the checkboxes and shortcut box above in the title bar register the click events.

If it helps, I'm using this version of the docs.

EDIT: tracked it down. I'm using Firefox Developer, which is on the Aurora channel. Safe to assume that a niche of a niche is not a showstopping bug! I'm trying the interface in FF Dev now and I'll let you know if it's also not working. The interface works in vanilla FF.

DOUBLE EDIT: Confirmed that the doc interface is working in other browsers, and Prodigy app interface for coref is working in other browsers but not in FF Dev as well.

ines · May 17, 2020, 3:20pm

Thanks for the detective work here! And wow, I can also reproduce this locally now, across the docs, Prodigy app etc. Works fine in all other browsers. This must have been caused by a very recent change to the FF developer edition, possibly within the last couple of days.

I'll see if I can find the release notes and what could have changed here. Anything related to <canvas> is potentially relevant.

nix411 · May 18, 2020, 6:34am

I can see a number of use cases for custom relations. It would be great with an example on one way to do it. I know it is kind of out of prodigy and spaCy scope but on the other hand it isn’t. Not sure how you feel about it.

jcbmyrstn · May 19, 2020, 3:44am

I am very new both to spaCy and Prodigy, so please correct me if I'm wrong. I think there are existing examples in the spaCy documentation that point to the next steps after annotating relations with Prodigy.

I think this example connects well with the relations recipe:

Training a custom parser for chat intent semantics

As I understand, a workflow starting with the annotation of relations would be something like this:

annotate relations
create a pipe with a custom dependency parser
train the model with the annotations
find relations using the dependency-matcher.

J.

andy · May 19, 2020, 5:28pm

Thanks for a really great addition to Prodigy! This is exactly what my colleagues and I were looking for. We're creating an evaluation dataset for event extraction/semantic role labeling, so our use case might be a bit different from the usual active learning approach.

A few comments and ideas on the manual relation interface:

Is there a way to highlight entities that cross lines when wrap is enabled? The highlighting box doesn't always work for this (see screenshot). Could we use the usual command-click option for highlighting multiple words rather than just the click and drag?
What's the best way to set up the task when the relations and "entities" have the same labels? In the example below, I'd like to indicate that "special police force" and "rapid action force" are both patients of the predicate "setting". One approach is to duplicate the labels across label and span-label, but it seems like another approach is to just have a span-label that's something generic like MERGE or SPAN.
If only the relations have different labels, it would be really nice to add the relation and span in one step. For instance, you could select the label from the top bar, click the anchor word, and then drag to highlight all the words that belong in the span.

Screen Shot 2020-05-19 at 12.51.00 PM

Topic		Replies	Views
rel.manual to train ner and dependency ner , done , solved , dep , relations	15	2049	September 7, 2020
prodigy use case for annotation having pre-annotated text usage , solved	8	1263	March 11, 2019
✨ Prodigy nightly: spaCy v3 support, UI for overlapping spans & more meta , done , spacy , news , nightly	113	12701	January 20, 2022
annotating entities in text documents usage , ner , solved	15	9931	November 28, 2017
Training a relation extraction component solved , relations , training	84	5709	June 27, 2023

✨🔗 Beta testers wanted: new manual dependencies & relations UI (v1.10)

Features include

Examples & screenshots

Beta testing and requirements

Related topics