βœ¨πŸ”— Beta testers wanted: new manual dependencies & relations UI (v1.10)

I already mentioned this briefly in the call for image UI beta testers: the upcoming Prodigy v1.10 will feature a completely new interface and a suite of recipes for efficient relation and dependency annotation :tada: We're especially excited about this one, since it's been one of the missing features in Prodigy and it took a while to develop efficient solutions for the various challenges involved.

Features include

  • manually annotate labelled dependencies and relationships between words and phrases
  • join entity and relation annotation: merge spans and assign directional relations to them in one interface
  • built-in recipes for dependency parsing, coreference resolution and fully custom relations
  • define custom workflows and automate as much as possible: use match patterns or a pretrained model to highlight spans or put a model in the loop to suggest relations
  • focus on what matters and ensure data quality and consistency: and use match patterns to disable tokens to exclude
  • keyboard shortcuts and support for touch devices

Examples & screenshots

Beta testing and requirements

If you want to help beta test the new feature and try them on your data, feel free to send me an email at ines@explosion.ai or a DM on Twitter . Requirements are:

  • Current Prodigy user on v1.9 (just include your order ID starting with #EX... when you get in touch)
  • Should have an immediate project you can test it on and time to test it this week.

If you have any questions, I'm also happy to answer them in this thread! If you have a biomedical use case, @SofieVL is also happy to help you get set up with the new relations workflows :raised_hands:

5 Likes

Excited to give it a look!

Reading up before a test run, and just a quick note that the black box in the doc for coref.manual is the same as for rel.manual.

1 Like

Ah, thanks, looks like a copy-paste mistake. Will fix :+1:

This is so cool and the documentation is great as well.

I do think one thing is missing in the documentation though; training the models. It's probably just adding a link and a small note on each of the recipes. E.g. if you label both entities and their relations in one go, then you produce training data for both train ner and train parser, right? I assume that the relations are actually independent of the named entities as well, right? It is still just its own DependencyParser ?

I hope it makes sense but a few words in the documentation to clarify about those things would be great I think. It would also be great with some ballpark numbers on how much data is needed for the custom relations to start performing. I do realise that this varies a lot but there might be some rule of thump for simple tasks etc!

I second @nix411 on the point about "what next" for training with the annotations you produce.

I'm particularly interested in coreference resolution, and had been sniffing around the Hugging Face implementation on top of standard SpaCy en models. Training seems intensive and requires more data than we might feasibly produce through rapid annotation loops.

I'm probably missing something as I'm more a consumer than a deep learning/NLP expert, but seems like we'll have to roll our own training loops as opposed to leaning on buit-in Prodigy recipes. Completely fine, but maybe something for the documentation?

I'm going to try out the interface but also see if I can extract some predictions from the Hugging Face implementation and then "correct" them in Prodigy-- maybe that's the loop that's most useful.

Thanks and good point! The train recipe has actually been updated to support dependency parsing annotations in the "relations" format, e.g. created with dep.correct. So you can easily export dependency annotations in spaCy's format and/or run a parser training experiment. (Haven't tested the dependency parsing training extensively yet, so that could be a useful one to beta test as well.)

We're actually working on integrating the neuralcoref module into the core library, so you'll be able to train a coref model within spaCy :slightly_smiling_face:

I do think creating your own coref corpus is possible, but it's definitely more work. One strategy could be to create additional annotations on your specific domain data that you can then mix in with existing coref annotations to get better accuracy on your data.

Prodigy's training recipes only cover spaCy, which is shipped with Prodigy by default. So you can use the relation annotations to train a dependency parser. If you want to train some other component or model that predicts relationships (or similar), you'd have to set up the training for that yourself.

I like the "what's next" section idea for the docs :+1:

2 Likes

I love that you've implemented this new feature

Support dataset:{name} syntax as source argument in recipes to allow loading from existing datasets. For example, dataset:my_set will use examples dataset my_set as the input data.

It would be pretty cool to extend this feature to allow dataset:{name}:{answer}. A pretty common workflow is textcat documents for spam/filter. And then go through all the accepted ones for a downstream task. You could argue that you are better off by letting your model decide whether a documented is accepted through you spam filter first since that's the actual data you'll get when "live". Anyways; just a thought.

OMG! It looks great! can we also find other kind of relation? between word or even between sentences?

Uhhh between sentences is a big thing for me

That's a very interesting idea and I definitely see the point. Sometimes you may want to load all examples, sometimes just the accepted ones and sometimes maybe only the ignored ones. (I'm not normally a fan of overly complex syntax that replaces relatively straightforward code, but in this case, I do think it's justified. I'll try it out :slightly_smiling_face:)

You can annotate relations between any tokens or spans, and you can load in data with existing "spans". So you could make every sentence a span and then only annotate relations between sentences:

eg["spans"] = [{"start": sent.start_char, "end": sent.end_char} for sent in doc.sents]

It's possible that this makes it harder to distinguish the sentences visually, so an alternative approach would be to use the first token of each sentence as the "anchor" and disable all other tokens (by adding "disabled": true to the entry in the "tokens").

Ah yes that’s a smart way to do cross sentence relations. I wonder how a model will be able to point to the last sub-header for instance. I assume there is only one way to find out :mechanical_arm:

For the dataset:: syntax you could just include all by default if no additional scope is set, e.g. dataset:my-dataset

Hi Ines,

I tried the relations recipe and it looks very promising. And now I would like to test my annotations by training training a model, but I'm not sure how to proceed.

I assume that the relations recipe is attached to the dependency parser pipe and that any training has to be done with prodigy dep.x, correct?

My other assumption is that the annotations produced by the relations recipe will be accessible as a custom label in the same pipeline, this means, they will show up as something as SUBJECT and LOCATION in the depency tree. How lost I am here?

J.

The data format created specifies a key "relations" containing the relations. For dependency parsing annotations, the train and data-to-spacy have been updated to process this format, so you can use them to train a spaCy dependency parser.

If you want to train a model for custom relationships, you'd have to use your own model implementation. You can create the data with Prodigy, but you have to decide what it "means" and how you want to use it to train a model, or what else you want to do with it. spaCy currently doesn't have a built-in component to predict arbitrary labels for relationships between tokens and/or spans.

Hello!

A note that the documentation for coref seems to be a bit wonky in FF.

It works as intended in Edge (Chromium) and excellently on mobile devices.

Ah, that's strange! What exactly do you mean by "wonky"? Do you have an example? I use FF as my main development browser and I haven't noticed anything yet.

So sorry, Ines-- poor testing documentation on my part. Here we go:

It appears that the interactive window is not registering click events. Not sure if I just need to wait a bit for the js to load.

I cleared the FF cache and checked the console on refresh-- no errors are being thrown. The demo for relation annotation is also not registering click events.

Strangely, for both, the checkboxes and shortcut box above in the title bar register the click events.

If it helps, I'm using this version of the docs.

EDIT: tracked it down. I'm using Firefox Developer, which is on the Aurora channel. Safe to assume that a niche of a niche is not a showstopping bug! I'm trying the interface in FF Dev now and I'll let you know if it's also not working. The interface works in vanilla FF.

DOUBLE EDIT: Confirmed that the doc interface is working in other browsers, and Prodigy app interface for coref is working in other browsers but not in FF Dev as well.

1 Like

Thanks for the detective work here! And wow, I can also reproduce this locally now, across the docs, Prodigy app etc. Works fine in all other browsers. This must have been caused by a very recent change to the FF developer edition, possibly within the last couple of days.

I'll see if I can find the release notes and what could have changed here. Anything related to <canvas> is potentially relevant.

1 Like

I can see a number of use cases for custom relations. It would be great with an example on one way to do it. I know it is kind of out of prodigy and spaCy scope but on the other hand it isn’t. Not sure how you feel about it.

I am very new both to spaCy and Prodigy, so please correct me if I'm wrong. I think there are existing examples in the spaCy documentation that point to the next steps after annotating relations with Prodigy.

I think this example connects well with the relations recipe:

Training a custom parser for chat intent semantics

As I understand, a workflow starting with the annotation of relations would be something like this:

  1. annotate relations
  2. create a pipe with a custom dependency parser
  3. train the model with the annotations
  4. find relations using the dependency-matcher.

J.

1 Like

Thanks for a really great addition to Prodigy! This is exactly what my colleagues and I were looking for. We're creating an evaluation dataset for event extraction/semantic role labeling, so our use case might be a bit different from the usual active learning approach.

A few comments and ideas on the manual relation interface:

  • Is there a way to highlight entities that cross lines when wrap is enabled? The highlighting box doesn't always work for this (see screenshot). Could we use the usual command-click option for highlighting multiple words rather than just the click and drag?
  • What's the best way to set up the task when the relations and "entities" have the same labels? In the example below, I'd like to indicate that "special police force" and "rapid action force" are both patients of the predicate "setting". One approach is to duplicate the labels across label and span-label, but it seems like another approach is to just have a span-label that's something generic like MERGE or SPAN.
  • If only the relations have different labels, it would be really nice to add the relation and span in one step. For instance, you could select the label from the top bar, click the anchor word, and then drag to highlight all the words that belong in the span.

Screen Shot 2020-05-19 at 12.51.00 PM