Hello, I am planning to use prodigy to label a legal text dataset with the intention of doing semantic role labeling, my question is can I use prodigy to do all this kind of labeling (constituency parsing, dependency parsing, and semantic role labeling) by default, or do is it possible to create and use my own recipe.
Thanks in advance.
Hi @bayethiernodiop,
Unfortunately support for tree annotation in Prodigy is currently very limited. You can accept or reject labelled relations, but there's currently no way to create new trees. We have some ideas for how to provide this, but it would be a new front-end that has different assumptions than the current one --- as the task is pretty different. In the meantime, you might find it helpful to use Prodigy's sequence annotation to help you prepare bracketed spans, which could be a helpful preprocess.
Thanks a lot hanibal, what about semantic role labelling.
We don't really have an end-to-end solution for that either yet. We have plans that we're excited to try out, but for now our best advice is to think about custom workflows, or perhaps tools from academia.
Thanks for your feedback, i don't want an end to end solution here but just do the labeling par for semantic role labeling and these use some deep learning library on those annotations.
Also do you know any tool that can be used to annotate data for semantic role labeling.
What type of data do you need? If you just want to assign labels to spans of text, you could use the ner.manual
workflow / ner_manual
interface?
Very good idea Ines, however after giving the label i need to add BIO notation to the tokens in the span. Any idea on a good way to handle this other than creating three entities for the same one with suffixe of B,I and O.
thanks
You should be able to do this automatically – no need to do this all by hand!
Prodigy already pre-tokenizes the text and stores all that information with the data. So for each annotated span, you have its position in the text and the ID of its start and end token. So you'll know which token needs to be B, I and O.
You could also use spaCy to do this for you: load in the data, process each text, use doc.char_span
to get a span object for each annotated span in the data and then look at the token.ent_iob_
tag for each token in the span
Thanks a lot