I have a question about a recipe, but I wanted to introduce myself first.
I am exploring right now both the library and prodigy for use in a university research project about social networks in the ancient world.
I’m working on a corpus of biographies of ancient philosophers and are trying to extract from that corpus who was teacher of whom and the migration routes of those individuals: who moved from where to where to study with whom. (For now, I am using English translations but am planning to conduct the research in ancient Greek if we can build with my students an ancient Greek model for spacy)
We were using the matcher with a language model I retrained with prodigy, but we couldn’t capture much due to the complexity of the literary texts in the corpus. We are looking now at the train_intent_parser.py example in spacy, which is used to predict trees over whole documents. We are considering a similar approach to extract relations between people and places in our corpus.
What would be the existing prodigy recipe to train such a parser? Or which one could be adapted for our purposes?
How you set up an annotation workflow kinda depends on the specifics of your dependency scheme. If you can narrow down the possible candidates that are part of a relationship you could stream in pairs and then use the choice interface to select the relationships between them. For instance, if you have an entity recognizer or rule-based process that extracts names and two people are mentioned together, you can show both mentions, and select a label for the relation.
A similar(ish) use case came up in the following thread the other day and it inspired me to dig a bit deeper and try out some ideas we've had for a manual interface for labelling complex relationships, and even relationships and spans at the same time. You can see some early screenshots here:
It turned out to be a really powerful interface and we'll be introducing it in Prodigy v1.10, along with some recipes for fully manual relation annotation, model-assisted dependency parsing annotation and coreference resolution. So this is probably also going to be a good fit for what you're trying to do.