It's always difficult to give general advice on specific use-cases, but I'll try to brainstorm a little out loud.
As far as my German takes me, I wonder a little about your annotation scheme, looking at the screenshot. I understand "ARG0" points to ingredients, usually connected to a verb. "ARG" points to additional information on how to cary out the instruction, like "fine" (for "chopping") but also seems to be used to label adjectives like "big" (for "carrots"). And then "ARG1" seems to point to "tools" of some sort - a bowl etc.
I can understand trying to link ingredients to verbs, and having modifying information with those verbs (like "in olive oil"). But the annotation goes much further and highlights prepositions as single "entities" (e.g. "in") or words like "dann". The granularity at which you've annotated these, almost starts looking like a dependency parse with part-of-speech tags annotated such as prepositions and adjectives.
In fact, I'm starting to wonder whether it wouldn't be more beneficial to you to train a tagger & parser on this type of data, and then use that information to deduce the relations you're looking for. For instance, if you can identify a clear cooking verb like "chopping", the nouns that are the objects connected to that verb are probably your ingredients. You could also try running a pretrained parser, but I would guess that you'd need at least some kind of fine-tuning on this specific data, as cooking instructions are generally a little different then sentences from news articles orso.
If you do still want to go the "REL route", I think you'd definitely benefit from incorporating part-of-speech tags / dependency parsing information into your classification model. I realise that's not entirely straightforward to implement, and we don't have a current example. But in most relation extraction challenges, you want the classifier to pick up on the grammar in the sentence, and just using word embeddings by themselves might not be sufficient.
FYI - we're currently preparing a tutorial video on the REL example from the nightly docs, that will explain the datastructures & flow in the Thinc model better. We hope this will help people dive into the specifics of the models and tune them according to their use-case.
But for this specific use-case, I think my first advice would be to try and see whether you can't cast this as a dependency parsing challenge.