Relating entities + resolving coreferences in Russian texts

Hi! I need to create a set of entities and interrelate them. The documentation on ner.manual and ref.manul is pretty clear, but I would like to clarify a few details related to coreference resolution if possible.

Say, we have a corpus on different types of vehicles. We want to recognize a few types those, as well as some of their attributes (e.g. speed, size) and then recognize which attribute is related to what entity. The corpus is in Russian, but I'll use English to exemplify.

I think my general steps will be:

  • Annotating separate entites for the vehicle types (say, a car, a ship, or a plane) and for the attributes;

  • Training a model and (if it performs well) plugging it, as well as all pre-annotated data I have, into ref.manual and then annotating the relations.

Now to what I'm least certain about. First, the entities and their attributes may appear a sentence or two apart. I guess we could potentially improve the relation extraction by also annotating with COREF tag and then applying neuralcoref. So e.g. instead of:

I could create attach a label COREF to ref.manual and annotate like this:

So I'd like to ask:

  • First and most important: since there's no support of Russian in neuralcoref, we could try to train a model with our own corpora. But would it be feasible at all, given the very limited bandwidth and modest size of the corpus (a few thousand records)?

  • If so, could this custom Russian model come in handy, as it has a tagger and a parser, and these seem to be the prerequisite?

  • Is it ok to annotate coreference not as a separate recipe but as one of the tags in ref.manual? And if it is, should I annotate the linking nouns/pronouns/phrases in any specific way?

Pardon me for so much text, did my best to word everything in the most concise way I can :slight_smile: