Hi @ines,
In this other thread, you told me the following:
When you’re done with that, you could export the data and run one experiment where you link up highlighted spans close to each other and collect binary feedback on whether they are connected. You could either create data in the dep format (with a head and child), or use the choice interface with the options “For”, “Against”, “Support” and “Attack”.
So I was testing the two approaches, but I'm having a hard time making them work.
I'm first testing mapping relations between spans of texts using dep.teach, but got nowhere. Here's what I did so far:
-
First I tried loading a sample jsonl into dep.teach (with head and child, and a custom 'supports' relation) using the following command:
prodigy dep.teach essays_relations en_core_web_sm example.jsonl --unsegmented
The jsonl content is:
{"text": "Should students be taught to compete or to cooperate?\n\nIt is always said that competition can effectively promote the development of economy. In order to survive in the competition, companies continue to improve their products and service, and as a result, the whole society prospers. However, when we discuss the issue of competition or cooperation, what we are concerned about is not the whole society, but the development of an individual's whole life. From this point of view, I firmly believe that we should attach more importance to cooperation during primary education.\nFirst of all, through cooperation, children can learn about interpersonal skills which are significant in the future life of all students. What we acquired from team work is not only how to achieve the same goal with others but more importantly, how to get along with others. During the process of cooperation, children can learn about how to listen to opinions of others, how to communicate with others, how to think comprehensively, and even how to compromise with other team members when conflicts occurred. All of these skills help them to get on well with other people and will benefit them for the whole life.\nOn the other hand, the significance of competition is that how to become more excellence to gain the victory. Hence it is always said that competition makes the society more effective. However, when we consider about the question that how to win the game, we always find that we need the cooperation. The greater our goal is, the more competition we need. Take Olympic games which is a form of competition for instance, it is hard to imagine how an athlete could win the game without the training of his or her coach, and the help of other professional staffs such as the people who take care of his diet, and those who are in charge of the medical care. The winner is the athlete but the success belongs to the whole team. Therefore without the cooperation, there would be no victory of competition.\nConsequently, no matter from the view of individual development or the relationship between competition and cooperation we can receive the same conclusion that a more cooperative attitudes towards life is more profitable in one's success.", "data": {"head": "the significance of competition is that how to become more excellence to gain the victory", "dep": "supports", "child": "competition makes the society more effective"}}
The tool loads the text ok, presenting the full text, but it still tries to do the regular dependency task annotation, highlighting random single-word tokens and using the regular dependency parsing labels. The example I got from the documentation was using ids for tokens, but since I'm predicting dependencies between spans, instead of tokens, I changed the sample to contain the full span text instead of a token id.
The same jsonl in ner_manual format I created is here (28.8 KB). I have all these spans that could be marked with some sort of relation between them.
- As for using the choice interface, I'm not sure what sort of annotation format I should present to prodigy either. Since I'm trying to annotate pairs of spans, I'm thinking that I should present annotated pairs from my dataset, but I'm not seeing how a trained model would handle this later on. However, it would only make sense to annotate pairs of spans, if they were highlighted in the original raw text, so the annotator could see them in context. How could I have this working on prodigy: full raw text, with two highlighted spans and my labels presented as options?
Thanks!