Hi,
What is the format of annotated coreference dataset? Is it conll format?
I am planning to train it on neuralcoref.
Thanks
Hi,
What is the format of annotated coreference dataset? Is it conll format?
I am planning to train it on neuralcoref.
Thanks
Hi! You can see an example of the JSON format produced by the relations
UI here: https://prodi.gy/docs/api-interfaces#relations
It includes the individual tokens, the annotated relations and the references to their heads and children (tokens they connect) and labels, so you should be able to easily convert that to any format you need.
Is anyone could able to solve this, please?
hi @DSexplorer!
Can you clarify what is your question?
The original question was:
And it was answered with the link to this example (which is what the coreference data looks like when using Prodigy's relations
UI):
{
"text": "My mother’s name is Sasha Smith. She likes dogs and pedigree cats.",
"tokens": [
{"text": "My", "start": 0, "end": 2, "id": 0, "ws": true},
{"text": "mother", "start": 3, "end": 9, "id": 1, "ws": false},
{"text": "’s", "start": 9, "end": 11, "id": 2, "ws": true},
{"text": "name", "start": 12, "end": 16, "id": 3, "ws": true },
{"text": "is", "start": 17, "end": 19, "id": 4, "ws": true },
{"text": "Sasha", "start": 20, "end": 25, "id": 5, "ws": true},
{"text": "Smith", "start": 26, "end": 31, "id": 6, "ws": true},
{"text": ".", "start": 31, "end": 32, "id": 7, "ws": true, "disabled": true},
{"text": "She", "start": 33, "end": 36, "id": 8, "ws": true},
{"text": "likes", "start": 37, "end": 42, "id": 9, "ws": true},
{"text": "dogs", "start": 43, "end": 47, "id": 10, "ws": true},
{"text": "and", "start": 48, "end": 51, "id": 11, "ws": true, "disabled": true},
{"text": "pedigree", "start": 52, "end": 60, "id": 12, "ws": true},
{"text": "cats", "start": 61, "end": 65, "id": 13, "ws": true},
{"text": ".", "start": 65, "end": 66, "id": 14, "ws": false, "disabled": true}
],
"spans": [
{"start": 20, "end": 31, "token_start": 5, "token_end": 6, "label": "PERSON"},
{"start": 43, "end": 47, "token_start": 10, "token_end": 10, "label": "NP"},
{"start": 52, "end": 65, "token_start": 12, "token_end": 13, "label": "NP"}
],
"relations": [
{
"head": 0,
"child": 1,
"label": "POSS",
"head_span": {"start": 0, "end": 2, "token_start": 0, "token_end": 0, "label": null},
"child_span": {"start": 3, "end": 9, "token_start": 1, "token_end": 1, "label": null}
},
{
"head": 1,
"child": 8,
"label": "COREF",
"head_span": {"start": 3, "end": 9, "token_start": 1, "token_end": 1, "label": null},
"child_span": {"start": 33, "end": 36, "token_start": 8, "token_end": 8, "label": null}
},
{
"head": 9,
"child": 13,
"label": "OBJECT",
"head_span": {"start": 37, "end": 42, "token_start": 9, "token_end": 9, "label": null},
"child_span": {"start": 52, "end": 65, "token_start": 12, "token_end": 13, "label": "NP"}
}
]
}
my question is how we can convert this data to the conll format which is required for training the neuralcoref model
We don't have an off-the-shelf converter script but this post gives a suggestion of how you could do it: