hi @jonnyfoka !
Thanks for your question and welcome to the Prodigy community
We don't have an off-the-shelf converter but there are several posts that may help:
Hello everyone,
I want to convert from either a Prodigy JSONL sample (or a better yet, a spaCy Doc), to a CoNLL 2003 sample. In CoNLL 2003 format documentation , I see there are 4 columns or items: "The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag". If my understanding is correct, I could obtain these items as follows:
1st item: The original word.
2nd item: Could be obtained from "tag_" Token attri…
Also this one is a little older but goes step-by-step:
I’ve just acquired prodigy to work on a manual ner tagging task. After manually tagging a document, the tool exports the results to a json format, the stantard spacy format. We would like to have this data tagged in the CoNLL format, the column format like so:
John PERSON
works O
for O
Microsoft ORGANIZATION
Is there an option to do this or should we opt to post-process the json file in order to do so?
Thanks in advance
Then this one covers relations info:
Hi! We don't have an existing conveter script – although it'd be cool to have, if you end up writing one and want to share
Assuming you've used the (semi-)manual relations UI to annotate dependencies, here's the format that Prodigy will produce: https://prodi.gy/docs/api-interfaces#relations It should give you all the information you need, so it should mostly come down formatting it as columns, depending on the CoNLL-U specification you're using. The easiest is probably …
I haven't used CONLL but I've been told there are different formats, or at least different things people call CONLL. So if those posts don't work out, if you can provide any more details we can try to help.