Is there any to train the the annotated data through spacy which I get from Prodigy in jsonl format? Actually my prodigy version is quite old and I dont want to resubscribe it as the old version is enough for me, But there are some functionalities now which are only compatible for spacy newer versions. So I wanted to know is there any way I can train my data from prodigy jsonl file through spacy instead of using ner.batch-train?
Hi, there's a built-in spacy converter that's intended for use with prodigy NER data:
spacy convert --lang en data.jsonl .
This should create data.json
in spacy's training format. You need to specify the language so that the converter can tokenize the texts.
See a more detailed example here: Unable to use Prodigy annotations with SpaCy CLI train
1 Like
I have converted the jsonl file from prodigy to json format and now I want to train that json file in Spacy for NER. Sample of json file is below:
[
{
"id":0,
"paragraphs":[
{
"raw":"Really very sad. Allah rehem kere Ameen",
"cats":[
],
"sentences":[
{
"brackets":[
],
"tokens":[
{
"ner":"U-IGNORE",
"id":0,
"orth":"Really"
},
{
"ner":"U-IGNORE",
"id":1,
"orth":"very"
},
{
"ner":"U-IGNORE",
"id":2,
"orth":"sad"
},
{
"ner":"O",
"id":3,
"orth":"."
}
]
},
{
"brackets":[
],
"tokens":[
{
"ner":"U-IGNORE",
"id":4,
"orth":"Allah"
},
{
"ner":"U-IGNORE",
"id":5,
"orth":"rehem"
},
{
"ner":"U-IGNORE",
"id":6,
"orth":"kere"
},
{
"ner":"U-IGNORE",
"id":7,
"orth":"Ameen"
}
]
}
]
}
]
How can I train this type of data in spacy for NER?
You can find documentation about training on the command line here: