How do I load the output of ner.gold-to-spacy into spacy?

theoldhat · October 8, 2018, 7:27pm

Hi,

I looked at this link in the spacy documentation around updating the named entity recogntion using custom training data, but the format of the training data referenced in the spacy documentation

TRAIN_DATA = [
    ('Who is Shaka Khan?', {
        'entities': [(7, 17, 'PERSON')]
    }),
    ('I like London and Berlin.', {
        'entities': [(7, 13, 'LOC'), (18, 24, 'LOC')]
    })
]

is different than the jsonl format exported from ner.gold-to-spacy. Is there an easy way to load jsonl data for use with spacy?

Thanks!

Hat

ines · October 9, 2018, 8:18am

Hi! The ner.gold-to-spacy format should give you data that looks like this:

 ["I like London", {"entities": [[7, 13, "LOC"]]}]

That’s pretty much the same format as the examples above (only with a list instead of tuples, since JSON doesn’t know tuples – but that shouldn’t matter). So you should be able to just read in the JSONL file and pass the result in as the training data.

theoldhat · October 9, 2018, 5:47pm

Hi Ines,

Thanks very much for the reply! Should I be able to use the JSONL loader that comes with Prodigy to read the file? When I try to do so I get an error saying invalid JSON?

Apologies if these are novice questions.

ines · October 9, 2018, 5:53pm

No worries!

Ultimately, all you need to do is open the file, iterate over the lines and call json.loads(line) (or, even better, line.strip() to trim whitespace). You can also use the jsonlines Python library if that’s easier.

Alternatively, you could also just copy-paste the data into a Python list – for example:

TRAIN_DATA = [
    ["I like London", {"entities": [[7, 13, "LOC"]]}]
]

theoldhat · October 10, 2018, 3:29pm

Horaay Ines! I was totally backwards on this and you helped me out. I really appreciate it! Got things working now.

Topic		Replies	Views
Cannot use the ner.gold-to-spacy output JSONL data to train in spacy train usage , ner , spacy , solved	3	671	June 20, 2019
Training prodigy ner data through spacy usage , ner , spacy , solved	3	893	January 8, 2020
data-to-spacy for adding additional NER entities usage , ner , solved	1	436	December 1, 2020
How to train a NER model using spaCy 3 only, starting from prodigy (1.11) JSON files? usage , ner , spacy	1	2644	August 22, 2021
unable to convert prodigy jsonl to spacy training json usage , spacy	3	1464	June 26, 2020

How do I load the output of ner.gold-to-spacy into spacy?

Related topics