Hello Matthew, I have a similar problem to solve to the problem you talked about on PyData conference PyData Berlin 2018 and I’m trying to replicate the example you showed on the slide
The first line prodigy textcat.teach crime_dataset /data.jsonl --label CRIME It doesn’t work for me because it wants me to specify spacy model as the second parameter. So I’m wondering how did it work in your example? The same question applies to the second prodigy command prodigy ner.teach ner_dataset /data.json --label PERSON, LOCATION
Could you please provide some clarity on how to replicate the problem you talked about there?
I made a typo when I was putting together the slide, you’re right that the command is wrong there. It should be fixed in the slideshare, but, hard to fix the video :p. It should work if you specify the spaCy model — something like en_core_web_md should be fine.
Thank you for a quick reply. I have follow up question. I see that here prodigy textcat.teach crime_dataset en_core_web_md /data.jsonl --label CRIME you don’t specify any initial training data I mean --seedor --patterns. Is it fine just to start annotating without these initial information? To give a little background what I’m trying to solve I want a model tell me weather there is an address present in the text.
If you’re labelling an entity that the model already predicts, you can use the current state of the model as a starting point. But if you’re annotating a new entity, you do need to do something else to add the initial entities.
I would suggest starting with a round of ner.manual annotation, to train an initial model. After that you can use ner.teach to improve its predictions.