Understanding textcat.teach from PyData Berlin 2018 talk

Ghorsey · October 5, 2018, 10:07pm

Hello Matthew, I have a similar problem to solve to the problem you talked about on PyData conference
PyData Berlin 2018 and I’m trying to replicate the example you showed on the slide

The first line prodigy textcat.teach crime_dataset /data.jsonl --label CRIME It doesn’t work for me because it wants me to specify spacy model as the second parameter. So I’m wondering how did it work in your example? The same question applies to the second prodigy command prodigy ner.teach ner_dataset /data.json --label PERSON, LOCATION

Could you please provide some clarity on how to replicate the problem you talked about there?

Thank you

honnibal · October 5, 2018, 10:15pm

Hi @geoff,

I made a typo when I was putting together the slide, you’re right that the command is wrong there. It should be fixed in the slideshare, but, hard to fix the video :p. It should work if you specify the spaCy model — something like en_core_web_md should be fine.

Ghorsey · October 5, 2018, 10:32pm

Thank you for a quick reply. I have follow up question. I see that here prodigy textcat.teach crime_dataset en_core_web_md /data.jsonl --label CRIME you don’t specify any initial training data I mean --seed or --patterns. Is it fine just to start annotating without these initial information? To give a little background what I’m trying to solve I want a model tell me weather there is an address present in the text.

honnibal · October 11, 2018, 12:21pm

If you’re labelling an entity that the model already predicts, you can use the current state of the model as a starting point. But if you’re annotating a new entity, you do need to do something else to add the initial entities.

I would suggest starting with a round of ner.manual annotation, to train an initial model. After that you can use ner.teach to improve its predictions.

Topic		Replies	Views
How textcat.teach works under the hood usage , textcat	16	92	March 26, 2025
textcat.teach not taking into account label value textcat , done	4	601	December 7, 2018
When Example objects are not created - E930 textcat , spacy	1	324	March 7, 2023
Interface error with text cat.teach? usage , textcat	1	583	March 20, 2018
Textcat - teach to train. usage , textcat	2	553	September 1, 2022

Understanding textcat.teach from PyData Berlin 2018 talk

Related topics