📺 Video: NER with Prodigy & Transfer Learning

ines · March 16, 2020, 12:05pm

I recorded a new video In this video, I'm training a named entity recogntion model from scratch, using semi-automatic annotation with sense2vec vectors and improving a model in the loop, plus some cool transfer learning stuff! The goal is to analyze 2m+ comments posted to Reddit's r/Cooking subreddit to find out how mentions of ingredients change over time, and to create a cool bar chart race animation.

Annotation took about 2.5 hours and the results are pretty interesting. I've open-sourced all code, raw data, annotations and results so you can check it out and play with it. I'm sure there's a lot more to explore in the data.

frenet · March 16, 2020, 7:45pm

https://github.com/explosion/projects/ner-food-ingredients is invalid.
Can you post it again? Thank you!

ines · March 16, 2020, 7:54pm

Ah, sorry, I mistyped. It's the ner-food-ingredients directory in that repo, so: https://github.com/explosion/projects/tree/master/ner-food-ingredients Also updated the link above.

frenet · March 16, 2020, 8:01pm

Thank you, this video is very useful for me, for it is an example of NER from scratch.

adamkgoldfarb · March 28, 2020, 12:15am

Great video, thanks for sharing! I was hoping to hear a bit more about the use case for --init-tok2vec and the process you took to create that component-- is that something you might add to the video or add a bit more color to here?

Thanks again!

ines · March 29, 2020, 9:43am

The pretraining process itself is not that exciting, to be honest It's really just running spacy pretrain on lots of raw text from Reddit for a while (~8 hours on GPU). The tok2vec_cd8_model289.bin artifact was trained with a depth of 8 and for 289 iterations. You can read more about spacy pretrain here: Command Line Interface · spaCy API Documentation

adamkgoldfarb · March 30, 2020, 2:13am

Thanks! I’ll read the docs to better understand the benefits offered by pretraining.

baxtersapp · June 9, 2020, 3:07pm

I know this might be too much of a basic question, but I still can't figure out how to pretrain using the GPU. There doesn't appear to be any way to do this via CLI. Please let me know what I'm missing, @ines! Hopefully I am not the only one confused on this.

ines · June 10, 2020, 8:11am

Ah, so what's the problem, does it not detect the GPU? If you have a GPU available, spacy pretrain should "just work", detect the GPU and train on GPU.

ysz · January 7, 2023, 3:23pm

@ines thank you for sharing this, quick one, you use prodigy sense2vec.teach to then prodigy terms.to-patterns to compile a list of patterns which is in fact just a plain text jsonl

Does downstream NER model which you then train the video make any use of that sense2vec model trained under the hood of the NER maybe? or its just the word verbatim patterns which are actually used?

koaning · January 9, 2023, 9:46am

The video is a bit dated and the commands are slightly different now, but terms are just terms used at verbatim. Their main use-case is to help pre-fill annotations which makes annotating much easier, which in turns makes it much easier to get your first model ready via prodigy train. If you're interested in a more recent tutorial with sense2vec, you might enjoy this video where I use it to detect video games.

Topic		Replies	Views
sense2vec: updated library, new vectors, tutorial for bootstrapping NER models, more Prodigy recipes & open-source datasets project , news	1	1001	November 27, 2019
Transfer Learning for NER usage , ner	6	2508	May 24, 2021
Difference in quality in make-gold vs trained model's annotations (and others) ner	1	600	August 10, 2018
Improve trained models with annotations usage , ner , training	3	521	September 20, 2021
ner.teach - couple of questions ner , done , solved , nightly	9	2649	December 30, 2021

📺 Video: NER with Prodigy & Transfer Learning

Related topics