Starting from scratch

matt.whitby · March 2, 2022, 1:14pm

I'm watching this; Training a NAMED ENTITY RECOGNITION MODEL with Prodigy and Transfer Learning - YouTube

it makes reference to the reddit file "s2v_reddit_2015_md". How to I create a file from the data I actually want to annotate? The video just says; "use this file" and doesn't' explain what it is, or how to use your own.

ljvmiranda921 · March 2, 2022, 1:38pm

Hi @matt.whitby , you can download the pretrained model from the sense2vec repository. You can also check the tutorial's repo for more information

matt.whitby · March 2, 2022, 2:25pm

I did. There are things that are not clear to me.
I fear this is going to be a company that just offers support in the form of links to things I've already read, rather than answers to questions.

ines · March 2, 2022, 2:33pm

Hi! If you look at the forum and our other responses, I hope you'll see that we spend a lot of time answering people's specific questions and providing guidance But there are also a lot of resources already, both in the form of documentation, tutorials etc. and it's never clear what someone has seen or what they might have missed. So we always try to provide as many links and resources as possible.

Your question was about the s2v_reddit_2015_md vectors, which is a vectors package that you can download from sense2vec. In the first step of the tutorial, I used the vectors package to bootstrap a terminology list for quicker annotation, by finding more examples of the entities so we can pre-highlight them later and annotate faster. You'd typically want to use existing pretrained vectors here. It's not a required step, but it's a nice way to automate the annotation, which is why I showed it in the video. But you can also skip this and go straight to the annotation process.

Here's an example of using Prodigy to annotate entities in your text, with an example of a text source you can load in: https://prodi.gy/docs/named-entity-recognition#manual You can also load in your text in other formats, including JSON, plain text or CSV: https://prodi.gy/docs/api-loaders#input

Topic		Replies	Views
📺 Video: NER with Prodigy & Transfer Learning ner , project , best-practices	12	1804	January 9, 2023
Obtain a list of similar words from my own trained model ner , spacy , off-topic	1	482	September 3, 2020
sense2vec: updated library, new vectors, tutorial for bootstrapping NER models, more Prodigy recipes & open-source datasets project , news	1	1002	November 27, 2019
custom sense2vec usage	5	1421	August 15, 2021
annotating entities in text documents usage , ner , solved	15	9931	November 28, 2017

Starting from scratch

Related topics