Starting from scratch

I'm watching this; Training a NAMED ENTITY RECOGNITION MODEL with Prodigy and Transfer Learning - YouTube

it makes reference to the reddit file "s2v_reddit_2015_md". How to I create a file from the data I actually want to annotate? The video just says; "use this file" and doesn't' explain what it is, or how to use your own.

Hi @matt.whitby , you can download the pretrained model from the sense2vec repository. You can also check the tutorial's repo for more information :slight_smile:

I did. There are things that are not clear to me.
I fear this is going to be a company that just offers support in the form of links to things I've already read, rather than answers to questions.

Hi! If you look at the forum and our other responses, I hope you'll see that we spend a lot of time answering people's specific questions and providing guidance :slightly_smiling_face: But there are also a lot of resources already, both in the form of documentation, tutorials etc. and it's never clear what someone has seen or what they might have missed. So we always try to provide as many links and resources as possible.

Your question was about the s2v_reddit_2015_md vectors, which is a vectors package that you can download from sense2vec. In the first step of the tutorial, I used the vectors package to bootstrap a terminology list for quicker annotation, by finding more examples of the entities so we can pre-highlight them later and annotate faster. You'd typically want to use existing pretrained vectors here. It's not a required step, but it's a nice way to automate the annotation, which is why I showed it in the video. But you can also skip this and go straight to the annotation process.

Here's an example of using Prodigy to annotate entities in your text, with an example of a text source you can load in: https://prodi.gy/docs/named-entity-recognition#manual You can also load in your text in other formats, including JSON, plain text or CSV: https://prodi.gy/docs/api-loaders#input

1 Like