Book

I have today to get ready for spending next week with Prodigy. Is there a good book on it?

I don't find the YouTube videos any help, and I think a book would be better.

I have a thousand text files. I want to train a model from scratch, output a model, then load that model into Python to test it. I.e. nlp = spacy.load("en_core_web_sm")

Hi! There's no book on Prodigy specifically and the most comprehensive and up-to-date overview is the documentation, specifically the usage pages about various annotation tasks. There are books about spaCy, e.g. this one, as well as extensive documentation and tutorials if what you're looking for is more along the lines of general NLP and data best practices.

From your other posts, it sounds like what you're trying to do is named entity recognition, right? The process here should be relatively straightforward: you can load your examples into a workflow like ner.manual, annotate them with the given label scheme and then run prodigy train with the annotated dataset to output a spaCy model.

What may take more time and experimentation is typically the stuff around general NLP development, which isn't directly related to Prodigy: iterating on your data, trying out different label schemes that work best in the context of machine learning, improving your data for the respective components, collecting more data if needed, and so on. This will be very specific to the data and use case you're working on and often just takes time, experience and getting a good feeling for your data, which Prodigy should be able to help with.

2 Likes