Hello SpaCy ProdiGy Team,
at the moment I am working on an tool for analytics of information in context of desinformation and recontexualisation of information.
For that I am using SpaCy at the backend and want train the knowlegde base by my own. Is it possible to integrate as developer Prodigy? Or is it better to use Label Studio?
Thanks in advance,
Maxim R. Garrtner
Hi @hdaipteam and welcome to the forum
If you're already using spaCy for your NLP pipeline, there is a clear advantage in integrating with Prodigy as there's a number of utilities in place that ensure smooth data and configurations interchange for training and data pre-annotation.
If, for example, you are using spaCy KnowledgeBase class to implement your knowledge base you can easily integrate the spaCy pipeline that contains the KB component to a custom Prodigy recipe e.g. to find the closest existing nodes to the candidate entities and curate their classification manually via UI.
To give you an idea how such integration could look like, you might want to look at this demo where the KB (in this case DBPedia Spotlight) is used to curate LLM annotations. This is not KB creation of course, but it gives an idea how different spaCy components can be easily integrated to create a custom annotation flow.
Prodigy is a highly scriptable also on the front end so you should be able to adjust the UI (using custom CSS & JS code) to display the annotation tasks as required.
Finally, you can count on dev support via this forum in case you want to brainstorm ideas or run into technical issues while developing your solution.
I realize I have not made any explicit comparison with Label Studio, but I have not used it extensively, and less for highly customized solutions like the one you likely need. If you want to share some details on your workflow or what your desired interface should look like it would be easier to give a more specific answer whether Prodigy is a good fit or not.