Topic modeling with prodigy

Hi Support,
sorry for the newbie question I’m wondering if Prodigy could be used also for Topic modeling? I’ve used LDA and I see so many topics, and I’m bit confused! Also the interpretability is sometime very low: maybe annotations could help, and in which ways?
Any suggestions would be really appreciated.

My best


You can definitely use Prodigy to help you label your clusters, or to evaluate topic modelling systems. We don’t have a built-in flow for this, but you could easily write a custom recipe:

To add some more specifics: If you really just want to label your clusters manually and you have absolutely no idea what they could be, it might actually best to start off in a text editor and just write down labels. Prodigy is most useful if you already have a (rough) label set or a more explicit, quantitative goal.

For evaluating your clusters, intrusion detection is something you can do pretty easily using Prodigy: You take a few words from each cluster, mix in a random one and use the choice interface to display them as options. A single task could look like this:

    "text": "Pick the odd one out",
    "options": [
        {"id": 0, "text": "Nirvana"},
        {"id": 1, "text": "Depeche Mode"},
        {"id": 2, "text": "bread"},
        {"id": 3, "text": "The Beatles"}

The options can have arbitrary, custom properties, so you could add a "correct": true or something, so you can later check which answer was correct. When you export the data, the ID of the selected option will be added as the "accept" property, e.g. "accept": [2].

Ideally, you could even ask a friend or colleague who has no idea about your clusters or objective to do a few hundred examples for you (should be pretty quick!). If the task is easy and the answers come back mostly correct, it’s usually an indicator that your clusters are internally consistent.

Ultimately, it really all comes down to the type of data you’re working with, and you might need to experiment a little. But hopefully, Prodigy can make this easier.