Classification of text into several topics

Hi
We have a database of several hundred thousands of Learning Videos, tutorials etc. Each item in this database has a brief title and description, which describes what topic the learning video/course/tutorial is about.
We are trying to build an algorithm that can use NLP to interpret the title and description text and classify it into one or more of several topics.
For eg: A learning item description could be something like “3 day course provides basic understanding of agricultural loans and farm credits”. This should be classified into a topic let us say “Rural Banking”.

There are no pre-determined set of topics. We built a set of around 500 topics manually.
Now the 500 topics are not precise, We keep modifying them, improving them so that they are comprehensive, coarse or granular enough.

The task is to classify hundred thousands of such learning items into one of those 500 topics.

Do you think prodigy annotation tool can help here. Is there a pipeline that you can think of that we can build by leveraging prodigy annotation tools.

I do think Prodigy will be able to help you. I think you should focus on running experiments to settle on your topic list, because once you change the topics, you’ll have to perform at least some reannotation. Prodigy can help with this if you’re able to reason about the schema change. For instance, if you know that you’re splitting one topic into two, you can queue up a new annotation task with just the instances that you previously assigned one label to, and give it the two new possible labels as options. This will make the reannotation work fairly fast.

In general you’ll want to make use of the fact that Prodigy is scriptable, and write yourself custom recipes to make use of what you know about your task structure: https://github.com/explosion/prodigy-recipes