Few labels from the txt file for textcat.manual

Hello, there are many discussions on similar topics, but I did not find an answer for my situation.

I want to create a ner model, to classify the names of the leaders of countries, countries and cities, I really liked the approach used in the video, but for several labels it does not work correctly.

I have a txt file with a lot of leaders of countries, countries and cities, and I want with a command " textcat.teach" assign them to patterns to then work with ner.

but after I save them, they do not belong to separate patterns, but to common one

And later when I use ner.manual with my pretrained patterns, the program refers to them as one, although they are different.

Therefore, I have a question, whether it is really possible to create pretrained patterns with different labels using "textcat"? If not, what are the options for annotating a large number of patterns taken from a txt file?

The docs read that the --label parameter assigns a single label to assign to the patterns. That means that when you run the command with --label "PRES,PLC" that then it's assumed that there's a single label called "PRES,PLC".

I think in your case it might just be easiest to manually grab the items from Prodigy and format them in the right way. You can fetch the data via:

from prodigy.components.db import connect

db = connect()
dataset = db.get_dataset("title_patern")

In this example, dataset refers to a list of dictionaries that you can manually turn into patterns. Let me know if that works. :slight_smile: