Text classification with 100 labels - Multi label

I am trying multi-label and multi-class classifier (around 100) classes. I read up many of the answers and also your documentation, I am going with the approach of labelling on a class at a time but I want to add a feature which can also take in suggestions from the annotators for the other classes that this text might belong to.

For this, I wrote a custom recipe which will also give a text-input (which is free text), to make it easier for me to later merge all this data I want to provide the user with a drop-down with my label list. I wanted to know if we can do this in the same text input block.

Any suggestions/help on this is highly appreciated.

Thanks,
Vaishnavi

Hi! Just to make sure I understand the question correctly: You want to have a free-form input that also has a dropdown/typeahead with options? For example, like this: W3Schools Tryit Editor

It's not currently built in – although it's a nice idea, especially adding it to the input as a datalist – I'll test this as a new feature :nerd_face: In the meantime, you could add this as a custom HTML block as an <input> with a <datalist> and add custom JavaScript that calls window.prodigy.update({ user_input: input.value }) to update the current task with whatever the user has typed in.

Thanks Ines. Yes, this is precisely what I wanted :slight_smile: I will write a custom HTML Block for now!

Okay, perfect :slightly_smiling_face:

Btw, I played around with the feature and it was very easy to implement, so Prodigy v1.10 will support a "field_suggestions" property that lets you provide a list of auto-suggestions that are shown when the user types or presses in the field.

1 Like

Hi Ines,

I am now trying to run multiple annotations on different ports. These are manual annotations for Gold Set no training happening in the background. My commands look something like this:

PRODIGY_PORT=5678 prodigy classification mechanics file_name l -F /nlp/prodigy_pipeline/custom_recipe/interface_receipe.py

I am able to run it normally but when I try to use "nohup" mode it's not working. Why do you think so?

What exactly do you mean by "not working"? Is there an error?

Yes, it says the command cannot be run file or directory missing.

Hmm, this sounds like it might not be a problem with Prodigy directly and more with the way nohup runs the command? It should definitely be no problem to run Prodigy this way and people have been doing this a lot. What command did you run? Just PRODIGY_PORT=5678 nohup prodigy [...]?

Thanks Ines.

I was able to get this to work. There was a problem with the ec2 instance and fixed it now.

However, I have one last pressing issue. As I have mentioned earlier I am annotating for multi-label, multi-class classification task. I have around 100 classes, I want to start three parallel annotation sessions.

PRODIGY_PORT=5678 nohup prodigy [....] --label class1
PRODIGY_PORT=1234 nohup prodigy [....] --label class2 and so-on.

I am hosting prodigy application in Ec2 instance and able to run multiple sessions successfully. But when I use proxy_pass the external link says "Oops, something went wrong" - Project couldn't be fetched properly.

I tried proxypass with other generic websites and it works fine, so I am assuming it's got something to do with my prodigy project.

Hi @ines
It works fine for one proxypass but from the next one onwards it gives that error. Also, I checked the explorer console - it says "502 bad gateway". I am using Chrome.

Hey @VaishKandala , what are you using with proxypass? Maybe Nginx? Apache? Or something like Traefik?

You might want to check if there's any error in the console, in developer tools.

Then also check the logs for the system on front (Nginx, Apache, Traefik, etc) .

And then also check the logs for Prodigy.

If you are running with nohup, you can probably redirect the output to a file, so that you can see the logs later, for example, something like:

nohup prodigy ...some-params-here > logs-prodigy1.log 2>&1 

The > logs-prodigy1.log makes the logs go to a file logs-prodigy1.log, you would use a different file for each Prodigy you start. And then 2>&1 makes the sterr (the "error" logs) go to the same file as the normal logs.

Then, when you get the issue again, you can check if there's any error in those files too.

But also make sure you check in the proxy (Nginx, Apache, Traefik, etc) as the error might be in its configurations.

Thanks @tiangolo.

I am using Nginx and finally could figure out the issue. This is my first attempt with Nginx and I wasn't passing the ports in "upstream" so it was picking the default or if the default is not running it was giving an error. Now, the problem is solved with Nginx.

Thanks,
Vaishnavi

Awesome @VaishKandala ! :tada:

Thanks for reporting back.

Just released Prodigy v1.10, which adds the field_suggestions option to text_input blocks!

Hi @ines

I want to train a multi-label text classification model on the data provided on Prodigy docs page news_headlines.jsonl. For this I need the annotated data set. Is it available? If it is, please share me the download link.
Thanks

Yes, there's a link "Downoad annotated data" here, but that includes the data annotated with some named entties: Prodigy 101 – everything you need to know · Prodigy · An annotation tool for AI, Machine Learning & NLP You can also download the raw data from there. It's just some news headlines, nothing special.

Also, note that this is just an example dataset, though and it was created for the tutorial to make it easier for people to try things out. It's not particularly useful outside of that, so if you actually want to train your own model, you should also create your own dataset.