I am trying multi-label and multi-class classifier (around 100) classes. I read up many of the answers and also your documentation, I am going with the approach of labelling on a class at a time but I want to add a feature which can also take in suggestions from the annotators for the other classes that this text might belong to.
For this, I wrote a custom recipe which will also give a text-input (which is free text), to make it easier for me to later merge all this data I want to provide the user with a drop-down with my label list. I wanted to know if we can do this in the same text input block.
Any suggestions/help on this is highly appreciated.
Hi! Just to make sure I understand the question correctly: You want to have a free-form input that also has a dropdown/typeahead with options? For example, like this: W3Schools Tryit Editor
It's not currently built in – although it's a nice idea, especially adding it to the input as a datalist – I'll test this as a new feature In the meantime, you could add this as a custom HTML block as an <input> with a <datalist> and add custom JavaScript that calls window.prodigy.update({ user_input: input.value }) to update the current task with whatever the user has typed in.
Btw, I played around with the feature and it was very easy to implement, so Prodigy v1.10 will support a "field_suggestions" property that lets you provide a list of auto-suggestions that are shown when the user types or presses ↓ in the field.
I am now trying to run multiple annotations on different ports. These are manual annotations for Gold Set no training happening in the background. My commands look something like this:
PRODIGY_PORT=5678 prodigy classification mechanics file_name l -F /nlp/prodigy_pipeline/custom_recipe/interface_receipe.py
I am able to run it normally but when I try to use "nohup" mode it's not working. Why do you think so?
Hmm, this sounds like it might not be a problem with Prodigy directly and more with the way nohup runs the command? It should definitely be no problem to run Prodigy this way and people have been doing this a lot. What command did you run? Just PRODIGY_PORT=5678 nohup prodigy [...]?
I was able to get this to work. There was a problem with the ec2 instance and fixed it now.
However, I have one last pressing issue. As I have mentioned earlier I am annotating for multi-label, multi-class classification task. I have around 100 classes, I want to start three parallel annotation sessions.
I am hosting prodigy application in Ec2 instance and able to run multiple sessions successfully. But when I use proxy_pass the external link says "Oops, something went wrong" - Project couldn't be fetched properly.
I tried proxypass with other generic websites and it works fine, so I am assuming it's got something to do with my prodigy project.
Hi @ines
It works fine for one proxypass but from the next one onwards it gives that error. Also, I checked the explorer console - it says "502 bad gateway". I am using Chrome.
The > logs-prodigy1.log makes the logs go to a file logs-prodigy1.log, you would use a different file for each Prodigy you start. And then 2>&1 makes the sterr (the "error" logs) go to the same file as the normal logs.
Then, when you get the issue again, you can check if there's any error in those files too.
But also make sure you check in the proxy (Nginx, Apache, Traefik, etc) as the error might be in its configurations.
I am using Nginx and finally could figure out the issue. This is my first attempt with Nginx and I wasn't passing the ports in "upstream" so it was picking the default or if the default is not running it was giving an error. Now, the problem is solved with Nginx.
I want to train a multi-label text classification model on the data provided on Prodigy docs page news_headlines.jsonl. For this I need the annotated data set. Is it available? If it is, please share me the download link.
Thanks
Also, note that this is just an example dataset, though and it was created for the tutorial to make it easier for people to try things out. It's not particularly useful outside of that, so if you actually want to train your own model, you should also create your own dataset.