Goal: Have a file in .jsonl-format of tweets, mark them/annotate as either category a, b, or c (mutually exclusive) and save them to the database. No model in the loop.
Tried: I have played around with textcat.manual, mark and custom recepies for many hours now! Could you please tell me how to achieve this step-by-step? To get you started there are some conceptual parts I don't get from the docs:
1). You read from the jsonl-formateted file to the database, right?
2). Where do you specify View ID and Label(s)? I the jsonl-formated dataset? Or somewhere else? Exactly how should I specify this?
Hi! From reading your description, it sounds like textcat.manual should be exactly what you're looking for? If you set the --exclusive flag, you'll only be able to select one option per text.
No, Prodigy will only save the collected annotations to the database, not the raw unannotated examples. Those will be streamed in directly from the input file.
The view_id is part of the dictionary of components returned by the recipe. The top-level label, multiple choice options etc. can all be part of the incoming data. Streams in Prodigy are generators that yield dictionaries – e.g. {"text": "foo", "label": "LABEL"}. That's what your recipe returns as the "stream".
You can find more details and examples of recipes here: Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP Your PRODIGY_README.html also has the more detailed API documentation and an "Annotation task formats" section that shows what formats Prodigy expects for the different interfaces.
Thanks again Ines! It worked with textcat.manual. It turned out that it was some stupid port problem. When I swithced the port to 8080 (which did not work yesterday, then 9999 worked on Anaconda) everything works as expected. I don't get whats happening. If you have heared something from others please give me a tip! But for now, just thanks!