.jsonl-formatted file, mark as either category a, b, or c (mutually exclusive) and save to database- how?


Goal: Have a file in .jsonl-format of tweets, mark them/annotate as either category a, b, or c (mutually exclusive) and save them to the database. No model in the loop.

Tried: I have played around with textcat.manual, mark and custom recepies for many hours now! Could you please tell me how to achieve this step-by-step? To get you started there are some conceptual parts I don't get from the docs:

1). You read from the jsonl-formateted file to the database, right?
2). Where do you specify View ID and Label(s)? I the jsonl-formated dataset? Or somewhere else? Exactly how should I specify this?

Hi! From reading your description, it sounds like textcat.manual should be exactly what you're looking for? If you set the --exclusive flag, you'll only be able to select one option per text.

prodigy textcat.manual your_dataset en_core_web_sm your_data.jsonl --label LABEL_A,LABEL_B,LABEL_C --exclusive

No, Prodigy will only save the collected annotations to the database, not the raw unannotated examples. Those will be streamed in directly from the input file.

The view_id is part of the dictionary of components returned by the recipe. The top-level label, multiple choice options etc. can all be part of the incoming data. Streams in Prodigy are generators that yield dictionaries – e.g. {"text": "foo", "label": "LABEL"}. That's what your recipe returns as the "stream".

You can find more details and examples of recipes here: https://prodi.gy/docs/workflow-custom-recipes Your PRODIGY_README.html also has the more detailed API documentation and an "Annotation task formats" section that shows what formats Prodigy expects for the different interfaces.

Thanks again Ines! :heart_eyes: It worked with textcat.manual. It turned out that it was some stupid port problem. When I swithced the port to 8080 (which did not work yesterday, then 9999 worked on Anaconda) everything works as expected. I don't get whats happening. If you have heared something from others please give me a tip! But for now, just thanks!

1 Like