.jsonl-formatted file, mark as either category a, b, or c (mutually exclusive) and save to database- how?

samoz · August 27, 2019, 12:56pm

Hi!

Goal: Have a file in .jsonl-format of tweets, mark them/annotate as either category a, b, or c (mutually exclusive) and save them to the database. No model in the loop.

Tried: I have played around with textcat.manual, mark and custom recepies for many hours now! Could you please tell me how to achieve this step-by-step? To get you started there are some conceptual parts I don't get from the docs:

1). You read from the jsonl-formateted file to the database, right?
2). Where do you specify View ID and Label(s)? I the jsonl-formated dataset? Or somewhere else? Exactly how should I specify this?

ines · August 27, 2019, 2:41pm

Hi! From reading your description, it sounds like textcat.manual should be exactly what you're looking for? If you set the --exclusive flag, you'll only be able to select one option per text.

prodigy textcat.manual your_dataset en_core_web_sm your_data.jsonl --label LABEL_A,LABEL_B,LABEL_C --exclusive

No, Prodigy will only save the collected annotations to the database, not the raw unannotated examples. Those will be streamed in directly from the input file.

The view_id is part of the dictionary of components returned by the recipe. The top-level label, multiple choice options etc. can all be part of the incoming data. Streams in Prodigy are generators that yield dictionaries – e.g. {"text": "foo", "label": "LABEL"}. That's what your recipe returns as the "stream".

You can find more details and examples of recipes here: Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP Your PRODIGY_README.html also has the more detailed API documentation and an "Annotation task formats" section that shows what formats Prodigy expects for the different interfaces.

samoz · August 27, 2019, 9:17pm

Thanks again Ines! It worked with textcat.manual. It turned out that it was some stupid port problem. When I swithced the port to 8080 (which did not work yesterday, then 9999 worked on Anaconda) everything works as expected. I don't get whats happening. If you have heared something from others please give me a tip! But for now, just thanks!

Topic		Replies	Views
textcat.manual seems to be exclusive by default usage , textcat , solved	2	509	March 26, 2020
textcat-multilabel annotations format textcat	2	209	January 26, 2024
error while loading pre-annotated jsonl file usage , textcat , solved	9	540	March 29, 2023
Bulk import textcat examples	2	24	April 29, 2025
text classification usage , textcat	7	1126	October 7, 2019

.jsonl-formatted file, mark as either category a, b, or c (mutually exclusive) and save to database- how?

Related topics