How to duplicate lines in a jsonl file?

Tiziri · May 5, 2023, 9:05am

"Hello prodigy, I would like to ask you how can I duplicate this list of recipes [29, 36, 43, 44, 47, 50, 54, 58, 62, 65, 72, 79, 120, 133, 139, 141, 146, 149, 150, 151, 180, 219] ---> make a copy of each in my already annotated jsonl file with ner.manual. For example, for recipe 29, I would like a copy 29.1 so that I can annotate it differently than recipe 29 (it contains two versions of recipes). I would like to give you some information about my text: it is an Arabic text containing 292 numbered recipes from 1 to 92."

ryanwesslen · May 5, 2023, 11:51am

hi @Tiziri,

I'm sorry, but I don't understand your question.

Are you using "recipes" in a different way than Prodigy does?

In Prodigy, a recipe is defined as "A Python function that can be executed from the command line and starts the Prodigy server for a specific task". So I'm a bit confused when you're asking a list of integers as a "list of recipes". Can you provide some clarification?

Tiziri · May 5, 2023, 1:04pm

Hello, thank you for your response and I apologize for not explaining my question clearly. I am using a text of Arabic pharmacopeias that contains treatment preparation recipes, and this text contains 292 recipes. I have already annotated them on Prodigy with ner.manual to extract ingredients, symptoms, etc. However, in some of these recipes in the text, there are multiple ways of preparation. Therefore, I would like to duplicate the recipes where there are multiple ways of preparation so that I can annotate them differently later with REL.Manual in order to establish relationships between entities. My question was how can I duplicate these recipes (not Prodigy recipes but actual medication preparation recipes) in the JSONL file?

ryanwesslen · May 5, 2023, 2:50pm

Ah -- when you said "recipes", they are your records that just so happen to be treatment recipes.

When you say you would "like to duplicate the recipes" -- are you saying you would like to re-annotate the same records?

You can do that fairly easily w/o needing to export to .jsonl by loading from existing datasets using dataset:[name of dataset]. For example, let's say your entities are saved into a Prodigy dataset called ner_recipes. Now you want to reannotate those records a 2nd time, saving these new annotations into a 2nd Prodigy dataset called ner_recipes2.

python -m prodigy ner.manual ner_recipes2 blank:ar dataset:ner_recipes --label ...

The one problem is I don't understand what "multiple ways of preparation" means -- does that mean annotating new entities using the same entity types? different entity types?

What I recommend above will show whatever entities you annotated (assuming you put those entity types in --label my_label,my_label2,...).

One thing to be aware of: ner does not allow overlapping entities. This is different from spancat, which does permit those. I suspect it may be out of scope for you, but given it sounds like you have "multiple ways of preparation" that spancat may be a possible alternative if I understand what you mean. There's even a spancat tutorial video, blog, and template project that uses a recipe/ingredient example.

Tiziri · May 5, 2023, 6:48pm

When I talk about "recipe", I'm referring to this:
[7]
prescription of the caper pastille
which is useful against sclerosis and enlargement
of the spleen
Caper barks four parts; the seeds of agnus castus, black pepper, asarabacca,
‘long’ birthwort, irisa which is the root of the sky-coloured iris,5
and Indian spikenard two parts of each; saron half a part. (is) is
brought together, pounded, kneaded with wine boiled down to one
quarter, and formed into pastilles of one dirham.
[8]
poppy pastille
for (the treatment of ) hepatic fever
Dark-coloured and light-coloured poppy four dirham of each; the seeds
of serpent melon, cucumber, gourd, and purslane, and starch and gumarabic
one dirham of each. All (this) is pounded, strained, kneaded with
water, formed into pastilles [of ] one mitq̠ al, dried, and drunk with the
water of purslane seeds and pomegranate oxymel.

What I want to do is copy and paste a recipe into my jsonl file. For example, for recipe 7, I want a copy 7.1.

I have a set of recipes in a dataset that I want to annotate differently. I'm not looking to access the annotations I've already made, but rather to learn how to copy a line of JSONL data(automatically)
{"text":"[7]","_input_hash":-98412754,"_task_hash":506670144....>i want to have a copy :
{"text":"[7.1]","_input_hash":-98412754,"_task_hash":506670144

ryanwesslen · May 5, 2023, 6:54pm

I don't understand what 7.1 is. Can you explain?

Is "recipe 7" the full text?

prescription of the caper pastille
which is useful against sclerosis and enlargement
of the spleen
Caper barks four parts; the seeds of agnus castus, black pepper, asarabacca,
‘long’ birthwort, irisa which is the root of the sky-coloured iris,5
and Indian spikenard two parts of each; saron half a part. (is) is
brought together, pounded, kneaded with wine boiled down to one
quarter, and formed into pastilles of one dirham.

Is 7.1 the first line?

Topic		Replies	Views
Missing data usage , solved	5	786	October 15, 2020
Duplicates in revised annotations usage	2	574	May 29, 2019
Duplicated examples in NER.teach & large jsonl files usage , ner , done	5	1436	September 10, 2018
Tasks are duplicated	3	438	June 7, 2023
need help in creating own jsonl file for training the model usage , solved	9	2684	February 2, 2019

How to duplicate lines in a jsonl file?

Related topics