Hi Ines,
Thank you for advises! ner.batch-train works now! You are correct, I used blank model. I didn’t understand it from user manual, my suggestion to high light it, at first run must use --output setting with model path.
Regarding JSONL file:
text: is string of product description from drug store. Includes DRUG name, volume, percent of active component, product form, manufacture etc. It looks like:
{“text”:“эднит 20мг №28 таблетки гедеон рихтер”}
{“text”:“зитазониум 20мг №30 таблетки”}
{“text”:“aspirin №30 таблетки”}
{“text”:“aspirin c 20мг №10 таблетки”}
{“text”:“aspirin c forte 500мг №1 таблетки”}
{“text”:“aspirin c double effect 2mg №15 таблетки”}
also I have a list, with DRUG names, [‘эднит’, ‘зитазониум’, ‘aspirin’, ‘aspirin c forte’, ‘aspirin c forte’ , etc] i can put them into JSONL as label, according to link
but what next step is correct?
{“text”:“эднит 20мг №28 таблетки гедеон рихтер”,“spans”:[{“start”:0,“end”:1,“label”:“DRUG”}]}
{“text”:“зитазониум 20мг №30 таблетки”,“spans”:[{“start”:0,“end”:1,“label”:“DRUG”}]}
Also during ner.batch-train after manual annotation, have some confusion how proceed. Several DRUG names are single token and some are multi tokens:
aspirin c 20мг №10 таблетки - predict [aspirin c] - my turn [YES]
aspirin c forte 500мг №1 таблетки - predict [aspirin c] - my turn [what is better to use NO or SKIP?]