Dependency parsing with multiple keywords

I want to find the possible relationships between three group of keywords in a text. These groups are ingredient _keywords, food_keywords, and adverse_event_keywords. Let's say that the text is "Too much salt in cheese may cause hypertension .". I want to train a model that predicts possible relationships between ingredient _keywords and/or food_keywords with adverse_event_keywords. I have prepared a jsonl file to import in prodigy (in the following format) but it seems it does not work.

{"text": "Too much salt in cheese may cause hypertension .", "spans": [{"start": 0, "end": 1, "label": " ingredient _keywords", "token": "salt"}, {"start": 9, "end": 10, "label": "food_keywords", "token": "cheese"}, {"start": 11, "end": 12, "label": "adverse_event_keywords", "token": " hypertension "}, , "relations": [], "_input_hash": 242237, "_task_hash": 242237}

This is the format of a line in my jsonl file and I have already imported it to the prodigy db as "dependency_parsing_abstracts". I am using following code for training and not sure where I am wrong exactly:

python -m prodigy rel.manual dependency_parsing_abstracts en_core_web_lg --labels " ingredient _keywords, food_keywords,adverse_event_keywords"

Thanks!

Any help or advice is appreciated!

hi @AmirNickkar,

Thanks for your question.

So it looks like there are a few minor issues.

First, your .jsonl file isn't formatted correctly.

For example (notice the "token": " hypertension ", which is out of place).

{"start": 11, "end": 12, "label": "adverse_event_keywords", "token": " hypertension "}, , "relations": [], "_input_hash": 242237, "_task_hash": 242237}

Second, is there a reason there's white space in the entity type: " ingredient _keywords"? That makes CLI commands a lot harder so I went ahead and changed this to "ingredients_keywords".

With these, I now have the .jsonl file of:

# issue-6448.jsonl
{"text":"Too much salt in cheese may cause hypertension .","spans":[{"start":0,"end":1,"label":"ingredient_keywords","token":"salt"},{"start":9,"end":10,"label":"food_keywords","token":"cheese"},{"start":11,"end":12,"label":"adverse_event_keywords"}],"relations":[],"_input_hash":242237,"_task_hash":242237}

I also noticed that you didn't have the "tokens" included. Did you manually remove these? Usually, these would automatically be included. See the format example for rel.manual.

So are you saying that you got that .jsonl from running this recipe? Or you had these annotations first, and then wanted to run the .jsonl file into:

python -m prodigy rel.manual dependency_parsing_abstracts en_core_web_lg --labels " ingredient _keywords, food_keywords,adverse_event_keywords"

I noticed this is missing passing in the source, i.e., what is the input for this recipe, so this wouldn't even run. You have dependency_parsing_abstracts, which is where the annotations will be saved.