Adding patterns removes meta-data "score" if patterns found

I experience an issue where the field SCORE is left out of the meta-data section when patterns are found. While the SCORE is left out, PATTERN is shown. If a pattern isn't found, the SCORE field is shown.

I am running the following command:

PRODIGY_LOGGING=basic prodigy textcat.teach name-of-project en_core_web_lg data.jsonl --label RELEVANT --patterns patterns.jsonl

First line of data.jsonl:

{"text": "To perform the biopanning of a mimotope peptide with reduced affinity to anti-ochratoxin A (OTA) monoclonal antibodies (mAbs), we executed two improved biopanning approaches with a commercial 7-mer peptide library. In the first approach, anti-mouse IgG antibodies were used to erect the anti-OTA mAbs; in the second approach, an ultralow OTA concentration (0.1 ng/mL) was used to perform the competitive elution of phage particles. After the fourth round of biopanning was completed, 30 identified clones were positive phage particles; of these phage particles, 16 exhibited strong competitive inhibition with a low OTA concentration of 0.1 ng/mL. DNA sequencing results revealed that the 16 phage particles represented six different peptide sequences. Among these particles, the phage particle with a peptide sequence of \"GMVQTIF\" showed the highest sensitivity to OTA detection. The biotinylated 12-mer peptide \"GMVQTIF-GGGSK-biotin\" was designed as a competing antigen to develop a competitive peptide ELISA. Under the optimal parameters, the proposed peptide ELISA with the biotinylated 12-mar peptide as a competing antigen exhibited good dynamic linear detection for OTA in the range of 0.005 ng/mL0.2 ng/mL with a detection limit of 0.001 ng/mL. The median inhibition concentration of OTA was 0.024 ng/mL (n=6), which is approximately fivefold more efficient as a competing antigen than the OTAHRP conjugates. Reaction kinetics revealed that the biotinylated 12-mer peptide exhibited lower affinity to anti-OTA mAbs than the conventional chemical OTA antigen. The practicality of the proposed peptide ELISA was compared with a conventional ELISA method. In summary, this study demonstrated a novel concept of the development of phage-free peptide ELISA for the detection of OTA by using a biotinylated mimotope peptide as a competing antigen. This novel strategy can be applied to sensitively detect other toxic small molecules during food safety monitoring. (C) 2015 Elsevier B.V. All rights reserved.", "meta": {"relevant": "0", "id": "1", "authors": "Zou, X. Q.; Chen, C. C.; Huang, X. L.; Chen, X. L.; Wang, L.; Xiong, Y. H.", "year": "2016", "journal": "Talanta", "volume": "146", "pages": "394-400", "title": "Phage-free peptide ELISA for ochratoxin A detection based on biotinylated mimotope as a competing antigen"}}

First line of patterns.jsonl:

{"label": "RELEVANT", "pattern": [{"lower": "fumonisins"}]}

Furthermore, I just realized, that some observations are shown twice; Once without patterns and the meta-data PATTERN, but with the meta-data SCORE.

I just double-checked the data source, and the observation occurs only once.

Hi! By left out, do you mean, scores that you have assigned manually and want to pass through? Or just that the pattern matches don't have a score, while the model suggestions do?

Whether a score is shown or not depends on whether a score is assigned to the example. When you running textcat.teach, you get to see two types of suggestions: scored examples from the model, and exact pattern matches. The examples suggested from the model come with a score, the examples produced by the exact pattern matches don't – it's based on the exact match in the text that's highlighted (and also indicated by the pattern number in the meta, so you always know which pattern produced the match).

1 Like