Hi,
this question is very closely related to How to score incompletely highlighted entities?.
I am trying to learn a new entity. The dataset looks like so:
{"text": " The notes will mature on August 15, 2018, and will be paid in U.S. dollars against presentation and surrender thereof at the corporate trust office of the Trustee. However, we may redeem the notes at our option prior to that date. See \"—Optional Redemption.\" The notes will not be entitled to the benefit of, and are not subject to, any sinking fund. ", "extra_info": {"start": 282609, "end": 283072, "filename": "/path/to/file", "sha512": "my_hash"}}
{"text": " the initial redemption date on or after which we may redeem the notes or the repayment date or dates on which the holders may elect repayment of the notes; ", "extra_info": {"start": 170838, "end": 171264, "filename": "/path/to/file", "sha512": "my_hash"}}
{"text": " The notes are redeemable at Citigroup\"s option, in whole, but not in part, on or after September 27, 2022, at a redemption price equal to 100% of the principal amount of the notes plus accrued and unpaid interest thereon to, but excluding, the date of redemption. In addition, Citigroup may redeem the notes prior to maturity if changes involving United States taxation occur which could require Citigroup to pay additional amounts, as described under \"Description of Debt Securities — Payment of Additional Amounts\" and \"— Redemption for Tax Purposes\" in the accompanying prospectus. ", "extra_info": {"start": 66141, "end": 66884, "filename": "/path/to/file", "sha512": "my_hash"}}
{"text": " If specified in the applicable prospectus supplement, TIFSA may redeem the debt securities of any series, as a whole or in part, at TIFSA\"s option on and after the dates and in accordance with the terms established for such series, if any, in the applicable prospectus supplement. If TIFSA redeems the debt securities of any series, TIFSA also must pay accrued and unpaid interest, if any, to the date of redemption on such debt securities. ", "extra_info": {"start": 373531, "end": 374087, "filename": "/path/to/file", "sha512": "my_hash"}}
{"text": "We will pay contingent interest on the convertible senior notes after they have been outstanding at least ten years, under certain conditions. We may redeem the convertible senior notes once they have been outstanding for ten years at a redemption price of 100% of the principal amount of the notes, payable in cash. The optional repurchase dates, the common stock price conversion threshold amounts and the ending date of the first six-month period contingent interest may be payable for the contingent convertible senior notes are as follows: ", "extra_info": {"start": 454678, "end": 456968, "filename": "/path/to/file", "sha512": "my_hash"}}
My patterns.jsonl looks like so:
{"label": "aliases", "pattern": [{"lower": "the notes"}]}
{"label": "aliases", "pattern": [{"lower": "the existing 2021 notes"}]}
{"label": "aliases", "pattern": [{"lower": "the exchange notes"}]}
{"label": "aliases", "pattern": [{"lower": "the series 2012c senior notes"}]}
{"label": "aliases", "pattern": [{"lower": "the 2024 first mortgage bonds"}]}
I start training with the following command:
prodigy ner.teach Aliases en_core_web_md paragraph_content.jsonl --patterns patterns.jsonl --label aliases
During the training process, I never saw a multiple token suggestion. In the very beginning, I saw selections of only parts of my entities, but in the end I did not pay attention anymore.
What am I doing wrong here? How could I set up the training to obtain valuable suggestions.