spacy.llm error when auto annotating and getting labels

I am getting the error below when using spacy.llm to get labels for a text.

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.

my python file

from spacy_llm.util import assemble
from dotenv import load_dotenv

load_dotenv()

nlp = assemble('config.cfg')

doc = nlp("""
          In the depths of the ocean, an intrepid explorer named Jane Smith.
          a dedicated member of the Marine Conservation Society.
          embarked on a remarkable journey to study the diverse marine life that inhabits the underwater world.
          Her expedition led her to a hidden reef teeming with exotic fish species and surrounded by a stunning coral garden.
          Jane's findings contributed to our understanding of the delicate ecosystems in these remote aquatic environments.
          shedding light on the importance of protecting these fragile habitats.
          As she continued her exploration, Jane also encountered a local fishing community that depended on
          the ocean for their livelihoods, highlighting the intricate relationship between humans and the underwater world.
          """)
#print(doc)
for ent in doc.ents:
    print(ent,ent.label_)

my config.cfg

[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"

[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = ["UNDERWATER","COORDINATES","DEPTH","METHODS"]
description = "Entities are the names water pysical features as oceans and ll, coordinates of places, depth in metres or any scale and methods used in collecting data and samples"

[components.llm.task.label_definitions]
UNDERWATER = "Extract names of known underwater places and features e.g Seamount, Seamounts"
COORDINATES = "Extract geographic coordinates for latitude and longitude"
DEPTH = "Extract references to depth in feet or metres"
METHODS = "Extract references to collection methods e.g trawling, dredging, sampling, collecting"

[components.llm.model]
@llm_models = "spacy.Falcon.v1"
# For better performance, use dolly-v2-12b instead
name = "falcon-rw-1b"
#"Mistral-7B-v0.1"

Using

spacy==3.7.2
prodigy 1.14.10

Hi Nyaribari,

could you share the entire trace of the error? Without that information it's a bunch harder for us to figure out where this error is taking place. My gut feeling is that this is an issue on the falcon end, and perhaps not so much on the spaCy LLM side, but I'm not 100% sure.

That said, it might be better to post this issue on the spaCy LLM repository.

This forum is for Prodigy related issues, and this issue seems to be unrelated to Prodigy. Also, the maintainers of spaCy LLM are not that active on this forum, so you'll be able to get more bespoke advice on the Github repo. When you post the issue there, be sure to add the full traceback.

Thank you @koaning will sure do that.

1 Like