Custom Recipe - Database structure

Hello, I’m new to prodigy and I’m trying to follow and adapt the tutorial “custom recipes” to use it with expressions in spanish, more exactly the section: Example: Wrapping built-in recipes (https://prodi.gy/docs/workflow-custom-recipes), but I don’t fully understand the schema of the database, I’m getting this error:

Traceback (most recent call last):
File “/home/user/.local/lib/python3.6/site-packages/waitress/channel.py”, line 338, in service
task.service()
File “/home/user/.local/lib/python3.6/site-packages/waitress/task.py”, line 169, in service
self.execute()
File “/home/user/.local/lib/python3.6/site-packages/waitress/task.py”, line 399, in execute
app_iter = self.channel.server.application(env, start_response)
File “/home/user/.local/lib/python3.6/site-packages/hug/api.py”, line 423, in api_auto_instantiate
return module.hug_wsgi(*args, **kwargs)
File “/home/user/.local/lib/python3.6/site-packages/falcon/api.py”, line 244, in call
responder(req, resp, **params)
File “/home/user/.local/lib/python3.6/site-packages/hug/interface.py”, line 793, in call
raise exception
File “/home/user/.local/lib/python3.6/site-packages/hug/interface.py”, line 766, in call
self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
File “/home/user/.local/lib/python3.6/site-packages/hug/interface.py”, line 703, in call_function
return self.interface(**parameters)
File “/home/user/.local/lib/python3.6/site-packages/hug/interface.py”, line 100, in call
return __hug_internal_self._function(*args, **kwargs)
File “/home/user/.local/lib/python3.6/site-packages/prodigy/app.py”, line 105, in get_questions
tasks = controller.get_questions()
File “cython_src/prodigy/core.pyx”, line 109, in prodigy.core.Controller.get_questions
File “cython_src/prodigy/components/feeds.pyx”, line 56, in prodigy.components.feeds.SharedFeed.get_questions
File “cython_src/prodigy/components/feeds.pyx”, line 61, in prodigy.components.feeds.SharedFeed.get_next_batch
File “cython_src/prodigy/components/feeds.pyx”, line 137, in prodigy.components.feeds.SessionFeed.get_session_stream
ValueError: Error while validating stream: no first example. This likely means that your stream is empty.

Hi! Could you share a bit more about what you're doing – for example, the custom recipe code or especially the code that puts together the stream?

ValueError: Error while validating stream: no first example. This likely means that your stream is empty.

This error usually means that there are no examples in the stream – for example, if you're loading in a file and the file is empty, if all examples have already been annotated or if there are no examples that meet a specified condition (e.g. if you're trying to find pattern matches but there are no matches in the data, or if no entities for a label are predicted).

I hope the example here wasn't confusing! It assumes a use case where you have some other external database or source of loading in data and you want to replace the stream in the ner.teach recipe (usually expected to be a file) with your own list of examples. To do this, you can call the teach function and pass in a list of examples as the source argument. How you create this list is completely up to you and depends on where the data is coming from.

Depending on what you're trying to do, it might make sense to pre-process your data instead and create a JSONL file, which you can then load into the regular ner.teach recipe. For example, if the texts you want to annotate live in a MongoDB database, you could fetch them all at, write them to a file as {'text': 'The text...'} etc. and then load that file into ner.teach.

The code I using is this:

def custom_ner_teach(dataset, spacy_model, database, label=None):
conn = sqlite3.connect(database)
table = pd.read_sql_query(“SELECT * from productos”, conn)
stream = ({‘text’: table[‘plain_text’][i]} for i in range(len(table)))
components = teach(dataset=dataset, spacy_model=spacy_model,
source=stream, label=label)
return components

Thanks! What does the table dataframe look like? I think this part here is potentially problematic:

Does table['plain_text'][i] actually contain the texts in your database? If it doesn't, or if the table doesn't have a length, you can easily end up with an empty stream.

Maybe you can just print the first records to see what the structure looks like, so you can adjust your code accordingly?

When I print the first three rows the exit is this:

  • Hierbas chinas tradicionales, raíces, insectos y cortezas se infusionan en…
  • Este curso te ayudará a triunfar con tu libro. Dile adiós a tu bloqueo como autor…
  • Se trata de una loción corporal utilizada para preparar los músculos para el ejercicio…

I create a table with only one column name “plain_text”.