Hi,
we are using a custom receipe which continuously feeds database data to our labeling tasks:
{
"db": "postgresql",
"db_settings": {
"postgresql": {
"dbname": "postgres"
},
"feed_overlap": false,
"force_stream_order": true,
"auto_exclude_current": false
}
}
def get_unlabeled_items(query):
conn = psycopg2.connect("") # pick from env
assert query
try:
while True:
with conn:
with conn.cursor() as curs:
curs.execute(query)
items = curs.fetchall()
prodigy.util.msg.text(f"Queried {len(items)} items...")
if len(items) == 0:
time.sleep(10)
for item in items:
yield item[0]
except Exception as e:
conn.close()
raise e
@prodigy.recipe("custom.ner")
def unbind_ner_label() -> Dict[str, Any]:
stream = get_unlabeled_items("select * from api.get_ner_tasks();")
nlp = spacy.blank("en")
stream = prodigy.components.preprocess.add_tokens(nlp, stream, use_chars=False)
return {
"view_id": "ner_manual",
"dataset": "ner",
"stream": stream,
"config": {"labels": ["brand", "quantity"]},
}
Question 1: Even though "auto_exclude_current
is set to false
- recurring items are still deduplicated and therefore "hang" within the stream as we want to label them twice.
Question 2: If the stream has 0 items, prodigy will show a Loading...
text instead of a No new task available
, which is shown if the generator is exhausted. Its more a cosmetic issue, but is it possible to recreate this behavior while keeping our unending generator implementation?
btw: the documentation states that feed_overlap
defaults to false
while in your code the true default is true
?
Best,
Roman