Hallo,
I built a custom recipe to do text classification according to a query. The dataset is a big CSV file. When I start a new session I basically start from the beginning. I tried to use the filter_inputs to filter the inputs saved from the last session but I still have the same problem.
@prodigy.recipe(
"semanticsearch",
dataset=("The dataset to use", "positional", None, str),
source=("The source data as a CSV file", "positional", None, str)
)
def semanticsearch(
dataset: str,
source: str):
db = connect()
input_hashes = db.get_input_hashes(dataset)
stream = CSV(source)
stream = filter_inputs(stream, input_hashes)
blocks = [
{"view_id": "html",
"html_template": "<div style='background-color:SlateBlue;'><h1 style='color:White;'>{{label}}</h1></div>"},
{"view_id": "html", "html_template": "<div>{{text}}</div>"}
]
return {
"view_id": "blocks", # Annotation interface to use
"dataset": dataset, # Name of dataset to save annotations
"stream": stream, # Incoming stream of examples
"config": {"blocks": blocks}
}
Any help?
LG