I will specify my question a bit more, as I have learned some things from the time I posted the last question.
So I have, this code, which uses a recipe to implement my desired new dependencies:
import prodigy
import spacy
from prodigy.components.loaders import JSONL
@prodigy.recipe("dep.manual.custom")
def dep_manual_custom(dataset, spacy_model, source):
nlp = spacy.load(spacy_model)
# Retrieve the original dependency labels from the loaded model
original_dep_labels = nlp.get_pipe("parser").labels
# Combine original labels with custom ones
custom_dep_labels = ["pl_obl", "tm_obl"]
combined_labels = list(original_dep_labels) + custom_dep_labels
def add_tokens(stream):
for eg in stream:
doc = nlp(eg["text"])
eg["tokens"] = [{"text": t.text, "start": t.idx, "end": t.idx + len(t.text), "id": i} for i, t in enumerate(doc)]
eg["arcs"] = [] # Initialize empty arcs for manual annotation
yield eg
stream = add_tokens(JSONL(source)) # Load examples from JSONL source
return {
"dataset": dataset,
"stream": stream,
"view_id": "dep", # Use the dependency view
"config": {
"labels": combined_labels, # Include both original and custom labels
"span_labels": combined_labels,
"optimize_typeahead": True,
"show_flag": False,
"exclude_by_input_hash": True
}
}
I used the following command, which intends to add my custom dependencies to be trained along with the already trained es_dep_news_trf
model dependencies.
python -m prodigy dep.correct my_dataset es_dep_news_trf my_data_source.jsonl --label "pl_obl,tm_obl" -F custom_deps.py
However, this only made my custom dependencies able to be trained from zero, without the underlying characteristics of the es_dep_news_trf
model. I tried removing --label "pl_obl,tm_obl"
from the command as that might have made the recipe only consider my custom dependencies, but then, only the original es_dep_news_trf
deps appeared.
I'd appreciate any sort of guidance on this matter and I'm sorry if the previous question was difficult to understand, I'm still figuringthis out.