Hi,
Does Prodigy support annotation of Dari language, which is rtl language. If it supports that I should go further for installation and usage. To avoid waste of time, I asked this question.
Thank syou
Hi,
Does Prodigy support annotation of Dari language, which is rtl language. If it supports that I should go further for installation and usage. To avoid waste of time, I asked this question.
Thank syou
Hi! Rendering and labelling RTL text should be no problem – we actually have several users annotating Arabic text with Prodigy. You can set "writing_dir": "rtl"
in your config and the interface will be adjusted accordingly.
spaCy currently doesn’t have any pre-trained models for Dari, but we do have alpha tokenization support for Farsi, which you could use to bootstrap a model and as a tokenizer. Here’s how to export a blank model from spaCy:
import spacy
nlp = spacy.blank("fa") # create blank language class
nlp.to_disk("/path/to/blank-fa-model")
This will give you a model in the directory /path/to/blank-fa-model
, which you can then load into Prodigy – for instance, to add a text classifier or to manually label text (which pre-tokenizes the text for fast highlighting). Here’s an example:
prodigy ner.manual your_dataset /path/to/blank-fa-model /path/to/data.jsonl --label PERSON