Built-in Recipes
A Prodigy recipe is a Python function that can be run via the command line.
Prodigy comes with lots of useful recipes, and it’s very easy to
write your own. Recipes don’t have to start the web
server – you can also use the recipe decorator as a quick way to make your
Python function into a command-line utility. To view the recipe arguments and
documentation on the command line, run the command with --help
, for example
prodigy ner.manual --help
.
Named Entity Recognition | Tag names and concepts as spans in text. |
Text Classification | Assign one or more categories to whole texts. |
Part-of-speech Tagging | Assign part-of-speech tags to tokens. |
Dependency Parsing | Assign and correct syntactic dependency attachments in text. |
Coreference Resolution | Resolve mentions and references to the same words in text. |
Relations | Annotate any relations between words and phrases. |
Computer Vision | Annotate images and image segments. |
Audio & Video | Annotate and segment audio and video files. |
Training | Train models and export training corpora. |
Vectors & Terminology | Create patterns and terminology lists from word vectors. |
Review & Evaluate | Review annotations and outputs and resolve conflicts. |
Utilities & Commands | Manage datasets, view data and streams, and more. |
Deprecated Recipes | Recipes that have already been replaced by better alternatives. |
Named Entity Recognition
ner.manual
manual
- Interface:
ner_manual
- Saves: annotations to the database
- Use case: highlight spans of text manually or semi-manually
Mark entity spans in a text by highlighting them and selecting the respective labels. The model is used to tokenize the text to allow less sensitive highlighting, since the token boundaries are used to set the entity spans. The label set can be defined as a comma-separated list on the command line or as a path to a text file with one label per line. If no labels are specified, Prodigy will check if labels are present in the model. This recipe does not require an entity recognizer, and doesn’t do any active learning.
prodigy
ner.manual
dataset
spacy_model
source
--loader
--label
--patterns
--exclude
--highlight-chars
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model for tokenization or blank:lang for a blank model (e.g. blank:en for English). | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | One or more labels to annotate. Supports a comma-separated list or a path to a file with one label per line. If no labels are set, Prodigy will check the model for available labels. | |
--patterns , -pt | str | New: 1.9 Optional path to match patterns file to pre-highlight entity spans. | None |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
--highlight-chars , -C | bool | New: 1.10 Allow highlighting individual characters instead of snapping to token boundaries. If set, no "tokens" information will be saved with the example. |
Example
prodigy ner.manual ner_news en_core_web_sm ./news_headlines.jsonl --label PERSON,ORG,PRODUCT
ner.correct
manual
- Interface:
ner_manual
- Saves: annotations to the database
- Use case: correct a spaCy model's predictions manually
Create gold-standard data for NER by correcting the model’s suggestions. The spaCy model will be used to predict entities contained in the text, which the annotator can remove and correct if necessary.
prodigy
ner.correct
dataset
spacy_model
source
--loader
--label
--exclude
--unsegmented
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | One or more labels to annotate. Supports a comma-separated list or a path to a file with one label per line. If no labels are set, Prodigy will check the model for available labels. | |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
--unsegmented , -U | bool | Don’t split sentences. | False |
Example
prodigy ner.correct gold_ner en_core_web_sm ./news_headlines.jsonl --label PERSON,ORG
ner.teach
binary
- Interface:
ner
- Saves: Annotations to the database
- Updates: spaCy model in the loop
- Active learning: prefers most uncertain scores
- Use case: updating and improving NER models
Collect the best possible training data for a named entity recognition model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next.
prodigy
ner.teach
dataset
spacy_model
source
--loader
--label
--patterns
--exclude
--unsegmented
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Label(s) to annotate. Accepts single label or comma-separated list. If not set, all available labels will be returned. | None |
--patterns , -pt | str | Optional path to match patterns file to pre-highlight entity spans in addition to those suggested by the model. | None |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
--unsegmented , -U | bool | Don’t split sentences. | False |
Example
prodigy ner.teach ner_news en_core_web_sm ./news_headlines.jsonl --label PERSON,EVENT
ner.silver-to-gold
manual
- Interface:
ner_manual
- Saves: annotations to the database
- Use case: converting binary datasets to gold-standard data with no missing values
Take existing “silver” datasets with binary accept/reject annotations, merge the annotations to find the best possible analysis given the constraints defined in the annotations, and manually edit it to create a perfect and complete “gold” dataset.
prodigy
ner.silver-to-gold
dataset
silver_sets
spacy_model
--label
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset ID to save annotations to. | |
silver_sets | str | Comma-separated names of existing binary datasets to convert. | |
spacy_model | str | Loadable spaCy model. | |
--label , -l | str | One or more labels to annotate. Supports a comma-separated list or a path to a file with one label per line. If no labels are set, Prodigy will check the model for available labels. |
ner.eval-ab
binary
- Interface:
choice
- Saves: evaluation results to the database
- Use case: comparing and evaluating two models (e.g. before and after training)
Load two models and a stream of text, compare their predictions and select which result you prefer. The outputs will be randomized, so you won’t know which model is which. When you stop the server, the results are calculated. This recipe is especially helpful if you’re updating an existing model or if you’re trying out a new strategy on the same problem. Even if two models achieve similar accuracy, one of them can still be subjectively “better”, so this recipe lets you analyze that.
prodigy
ner.eval-ab
dataset
model_a
model_b
source
--loader
--label
--exclude
--unsegmented
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
model_a | str | First loadable spaCy model to compare. | |
model_b | str | Second loadable spaCy model to compare. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | One or more labels to annotate. Supports a comma-separated list or a path to a file with one label per line. If no labels are set, Prodigy will check the model for available labels. | |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
--unsegmented , -U | bool | Don’t split sentences. | False |
Example
prodigy ner.eval-ab eval_dataset en_core_web_sm ./improved_ner_model ./news_headlines.jsonl
Text Classification
textcat.manual
manual
- Interface:
choice
/classification
- Saves: annotations to the database
- Use case: select one or more categories to apply to the text
Manually annotate categories that apply to a text. If only one label is set, the
classification
interface is used. If more than one label is specified,
the choice
interface is used and categories are added as multiple
choice options. If the --exclusive
flag is set, categories become mutually
exclusive, meaning that only one can be selected during annotation.
prodigy
textcat.manual
dataset
source
--loader
--label
--exclusive
--exclude
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Category label to apply. | '' |
--exclusive, -E | bool | Treat labels as mutually exclusive. If not set, an example may have multiple correct classes. | False |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
Example
prodigy textcat.manual news_topics ./news_headlines.jsonl --label Technology,Politics,Economy,Entertainment
textcat.teach
binary
- Interface:
classification
- Saves: annotations to the database
- Updates: spaCy model in the loop
- Active learning: prefers most uncertain scores
- Use case: updating and improving text classification models
Collect the best possible training data for a text named entity recognition
model with the model in the loop. Based on your annotations, Prodigy will decide
which questions to ask next. All annotations will be stored in the database. If
a patterns file is supplied via the --patterns
argument, the matches will be
included in the stream and the matched spans are highlighted, so you’re able to
tell which words or phrases the selection was based on. Note that the exact
pattern matches have no influence when updating the model – they’re only used to
help pre-select examples for annotation.
prodigy
textcat.teach
dataset
spacy_model
source
--loader
--label
--patterns
--long-text
--init-tok2vec
--exclude
--unsegmented
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model or blank:lang for a blank model (e.g. blank:en for English). | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Category label to apply. | '' |
--patterns , -pt | str | Optional path to match patterns file to filter out examples containing terms and phrases. | None |
--init-tok2vec , -t2v | str | Path to pretrained weights for the token-to-vector parts of the model. Experimental. | None |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
Example
prodigy textcat.teach news_topics en_core_web_sm ./news_headlines.jsonl --label Technology,Politics,Economy,Entertainment
Part-of-speech Tagging
pos.correct
manual
- Interface:
pos_manual
- Saves: annotations to the database
- Use case: correct a spaCy model's predictions manually
Create gold-standard data for part-of-speech tagging by correcting the model’s
suggestions. The spaCy model will be used to predict part-of-speech tags, which
the annotator can remove and correct if necessary. It’s often more efficient to
focus on a few labels at a time, instead of annotating all labels jointly. The
--fine-grained
flag enables annotation of the fine-grained tags, i.e.
Token.tag_
instead of Token.pos_
. Note that this can lead to unexpected
results and very long tags for some language models that use fine-grained tags
composed of morphological features, like spaCy’s default Italian or Dutch
models.
prodigy
pos.correct
dataset
spacy_model
source
--loader
--label
--exclude
--unsegmented
--fine-grained
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | One or more tags to annotate. Supports a comma-separated list or a path to a file with one label per line. If not set, all tags are shown. | |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
--unsegmented , -U | bool | Don’t split sentences. | False |
--fine-grained , -FG | bool | Use fine-grained part-of-speech tags, i.e. Token.tag_ instead of Token.pos_ . | False |
Example
prodigy pos.correct news_pos en_core_web_sm ./news_headlines.jsonl --label NOUN,VERB,PROPN
pos.teach
binary
- Interface:
pos
- Saves: Annotations to the database
- Updates: spaCy model in the loop
- Active learning: prefers most uncertain scores
- Use case: updating and improving part-of-speech tagging models
Collect the best possible training data for a part-of-speech tagging model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next. It’s often more efficient to focus on a few labels at a time, instead of annotating all labels jointly.
prodigy
pos.teach
dataset
spacy_model
source
--loader
--label
--tag-map
--exclude
--unsegmented
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Label(s) to annotate. Accepts single label or comma-separated list. If not set, all available labels will be returned. | None |
--tag-map , -tm | str / Path | Path to JSON mapping table for POS tags. Read from the spaCy tagger model if not provided. | None |
--patterns , -pt | str | Optional path to match patterns file to filter out entities. | None |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
--unsegmented , -U | bool | Don’t split sentences. | False |
Example
prodigy pos.teach pos_news en_core_web_sm ./news_headlines.jsonl --label VERB,NOUN,PROPN
Dependency Parsing
dep.correct
manualNew: 1.10
- Interface:
relations
- Saves: annotations to the database
- Updates: spaCy model in the loop (if enabled)
- Active learning: no example selection
- Use case: correct a spaCy model's predictions manually
Create gold-standard data for dependency parsing by correcting the model’s
suggestions. The spaCy model will be used to predict dependencies for the given
labels, which the annotator can remove and correct if necessary. If --update
is set, the model in the loop will be updated with the annotations and its
updated predictions will be reflected in future batches. The recipe performs no
example selection and all texts will be shown as they come in.
prodigy
dep.correct
dataset
spacy_model
source
--loader
--label
--update
--wrap
--unsegmented
--exclude
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model with a dependency parser. | |
source | str | Path to text source, - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Label(s) to annotate. Accepts single label or comma-separated list. If not set, all available labels will be used. | None |
--update , -U | bool | Whether to update the model in the loop during annotation. | |
--wrap , -W | bool | Wrap lines in the UI by default (instead of showing tokens in one row). | False |
--unsegmented , -U | bool | Don’t split sentences. | False |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
Example
prodigy dep.correct deps_news en_core_web_sm ./news_headlines.jsonl --label ROOT,csubj,nsubj,dobj,pboj --update
dep.teach
binary
- Interface:
dep
- Saves: annotations to the database
- Updates: spaCy model in the loop
- Active learning: prefers most uncertain scores
- Use case: updating and improving dependency parsing models
Collect the best possible training data for a dependency parsing model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next. It’s often more efficient to focus on a few most relevant labels at a time, instead of annotating all labels jointly.
prodigy
dep.teach
dataset
spacy_model
source
--loader
--label
--exclude
--unsegmented
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Label(s) to annotate. Accepts single label or comma-separated list. If not set, all available labels will be returned. | None |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
--unsegmented , -U | bool | Don’t split sentences. | False |
Example
prodigy dep.teach deps_news en_core_web_sm ./news_headlines.jsonl --label csubj,nsubj,dobj,pboj
Coreference Resolution
coref.manual
manualNew: 1.10
- Interface:
relations
- Saves: annotations to the database
- Use case: create annotations for coreference resolution
Create training data for coreference resolution. Coreference resolution is the challenge of linking ambiguous mentions such as “her” or “that woman” back to an antecedent providing more context about the entity in question. This recipe allows you to focus on nouns, proper nouns and pronouns specifically, by disabling all other tokens. You can customize the labels used to extract those using the recipe arguments. Also see the usage guide on coreference annotation.
prodigy
coref.manual
dataset
spacy_model
source
--loader
--label
--pos-tags
--poss-pron-tags
--ner-labels
--exclude
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model with the required capabilities (entity recognizer part-of-speech tagger) or blank:lang for a blank model (e.g. blank:en for English). | |
source | str | Path to text source, - to read from standard input or dataset:name to load from existing annotations. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Label(s) to use for coreference annotation. Accepts single label or comma-separated list. | "COREF" |
--pos-tags , -ps | str | List of coarse-grained POS tags to enable for annotation. | "NOUN,PROPN,PRON,DET" |
--poss-pron-tags , -pp | str | List of fine-grained tag values for possessive pronoun to use. | "PRP$" |
--ner-labels , -nl | str | List of NER labels to use if model has a named entity recognizer. | "PERSON,ORG" |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
Example
prodigy coref.manual coref_movies en_core_web_sm ./plot_summaries.jsonl --label COREF
Relations
rel.manual
manualNew: 1.10
- Interface:
relations
- Saves: annotations to the database
- Use case: annotate relations between words and expressions
Annotate directional relations and dependencies between tokens and
expressions by selecting the head, child and dependency label and optionally
assign labelled spans for named entities or other expressions. This workflow
is extremely powerful and can be used for basic dependency annotation, as well
as joint named entity and entity relation annotation. If --span-label
defines
additional span labels, a second mode for span highlighting is added.
The recipe lets you take advantage of several efficiency tricks: spans can
be pre-defined using an existing NER dataset, entities or noun phrases from a
model or fully custom match patterns. You can also disable certain tokens to
make them unselectable. This lets you focus on what matters and prevents
annotators from introducing mistakes. For more details and examples, check out
the
usage guide on custom relation annotation
and see the task-specific recipes dep.correct
and coref.manual
that include pre-defined configurations.
prodigy
rel.manual
dataset
spacy_model
source
--loader
--label
--span-label
--patterns
--disable-patterns
--add-ents
--add-nps
--wrap
--exclude
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model with the required capabilities (if entities or noun phrases should be merged) or blank:lang for a blank model (e.g. blank:en for English). | |
source | str | Path to text source, - to read from standard input or dataset:name to load from existing annotations. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Label(s) to annotate. Accepts single label or comma-separated list. | None |
--span-label , -sl | str | Optional span label(s) to annotate. If set, an additional span highlighting mode is added. | None |
--patterns , -pt | str | Path to patterns file defining spans to be added and merged. | None |
--disable-patterns , -dpt | str | Path to patterns file defining tokens to disable (make unselectable). | None |
--add-ents , -AE | bool | Add entities predicted by the model. | False |
--add-nps , -AN | bool | Add noun phrases (if noun chunks rules are available), based on tagger and parser. | False |
--wrap , -W | bool | Wrap lines in the UI by default (instead of showing tokens in one row). | False |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
Example
prodigy rel.manual relation_data en_core_web_sm ./data.jsonl --label COREF,OBJECT --span-label PERSON,PRODUCT,NP --disable-patterns ./disable_patterns.jsonl --add-ents --wrap
disable_patterns.jsonl{"pattern": [{"is_punct": true}]}
{"pattern": [{"pos": "VERB"}]}
{"pattern": [{"lower": {"in": ["'s", "’s"]}}]}
Computer Vision
image.manual
manual
- Interface:
image_manual
- Saves: annotations to the database
- Use case: Add bounding boxes and segments to images
Annotate images by drawing rectangular bounding boxes and polygon shapes. Each
shape will be added to the task’s "spans"
with its label and a "points"
property containing the [x, y]
pixel coordinate tuples.
See here for more details on the JSONL
format. You can click and drag or click and release to draw boxes. Polygon
shapes can also be closed by double-clicking when adding the last point, similar
to closing a shape in Photoshop or Illustrator. Clicking on the label will
select a shape so you can change the label or delete it.
prodigy
image.manual
dataset
source
--loader
--exclude
--width
--darken
--no-fetch
--remove-base64
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
source | str | Path to a directory containing image files or pre-formatted JSONL file if --loader jsonl is set. | |
--loader , -lo | str | Optional ID of source loader. | images |
--label , -l | str / Path | One or more labels to annotate. Supports a comma-separated list or a path to a file with one label per line. | |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
--width , -w | int | New: 1.10 Width of card and maximum image width in pixels. | 675 |
--darken , -D | bool | Darken image to make boxes stand out more. | False |
--no-fetch , -NF | bool | New: 1.9 Don’t fetch images as base64. Ideally requires a JSONL file as input, with --loader jsonl set and all images available as URLs. | False |
--remove-base64 , R | bool | New: 1.10 Remove base64-encoded image data before storing example in the database and only keep the reference to the local file path. Caution: If enabled, make sure to keep original files! | False |
Example
prodigy image.manual photo_objects ./stock-photos --label LAPTOP,CUP,PLANT
Audio and Video
audio.manual
manualNew: 1.10
- Interface:
audio_manual
- Saves: annotations to the database
- Use case: Manually annotate audio regions in audio and video files
Manually label regions for the given labels in the audio or video file. The
recipe expects a directory of audio files as the source
argument and will use
the audio
loader (default) to load the data.
To load video files instead, you can set --loader video
. Each added region
will be added to the "audio_spans"
with a start and end timestamp and the
selected label.
prodigy
audio.manual
dataset
source
--loader
--label
--autoplay
--keep-base64
--fetch-media
--exclude
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
source | str | Path to a directory containing audio files or pre-formatted JSONL file if --loader jsonl is set. | |
--loader , -lo | str | Optional ID of source loader, e.g. audio or video . | audio |
--label , -l | str / Path | One or more labels to annotate. Supports a comma-separated list or a path to a file with one label per line. | |
--autoplay , -A | bool | Autoplay the audio when a new task loads. | False |
--keep-base64 , -B | bool | If audio loader is used: don’t remove the base64-encoded audio data from the task before it’s saved to the database. | False |
--fetch-media , -FM | bool | Convert local paths and URLs to base64. Can be enabled if you’re annotating a JSONL file with paths or for re-annotating an existing dataset. | False |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
Example
prodigy audio.manual speaker_data ./recordings --label SPEAKER_1,SPEAKER_2,NOISE
Recipe command
prodigy audio.manual speaker_data ./recordings --loader video --label SPEAKER_1,SPEAKER_2
audio.transcribe
manualNew: 1.10
- Interface:
blocks
/audio
/text_input
- Saves: annotations to the database
- Use case: Manually create transcriptions for audio and video files
Manually transcribe audio and video files by typing the transcript into a text
field. The recipe expects a directory of audio files as the source
argument
and will use the audio
loader (default) to
load the data. To load video files instead, you can set --loader video
. The
transcript will be stored as the key "transcript"
. To make it easier to toggle
play and pause as you transcribe and to prevent clashes with the text input
field (like with the default enter), this recipe lets you customize
the keyboard shortcuts. To toggle play/pause, you can press
command/option/alt/ctrl+enter
or provide your own overrides via --playpause-key
.
prodigy
audio.transcribe
dataset
source
--loader
--label
--autoplay
--keep-base64
--fetch-media
--playpause-key
--text-rows
--exclude
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
source | str | Path to a directory containing audio files or pre-formatted JSONL file if --loader jsonl is set. | |
--loader , -lo | str | Optional ID of source loader, e.g. audio or video . | audio |
--autoplay , -A | bool | Autoplay the audio when a new task loads. | False |
--keep-base64 , -B | bool | If audio loader is used: don’t remove the base64-encoded audio data from the task before it’s saved to the database. | False |
--fetch-media , -FM | bool | Convert local paths and URLs to base64. Can be enabled if you’re annotating a JSONL file with paths or for re-annotating an existing dataset. | False |
--playpause-key , -pk | str | Alternative keyboard shortcuts to toggle play/pause so it doesn’t conflict with text input field. | "command+enter, option+enter, ctrl+enter" |
--text-rows , -tr | int | Height of the text input field, in rows. | 4 |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
Example
prodigy audio.transcribe speaker_transcripts ./recordings --text-rows 3
Training models
train
commandNew: 1.9
- Interface: terminal only
- Saves: trained model to a directory (optional)
- Use case: run training experiments
Train a model component (NER, text classification, tagger or parser) using one
or more Prodigy datasets with annotations. The recipe calls into spaCy directly
and can update an existing model or train a new model from scratch. Datasets
will be merged and conflicts will be filtered out. If your data contains
potentially conflicting annotations, it’s recommended to first use review
to resolve them. If you specify an --output
directory, the best model will be
saved at the end. You can then load it into spaCy by pointing
spacy.load
at the directory.
prodigy
train
component
datasets
spacy_model
--init-tok2vec
--output
--eval-id
--eval-split
--n-iter
--batch-size
--dropout
--factor
--textcat-exclusive
--ner-missing
--binary
--silent
Argument | Type | Description | Default |
---|---|---|---|
component | str | The component to train: ner , textcat , tagger or parser . | |
datasets | str | One or more (comma-separated) dataset names to train from. | |
spacy_model | str | Loadable spaCy model or blank:lang to start with a blank model (e.g. blank:en for English). | |
--init-tok2vec , -t2v | str | Optional path to pretrained weights for the token-to-vector parts of the model to use transfer learning. | None |
--output , -o | str | Optional path to output directory. | None |
--eval-id , -e | str | Optional ID of a dataset containing evaluation examples. | None |
--eval-split , -es | float | If no evaluation ID is provided, split off a portion of the training examples for evaluation. Defaults to 0.2 for over 1000 examples and 0.5 for under 100 examples. | None |
--n-iter , -n | int | Number of training iterations. | 10 |
--batch-size , -b | int | Training batch size or -1 for compounding batch size. | -1 |
--dropout , -d | float | Dropout rate. | 0.2 |
--factor , -f | float | Portion of the examples to train on, e.g 0.5 for 50%. Mostly used for train-curve . | 1.0 |
--textcat-exclusive , -TE | bool | For text classification: treat classes as mutually exclusive. If False , an example may have multiple correct classes. | False |
--ner-missing , -NM | bool | For NER: assume unannotated spans are missing values, not outside an entity. | False |
--binary , -B | bool | For NER, tagging and parsing: update from binary accept/reject annotations collected with ner.teach , pos.teach or dep.teach . | False |
--silent , -S | bool | Don’t print any updates. | False |
Example
prodigy train ner fashion_brands_training en_vectors_web_lg --eval-id fashion_brands_eval ✔ Loaded model 'en_vectors_web_lg' Created and merged data for 1235 total examples Created and merged data for 500 total examples Using 1235 train / 500 eval (from 'fashion_brands_eval') Component: ner | Batch size: compounding | Dropout: 0.2 | Iterations: 10 ℹ Baseline accuracy: 0.000 =========================== ✨ Training the model =========================== # Loss Precision Recall F-Score -- -------- --------- -------- -------- 1 1254.70 72.381 63.866 67.857 2 575.05 70.079 74.790 72.358 3 349.95 67.293 75.210 71.032 4 255.48 65.455 75.630 70.175 5 293.42 64.643 76.050 69.884 6 260.93 64.643 76.050 69.884 7 203.03 64.311 76.471 69.866 8 139.05 64.085 76.471 69.732 9 122.61 63.986 76.891 69.847 10 73.69 63.763 76.891 69.714 ============================ ✨ Results summary ============================ Label Precision Recall F-Score ------------- --------- ------ ------- FASHION_BRAND 70.079 74.790 72.358 Best F-Score 72.358 Baseline 0.000
train-curve
commandNew: 1.9
- Interface: terminal only
- Use case: test how accuracy improves with more data
Train a model component (NER, text classification, tagger or parser) with
different portions of the training examples and print the accuracy figures and
accuracy improvements with more data. This recipe takes pretty much the same
arguments as train
. --n-samples
sets the number of sample models to
train at different stages. For instance, 10
will train models for 10% of the
examples, 20%, 30% and so on. This recipe is useful to determine the quality of
the collected annotations, and whether more training examples will improve the
accuracy. As a rule of thumb, if accuracy improves within the last 25%, training
with more examples will likely result in better accuracy.
prodigy
train-curve
component
datasets
spacy_model
--init-tok2vec
--eval-id
--eval-split
--n-iter
--batch-size
--dropout
--n-samples
--textcat-exclusive
--ner-missing
--binary
Argument | Type | Description | Default |
---|---|---|---|
component | str | The component to train: ner , textcat , tagger or parser . | |
datasets | str | One or more (comma-separated) dataset names to train from. | |
spacy_model | str | Loadable spaCy model or blank:lang to start with a blank model (e.g. blank:en for English). | |
--init-tok2vec , -t2v | str | Optional path to pretrained weights for the token-to-vector parts of the model to use transfer learning. | None |
--eval-id , -e | str | Optional ID of a dataset containing evaluation examples. | None |
--eval-split , -es | float | If no evaluation ID is provided, split off a portion of the training examples for evaluation. Defaults to 0.2 for over 1000 examples and 0.5 for under 100 examples. | None |
--n-iter , -n | int | Number of training iterations. | 10 |
--batch-size , -b | int | Training batch size or -1 for compounding batch size. | -1 |
--dropout , -d | float | Dropout rate. | 0.2 |
--n-samples , -ns | int | Number of samples to train, e.g. 10 for results at 10%, 20% and so on. | 4 |
--textcat-exclusive , -TE | bool | For text classification: treat classes as mutually exclusive. If False , an example may have multiple correct classes. | False |
--ner-missing , -NM | bool | For NER: assume unannotated spans are missing values, not outside an entity. | False |
--binary , -B | bool | For NER, tagging and parsing: update from binary accept/reject annotations collected with ner.teach , pos.teach or dep.teach . | False |
Example
prodigy train-curve ner news_headlines en_vectors_web_lg --n-iter 10 ✔ Starting with model 'en_vectors_web_lg' Training 4 times with 25%, 50%, 75%, 100% of the data ============================== ✨ Train curve ============================== % Accuracy Difference ---- -------- ---------- 0% 0.33 baseline 25% 0.73 +0.40 50% 0.67 -0.07 75% 0.80 +0.13 100% 0.91 +0.11 ✔ Accuracy improved in the last sample As a rule of thumb, if accuracy increases in the last segment, this could indicate that collecting more annotations of the same type might improve the model further.
data-to-spacy
commandNew: 1.9
- Interface: terminal only
- Saves: training and evaluation data in spaCy's JSON format
- Use case: merge annotations and export a training corpus
Combine multiple datasets, merge annotations on the same examples and output a
file in spaCy’s JSON format that you
can use with spacy train
. If an
eval_output
is provided, the --eval-split
will be used to split the examples
into training and evaluation data. This recipe will merge annotations for the
different model components and outputs a combined training corpus. If an example
is only present in one dataset type, its annotations for the other components
will be missing values. It’s recommended to use the review
recipe on the
different annotation types first to resolve conflicts properly.
prodigy
data-to-spacy
output
eval_output
--lang
--ner
--textcat
--tagger
--parser
--textcat-exclusive
--ner-missing
--eval-split
--base-model
Argument | Type | Description | Default |
---|---|---|---|
output | str | Path to output JSON file. | |
eval_output | str | Optional path to evaluation data JSON file. If not set, no evaluation data will be created. | |
--lang , -l | str | Two-letter language code to use for tokenization. | "en" |
--ner , -n | str | Comma-separated names of datasets to use for named entity recognition. | |
--textcat , -tc | str | Comma-separated names of datasets to use for text classification. | |
--tagger , -t | str | Comma-separated names of datasets to use for part-of-speech tagging annotations. | |
--parser , -p | str | Comma-separated names of datasets to use for dependency parsing annotations. | |
--textcat-exclusive , -TE | bool | For text classification: treat classes as mutually exclusive. If False , an example may have multiple correct classes. | False |
--ner-missing , -NM | bool | For NER: assume unannotated spans are missing values, not outside an entity. | False |
--eval-split , -es | float | If an eval_output is provided, split off a portion of the training examples for evaluation. Defaults to 0.2 for over 1000 examples and 0.5 for under 100 examples. | None |
--base-model , -m | str | New: 1.10 Optional spaCy model for tokenization and custom sentencizer . | None |
Example
prodigy data-to-spacy ./train-data.json ./eval-data.json --lang en --ner news_ner_person,news_ner_org,news_ner_product --textcat news_cats2018,news_cats2019 --eval-split 0.3 ✔ Saved 1050 examples to ./eval-data.json ✔ Saved 2450 examples to ./train-data.json
Training in spaCy
spacy train en ./model ./train-data.json ./eval-data.json --pipeline ner,textcat --n-iter 20 --textcat-multilabel
Vectors and Terminology
terms.teach
binary
- Interface:
text
- Saves: accepted and rejected terms to the database
- Updates: target vector used for similarity comparsion
- Use case: building terminology lists and pre-processing candiates for NER training
Build a terminology list interactively using a model’s word vectors and seed terms, either a comma-separated list or a text file containing one term per line. Based on the seed terms, a target vector is created and only terms similar to that target vector are shown. As you annotate, the recipe iterates over the vector model’s vocab and updates the target vector with the words you accept.
prodigy
terms.teach
dataset
vectors
--seeds
--resume
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
vectors | str | Loadable spaCy model with word vectors and a vocab, e.g. en_core_web_lg or en_vectors_web_lg , or custom vectors trained on domain-specific text. | |
--seeds , -s | str / Path | Comma-separated list or path to file with seed terms (one term per line). | '' |
--resume , -R | bool | Resume from existing terms dataset and update target vector accordingly. | False |
Example
prodigy terms.teach prog_lang_terms en_vectors_web_lg --seeds Python,C++,Ruby
terms.to-patterns
command
- Interface: terminal only
- Saves: JSONL-formatted patterns file
- Use case: Convert terms dataset to match patterns to bootstrap annotation or for spaCy's entity ruler
Convert a dataset collected with terms.teach
or
sense2vec.teach
to a JSONL-formatted patterns file. You can
optionally provide a spaCy model for tokenization to create token-based patterns
and make them case-insensitive. If no model is provided, the patterns will be
generated as exact string matches. Pattern files can be used in Prodigy to
bootstrap annotation and pre-highlight suggestions, for example in
ner.manual
. You can also use them with
spaCy’s EntityRuler
for rule-based named entity recognition.
prodigy
terms.to-patterns
dataset
output_file
--label
--spacy-model
--case-sensitive
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset ID to convert. | |
output_file | str | Optional path to an output file. | sys.stdout |
--label , -l | str | Label to assign to the patterns. | None |
--spacy-model , -m | str | New: 1.9 Optional spaCy model for tokenization to create token-based patterns, or blank:lang to start with a blank model (e.g. blank:en for English). | None |
--case-sensitive , -CS | bool | New: 1.9 Make patterns case-sensitive. | False |
Example
prodigy terms.to-patterns prog_lang_terms ./prog_lang_patterns.jsonl --label PROGRAMMING_LANGUAGE --spacy-model blank:en ✨ Exported 59 patterns ./prog_lang_patterns.jsonl
Review and Evaluate
review
New: 1.8
- Interface:
review
- Saves: reviewed master annotations to the database
- Use case: review annotations by multiple annotators and resolve conflicts
Review existing annotations created by multiple annotators and resolve potential
conflicts by creating one final “master annotation”. Can be used for both binary
and manual annotations and supports all interfaces except image_manual
and compare
. If the annotations were created with a manual interface,
the “most popular” version, e.g. the version most sessions agreed on, will be
pre-selected automatically.
prodigy
review
dataset
in_sets
--label
--view-id
--fetch-media
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset ID to save reviewed annotations. | |
in_sets | str | Comma-separated names of datasets to review. | |
--label , -l | str | Optional comma-separated labels to display in manual annotation mode. | None |
--view-id , -v | str | Interface to use if none present in the task, e.g. ner or ner_manual . | None |
--fetch-media , -FM | bool | New: 1.10 Temporarily replace paths and URLs with base64 string so thex can be reannotated. Will be removed again before examples are placed in the database. | False |
Example
prodigy review food_reviews_final food_reviews2019,food_reviews2018
compare
Compare the output of your model and the output of a baseline on the same
inputs. To prevent bias during annotation, Prodigy will randomly decide which
output to suggest as the correct answer. When you exit the application, you’ll
see detailed stats, including the preferred output. Expects two JSONL files
where each entry has an "id"
(to match up the outputs on the same input), and
an "input"
and "output"
object with the content to render, e.g. the
"text"
.
prodigy
compare
dataset
a_file
b_file
--no-random
--diff
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
a_file | str | First file to compare, e.g. system responses. | |
b_file | str | Second file to compare, e.g. baseline responses. | |
--no-random , -nr | bool | Don’t randomize which annotation is shown as the “correct” suggestion (always use the first option). | False |
--diff , -D | bool | Show examples as visual diff. | False |
prodigy
compare
eval_translation
./model_a.jsonl
./model_b.jsonl
model_a.jsonl{"id": 1, "input": {"text": "FedEx von weltweiter Cyberattacke getroffen"}, "output": {"text": "FedEx hit by worldwide cyberattack"}}
model_b.jsonl{"id": 1, "input": {"text": "FedEx von weltweiter Cyberattacke getroffen"}, "output": {"text": "FedEx from worldwide Cyberattacke hit"}}
Other Utilities and Commands
mark
binary
- Interface: n/a
- Saves: annotations to the database
- Use case: show data and accept or reject examples
Start the annotation server, display whatever comes in with a given interface
and collect binary annotations. At the end of the annotation session, a
breakdown of the answer counts is printed. The --view-id
lets you specify one
of the existing annotation interfaces – just make sure
your input data includes everything the interface needs, since this recipe does
no preprocessing and will just show you whatever is in the data. The recipe is
also very useful if you want to re-annotate data exported with db-out
.
prodigy
mark
dataset
source
--loader
--label
--view-id
--exclude
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Label to apply in classification mode or comma-separated labels to show for manual annotation. | '' |
--view-id , -v | str | Annotation interface to use. | None |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
prodigy
mark
news_marked
./news_headlines.jsonl
--label INTERESTING
--view-id classification
match
binaryNew: 1.9.8
- Interface: n/a
- Saves: annotations to the database
- Use case: select examples based on match patterns
Select examples based on match patterns and
accept or reject the result. Unlike ner.manual
with patterns, this recipe
will only show examples if they contain pattern matches. It can be used for NER
and text classification annotations – for instance, to bootstrap a text category
if the classes are very imbalanced and not enough positive examples are
presented during manual annotation or textcat.teach
. The --label-task
and --label-span
flags can be used to specify where the label should be added.
This will also be reflected via the "label"
property (on the top-level task or
the spans) in the data you create with the recipe. If --combine-matches
is
set, all matches will be presented together. Otherwise, each match will be
presented as a separate task.
prodigy
match
dataset
spacy_model
source
--loader
--label
--patterns
--label-task
--label-span
--combine-matches
--exclude
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset to save annotations to. | |
spacy_model | str | Loadable spaCy model for tokenization to initialize the matcher, or blank:lang for a blank model (e.g. blank:en for English). | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
--label , -l | str | Comma-separated label(s) to annotate or text file with one label per line. Only pattern matches for those labels will be shown. | |
--patterns , -pt | str | Path to match patterns file. | |
--label-task , -LT | bool | Whether to add a label to the top-level task if a match for that label was found. For example, if you use this recipe for text classification, you typically want to add a label to the whole task. | False |
--label-span , -LS | bool | Whether to add a label to the matched span that’s highlighted. For example, if you use this recipe for NER, you typically want to add a label to the span but not the whole task. | False |
--combine-matches , -C | bool | Whether to show all matches in one task. If False , the matcher will output one task for each match and duplicate tasks if necessary. | False |
--exclude , -e | str | Comma-separated list of dataset IDs containing annotations to exclude. | None |
prodigy
match
news_matched
blank:en
./news_headlines.jsonl
--patterns ./news_patterns.jsonl
--label ORG,PRODUCT
--label-span
print-stream
commandNew: 1.9
- Interface: terminal only
- Use case: quickly view a spaCy model's predictions
Pretty-print the model’s predictions on the command line. Supports named
entities and text categories and will display the annotations if the model
components are available. For textcat annotations, only the category with the
highest score is shown if the score is greater than 0.5
.
prodigy
print-stream
spacy_model
source
--loader
Argument | Type | Description | Default |
---|---|---|---|
spacy_model | str | Loadable spaCy model. | |
source | str | Path to text source or - to read from standard input. | |
--loader , -lo | str | Optional ID of text source loader. If not set, source file extension is used to determine loader. | None |
print-dataset
commandNew: 1.9
- Interface: terminal only
- Use case: quickly inspect collected annotations
Pretty-print annotations from a given dataset on the command line. Supports
plain text, text classification and NER annotations. If no --style
is
specified, Prodigy will try to infer it from the data via the "_view_id"
that’s automatically added since v1.8.
prodigy
print-dataset
dataset
--style
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Prodigy dataset ID. | |
--style , -s | str | Dataset type: auto (try to infer from the data, default), text , spans or textcat . | auto |
db-out
command
- Interface: terminal only
- Saves: JSONL file to disk
- Use case: export annotated data
Export annotations in Prodigy’s JSONL format. If the output directory doesn’t exist, it will be created. If no output directory is specified, the data will be printed so it can be redirected to a file.
prodigy
db-out
dataset
out_dir
--dry
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Dataset ID to import or export. | |
out_dir | str | Optional path to output directory to export annotation file to. | None |
--dry , -D | bool | Perform a dry run and don’t save any files. | False |
Example
prodigy db-out news_headlines > ./news_headlines.jsonl
db-merge
command
- Interface: terminal only
- Saves: merged examples to the database
- Use case: merging multiple datasets with annotations into one
Merge two or more existing datasets into a new set, e.g. to create a final dataset that can be reviewed or used to train a model. Keeps a copy of the original datasets and creates a new set for the merged examples.
prodigy
db-merge
in_sets
out_set
--rehash
--dry
Argument | Type | Description | Default |
---|---|---|---|
in_sets | str | Comma-separated names of datasets to merge. | |
out_set | str | Name of dataset to save the merged examples to. | |
--rehash , -R | bool | New: 1.10 Force-update all hashes assigned to examples. | False |
--dry , -D | bool | Perform a dry run and don’t save anything. | False |
prodigy
db-merge
news_person,news_org,news_product
news_training
✔ Merged 2893 examples from 3 datasets
Created merged dataset 'news_training'
db-in
command
- Interface: terminal only
- Saves: imported examples to the database
- Use case: importing existing annotated data
Import existing annotations to the database. Can load all file types supported by Prodigy. To import NER annotations, the files should be converted into Prodigy’s JSONL annotation format.
prodigy
db-in
dataset
in_file
--rehash
--dry
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Dataset ID to import or export. | |
in_file | str | Path to input annotation file. | |
--rehash , -rh | bool | Update and overwrite all hashes. | False |
--dry , -D | bool | Perform a dry run and don’t save any files. | False |
drop
command
- Interface: terminal only
- Saves: updated database
- Use case: remove datasets and sessions
Remove a dataset or annotation session from a project. Can’t be undone. To see
all dataset and session IDs in the database, use prodigy stats -ls
.
prodigy
drop
dataset
--batch-size
Argument | Type | Description |
---|---|---|
dataset | positional | Dataset or session ID. |
--batch-size , -n | option | Delete examples in batches of the given size. Prevents possible database error for large datasets. |
stats
command
- Interface: terminal only
- Use case: view installation details and database statistics
Print Prodigy and database statistics. Specifying a dataset ID will show detailed stats for the dataset, like annotation counts and meta data. You can also choose to list all available dataset or session IDs.
prodigy
stats
dataset
-l
-ls
--no-format
Argument | Type | Description | Default |
---|---|---|---|
dataset | str | Optional Prodigy dataset ID. | |
--list-datasets , -l | bool | List IDs of all datasets in the database. | False |
--list-sessions , -ls | bool | List IDs of all datasets and sessions in the database. | False |
--no-format , -nf | bool | Don’t pretty-print the stats and print a simple dict instead. | False |
Example
prodigy stats news_headlines -l ============================= ✨ Prodigy Stats ============================= Version 1.9.0 Database Name SQLite Database Id sqlite Total Datasets 4 Total Sessions 23 ================================ ✨ Datasets ================================ news_headlines, news_headlines_eval, github_docs, test ============================= ✨ Dataset Stats ============================= Dataset news_headlines Created 2017-07-29 15:29:28 Description Annotate news headlines Author Ines Annotations 1550 Accept 671 Reject 435 Ignore 444
prodigy
command
- Interface: terminal only
- Use case: Run recipe scripts
Run a built-in or custom Prodigy recipe. The -F
option lets you load a recipe
from a simple Python file, containing one or more recipe functions. All recipe
arguments will be available from the command line. To print usage info and a
list of available arguments, use the --help
flag.
prodigy
recipe_name
...recipe arguments
-F
Argument | Type | Description |
---|---|---|
recipe_name | positional | Recipe name. |
*recipe_arguments | Recipe arguments. | |
-F | str | Path to recipe file to load custom recipe. |
--help , -h | bool | Show help message and available arguments. |
Example
prodigy custom-recipe my_dataset ./data.jsonl --custom-opt 123 -F recipe.py
recipe.pypseudocode import prodigy
from prodigy.components.loaders import JSONL
@prodigy.recipe(
"custom-recipe",
dataset=("The dataset", "positional", None, str),
source_file=("A positional argument", "positional", None, str),
custom_opt=("An option", "option", "co", int)
)
def custom_recipe_function(dataset, source_file, custom_opt=10):
stream = JSONL(source_file)
print("Custom option pased in via command line:", custom_opt)
return {
"dataset": dataset,
"stream": stream,
"view_id": "text"
}
Deprecated recipes
The following recipes have been deprecated in favor of newer workflows and
best practices. See the table for details and replacements. You can still use
the deprecated recipes in Prodigy, but we won’t keep updating them. To view the
recipe details and documentation, run the recipe command with the --help
flag.
For example prodigy ner.match --help
.
ner.match | This recipe has been deprecated in favor of ner.manual with --patterns , which lets you match patterns and allows editing the results at the same time, and the general purpose match , which lets you match patterns and accept or reect the result. |
ner.eval | This recipe has been deprecated in favor of creating regular gold-standard evaluation sets with ner.manual (fully manual) or ner.correct (semi-automatic). |
ner.print-stream | This recipe has been deprecated in favor of the general-purpose print-stream command that can print streams of all supported types. |
ner.print-dataset | This recipe has been deprecated in favor of the general-purpose print-dataset command that can print datasets of all supported types. |
ner.gold-to-spacy | This recipe has been deprecated in favor of data-to-spacy , which can take multiple datasets of different types (e.g. NER and text classification) and outputs a JSON file in spaCy’s training format that can be used with spacy train . |
ner.iob-to-gold | This recipe has been deprecated because it only served a very limited purpose. To convert IOB annotations, you can either use spacy convert or write a custom script. |
ner.batch-train | This recipe will be deprecated in favor of the general-purpose train recipe that supports all components and can be used with binary accept/reject annotations by setting the --binary flag. |
ner.train-curve | This recipe will be deprecated in favor of the general-purpose train-curve recipe that supports all components. |
textcat.eval | This recipe has been deprecated in favor of creating regular gold-standard evaluation sets with textcat.manual . |
textcat.print-stream | This recipe has been deprecated in favor of the general-purpose print-stream command that can print streams of all supported types. |
textcat.print-dataset | This recipe has been deprecated in favor of the general-purpose print-dataset command that can print datasets of all supported types. |
textcat.batch-train | This recipe will be deprecated in favor of the general-purpose train recipe that supports all components and works with both binary accept/reject annotations and multiple choice annotations out-of-the-box. |
textcat.train-curve | This recipe will be deprecated in favor of the general-purpose train-curve recipe that supports all components. |
pos.gold-to-spacy | This recipe has been deprecated in favor of data-to-spacy , which can take multiple datasets of different types (e.g. POS tags and NER) and outputs a JSON file in spaCy’s training format that can be used with spacy train . |
pos.batch-train | This recipe will be deprecated in favor of the general-purpose train recipe that supports all components and can be used with binary accept/reject annotations by setting the --binary flag. |
pos.train-curve | This recipe will be deprecated in favor of the general-purpose train-curve recipe that supports all components. |
dep.batch-train | This recipe will be deprecated in favor of the general-purpose train recipe that supports all components and can be used with binary accept/reject annotations by setting the --binary flag. |
dep.train-curve | This recipe will be deprecated in favor of the general-purpose train-curve recipe that supports all components. |
terms.train-vectors | This recipe has been deprecated since wrapping word vector training in a recipe only introduces a layer of unnecessary abstraction. If you want to train your own vectors, use GloVe, fastText or Gensim directly and then add the vectors to a spaCy model. |
image.test | This recipe has been deprecated since it was mostly intended to demonstrate the new image capabilities on launch. For a real-world example of using Prodigy for object detection with a model in the loop, see this TensorFlow tutorial. |
pipe | This command has been deprecated since it didn’t provide any Prodigy-specific functionality. To pipe data forward, you can convert the data to JSONL and run cat data.jsonl | prodigy ... or write a custom loader. |
dataset | This command has been deprecated since it’s mostly redundant. If a dataset doesn’t exist in the database, it’s added automatically. |