Get started

Installation & Setup

Installation

The download link lets you choose between Prodigy wheels for all supported platforms: macOS, Linux and Windows on Python 3.6+. You can download them all and keep them somewhere – they’re all yours! Whenever a new version of Prodigy is available, you’ll be able to download it using the same download link.

Screenshot of download links

Wheel installers are basically pre-compiled Python package installers. You can install them like any other Python package by pointing pip install at the local path of the .whl file you downloaded. It’s recommended to use a new virtual environment for installing Prodigy.

pip install ./prodigy*.whl

The above command will install the Prodigy package in your current environment, create your Prodigy home directory and register the commands prodigy and pgy (a shortcut).

Double-check that you’ve downloaded the correct wheel and make sure your version of pip is up to date. If it still doesn’t work, check if the file name matches your platform (distutils.util.get_platform()) and rename the file if necessary. For more details, see this article on Python wheels. Essentially, Python wheels are only archives containing the source files – so you can also just unpack the wheel and place the contained prodigy package in your site-packages directory.

On installation, Prodigy will set up the alias commands prodigy and pgy. This should work fine and out-of-the-box on most macOS and Linux setups. If not, you can always run the commands by prefixing them with python -m, for example: python -m prodigy stats. Alternatively, you can create your own alias, and add it to your .bashrc to make it permanent:

alias prodigy="python -m prodigy"

On Windows you can also create a Windows activation script as a .ps1 file and run it in your environment, or add it to your PowerShell profile. See this thread for more details.

Function global:pdgy { python -m prodigy $args}
Function global:prodigy { python -m prodigy $args}
Function global:spacy { python -m spacy $args}

By default, Prodigy starts the web server on localhost and port 8080. If you’re running Prodigy via a Docker container or a similar containerized environment, you’ll have to set the host to 0.0.0.0. Simply edit your prodigy.json and add the following:

{
  "host": "0.0.0.0"
}

See this thread for more details and background. The above approach should also work in other environments if you come across the following error on startup:

OSError: [Errno 99] Cannot assign requested address

Using and installing spaCy models

To use Prodigy’s built-in recipes for NER or text classification, you’ll also need to install a spaCy model – for example, the small English model, en_core_web_sm (around 34 MB). Note that Prodigy currently requires Python 3.6+ and the latest spaCy v2.2.

python -m spacy download en_core_web_sm

If you have trained your own spaCy model, you can load them into Prodigy using the path to the model directory. You can also use the spacy package command to turn it into a Python package, and install it in your current environment. All Prodigy recipes that allow a spacy_model argument can either take the name of an installed model package, or the path to a valid model package directory. Keep in mind that a new minor version of spaCy also means that you need to retrain your models. For example, models trained with spaCy v2.1 are not going to be compatible with v2.2.

Using Prodigy with JupyterLab

If you’re using JupyterLab, you can install our jupyterlab-prodigy extension. It lets you execute recipe commands in notebook cells and opens the annotation UI in a JupyterLab tab so you don’t need to leave your notebook to annotate data.

jupyter labextension install jupyterlab-prodigy

Configuration

When you first run Prodigy, it will create a folder .prodigy in your home directory. By default, this will be the location where Prodigy looks for its configuration file, prodigy.json. You can change this directory via the environment variable PRODIGY_HOME:

PRODIGY_HOME=/custom/prodigy/home

When you run Prodigy, it will first check if a global configuration file exists. It will also check the current working directory for a prodigy.json or .prodigy.json. This allows you to overwrite specific settings on a project-by-project basis. The following settings can be defined in your config file, or in the "config" returned by a recipe.

prodigy.json{
  "theme": "basic",
  "custom_theme": {},
  "buttons": ["accept", "reject", "ignore", "undo"],
  "batch_size": 10,
  "history_size": 10,
  "port": 8080,
  "host": "localhost",
  "cors": true,
  "db": "sqlite",
  "db_settings": {},
  "api_keys": {},
  "validate": true,
  "auto_exclude_current": true,
  "instant_submit": false,
  "feed_overlap": true,
  "ui_lang": "en",
  "project_info": ["dataset", "session", "lang", "recipe_name", "view_id", "label"],
  "show_stats": false,
  "hide_meta": false,
  "show_flag": false,
  "instructions": false,
  "swipe": false,
  "split_sents_threshold": false,
  "html_template": false,
  "global_css": null,
  "javascript": null,
  "writing_dir": "ltr",
  "show_whitespace": false
}
SettingDescriptionDefault
themeName of UI theme to use."basic"
custom_themeCustom UI theme overrides, keyed by name.{}
buttonsNew: 1.10 Buttons to show at the bottom of the screen in order. If an answer button is disabled, the user will be unable to submit this answer and the keyboard shortcut will be disabled as well. Only the "undo" action will stay available via keyboard shortcut or click on the sidebar history.["accept", "reject", "ignore", "undo"]
batch_sizeNumber of tasks to return to and receive back from the web app at once. A low batch size means more frequent updates.10
history_sizeNew: 1.10 Maximum number of examples to show in the sidebar history. Defaults to the value of batch_size. Note that the history size can’t be larger than the batch size.10
portPort to use for serving the web application. Can be overwritten by the PRODIGY_PORT environment variable.8080
hostHost to use for serving the web application. Can be overwritten by the PRODIGY_HOST environment variable."localhost"
corsEnable or disable cross-origin resource sharing (CORS) to allow the REST API to receive requests from other domains.true
dbName of database to use."sqlite"
db_settingsAdditional settings for the respective databases.{}
api_keysDeprecated: Live API keys, keyed by API loader ID.{}
validateValidate incoming tasks and raise an error with more details if the format is incorrect.true
force_stream_orderNew: 1.9 Always send out tasks in the same order and re-send them until they’re answered, even if the app is refreshed in the browser.false
auto_exclude_currentAutomatically exclude examples already present in current dataset.true
instant_submitInstantly submit a task after it’s answered in the app, skipping the history and immediately triggering the update callback if available.false
feed_overlapWhether to send out each example once so it’s annotated by someone (false) or whether to send out each example to every session (true, default). Should be used with custom user sessions set via the app (via /?session=user_name).true
ui_langNew: 1.10 Language of descriptions, messages and tooltips in the UI. See here for available translations."en"
project_infoNew: 1.10 Project info shown in sidebar, if available. The list of string IDs can be modified to hide or re-order the items.["dataset", "session", "lang", "recipe_name", "view_id", "label"]
show_statsShow additional stats, like annotation decision counts, in the sidebar.true
hide_metaHide the meta information displayed on annotation cards.false
show_flagShow a flag icon in the top right corner that lets you bookmark a task for later. Will add "flagged": true to the task.false
instructionsPath to a text file with instructions for the annotator (HTML allowed). Will be displayed as a help modal in the UI.false
swipeEnable swipe gestures on touch devices (left for accept, right for reject).false
split_sents_thresholdMinimum character length of a text to be split by the split_sentences preprocessor, mostly used in NER recipes. If false, all multi-sentence texts will be split.false
html_templateOptional Mustache template for content in the html annotation interface. All task properties are available as variables.false
global_cssCSS overrides added to the global scope. Takes a string value of the CSS code.null
javascriptCustom JavaScript added to the global scope. Takes a string value of the JavaScript code.null
writing_dirWriting direction. Mostly important for manual text annotation interfaces.ltr
show_whitespaceAlways render whitespace characters as symbols in non-manual interfaces.false
exclude_byNew: 1.9 Which hash to use ("task" or "input") to determine whether two examples are the same and an incoming example should be excluded. Typically used in recipes."task

Database setup

By default, Prodigy uses SQLite to store annotations in a simple database file in your Prodigy home directory. If you want to use the default database with its default settings, no further configuration is required and you can start using Prodigy straight away. Alternatively, you can choose to use Prodigy with a MySQL or PostgreSQL database, or write your own custom recipe to plug in any other storage solution. For more details, see the database API documentation.

prodigy.json{
    "db": "sqlite",
    "db_settings": {
        "sqlite": {},
        "mysql": {},
        "postgresql": {}
    }
}
DatabaseIDSettings
SQLitesqlitename for database file name (defaults to "prodigy.db"), path for custom path (defaults to Prodigy home directory), plus sqlite3 connection parameters. To only store the database in memory, use ":memory:" as the database name.
MySQLmysqlMySQLdb or PyMySQL connection parameters.
PostgreSQLpostgresqlpsycopg2 connection parameters.

Database setup details


Environment variables

Prodigy lets you set the following environment variables:

VariableDescription
PRODIGY_HOMEUse a custom path for the Prodigy home directory. Defaults to the equivalent of ~/.prodigy.
PRODIGY_LOGGINGEnable logging. Values can be either basic (simple logging) or verbose (more details per entry).
PRODIGY_PORTOverwrite the port used to serve the Prodigy app and REST API. Supersedes the settings in the global, local and recipe-specific config.
PRODIGY_HOSTOverwrite the host used to serve the Prodigy app and REST API. Supersedes the settings in the global, local and recipe-specific config.
PRODIGY_ALLOWED_SESSIONSDefine comma-separated string names of multi-user session names that are allowed in the app.
PRODIGY_BASIC_AUTH_USERAdd super basic authentication to the app. String user name to accept.
PRODIGY_BASIC_AUTH_PASSAdd super basic authentication to the app. String password to accept.

Data validation

Wherever possible, Prodigy will try to validate the data passed around the application to make sure it has the correct format. This prevents confusing errors and problems later on – for example, if you use a string instead of a number for a config argument, or try to train a text classifier on an NER dataset by mistake. To disable validation you can set "validate": False in your prodigy.json or recipe config.

Example

Invalid data for component 'ner' spans field required {'text': ' Rats and Their Alarming Bugs', 'meta': {'source': 'The New York Times', 'score': 0.5056365132}, 'label': 'ENTERTAINMENT', '_input_hash': 1817790242, '_task_hash': -2039660589, 'answer': 'accept'}
StreamValidate each incoming task in the stream for the given annotation interface.
Prodigy configNew: 1.9 Validate the global, local and recipe config settings on startup.
RecipeNew: 1.9 Validate the dictionary of components returned by a recipe.
Training examplesNew: 1.9 Validate training examples for the given component in train, train-curve and data-to-spacy.

Debugging and logging

If Prodigy raises an error, or you come across unexpected results, it’s often helpful to run the debugging mode to keep track of what’s happening, and how your data flows through the application. Prodigy uses the logging module to provide logging information across the different components. The logging mode can be set as the environment variable PRODIGY_LOGGING. Both logging modes log the same events, but differ in the verbosity of the output.

Logging modeDescription
basicOnly log timestamp and event description with most important data.
verboseAlso log additional information, like annotation data and function arguments, if available.

PRODIGY_LOGGING=basic
prodigy
ner.manual
my_dataset
en_core_web_sm
./my_data.jsonl
--label PERSON,ORG

$ PRODIGY_LOGGING=basic prodigy ner.teach test en_core_web_sm "rats" --api guardian

13:11:39 - DB: Connecting to database 'sqlite'
13:11:40 - RECIPE: Calling recipe 'ner.teach'
13:11:40 - RECIPE: Starting recipe ner.teach
13:11:40 - LOADER: Loading stream from API 'guardian'
13:11:40 - LOADER: Using API 'guardian' with query 'rats'
13:11:42 - MODEL: Added sentence boundary detector to model pipeline
13:11:43 - RECIPE: Initialised EntityRecognizer with model en_core_web_sm
13:11:43 - SORTER: Resort stream to prefer uncertain examples (bias 0.8)
13:11:43 - PREPROCESS: Splitting sentences
13:11:44 - MODEL: Predicting spans for batch (batch size 64)
13:11:44 - Model: Sorting batch by entity type (batch size 32)
13:11:44 - CONTROLLER: Initialising from recipe
13:11:44 - DB: Loading dataset 'test' (168 examples)
13:11:44 - DB: Creating dataset '2017-11-20_13-07-44'

  ✨  Starting the web server at http://localhost:8080 ...
  Open the app in your browser and start annotating!

13:11:59 - GET: /project
13:11:59 - GET: /get_questions
13:11:59 - CONTROLLER: Iterating over stream
13:11:59 - Model: Sorting batch by entity type (batch size 32)
13:11:59 - CONTROLLER: Returning a batch of tasks from the queue
13:11:59 - RESPONSE: /get_questions (10 examples)
13:12:07 - GET: /get_questions
13:12:07 - CONTROLLER: Returning a batch of tasks from the queue
13:12:07 - RESPONSE: /get_questions (10 examples)
13:12:07 - POST: /give_answers (received 7)
13:12:07 - CONTROLLER: Receiving 7 answers
13:12:07 - MODEL: Updating with 7 examples
13:12:08 - MODEL: Updated model (loss 0.0172)
13:12:08 - PROGRESS: Estimating progress of 0.0909
13:12:08 - DB: Getting dataset 'test'
13:12:08 - DB: Getting dataset '2017-11-20_13-07-44'
13:12:08 - DB: Added 7 examples to 2 datasets
13:12:08 - CONTROLLER: Added 7 answers to dataset 'test' in database SQLite
13:12:08 - RESPONSE: /give_answers
13:12:11 - DB: Saving database

Saved 7 annotations to database SQLite
Dataset: test
Session ID: 2017-11-20_13-07-44

Custom logging

If you’re developing custom recipes, you can use Prodigy’s log helper function to add your own entries to the log. The log function takes the following arguments:

ArgumentTypeDescription
messagestrThe basic message to display, e.g. “RECIPE: Starting recipe ner.teach”.
details-Optional details to log only in verbose mode.
recipe.pypseudocode 
import prodigy @prodigy.recipe("custom-recipe") def my_custom_recipe(dataset, source): progigy.log("RECIPE: Starting recipe custom-recipe", locals()) stream = load_my_stream(source) prodigy.log(f"RECIPE: Loaded stream with argument {source}") return { "dataset": dataset, "stream": stream }

Customizing Prodigy with entry points

Entry points let you expose parts of a Python package you write to other Python packages. This lets one application easily customize the behavior of another, by exposing an entry point in its setup.py or setup.cfg. For a quick and fun intro to entry points in Python, check out this excellent blog post. Prodigy can load custom function from several different entry points, for example custom recipe functions. To see this in action, check out the sense2vec package, which provides several custom Prodigy recipes. The recipes are registered automatically if you install the package in the same environment as Prodigy. The following entry point groups are supported:

prodigy_recipesEntry points for recipe functions.
prodigy_dbEntry points for custom Database classes.
prodigy_loadersEntry points for custom loader functions.