Installation & Setup

Installation

The download link lets you choose between Prodigy wheels for all supported platforms: macOS, Linux and Windows on Python 3.6+. You can download them all and keep them somewhere – they’re all yours! Whenever a new version of Prodigy is available, you’ll be able to download it using the same download link.

Wheel installers are basically pre-compiled Python package installers. You can install them like any other Python package by pointing pip install at the local path of the .whl file you downloaded. It’s recommended to use a new virtual environment for installing Prodigy.

pip install ./prodigy*.whl

The above command will install the Prodigy package in your current environment, create your Prodigy home directory and register the commands prodigy and pgy (a shortcut).

Double-check that you’ve downloaded the correct wheel and make sure your version of pip is up to date. If it still doesn’t work, check if the file name matches your platform (distutils.util.get_platform()) and rename the file if necessary. For more details, see this article on Python wheels. Essentially, Python wheels are only archives containing the source files – so you can also just unpack the wheel and place the contained prodigy package in your site-packages directory.

On installation, Prodigy will set up the alias commands prodigy and pgy. This should work fine and out-of-the-box on most macOS and Linux setups. If not, you can always run the commands by prefixing them with python -m, for example: python -m prodigy stats. Alternatively, you can create your own alias, and add it to your .bashrc to make it permanent:

alias prodigy="python -m prodigy"

On Windows you can also create a Windows activation script as a .ps1 file and run it in your environment, or add it to your PowerShell profile. See this thread for more details.

Function global:pdgy { python -m prodigy $args}
Function global:prodigy { python -m prodigy $args}
Function global:spacy { python -m spacy $args}

By default, Prodigy starts the web server on localhost and port 8080. If you’re running Prodigy via a Docker container or a similar containerized environment, you’ll have to set the host to 0.0.0.0. Simply edit your prodigy.json and add the following:

{
  "host": "0.0.0.0"
}

See this thread for more details and background. The above approach should also work in other environments if you come across the following error on startup:

OSError: [Errno 99] Cannot assign requested address

Using and installing spaCy models

To use Prodigy’s built-in recipes for NER or text classification, you’ll also need to install a spaCy model – for example, the small English model, en_core_web_sm (around 34 MB). Note that Prodigy currently requires Python 3.6+ and the latest spaCy v2.2.

python -m spacy download en_core_web_sm

If you have trained your own spaCy model, you can load them into Prodigy using the path to the model directory. You can also use the spacy package command to turn it into a Python package, and install it in your current environment. All Prodigy recipes that allow a spacy_model argument can either take the name of an installed model package, or the path to a valid model package directory. Keep in mind that a new minor version of spaCy also means that you need to retrain your models. For example, models trained with spaCy v2.1 are not going to be compatible with v2.2.

Using Prodigy with JupyterLab

If you’re using JupyterLab, you can install our jupyterlab-prodigy extension. It lets you execute recipe commands in notebook cells and opens the annotation UI in a JupyterLab tab so you don’t need to leave your notebook to annotate data.

jupyter labextension install jupyterlab-prodigy

Configuration

When you first run Prodigy, it will create a folder .prodigy in your home directory. By default, this will be the location where Prodigy looks for its configuration file, prodigy.json. You can change this directory via the environment variable PRODIGY_HOME:

PRODIGY_HOME=/custom/prodigy/home

When you run Prodigy, it will first check if a global configuration file exists. It will also check the current working directory for a prodigy.json or .prodigy.json. This allows you to overwrite specific settings on a project-by-project basis. The following settings can be defined in your config file, or in the "config" returned by a recipe.

prodigy.json{
  "theme": "basic",
  "custom_theme": {},
  "buttons": ["accept", "reject", "ignore", "undo"],
  "batch_size": 10,
  "history_size": 10,
  "port": 8080,
  "host": "localhost",
  "cors": true,
  "db": "sqlite",
  "db_settings": {},
  "api_keys": {},
  "validate": true,
  "auto_exclude_current": true,
  "instant_submit": false,
  "feed_overlap": true,
  "ui_lang": "en",
  "project_info": ["dataset", "session", "lang", "recipe_name", "view_id", "label"],
  "show_stats": false,
  "hide_meta": false,
  "show_flag": false,
  "instructions": false,
  "swipe": false,
  "split_sents_threshold": false,
  "html_template": false,
  "global_css": null,
  "javascript": null,
  "writing_dir": "ltr",
  "show_whitespace": false
}

Setting	Description	Default
`theme`	Name of UI theme to use.	`"basic"`
`custom_theme`	Custom UI theme overrides, keyed by name.	`{}`
`buttons`	New: 1.10 Buttons to show at the bottom of the screen in order. If an answer button is disabled, the user will be unable to submit this answer and the keyboard shortcut will be disabled as well. Only the `"undo"` action will stay available via keyboard shortcut or click on the sidebar history.	`["accept", "reject", "ignore", "undo"]`
`batch_size`	Number of tasks to return to and receive back from the web app at once. A low batch size means more frequent updates.	`10`
`history_size`	New: 1.10 Maximum number of examples to show in the sidebar history. Defaults to the value of `batch_size`. Note that the history size can’t be larger than the batch size.	`10`
`port`	Port to use for serving the web application. Can be overwritten by the `PRODIGY_PORT` environment variable.	`8080`
`host`	Host to use for serving the web application. Can be overwritten by the `PRODIGY_HOST` environment variable.	`"localhost"`
`cors`	Enable or disable cross-origin resource sharing (CORS) to allow the REST API to receive requests from other domains.	`true`
`db`	Name of database to use.	`"sqlite"`
`db_settings`	Additional settings for the respective databases.	`{}`
`api_keys`	Deprecated: Live API keys, keyed by API loader ID.	`{}`
`validate`	Validate incoming tasks and raise an error with more details if the format is incorrect.	`true`
`force_stream_order`	New: 1.9 Always send out tasks in the same order and re-send them until they’re answered, even if the app is refreshed in the browser.	`false`
`auto_exclude_current`	Automatically exclude examples already present in current dataset.	`true`
`instant_submit`	Instantly submit a task after it’s answered in the app, skipping the history and immediately triggering the update callback if available.	`false`
`feed_overlap`	Whether to send out each example once so it’s annotated by someone (`false`) or whether to send out each example to every session (`true`, default). Should be used with custom user sessions set via the app (via `/?session=user_name`).	`true`
`ui_lang`	New: 1.10 Language of descriptions, messages and tooltips in the UI. See here for available translations.	`"en"`
`project_info`	New: 1.10 Project info shown in sidebar, if available. The list of string IDs can be modified to hide or re-order the items.	`["dataset", "session", "lang", "recipe_name", "view_id", "label"]`
`show_stats`	Show additional stats, like annotation decision counts, in the sidebar.	`true`
`hide_meta`	Hide the meta information displayed on annotation cards.	`false`
`show_flag`	Show a flag icon in the top right corner that lets you bookmark a task for later. Will add `"flagged": true` to the task.	`false`
`instructions`	Path to a text file with instructions for the annotator (HTML allowed). Will be displayed as a help modal in the UI.	`false`
`swipe`	Enable swipe gestures on touch devices (left for `accept`, right for `reject`).	`false`
`split_sents_threshold`	Minimum character length of a text to be split by the `split_sentences` preprocessor, mostly used in NER recipes. If `false`, all multi-sentence texts will be split.	`false`
`html_template`	Optional Mustache template for content in the `html` annotation interface. All task properties are available as variables.	`false`
`global_css`	CSS overrides added to the global scope. Takes a string value of the CSS code.	`null`
`javascript`	Custom JavaScript added to the global scope. Takes a string value of the JavaScript code.	`null`
`writing_dir`	Writing direction. Mostly important for manual text annotation interfaces.	`ltr`
`show_whitespace`	Always render whitespace characters as symbols in non-manual interfaces.	`false`
`exclude_by`	New: 1.9 Which hash to use (`"task"` or `"input"`) to determine whether two examples are the same and an incoming example should be excluded. Typically used in recipes.	`"task`

Database setup

By default, Prodigy uses SQLite to store annotations in a simple database file in your Prodigy home directory. If you want to use the default database with its default settings, no further configuration is required and you can start using Prodigy straight away. Alternatively, you can choose to use Prodigy with a MySQL or PostgreSQL database, or write your own custom recipe to plug in any other storage solution. For more details, see the database API documentation.

prodigy.json{
    "db": "sqlite",
    "db_settings": {
        "sqlite": {},
        "mysql": {},
        "postgresql": {}
    }
}

Database	ID	Settings
SQLite	`sqlite`	`name` for database file name (defaults to `"prodigy.db"`), `path` for custom path (defaults to Prodigy home directory), plus sqlite3 connection parameters. To only store the database in memory, use `":memory:"` as the database name.
MySQL	`mysql`	MySQLdb or PyMySQL connection parameters.
PostgreSQL	`postgresql`	psycopg2 connection parameters.

Database setup details

Environment variables

Prodigy lets you set the following environment variables:

Variable	Description
`PRODIGY_HOME`	Use a custom path for the Prodigy home directory. Defaults to the equivalent of `~/.prodigy`.
`PRODIGY_LOGGING`	Enable logging. Values can be either `basic` (simple logging) or `verbose` (more details per entry).
`PRODIGY_PORT`	Overwrite the port used to serve the Prodigy app and REST API. Supersedes the settings in the global, local and recipe-specific config.
`PRODIGY_HOST`	Overwrite the host used to serve the Prodigy app and REST API. Supersedes the settings in the global, local and recipe-specific config.
`PRODIGY_ALLOWED_SESSIONS`	Define comma-separated string names of multi-user session names that are allowed in the app.
`PRODIGY_BASIC_AUTH_USER`	Add super basic authentication to the app. String user name to accept.
`PRODIGY_BASIC_AUTH_PASS`	Add super basic authentication to the app. String password to accept.

Data validation

Wherever possible, Prodigy will try to validate the data passed around the application to make sure it has the correct format. This prevents confusing errors and problems later on – for example, if you use a string instead of a number for a config argument, or try to train a text classifier on an NER dataset by mistake. To disable validation you can set "validate": False in your prodigy.json or recipe config.

Example

✘ Invalid data for component 'ner'

spans   field required


{'text': ' Rats and Their Alarming Bugs', 'meta': {'source': 'The New York Times', 'score': 0.5056365132}, 'label': 'ENTERTAINMENT', '_input_hash': 1817790242, '_task_hash': -2039660589, 'answer': 'accept'}


Stream	Validate each incoming task in the stream for the given annotation interface.
Prodigy config	New: 1.9 Validate the global, local and recipe config settings on startup.
Recipe	New: 1.9 Validate the dictionary of components returned by a recipe.
Training examples	New: 1.9 Validate training examples for the given component in `train`, `train-curve` and `data-to-spacy`.

Debugging and logging

If Prodigy raises an error, or you come across unexpected results, it’s often helpful to run the debugging mode to keep track of what’s happening, and how your data flows through the application. Prodigy uses the logging module to provide logging information across the different components. The logging mode can be set as the environment variable PRODIGY_LOGGING. Both logging modes log the same events, but differ in the verbosity of the output.

Logging mode	Description
`basic`	Only log timestamp and event description with most important data.
`verbose`	Also log additional information, like annotation data and function arguments, if available.


PRODIGY_LOGGING=basic
prodigy
ner.manual
my_dataset
en_core_web_sm
./my_data.jsonl
--label PERSON,ORG

$ PRODIGY_LOGGING=basic prodigy ner.teach test en_core_web_sm "rats" --api guardian

13:11:39 - DB: Connecting to database 'sqlite'
13:11:40 - RECIPE: Calling recipe 'ner.teach'
13:11:40 - RECIPE: Starting recipe ner.teach
13:11:40 - LOADER: Loading stream from API 'guardian'
13:11:40 - LOADER: Using API 'guardian' with query 'rats'
13:11:42 - MODEL: Added sentence boundary detector to model pipeline
13:11:43 - RECIPE: Initialised EntityRecognizer with model en_core_web_sm
13:11:43 - SORTER: Resort stream to prefer uncertain examples (bias 0.8)
13:11:43 - PREPROCESS: Splitting sentences
13:11:44 - MODEL: Predicting spans for batch (batch size 64)
13:11:44 - Model: Sorting batch by entity type (batch size 32)
13:11:44 - CONTROLLER: Initialising from recipe
13:11:44 - DB: Loading dataset 'test' (168 examples)
13:11:44 - DB: Creating dataset '2017-11-20_13-07-44'

  ✨  Starting the web server at http://localhost:8080 ...
  Open the app in your browser and start annotating!

13:11:59 - GET: /project
13:11:59 - GET: /get_questions
13:11:59 - CONTROLLER: Iterating over stream
13:11:59 - Model: Sorting batch by entity type (batch size 32)
13:11:59 - CONTROLLER: Returning a batch of tasks from the queue
13:11:59 - RESPONSE: /get_questions (10 examples)
13:12:07 - GET: /get_questions
13:12:07 - CONTROLLER: Returning a batch of tasks from the queue
13:12:07 - RESPONSE: /get_questions (10 examples)
13:12:07 - POST: /give_answers (received 7)
13:12:07 - CONTROLLER: Receiving 7 answers
13:12:07 - MODEL: Updating with 7 examples
13:12:08 - MODEL: Updated model (loss 0.0172)
13:12:08 - PROGRESS: Estimating progress of 0.0909
13:12:08 - DB: Getting dataset 'test'
13:12:08 - DB: Getting dataset '2017-11-20_13-07-44'
13:12:08 - DB: Added 7 examples to 2 datasets
13:12:08 - CONTROLLER: Added 7 answers to dataset 'test' in database SQLite
13:12:08 - RESPONSE: /give_answers
13:12:11 - DB: Saving database

Saved 7 annotations to database SQLite
Dataset: test
Session ID: 2017-11-20_13-07-44

Custom logging

If you’re developing custom recipes, you can use Prodigy’s log helper function to add your own entries to the log. The log function takes the following arguments:

Argument	Type	Description
`message`	str	The basic message to display, e.g. “RECIPE: Starting recipe ner.teach”.
`details`	-	Optional details to log only in `verbose` mode.

recipe.pypseudocode import prodigy

@prodigy.recipe("custom-recipe")
def my_custom_recipe(dataset, source):
    progigy.log("RECIPE: Starting recipe custom-recipe", locals())
    stream = load_my_stream(source)
    prodigy.log(f"RECIPE: Loaded stream with argument {source}")
    return {
        "dataset": dataset,
        "stream": stream
    }

Customizing Prodigy with entry points

Entry points let you expose parts of a Python package you write to other Python packages. This lets one application easily customize the behavior of another, by exposing an entry point in its setup.py or setup.cfg. For a quick and fun intro to entry points in Python, check out this excellent blog post. Prodigy can load custom function from several different entry points, for example custom recipe functions. To see this in action, check out the sense2vec package, which provides several custom Prodigy recipes. The recipes are registered automatically if you install the package in the same environment as Prodigy. The following entry point groups are supported:


`prodigy_recipes`	Entry points for recipe functions.
`prodigy_db`	Entry points for custom `Database` classes.
`prodigy_loaders`	Entry points for custom loader functions.

Get started

Installation & Setup

Installation

About the 'is not a supported wheel on this platform' error¶

Making the 'prodigy' CLI command work¶

Using Prodigy with Docker¶

Using Prodigy with JupyterLab

Example

Example output of PRODIGY_LOGGING=basic