Changelog
This page lists the history of changes to Prodigy. Whenever a new update is available, you’ll receive an email notification sent to the address specified at checkout. You can then download the new version via your personal download link. If your free upgrades expired, you can now add 12 months of updates to your license via our online shop. Please allow up to 24 hours for your download link to be reactivated.
v1.10.0 2020-XX-XX
Our biggest release yet includes a bunch of new features, interfaces and recipes for dependency and relation annotation, audio and video annotation, as well as a new and improved manual image annotation interface with support for editing shapes and bounding boxes. We’ve also added new recipe callbacks for modifying examples placed in the database and validating answers at runtime, added more settings for whitespace-handling in manual NER annotation, including a mode for character-based highlighting, and introduced various new config settings to customize the web app and annotation interfaces. Thanks to everyone who’s helped us beta test the new features – your feedback has helped a lot! See the changelog below for a full list of new features.
new | Flexible relations interface for fully manual dependency and relationship annotation and joint span and dependency relation annotation. |
new | New recipes rel.manual , coref.manual and dep.correct for efficient manual and model-assisted dependency annotation. |
new | audio and audio_manual interfaces binary and fully manual audio and video annotation. Add and modify segments for different labels and collect feedback about pre-highlighted regions. |
new | audio.manual and audio.transcribe recipes for audio and video annotation and transcription, as well as community recipes for using Prodigy with pretrained pyannote.audio models for speaker diarization in the loop. |
new | New and improved image_manual interface with support for moving and resizing shapes, adjusting polygons, freehand annotation, more detailed data format and more settings. |
new | Support dataset:{name} and dataset:{name}:{answer} syntax as source argument in recipes to allow loading from existing datasets. For example, dataset:my_set will use examples dataset my_set as the input data and dataset:my_set:accept will only load in accepted answers. |
new | Add validate_answer recipe component to perform custom validation of annotations created in the UI and prevent invalid answers from being submitted. |
new | Allow recipes to return a before_db callback for modifying examples before they’re placed in the database, e.g. to strip out base64 data. |
new | Update Prodigy for the latest spaCy v2.3 and new models. |
new | Support multi-arc dependency annotations (e.g. created with dep.correct ) in train . |
new | Set information about trailing whitespace in add_tokens and reflect whitespace (or lack of whitespace) between tokens in ner_manual (can be changed using the "honor_token_whitespace" setting). |
new | Add --highlight-chars flag to ner.manual and use_chars argument to add_tokens to allow highlighting individual characters instead of full tokens. |
new | Add "field_suggestions" property to text_input UI to allow specifying a list of auto-suggestions to show when the user types or presses ↓. |
new | Allow disabling and reordering of the accept, reject, ignore and undo buttons at the bottom of the screen via the "buttons" config setting. |
new | Add options --width (card with and maximum image width) and --remove-base64 (remove base64-encoded image data) to image.manual . |
new | Add file_ext argument to Images and ImageServer loaders, always preserve original local file path as "path" and add Audio , AudioServer , Video and VideoServer loaders. |
new | Expose generic Base64 and Server helpers to load any data as a base64-string or via a web server and add generic fetch_media preprocessor. |
new | Add --rehash flag to db-merge to force-overwrite hashes. |
new | Add --base-model argument to data-to-spacy to customize tokenizer and sentencizer. |
new | Allow individual tasks to override global or UI config via a key "config" . |
new | Add "ui_lang" config and translations of descriptions, messages and tooltips in the annotation UI to German, Spanish, Dutch and Chinese. |
new | Make sidebar history length default to batch_size and allow customizing it via the history_size config setting. Note that the history size can’t be larger than the batch size. |
new | Show recipe name in project info in sidebar and allow customizing info via "project_info" config. |
new | Add Controller methods and attributes for retrieving total counts and progress by session ID. |
new | Warn if global or local prodigy.json settings override potentially critical recipe components. |
new | Support custom label colors manual interfaces and automatically pick contrasting text color. |
new | Show keyboard shortcuts for toolbar buttons on hover. |
new | Show friendlier error if prodigy.json contains invalid JSON. |
fix | Make progress function returned by recipes consistent and always pass it the controller and the return value of the update callback, if available. |
fix | Correctly report per-session progress for streams with a length and multi-user sessions and take feed_overlap into account. |
fix | Improve support for using "force_stream_order": True (repeating feed) with "feed_overlap": False (no overlap between sessions). |
fix | Make all manual recipes default to "force_stream_order": True for more intuitive stream behavior: batches of tasks are now always re-sent until they’re answered and refreshing the page will show the same batch again. |
fix | Fix issue that could cause the review to not displayed changes when user hits undo. |
fix | Preserve "choice_style" config setting on tasks so it can be re-applied when running review . |
fix | Support simpler data format in diff interface to make it work combined with choice in a blocks interface and prevent clash of "accept" property used by both UIs. |
fix | Adjust display of spans with RTL text when "writing_mode": "rtl" is enabled. |
fyi | The deprecated --api recipe argument has been removed and merged with --loader . |
fyi | textcat.manual now doesn’t perform additional checks for the pre-v1.9 syntax with an (unused) spaCy model argument anymore. |
fyi | Loading from standard input now requires the source argument to be set to - explicitly. |
fyi | The "show_stats" setting to display detailed stats in the sidebar is now set to true by default. |
fyi | The "spans" data created with image_manual now also include a "type" (either "rect" , "polygon" or "freehand" ), as well as "width" , "height" , "x" , "y" and "center" values for rects. |
fyi | Forced stream order and repeating batches by default means that you should use named sessions or set "force_stream_order": False if you want multiple users connecting to the same instance. Otherwise, you may get duplicate questions. |
doc | Add documentation and feature page for dependency relation annotation. |
doc | Add documentation and feature page for audio and speech annotation. |
doc | Update feature pages for named entity recognition and computer vision. |
doc | Add docs section on efficient NER annotation for fine-tuning transformers like BERT. |
doc | Add docs section on recipe callback functions in detail. |
doc | Document b64_to_bytes , file_to_b64 and bytes_to_b64 utilities for converting base64. |
doc | Tidy up global config docs and move settings specific to a single interface to interface docs. |
doc | Fix various typos and inconsistencies. |
v1.9.10 2020-06-05
This patch release includes small fixes to the force_stream_order
setting to
prevent a race condition and duplicate examples. Stay tuned for v1.10, which is
coming soon and will include lots of cool new features!
new | Add Controller.all_session_ids , all named sessions that have connected to the current instance. |
fix | Fix race condition that could cause force_stream_order to produce duplicate tasks. |
fix | Correctly exclude currently shown task when requesting new questions with force_stream_order . |
v1.9.9 2020-03-17
This release includes an important fix for a training regression introduced in the previous version, as well as small improvements.
fix | Fix issue that’d cause rejected binary text classification annotations to be filtered out in train . |
fix | Improve handling of rejected and ignored examples across different annotation types in train . |
fix | Relax unnecessarily strict validation for diff tasks. |
v1.9.8 2020-03-14
This release includes a new built-in recipe match
for selecting examples
based on pattern matches, as well as various bug fixes and improvements.
new | General-purpose match recipe to only match patterns in text with various configurations. |
fix | Use custom --view-id set in review to determine how to merge examples. |
fix | Improve default configuration in train for NER models with --init-tok2vec . |
fix | Fix filtering that could cause incorrect totals to be reported before training. |
fix | Fix async handling of built-in and user-provided databases. |
fix | Fix hashing of patterns that’d cause incorrect line numbers to be displayed. |
fix | Fix compiler setting that’d cause print-stream and print-dataset to not output colored results. |
fix | Check for correct view ID when printing text classification datasets with print-dataset . |
fix | Correctly pass --eval-split to data-to-spacy . |
fix | Show correct path to Prodigy installation root (not recipe root) in stats command. |
fix | Fix issue that could cause span rendering problems in ner if text contains emoji. |
fix | Fix UI issue that’d cause card headings to overlay expanded sidebar on small screens. |
doc | Fix various typos and links. |
v1.9.7 2020-02-21
This release includes small fixes and improvements to the built-in recipes and interfaces.
new | Add overwrite flag to add_tokens preprocessor to overwrite existing "tokens" . |
new | Allow review recipe to overwrite view ID (e.g. to render blocks annotations differently). |
fix | Accept pre-set tokens correctly in add_tokens to make it easier to provide custom tokenization. |
fix | Improve backwards-compatibility checks of arguments in textcat.manual . |
fix | Correctly report numbers of textcat examples in train and filter out ignored answers instead of just ignoring the examples during training and evaluation. |
fix | Fix handling of integer option "id" values in print-dataset . |
fix | Fix issue that’d cause text_input value to not reset and auto-focus correctly between tasks. |
fix | Fix incorrect validation errors for dep UI and "card_css" setting. |
fix | Set more explicit MIME types for JS bundle for server configs that prevent MIME type sniffing. |
fix | Adjust eighties theme to prevent dark text on dark background in choice options. |
v1.9.6 2020-01-27
This release includes small fixes related to async database usage in Python 3.7+
and text classification training with the new train
recipe.
fix | Fix issue with async database usage in Python 3.7+ that could cause MySQL connection errors. |
fix | Ensure --textcat-exclusive setting is passed down correctly in train . |
v1.9.5 2020-01-10
This release includes small fixes related to multiprocessing and new features introduced in v1.9.0.
fix | Make ner.manual with --patterns correctly return all examples instead of only the matches. |
fix | Fix error when loading evaluation examples in new train recipe with --binary enabled. |
fix | Fix issue in train with --binary when restoring pipeline component before saving the model. |
fix | Fix Foreign Key constraint error that could occur in Database.drop_examples . |
fix | Add thread locking to database reconnect methods in controller. |
fix | Fix accuracy output for tagger and parser in train . |
v1.9.4 2019-12-28
This release includes small fixes, a new option for changing keyboard shortcuts for labels and multiple choice options, and a new loader for serving images.
new | Add "keymap_by_label" config to change keyboard shortcuts for labels and choice options. |
new | Add image-server loader for serving images from a directory (and bypassing base64 encoding). |
fix | Fix too strict validation for review content. |
v1.9.3 2019-12-23
This release includes small fixes to the new interfaces introduced in v1.9.0.
fix | Fix too strict validation for blocks content. |
fix | Prevent input field in text_input from losing focus on update. |
v1.9.2 2019-12-20
This release includes small fixes to bugs introduced in v1.9.0.
fix | Fix error in PatternMatcher when assigning combined matches to tasks with no "meta" . |
fix | Fix too strict validation for html tasks with no "html" key but "html_template" . |
v1.9.1 2019-12-19
This release includes small fixes to bugs introduced in v1.9.0.
fix | Fix issue with loading recipes from entry points. |
fix | Fix too strict validation for "db" recipe component. |
v1.9.0 2019-12-18
This release introduces tons of new features and improvements, including new recipes, interfaces and workflows. We also redesigned the website, rewrote the documentation from scratch and added lots of new pages, usage guides, demos and examples. We hope you like it! Some highlights in Prodigy v1.9 include new unified training recipes, two new annotation interfaces for free-form text input and combining different UIs, config settings for making streams repeat questions until they’re answered, and changing keyboard shortcuts, official support for spaCy v2.2 and a new recipe for converting Prodigy annotations of different types to a single training corpus in spaCy’s JSON format. See the changelog below for a full list of new features.
new | Add new general-purpose train and train-curve recipes to replace the task-specific training recipes and make overall training process more consistent. |
new | Show accuracy per entity type, tag or text category in training results. |
new | Add data-to-spacy recipe that takes Prodigy datasets for NER, text classification, tagging and parsing and outputs a merged corpus (optionally split into training and evaluation data) in spaCy’s JSON format that you can use with spacy train . |
new | Add --patterns argument to ner.manual to pre-highlight suggestions from patterns. This workflow is going to replace the binary ner.match . |
new | Add general-purpose print-stream and print-dataset recipes that can output different data types. Those recipes are going to replace the more specific print utilities like ner.print-stream . |
new | Add blocks interface to freely combine annotation interfaces. |
new | Add text_input interface to collect free-form text input from annotators. |
new | New "force_stream_order" config setting. If True , tasks will always be sent out in the same order and re-sent until they’re answered – even if you refresh the app in your browser. |
new | Support customizing keyboard shortcuts. |
new | Support tokenizing terms in terms.to-patterns to create patterns for multi-token terms. |
new | Add "exclude_by" config setting to allow recipes to specify whether to filter by input hash or task hash so that manual recipes don’t repeat the same content with different suggestions. |
new | Support blank:{lang} , e.g. blank:en as an alternative spaCy model in ner.manual , textcat.teach and train to start off with a blank model. |
new | Pass --label values added to mark to the "labels" config so the recipe can be used with manual interfaces like ner_manual and image_manual . |
new | Add --no-fetch flag to image.manual to disable base64 conversion of images. |
new | Add --fetch-media flag to review recipe to temporarily replace paths with base64 data. |
new | Also support - as the value of source arguments to read from standard input and make this the recommended best practice (instead of omitting the argument). |
new | Always auto-create datasets and deprecate dataset command. |
new | Make compare and ner.eval-ab recipes use the more flexible choice interface and deprecate the compare UI. |
new | Add more human-readable class names to use in custom CSS and JS. |
new | Support new syntax in prodigy.serve that lets you pass in the full command-line command to start Prodigy from within Python. |
new | Add data validation for prodigy.json / recipe config, recipe components and training examples. |
new | Make the printed output and messages prettier and more consistent. |
new | Make FastAPI the default REST API library and include interactive API docs. |
new | Drop support for Python 3.5 and make wheel installers support Python 3.8. |
new | Update Prodigy for spaCy v2.2. |
fix | Show an example suggested by patterns in textcat.teach only once with all matches instead of once per match. |
fix | Fix error that’d occur when passing in long label sets on the command line (due to Prodigy checking if it’s a valid file path). |
fix | Remove unused spacy_model argument from textcat.manual . |
fix | Exclude by input hash instead of task hash in ner.manual , ner.correct , pos.correct , textcat.manual and image.manual , using the new "exclude_by" setting. Examples will only be shown again if their content is identical, not if they include different highlighted suggestions. |
fix | Fix handling of newline tokens in ner.manual for multiple newline character and adjust style of ↵ symbols. Newline-only tokens are now unselectable by default to prevent creating newline token entities. You can set "allow_newline_highlight": true to change this. |
fix | Show error if MySQL database is used and JSON blob saved to the database is longer than 65535 characters, to prevent MySQL DB from truncating example. |
fyi | Rename ner.make-gold and pos.make-gold to ner.correct and pos.correct . The old names are still supported so your code won’t break. |
fyi | Deprecate various outdated recipes, the built-in live APIs and the recipe_args dict. You can still use all of these features and your code shouldn’t break but they’ll be removed in v2. |
fyi | Refactor the whole code base and module organization and various other internals, and added simple type annotations to recipe functions. |
doc | New documentation and website redesigned and rewritten completely from scratch, with tons of new content, demos and usage examples. The new site also replaces the PRODIGY_README.html that used to be available for download with Prodigy. |
doc | Update prodigy-recipes repo. |
v1.8.5 2019-10-19
This update includes a fix for a regression introduced in v1.8.4, as well as small improvements to the dataset creation and stream handling.
new | Warn after exhausting streams with many duplicates. |
fix | Fix issue introduced in v1.8.4 that could cause the client to send back empty answers if users annotated very quickly. |
fix | Remove default session from client and correctly populate session datasets. |
v1.8.4 2019-10-07
This update includes various small fixes to the interfaces and recipes.
new | Experimental: Allow moving selected bounding boxes in image_manual interface via keyboard shortcuts ← → ↑ ↓. |
new | Add prodigyundo event for custom JavaScript. |
fix | Fix issue that’d cause label change in image_manual to not be reflected correctly. |
fix | Disable unselecting of radio button if choice_auto_accept is enabled. |
fix | Always prefer rending "html" in classification interface, if available. |
fix | Improve handling of choice tasks in review recipe. |
fix | Re-add default spacing for most common HTML elements in html interface. |
fix | Ensure bin/prodigy and bin/pgy are interpreted as shell scripts. |
fix | Make textcat.manual correctly support single-label use cases. |
fix | Fix handling of pre-defined spans in EntityRecognizer . |
fix | Fix detection of user databases via entry points. |
fix | Fix race condition that’d fire prodigyanswer event incorrectly. |
fix | Prevent card labels from being displayed on top of modals. |
fix | Improve fallback if labels are provided to the app in incorrect format. |
fix | Fix handling of related sessions in feeds if "feed_overlap" is enabled. |
v1.8.3 2019-06-07
This update includes fixes to textcat.batch-train
, the NER preprocessing
logic and Prodigy’s dependencies.
fix | Fix issue in textcat.batch-train that wouldn’t pass exclusive setting to the model and converter functions correctly. |
fix | Fix handling of multiple choice data in textcat.batch-train . |
fix | Fix segmentation bug that caused spans ending on text boundaries to be dropped. |
fix | Make sure span is fully excluded if skip=True is set in add_tokens preprocessor. |
fix | Add srsly to direct dependencies and pin to latest version. |
doc | Fix typos and inconsistencies. |
v1.8.2 2019-05-28
This update includes small fixes to the terms.teach
and review
recipes, as well as improvements to the pretraining support.
fix | Fix issue in review recipe that’d raise error if no versions were generated. |
fix | Make terms.teach skip vocab entries with no vectors to prevent unnecessary warnings. |
fix | Fix serialization issue of sentencizer in textcat.batch-train . |
fix | Ensure hyperparameters from pretraining are passed to textcat.batch-train . |
v1.8.1 2019-05-21
This update includes small fixes to the text classification workflows.
fix | Fix handling of rejected example scores in textcat.manual . |
fix | Ensure handling of --eval-id in textcat.batch-train remains backwards-compatible. |
v1.8.0 2019-05-20
This release updates Prodigy for the brand new spaCy v2.1, which features BERT-style language model pretraining, an extended match pattern API and faster tokenization. We’ve also added support for basic authentication and several completely new built-in recipes and workflows for reviewing annotations from multiple sessions and resolving conflicts, manual multiple-choice text classification, and merging two or more existing datasets.
new | Update Prodigy for spaCy v2.1. |
new | Add language model pretraining support via --init-tok2vec in training recipes. |
new | New interface and review recipe for reviewing and reconciling annotations from multiple sessions on the same data. View conflicting annotations, resolve them in the UI and create a final training set. |
new | Add textcat.manual recipe to annotate text categories using the choice UI. |
new | Make textcat.batch-train accept annotations in choice format. |
new | Add --exclusive flag to textcat.batch-train to train mutually exclusive categories. |
new | Add ner.silver-to-gold recipe to convert binary accept/reject annotations to gold-standard data with no missing values. |
new | Add db-merge recipe to merge two or more datasets into a new set. |
new | Add basic authentication to the app with PRODIGY_BASIC_AUTH_USER and PRODIGY_BASIC_AUTH_PASS env variables. |
new | Add PRODIGY_ALLOWED_SESSIONS env variable to specify allowed named sessions. |
new | Store "_session_id" and "_view_id" with annotations. |
new | New REST API powered by FastAPI. Set the PRODIGY_FASTAPI environment variable and install fastapi (Python 3.6+) to try it out. |
fix | Fix issue in image_manual UI that’d cause boxes to not be deleted correctly. |
fix | Make sure flag button isn’t covered by title in annotation UI. |
fix | Use named logger "prodigy" to allow customizing logging behavior. |
fix | Allow textcat.eval recipe to read from stdin as expected. |
fix | Prevent incorrectly raised KeyError in split_sentences preprocessor. |
fix | Raise error if Database.add_examples doesn’t receive list/tuple of dataset names. |
fix | Make sure choice interface adds "accept": [] if no selection is made. |
fix | If instant_submit is enabled, send answer before requesting new questions. |
fix | Prevent keyboard in custom <input> and <textarea> elements. |
fix | Preserve docstrings of compiled Cython classes, methods and functions. |
doc | Improve various typos and inconsistencies and add new sections for new features. |
v1.7.1 2019-02-23
This update includes a small fix to the "instant_submit"
feature introduced in
the previous release.
fix | Fix issue that could cause tasks to not receive an "answer" when "instant_submit" was enabled. |
v1.7.0 2019-02-18
This update makes it easy to set up named multi-user sessions in a single instance and use some of the multi-user features we’ve developed for the upcoming Prodigy Scale. It also introduces a new setting for instant submissions and support for custom CSS and JavaScript across all interfaces. Of course, we also fixed various bugs and inconsistencies to make sure Prodigy runs as smoothly as possible.
By the way, if you want to add 12 months of updates to your license, you can now do so via our online shop!
new | Add "instant_submit" option to send back a task instantly after it’s answered in the app, skipping the history and immediately triggering the update callback if available. |
new | Support custom named sessions via query parameters in the app to enable multi-user workflows in single instances. For example, accessing the app with /?session=alex will add all annotations to a session dataset dataset-alex . The boolean "feed_overlap" setting lets you control whether to have each example sent out once so it’s annotated by someone or whether to allow overlaps and send out each example to everyone (default). |
new | Add "global_css" option across all interfaces, more human-readable class names and expose data-prodigy-view-id and data-prodigy-recipe for custom interface or recipe-specific styling. |
new | Add "javascript" option across all interfaces and fire custom events on mount, update and answer. |
new | Add --batch-size option to drop command to prevent database errors when deleting large datasets. |
fix | Make labels in pos.teach and pos.make_gold correctly default to built-in label scheme and raise error if no fine-grained labels are provided. |
fix | Make sure PatternMatcher only shows matches for recipe labels. |
fix | Fix bug that would cause add_tokens preprocessor to raise an error. |
fix | Correctly handle min_length in split_sentences preprocessor. |
fix | Fix bug that’d cause text classification tasks to not be deep copied correctly. |
fix | Raise error if terms.to-patterns is used without label to prevent null value. |
fix | Fix problem that’d cause dependency arcs to be rendered incorrectly. |
fix | Improve relative sizing of bounding boxes and labels for large images. |
fix | Ensure task can only be flagged via keyboard shortcut if "show_flag" is enabled. |
fix | Drop third-party dependency mmh3 that was causing problems for some users. |
fix | Make manual NER interface more touch-friendly. |
doc | New video: FAQ #1: Tips & tricks for NLP, annotation and training. |
doc | Improve various typos and inconsistencies and add new sections for new features. |
v1.6.1 2018-10-17
fix | Fix split_sentences pre-processor for untokenized examples. |
v1.6.0 2018-10-16
This update takes advantage of pre-built binary wheels for our dependencies and speeds up the installation by up to 10 times! We’ve also added official support for Python 3.7, made excluding the current dataset the default behavior, fixed issues related to patterns, text classification and NER training and improved some internals to get Prodigy ready for multi-user workflows and the upcoming Prodigy Scale.
new | Add official support and wheels for Python 3.7. |
new | Use spaCy v2.0.16 to take advantage of pre-built wheels and allow up to 10 times faster installation. |
new | Automatically exclude examples already present in the current dataset (e.g. make --exclude dataset the default behavior). To disable this feature, you can set "auto_exclude_current": false in your prodigy.json or recipe config. |
new | Add --loader argument to image.manual . |
new | Make annotation card header sticky for long content. |
new | Improve internal handling of sessions and streams to get Prodigy ready for better multi-user workflows. |
fix | Fix prior in PatternMatcher to prevent matches from being excluded by sorter. |
fix | Ensure spans and tokens are correctly updated in split_sentences preprocessor. |
fix | Improve textcat.eval recipe and make sure labels are added automatically. |
fix | Fix issue that would require refreshing the app when using the manual interface with a low batch size. |
fix | Make sure dataset links are removed when dropping a dataset via drop . |
fix | Fix memory leak in NER training that could cause segmentation fault for large datasets. |
fix | Fix issue in TextClassifier , where active learning didn’t resume from weights. |
fix | Exclude "model" key from hashes so that identical predictions by different models receive the same hash. |
fix | Add server middleware to prevent response caching in IE 11. |
fix | Improve NER model loading with large vectors. |
doc | Fix various typos and inconsistencies. |
v1.5.1 2018-06-13
This update includes several bug fixes and stability improvements related to the new part-of-speech tagging recipes and the built-in pattern matcher model, as well as a better identification system for match patterns.
new | Add --resume argument to ner.match to update matcher from dataset. |
new | Use hashes as pattern IDs to allow updating existing matchers even if pattern files change across sessions. |
fix | Make pos.teach and pos.batch-train work as expected with both fine-grained and coarse-grained part-of-speech tags. |
fix | Fix bug in ner.iob-to-gold that’d cause export to fail. |
fix | Small improvements to UI and web app stability. |
v1.5.0 2018-06-07
This update includes new recipes for part-of-speech tagging, an experimental release of the new manual image labeling interface and a new mechanism for adding custom loaders, database connectors and recipes via Python entry points. We’ve also added validation for incoming streams and detailed error messages for incorrect task formats, enhanced the training options for sparse and gold-standard named entity data, and improved handling of newlines and formatting tokens in the manual NER interface.
new | New recipes for part-of-speech tagging: pos.teach , pos.batch-train and pos.train-curve . |
new | Experimental: Manual image annotation interface and image.manual recipe. |
new | Add annotation task validation. Before the Prodigy server starts, your stream is checked against a schema to make sure it has the correct format. If not, Prodigy tells you what the problem is. |
new | Allow adding custom recipes, databases and loaders via entry points. |
new | Add --no-missing flag to ner.batch-train to assume all correct spans are in the gold annotation, and any spans not in the gold annotation are incorrect. This is especially useful when training from annotations collected with ner.manual or ner.make-gold . |
new | Add --resume argument to terms.teach to update target vector from dataset. |
new | Add “true” newlines to newline tokens ↵ manual interfaces. The behavior can be turned off by setting "hide_true_newline_tokens": true . |
new | Allow marking tokens as "disabled": true in manual interfaces. Disabled tokens can’t be highlighted and can be used to assist annotators with formatting. |
new | Converter recipe ner.iob-to-gold to convert IOB tags to Prodigy’s JSONL. |
fix | Disable and restore other pipeline components in batch-train recipes. |
fix | Ensure seed terms are added to the dataset correctly. |
fix | Fix bug that would cause web app to fail with annotation instructions. |
fix | Make keyboard shortcuts in choice interface work as expected again. |
fix | Add missing import and make image.test work out-of-the-box again. |
doc | Add sections on Python entry points and document new recipes and interfaces. |
doc | Fix various typos and inconsistencies. |
v1.4.2 2018-04-10
This update includes various bug fixes and efficiency improvements.
new | Allow custom HTML in classification interface. |
new | Allow pre-defined selections in choice interface, e.g. "accept": [1, 3 ]. |
fix | Improve memory usage of terms.teach . |
fix | Fix data integrity error when dropping datasets using MySQL. |
fix | Fix bug in error message of custom recipe validation introduced in v1.4.1 . |
fix | Resolve problem with image preloading in image interfaces. |
fix | Make keyboard shortcuts work as expected in choice interface. |
doc | Fix various typos and inconsistencies. |
v1.4.1 2018-03-26
This update improves efficiency of the ner.batch-train
recipe and fixes
the handling of task and input hashes in the database methods and --exclude
option. It also comes with various improvements to error messages and web app
stability.
new | Improve efficiency of ner.batch-train – up to 10× faster for some workloads! |
fix | Fix problem that would cause text classification tasks created from pattern matches to not have a label assigned to the task. |
fix | Ensure that --exclude logic is always applied after the stream is (re)hashed. |
fix | Fix bug that would cause hashes to not be returned correctly by the database. |
fix | Allow the "instructions" setting to be false or null . |
fix | Improve error messages if recipe file is not valid and if dataset doesn’t exist in terms.to-patterns . |
fix | Various improvements to UI and web app stability. |
v1.4.0 2018-03-11
This update includes a new annotation interface for
relations and dependencies, as well as an experimental dep.teach
recipe.
textcat.teach
now takes a file of match patterns instead of seed terms,
and manual interfaces now support lists of up to 30 labels with keyboard
shortcuts. We’ve also improved the customization of various components.
new | Dependency and relation annotation interface and recipes dep.teach , dep.batch-train and dep.train-curve recipes for training a dependency parsing model. Still experimental! |
new | Allow using textcat.teach with a patterns file instead of seed terms. |
new | Support list view and keyboard shortcuts for larger label sets in manual interfaces. |
new | Add option to display modal with annotation instructions. |
new | Allow skipping examples with mismatched tokenization in add_tokens . |
new | Make swipe gestures optional via "swipe": true . |
new | Allow overwriting the host and port via PRODIGY_HOST and PRODIGY_PORT environment variables. |
new | Add split_sents_threshold config setting and --unsegmented command-line option to disable sentence segmentation. |
new | Update NewsAPI loader to use v2. |
fix | Prevent MySQL server from timing out between requests. |
fix | Correctly port over spans in split_sentences preprocessor. |
fix | Always add labels from examples and --labels in ner.batch-train and consistently allow loading label sets from a string or a text file. |
fix | Fix issue that caused print recipes to not display colors when piped to less . |
fix | Ensure that pre-set task meta isn’t overwritten in the PatternMatcher . |
fix | Show error message in the web app if view_id is invalid. |
doc | Add live demo for new dep interface. |
doc | Add Prodigy Cookbook with quick solutions to various tasks. |
doc | Add glossary to “First Steps” workflow. |
doc | Order recipes in PRODIGY_README.html table of contents by type. |
v1.3.0 2018-02-01
This update introduces a new ner.make-gold
recipe that lets you create
gold-standard data faster by manually correcting a model’s predictions. We’ve
also added a new pos.make-gold
recipe for
annotating part-of-speech tags, as well as
converters to create spaCy training data from Prodigy datasets.
new | Improved ner.make-gold workflow: run a model over your text and manually correct the entities to create gold-standard data. |
new | Add "ner_manual_label_style" option to display label set as list of dropdown (always uses dropdown for more than 10 labels) and add number keyboard shortcuts to list of labels. |
new | Experimental pos.make-gold recipe for manual POS annotation. |
new | Experimental ner.gold-to-spacy and pos.gold-to-spacy converters. |
new | Add option for custom label color schemes for NER and POS tagging. |
new | Add UI option to “flag” tasks to bookmark them for later via "show_flag" setting and a flag icon and f keyboard shortcut. Add --flagged-only setting to db-out command. |
new | Rename split_tokens pre-processor to add_tokens . |
fix | Fix rendering and use icons for whitespace tokens in ner_manual . |
fix | Fix rendering of RTL languages in manual interfaces via "writing_dir" setting. |
fix | Overwrite database settings correctly when using connect() . |
fix | Fix bug in logging timestamp and log minutes correctly. |
fix | Only use colored CLI output if supported by user’s terminal. |
fix | Don’t disable entity recognizer in textcat.batch-train . |
doc | Document preprocessor components and models’ batch_train methods. |
doc | Fix various typos and add more examples. |
doc | Add docstrings to internals so they can be inspected using help() . |
v1.2.0 2018-01-09
This update introduces ner.manual
, a new recipe and interface for
manual NER annotation. You can now highlight one or
more text spans per task and select the entity label from a dropdown menu. To
allow faster annotation and less fiddly clicking, token boundaries are used to
determine the entity spans when highlighting them. Note that this workflow
replaces ner.mark
and boundaries
.
new | ner.manual recipe and interface for manual NER annotation. |
new | "card_css" option to inject custom CSS into annotation card. |
new | Experimental "show_whitespace" for basic ner interface. |
fix | Make --exclude argument and recipe option work as expected. |
fix | Don’t merge and modify NER spans before adding example to the database. |
doc | Document API of PatternMatcher model. |
doc | Improve formatting of available recipes in prodigy --help . |
doc | Fix various typos and inconsistencies. |
v1.1.0 2017-12-18
new | Automatically add new entity labels in ner.batch-train . |
new | Improve speed during NER training and allow setting the beam width via CLI. |
new | Filter out ignored examples before creating training and evaluation sets. |
new | Re-add improved version of ner.eval recipe. |
new | Handle broken JSONL in Reddit loader. |
new | Use spaCy model to assign labels in ner.print-stream . |
doc | Small improvements to documentation. |