Radically efficient machine teaching.
An annotation tool powered
by active learning.
pip install ./prodigy.whl
Successfully installed prodigy
prodigy ner.manual reviews_ner en_core_web_sm ./data.jsonl --label PRODUCT,PERSON,ORG
✨ Starting the web server on port 8080...
Open the app in your browser and start annotating!
Train a new AI model in hours
Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration.
Today’s transfer learning technologies mean you can train production-quality models with very few examples. With Prodigy you can take full advantage of modern machine learning by adopting a more agile approach to data collection. You'll move faster, be more independent and ship far more successful projects.
How it worksThe missing piece in your data science workflow
Prodigy brings together state-of-the-art insights from machine learning and user experience. With its continuous active learning system, you're only asked to annotate examples the model does not already know the answer to. The web application is powerful, extensible and follows modern UX principles. The secret is very simple: it's designed to help you focus on one decision at a time and keep you clicking – like Tinder for data.
Everyone knows data scientists should spend more time looking at their data. When good habits are hard to form, the trick is to remove the friction. Prodigy makes the right thing easy, encouraging you to spend more time understanding your problem and interpreting your results.
Try the demoProdigy users include
Try out new ideas quickly
Annotation is usually the part where projects stall. Instead of having an idea and trying it out, you start scheduling meetings, writing specifications and dealing with quality control. With Prodigy, you can have an idea over breakfast and get your first results by lunch. Once the model is trained, you can export it as a versioned Python package, giving you a smooth path from prototype to production.
Read moreWhat others say
Andy Halterman
@ahaltermanMordecai would not have been possible without @explosion_ai's Prodigy. A lack of labeled data held geoparsing back for years. It took a week to fix that with Prodigy.Raphael cohen
@cohenrapProdi.gy is by far the best ROI we had on any tool!FullFact
@FullFactWe've collected 25,000+ annotations from 90 fantastic volunteers, to support our automated factchecking work thanks to Prodigy, an annotation tool created by @explosion_ai.Andrew Trask
@iamtraskI'm a huge fan of everything @explosion_ai does... they're brilliant... and their new annotation tool is the best I've ever seen.Oliver Beavers
@oliverbeaversjust finishing up first major #nlp project with @explosion_ai's prodigy active learning platform. in 3 hours, did what took > 10 volunteers, painful google sheets nonsense, and weeks worth of time. game changer. #yesimshillingDavid Campion
@Orbis_21“Text Classification: Be lazy, use Prodi.gy (a tool by @explosion_ai) !”. This tool (prodi.gy) is fantastic and really help us to speed-up and build our models.Ajinkya Kale
@ajinkyakaleIts amazing, every time i try to build something in house these guys beat me at it providing an awesome solution out of the box!
Fully scriptable and extensible
Prodigy is fully scriptable, and slots neatly into the rest of your Python-based data science workflow. As the makers of spaCy, a popular library for Natural Language Processing, we understand how to make tools programmers love. The simple secret is this: programmers want to be able to program. Good developer tools need to let you in, not lock you out. That's why Prodigy comes with a rich Python API, elegant command-line integration, and a super productive Jupyter extension. Using custom recipe scripts, you can adapt Prodigy to read and write data however you like, and plug in custom models using any of your favourite frameworks.
recipe.pyimport prodigy
from prodigy.components.loaders import JSONL
@prodigy.recipe("custom")
def custom_recipe(dataset, source):
return {
"dataset": dataset,
"stream": JSONL(source),
"view_id": "classification"
}
Command-line usage
prodigycustommy_dataset./data.jsonl-F recipe.py