Deliberate Training Data Poisoning Project Highlights AI Data Quality Challenges
Published 2026-04-21Ingested 2026-04-22AI Regulation and GovernanceLow
Summary
Simon Willison highlighted a GitHub project by Steve Cosman called 'pelicans_riding_bicycles,' which deliberately publishes mislabeled image-text pairs (e.g., labeling a bear on a snowboard as a 'pelican riding a bicycle') with the explicit goal of polluting AI training datasets. Willison noted with approval that this effort joins a broader set of data poisoning examples, including some he has published himself. The project underscores ongoing concerns about the integrity of web-scraped trainin
Alignment: Neutral
Related Positions: ai-governance-and-risk.md
training-data-poisoningdata-qualityfoundation-modelsai-governancedata-provenanceadversarial-dataweb-scrapingopen-sourceai-safetymodel-evaluation