Appen includes new speech and text datasets for AI training

Appen Limited, a provider of high-quality training data for organizations that build effective AI systems at scale, is offering new off-the-shelf (OTS) datasets, designed to make it easier and faster for businesses to acquire the high-quality training data needed to accelerate their artificial intelligence (AI) and machine learning (ML) projects.

The new OTS datasets include human body movement and innovative baby crying sounds, as well as scripted speech and images with text suitable for optical character recognition (OCR) for high-demand but hard-to-acquire languages, such as Arabic, Croatian, Greek, Hungarian, Thai, and more.

With the expanded datasets, Appen’s total OTS offering includes over 250 datasets, comprising of over 11,000 hours of audio, over 25,000 images and over 8.7 million words across 80 languages and multiple dialects.

Appen’s OTS datasets are a fast, cost-effective tool to jumpstart an AI or ML project with consistent high-quality training data. Teams expanding their AI capabilities can also leverage OTS datasets to effectively improve accuracy, develop new model skills and incorporate other improvements into their AI models.

All Appen datasets are developed using a fully transparent, opt-in methodology, so AI specialists can be assured their data is clean and compliant, eliminating the potential risk of backlash and reputation damage.

“AI teams around the world working on projects with tight deadlines and flexible data requirements can benefit from using off-the-shelf datasets,” said Wilson Pang, CTO of Appen. “OTS datasets shorten time to value and provide access to high-quality data at a lower total cost than using traditional methods. We at Appen take the necessary steps to ensure that all our datasets are ethically sourced and demographically balanced, enabling companies to maintain responsible AI practices by minimizing bias in their models and ensuring fair treatment of data annotators.”

The most experienced AI experts combine OTS datasets with on-demand data collection and annotation projects to meet their complex AI model training data needs.

Appen is the leader in offering continued support through a range of specific data collection services, such as ongoing data annotation and smart labeling, through AI-powered tools and automated workflows to maximize efficiency.

For more information about this news, visit https://appen.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

NEW EVENT: KM & AI Summit 2025, March 17 - 19 in beautiful Scottsdale, Arizona. Register Now!

Appen includes new speech and text datasets for AI training

Delivering Knowledge Everywhere: The Rise of Self-Service

Special Report- Measuring KM Success: Key Metrics and Strategies

Special Report: Leverage Automation and AI to Deliver Better Business Value

More

Building a KM Foundation for AI

GenAI Success Begins with Content: 5 Strategies For Accuracy & Precision

Empowering LLMs with a Semantic Layer

Is Your Knowledge Content Ready for Large Language Models? Key Challenges and Strategies for Success

More Webinars