Business Challenge
The development of an auto-caption feature for a client faced significant challenges due to global events like COVID-19 and rising geopolitical tensions. They needed to rapidly scale data annotation, transcription, and translation efforts for their language detection models across multiple languages. However, internal recruitment efforts struggled to meet the demands of the project due to several key obstacles:
- The sheer volume of content requiring annotation, transcription, and translation.
- Tight deadlines to complete the required work.
- The sensitive and graphic nature of some content.
- Geopolitical and logistical complexities in sourcing native speakers from remote regions.
- Limited access to technology and internet in underserved communities where these dialects are spoken.
A clear example of these challenges arose during a 2022 project requiring native Urdu speakers. Geopolitical tensions along the Indian-Pakistan border disrupted recruitment efforts, forcing the client to pivot quickly. UNIO stepped in to source, train, and deliver the required transcriptions, successfully completing the project within the necessary timeframe.
Project Requirements
Using UNIO’s global network, we sourced, verified, and trained Urdu transcribers for the client’s speech-to-text software, while collaborating with their data engineers to develop a tailored workflow addressing key requirements. This entailed:
- Security and compliance improvements:
- Updated SOPs for audits and implemented technological enhancements.
- Regular spot checks.
- Immediate feedback loops.
- Weekly syncs with the client to monitor progress and optimise outcomes.
- Comprehensive verification process:
- Confirmed worker identities, locations, and transcription proficiency.
- Conducted background checks, testing, and executed NDAs.
- Training sessions and on-the-job shadowing ensuring:
- The team consistently met quality standards
- Adhered to tight deadlines.
- Scaling operations to meet increasing demand for Urdu transcription projects, strategically focusing on scaling for larger initiatives while maintaining agility for smaller, time-sensitive tasks.
Key Success Metrics
- Initiated 2 Urdu data acquisition projects, sourcing diverse datasets to meet the client’s AI model objectives and established the ground truth for subsequent categorisation and annotation efforts.
- Completed 4 Urdu categorisation projects, our teams accurately labeling 2,036 QID’s (approximately 60, 000 minutes) of audio content to train the client’s model.
- Delivered 38,310 minutes of fully transcribed and annotated data across 5 Urdu long-form public audio projects.
- Between 2019–2021, Urdu projects were evenly distributed between small teams(1–10 workers) and medium teams (11–50 workers), with only one large-scale project (51+ workers) completed. In 2022, we met the clients increased demands by:
- Delivering on 3 large scale projects, representing a 200% growth in capacity for complex initiatives.
- Reallocating resources to smaller, high-priority projects, resulting in a 40% increase in smaller agile-short time frame projects (from 5 to 7).
- UNIO Global’s team completed 100% of the client’s 2022 requirements for Urdu, eliminating the need for input from their in-house teams.
- Processed a total of 40,990 minutes of Urdu content in 2022, maintaining high accuracy and full compliance with client standards.




