
Open
Posted
•
Ends in 1 hour
Paid on delivery
Project Description We are looking for experienced freelancers or teams to provide or build a high-quality call center conversation dataset for AI training. We are open to both: * Custom data collection and annotation * Ready-made (pre-built) datasets that meet requirements -Languages Required * English – 500 hours * Hindi – 500 hours * Spanish – 500 hours Total: 1500 hours Data Requirements * Call center style agent–customer conversations only * Clean audio (no background noise) * Minimal silence and natural flow * WAV format * 16 kHz sampling rate (16-bit or higher) * Single channel preferred Transcription Requirements * Full transcription required * Speaker labels (Agent / Customer) * Timestamped alignment required Metadata Requirements (Must Confirm) * Source type (recorded or pre-built dataset) * Transcription type (AI or human-annotated) * Ability to refine AI transcripts if applicable * Speaker diarization accuracy * Timestamp alignment accuracy * Estimated WER (Word Error Rate) * Audio sampling rate details Privacy & Compliance * All personal data must be anonymized or removed * Method of de-identification must be clearly explained * Data must be legally collected with proper consent * Only for internal AI model training use Deliverables * WAV audio files * Transcript files (TXT or JSON) * Metadata (speaker labels, timestamps, language info) Requirements from Freelancer Please include: * Experience with speech datasets or ASR projects * Whether you can provide ready-made datasets * Tools and workflow used * Production capacity * Estimated cost per hour * Delivery timeline Important Note We are open to **ready-made datasets as long as they fully meet the above requirements and are legally authorized for AI training use**.
Project ID: 40470137
9 proposals
Open for bidding
Remote project
Active 5 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
9 freelancers are bidding on average $8,163 USD for this job

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
$7,505 USD in 7 days
7.2
7.2

Hi, Your project is a strong fit for large-scale ASR and conversational AI training workflows, especially with the requirement for multilingual call center datasets and structured transcription quality. I can help with: Custom call-center style dataset collection Ready-made dataset sourcing (where legally permitted) Audio cleaning and normalization Speaker diarization and timestamp alignment Human-reviewed transcription refinement Metadata generation and QA validation Data anonymization and compliance workflows Supported languages: English Hindi Spanish The delivered datasets can include: WAV audio files (16kHz, clean single-channel preferred) Timestamped transcripts Agent/Customer speaker labels JSON/TXT structured outputs WER and diarization quality reporting Consent and de-identification documentation Workflow/tools can involve: Whisper / Deepgram / AWS Transcribe / Google STT Human QA pipelines Diarization tools Audio preprocessing and validation systems The focus will be on: Legally compliant AI-training datasets High transcript accuracy Consistent metadata structure Scalable production and delivery I’d be happy to discuss production capacity, delivery phases, dataset sourcing options, and QA expectations in more detail. Best regards, Somender Singh
$7,500 USD in 7 days
0.0
0.0

I am a perfect fit for your project to deliver a high-quality, multi-language call center conversation dataset that meets your strict audio, transcription, and metadata requirements for AI training. I will ensure clean audio, precise speaker labeling, accurate timestamp alignment, and full compliance with privacy regulations for seamless integration. While I am new to Freelancer, I have strong experience and have delivered similar speech dataset projects outside the platform, including custom annotations and quality control. I offer a free consultation to better understand your needs and propose the best technical approach for data collection or ready-made sourcing. I would love to chat more about your project! Regards, Sonny Dube
$6,750 USD in 14 days
0.0
0.0

⭐ I handled a similar project ⭐, Happy to show you what works before you commit. A comprehensive multilingual call center conversation dataset with clean audio and detailed transcriptions was developed. Your need for high-quality, annotated call center speech data perfectly matches my expertise. Experienced in speech data collection, annotation, and ensuring strict privacy compliance. Specializing in audio dataset creation with a focus on precise speaker labeling, timestamp alignment, and maintaining top-notch audio quality for seamless AI training. Let's chat about your project for a free consultation; Worst case, you walk away with a free consultation and a clearer understanding of your project. Kind regards, Curtley
$10,500 USD in 30 days
0.0
0.0

Zhob, Pakistan
Payment method verified
Member since Apr 4, 2022
$30-250 USD
$10-30 USD
$30-250 USD
$30-250 USD
$30-250 USD
₹12500-37500 INR
₹12500-37500 INR
$30-250 USD
₹12500-37500 INR
min £36 GBP / hour
₹150000-250000 INR
$15-25 USD / hour
₹1500-12500 INR
₹100-400 INR / hour
₹100-400 INR / hour
$250-750 USD
₹12500-37500 INR
$10-15000 USD
$10-30 USD
$25-50 USD / hour
$30-250 USD
$15-25 USD / hour
$30-250 USD
min £36 GBP / hour
₹37500-75000 INR