Proof pending. Core topic summary fields are still materializing.
Recent advancements in speech recognition technology focus on improving performance across diverse languages and contexts, particularly for low-resource languages. Innovations like Vividh-ASR and Ethio-ASR address challenges in multilingual models, while Whisper-CD enhances long-form transcription accuracy. These developments are crucial for builders aiming to create inclusive and robust speech applications that cater to underrepresented languages and dialects. By leveraging newly curated datasets and optimized training strategies, researchers are paving the way for more reliable and efficient speech recognition systems that can adapt to various linguistic nuances and real-world scenarios. This progress not only enhances user experience but also broadens accessibility in technology.
Topic-specific paper and score movement from the daily diff ledger.
Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance, a phenomenon we term studio-bias. To diagnose this mi...
Long-form speech recognition with large encoder-decoder models such as Whisper often exhibit hallucinations, repetition loops, and content omissions. These errors can accumulate and be further amplifi...
Nepal Bhasha (Newari), an endangered language of the Kathmandu Valley, remains digitally marginalized due to the severe scarcity of annotated speech resources. In this work, we introduce Nwāchā Munā, ...
Despite significant advances in speech processing, Portuguese remains under-resourced due to the scarcity of public, large-scale, and high-quality datasets. To address this gap, we present a new datas...
We present Ethio-ASR, a suite of multilingual CTC-based automatic speech recognition (ASR) models jointly trained on five Ethiopian languages: Amharic, Tigrinya, Oromo, Sidaama, and Wolaytta. These la...
As pretrained large language models replace task-specific decoders in speech recognition, a critical question arises: do their text-derived priors make recognition fairer or more biased across demogra...
Model merging is a scalable alternative to multi-task training that combines the capabilities of multiple specialised models into a single model. This is particularly attractive for large speech found...
Automatic Speech Recognition (ASR) performance is heavily dependent on the availability of large-scale, high-quality datasets. For low-resource languages, existing open-source ASR datasets often suffe...
Chinese mandarin visual speech recognition (VSR) is a task that has advanced in recent years, yet still lags behind the performance on non-tonal languages such as English. One primary challenge arises...
Taiwanese Hakka is a low-resource, endangered language that poses significant challenges for automatic speech recognition (ASR), including high dialectal variability and the presence of two distinct w...
Freshness
Canonical route: /topics
Agent Handoff
Canonical ID speech-recognition | Route /topic/speech-recognition
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/topic/speech-recognitionMCP example
{
"tool": "search_papers",
"arguments": {
"query": "Speech Recognition",
"cluster": "Speech Recognition"
}
}source_context
{
"surface": "topic",
"mode": "topic",
"query": "Speech Recognition",
"normalized_query": "speech-recognition",
"route": "/topic/speech-recognition",
"paper_ref": null,
"topic_slug": "speech-recognition",
"benchmark_ref": null,
"dataset_ref": null
}Use This Via API or MCP
Topic pages bundle paper counts, viability trends, author concentration, and top questions into one canonical surface your agents can reference before they open Signal Canvas or create a workspace.