Dr. Wang Shih-Chang, Director of the Advanced Technology Research Institute at Chunghwa Telecom Laboratories, stated:
“Chunghwa Telecom's self-developed Mandarin-English bilingual speech synthesis technology has been applied across various sectors over the years—not only in 24/7 customer service hotlines, but also in services like the 166 and 167 weather forecast lines, accessibility tools for the visually impaired, the iBaby smart speaker, AI Semantic Cloud, and Smart Broadcasting Assistant. It's also used in voice systems for the National Health Insurance Administration and the National Fire Agency.”
Before integrating AI technologies, Chunghwa Telecom had already invested in developing realistic digital voice synthesis. They progressed from early-stage concatenative synthesis and parametric synthesis, to now applying deep learning-based AI models. These models are trained with large-scale audio datasets and corresponding textual data to produce highly natural speech output. However, the training process remained time-consuming.
To address this, Chunghwa Telecom adopted NVIDIA’s Triton Inference Server, a large-scale model inference solution, alongside TensorRT (a deep learning inference optimizer), and cuDNN libraries for GPU acceleration. With support from NVIDIA Elite Partner FONGCON, they also deployed powerful hardware systems including the NVIDIA DGX-1 Supercomputer, RTX A6000 GPUs, and multiple NVIDIA T4 Tensor Core GPUs.
These upgrades reduced model training time from several days to just one day, while also significantly lowering overall costs. The result is more efficient development of natural, human-like bilingual voice synthesis models that support a wide range of smart services and improve user interaction.
???? Source (in Chinese): https://reurl.cc/aaVxO3