Speech synthesis is the process of converting written text into machine-generated synthetic speech. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech units. Corpus-based methods use a large inventory to select the units to be concatenated. In this paper, we design and develop an intelligible and natural sounding corpus-based concatenative speech synthesis system for the Turkish language. The implemented system contains a front-end comprised of text analysis, phonetic analysis, and optional use of transplanted prosody. The unit selection algorithm is based on commonly used Viterbi decoding algorithm of the best-path in the network of the speech units using spectral discontinuity and prosodic mismatch objective cost measures. The back-end is the speech waveform generation based on the harmonic coding of speech and overlap-and-add mechanism. Harmonic coding enabled us to compress the unit inventory size by a factor of three. In this study, a Turkish phoneme set has been designed and a pronunciation lexicon for root words has been constructed. The importance of prosody in unit selection has been investigated by using transplanted prosody. A Turkish Diagnostic Rhyme Test (DRT) word list that can be used to evaluate the intelligibility of Turkish Text-to-Speech (TTS) systems has been compiled. Several experiments have been performed to evaluate the quality of the synthesized speech and we obtained 4.2 Mean Opinion Score (MOS) in the listening tests for our system, which is the first unit selection based system published for Turkish.
SAK, HAŞİM; GÜNGÖR, TUNGA; and SAFKAN, YAŞAR (2006) "A Corpus-Based Concatenative Speech Synthesis System for Turkish," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 14: No. 2, Article 1. Available at: https://journals.tubitak.gov.tr/elektrik/vol14/iss2/1