Evaluation: Difference between revisions

From SynSIG
No edit summary
No edit summary
Line 1: Line 1:
== Towards freely usable software and datasets for assesment purposes ==
Assessing speech synthesis is not as easy as assessing speech recognition, for various reasons:
Assessing speech synthesis is not as easy as assessing speech recognition, for various reasons:
* Various criteria can be used (do we assess speech intelligibility, or speech naturalness, or the efficiency of the speech component in a given application, etc.).  
* Various criteria can be used (do we assess speech intelligibility, or speech naturalness, or the efficiency of the speech component in a given application, etc.).  
* It systematically requires subjective tests by human listeners, which makes assessment a heavy task.
* It systematically requires subjective tests by human listeners, which makes assessment a heavy task.
* Assessing the overall quality of a TTS system does not often give interesting information on how to improve the system, since the output os the result of several complex and intermixed processes.  
* Assessing the overall quality of a TTS system does not often give interesting information on how to improve the system, since the output os the result of several complex and intermixed processes.  
== Freely usable software ==


It is generally agreed that the developement of free software can boost assessment and improvement of technologies. As far as speech synthesis is concerned, the community has made several important contributions over the past 10 years. See the [[Software]] page of this web site.  
It is generally agreed that the developement of free software can boost assessment and improvement of technologies. As far as speech synthesis is concerned, the community has made several important contributions over the past 10 years. See the [[Software]] page of this web site.  


== Available datasets for assesment purposes ==
Developing widely available common datasets is also a primary importance for encouraging informative comparative tests of synthesis techniques. For American English, the CMU ARCTIC dataset available in the framework of the [http://www.festvox.org/blizzard/blizzard2005.html BLIZZARD challenge] is an example to follow (and adapt to other languages).
Developing widely available common datasets is also a primary importance for encouraging informative comparative tests of synthesis techniques. For American English, the CMU ARCTIC dataset available in the framework of the [http://www.festvox.org/blizzard/blizzard2005.html BLIZZARD challenge] is an example to follow (and adapt to other languages).


== Assesment protocols ==
There is no universally accepted assessment technique for TTS. In the [http://www.festvox.org/blizzard/bc2005/IS052023.PDF Blizzard challenge], the naturalness of speech synthesizers was judged on the basis of MOS (Mean Opinion Score) tests, while their intelligibility were measured by the WER (word error rate) otbained in two test conditions : a MRT test (modified rhyme test) and a SUS test (using semantically unpredictable sentences).
There is no universally accepted assessment technique for TTS. In the [http://www.festvox.org/blizzard/bc2005/IS052023.PDF Blizzard challenge], the naturalness of speech synthesizers was judged on the basis of MOS (Mean Opinion Score) tests, while their intelligibility were measured by the WER (word error rate) otbained in two test conditions : a MRT test (modified rhyme test) and a SUS test (using semantically unpredictable sentences).


==HLT-evaluation.org==
==HLT-evaluation.org==
Another source of information on Speech Synthesis evaluation is the [http://www.hlt-evaluation.org/article.php3?id_article=16 TTS page] in the [http://www.hlt-evaluation.org Human Language Technologies Evaluation] web site.
Another source of information on Speech Synthesis evaluation is the [http://www.hlt-evaluation.org/article.php3?id_article=16 TTS page] in the [http://www.hlt-evaluation.org Human Language Technologies Evaluation] web site.

Revision as of 12:19, 19 May 2006

Assessing speech synthesis is not as easy as assessing speech recognition, for various reasons:

  • Various criteria can be used (do we assess speech intelligibility, or speech naturalness, or the efficiency of the speech component in a given application, etc.).
  • It systematically requires subjective tests by human listeners, which makes assessment a heavy task.
  • Assessing the overall quality of a TTS system does not often give interesting information on how to improve the system, since the output os the result of several complex and intermixed processes.

Freely usable software

It is generally agreed that the developement of free software can boost assessment and improvement of technologies. As far as speech synthesis is concerned, the community has made several important contributions over the past 10 years. See the Software page of this web site.

Available datasets for assesment purposes

Developing widely available common datasets is also a primary importance for encouraging informative comparative tests of synthesis techniques. For American English, the CMU ARCTIC dataset available in the framework of the BLIZZARD challenge is an example to follow (and adapt to other languages).

Assesment protocols

There is no universally accepted assessment technique for TTS. In the Blizzard challenge, the naturalness of speech synthesizers was judged on the basis of MOS (Mean Opinion Score) tests, while their intelligibility were measured by the WER (word error rate) otbained in two test conditions : a MRT test (modified rhyme test) and a SUS test (using semantically unpredictable sentences).

HLT-evaluation.org

Another source of information on Speech Synthesis evaluation is the TTS page in the Human Language Technologies Evaluation web site.