Blizzard Challenge 2009 Rules: Difference between revisions

From SynSIG
Line 34: Line 34:


===Spoke tasks for Mandarin===
===Spoke tasks for Mandarin===
* Task MS1: build voices from the specified 'SMALL_10', 'SMALL_50' and 'SMALL_100' subsets of the UK English database, which consist of the first 10, 50 and 100 sentences respectively of the full Mandarin database.
* Task MS1: build voices from the specified 'SMALL10', 'SMALL50' and 'SMALL100' subsets of the UK English database, which consist of the first 10, 50 and 100 sentences respectively of the full Mandarin database.
* Task MS2: build a voice suitable for synthesising speech to be transmitted via a telephone channel. A telephone channel simulation tool is available to assist in system development.
* Task MS2: build a voice suitable for synthesising speech to be transmitted via a telephone channel. A telephone channel simulation tool is available to assist in system development.



Revision as of 16:33, 16 January 2009

DATABASE ACCESS

  • You will receive a separate message about how to download this

REGISTRATION FEE

  • A registration fee of 750USD (500GBP) is due to offset the costs of running the challenge, including paying local assistants and undergraduate listeners. The fee is fixed, regardless of how many hub or spoke tasks you participate in. The fee must be paid by Friday 10th April 2009. You can pay this fee using Edinburgh University's online payments system (https://www.epay.ed.ac.uk/events/eventdetails.asp?eventid=88) where you should register for the event called 'Blizzard Challenge 2009'. After doing this, please also email blizzard@festvox.org to notify us that you have paid. If you are unable to use the online payments system, please contact the organisers for assistance with other methods of payment.

EXPERT LISTENERS

  • Each participant is expected to provide ten speech experts as listeners of the evaluation tests. English or Mandarin native speakers (as appropriate) are preferable, where possible. The organisers would also appreciate assistance in advertising the Challenge as widely as possible (e.g., to your students or colleagues).

BUILDING VOICES

  • Participants may participate in the Challenge for one or both languages. You are encouraged you to attempt both languages.
  • For each language in which you are participating, you must complete the hub task.
  • If you complete the hub task for a language, you may then submit entries for any number of the spoke tasks for that language.
  • It is not permissible for a single participant to submit multiple entries, because the listening test will become unmanageable.
  • Participants involved in joint projects or consortia who wish to submit both an individual entry and a joint system should contact the organisers in advance to agree this. We will try to accommodate all reasonable requests, provided the listening test remains manageable.

Hub task for English

You must build two voices:

  • Task EH1: build a voice from the full UK English database (about 15 hours)
  • Task EH2: build a voice from the specified 'ARCTIC' subset of the UK English database (about 1 hour)

Spoke tasks for English

  • Task ES1: build voices from the specified 'SMALL10', 'SMALL50' and 'SMALL100' subsets of the UK English database, which consist of the first 10, 50 and 100 sentences respectively of the 'ARCTIC' subset. You may use voice conversion, speaker adaptation techniques or any other technique you like.
  • Task ES2: build a voice suitable for synthesising speech to be transmitted via a telephone channel. A telephone channel simulation tool is available to assist in system development.
  • Task ES3: build a voice suitable for synthesising the computer role in a human-computer dialogue. A set of development dialogues are provided. The test dialogues will be from the same domain.

Hub task for Mandarin

  • Task MH: build a voice from the full Mandarin database (about 6.5 hours)

Spoke tasks for Mandarin

  • Task MS1: build voices from the specified 'SMALL10', 'SMALL50' and 'SMALL100' subsets of the UK English database, which consist of the first 10, 50 and 100 sentences respectively of the full Mandarin database.
  • Task MS2: build a voice suitable for synthesising speech to be transmitted via a telephone channel. A telephone channel simulation tool is available to assist in system development.

USE OF EXTERNAL DATA

  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data. For each hub or spoke task independently, you must choose and follow one of these two sets of rules:
    • RULES VERSION 1 (standard rules): You may use external data to construct these parts of your system:
      • text normalisation
      • lexicon & letter-to-sound
      • duration model
      • F0 model
      • aligner (i.e., any component used only to label the database, such as a set of HMMs used for forced alignment)
    • RULES VERSION 2 (voice conversion rules): You may use external data in any way you wish
  • When building a voice from a subset of the data, you must not use the remainder of the data at all, with the sole exception of training the aligner
  • If you are in any doubt about how to apply these rules, please contact the organizers immediately.
  • You must include in your submission of the test sentences a declaration of which rule set you followed for each task

SYNTHESISING THE TEST EXAMPLES

  • No manual intervention is allowed during synthesis. This includes, but is not limited to:
    • "Prompt sculpting"
    • Altering existing entries in your lexicon (however, you are allowed to add new words)
    • Using different subsets of the database for different test sentences or sentence types, unless this is a fully automatic part of your system

LISTENING TEST

  • We are not releasing details of the listening test design at this time, because you should not be tailoring your voice building to it. It will contain similar sections to previous challenges along with new ones, and you will need to synthesise several hundred sentences from text.
  • For voice conversion-type systems, there will be an additional component of the test, to judge how close the system sounds to the database speaker. If the listening test design allows, we will perform this test for all standard systems too.
  • Any examples that you submit for evaluation may be retained by the Blizzard organisers for future use. We hope to be able to distribute them in anonymised form to all participants, or publically, subject to participants' consent.

PAPER

  • Each participant will be expected to submit a four-page paper describing their entry for review.
  • One of the authors of each accepted paper should present it at the Blizzard 2008 Workshop, which we hope will be a satellite of Interspeech 2008 in Australia
  • In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)

HOW ARE THESE RULES ENFORCED?

  • This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.