Blizzard Challenge 2009 Rules
From SynSIG
DATABASE ACCESS
- You will receive a separate message about how to download this
REGISTRATION FEE
- A registration fee of 750USD (500GBP) is due to offset the costs of running the challenge, including paying local assistants and undergraduate listeners. The fee is fixed, regardless of how many hub or spoke tasks you participate in. The fee must be paid by Friday 10th April 2009. You can pay this fee using Edinburgh University's online payments system (https://www.epay.ed.ac.uk/events/eventdetails.asp?eventid=88) where you should register for the event called 'Blizzard Challenge 2009'. After doing this, please also email blizzard@festvox.org to notify us that you have paid. If you are unable to use the online payments system, please contact the organisers for assistance with other methods of payment.
EXPERT LISTENERS
- Each participant is expected to provide ten speech experts as listeners of the evaluation tests. English or Mandarin native speakers (as appropriate) are preferable, where possible. The organisers would also appreciate assistance in advertising the Challenge as widely as possible (e.g., to your students or colleagues).
BUILDING VOICES
- Participants may participate in the Challenge for one or both languages. You are encouraged you to attempt both languages.
- For each language in which you are participating, you must complete the hub task.
- If you complete the hub task for a language, you may then submit entries for any number of the spoke tasks for that language.
- It is not permissible for a single participant to submit multiple entries, because the listening test will become unmanageable.
- Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance to agree this. We will try to accommodate all reasonable requests, provided the listening test remains manageable.
Hub task for English
You must build two voices:
- Task EH1: build a voice from the full UK English database (about 15 hours)
- Task EH2: build a voice from the specified 'ARCTIC' subset of the UK English database (about 1 hour)
Spoke tasks for English
- Task ES1: build voices from the specified 'SMALL10', 'SMALL50' and 'SMALL100' datasets, which consist of the first 10, 50 and 100 sentences respectively of the 'ARCTIC' subset. You may use voice conversion, speaker adaptation techniques or any other technique you like.
- Task ES2: build a voice from the full UK English database suitable for synthesising speech to be transmitted via a telephone channel. A telephone channel simulation tool is available to assist in system development.
- Task ES3: build a voice from the full UK English database suitable for synthesising the computer role in a human-computer dialogue. A set of development dialogues are provided. The test dialogues will be from the same domain.
Hub task for Mandarin
- Task MH: build a voice from the full Mandarin database (about 6000 utterances / 170000 Chinese characters)
Spoke tasks for Mandarin
- Task MS1: build voices from the specified 'SMALL10', 'SMALL50' and 'SMALL100' datasets, which consist of the first 10, 50 and 100 sentences respectively of the full Mandarin database.
- Task MS2: build a voice from the full Mandarin database suitable for synthesising speech to be transmitted via a telephone channel. A telephone channel simulation tool is available to assist in system development.
USE OF EXTERNAL DATA
- "External data" is defined as data, of any type, that is not part of the provided database.
- For the UK English tasks, you should consider the Mandarin database to be external data.
- For the Mandarin tasks, you should consider the UK English database to be external data.
- You are allowed to use external data. For each hub or spoke task independently, you must choose and follow one of these two sets of rules:
- RULE SET 1 (standard rules): You may use external data to construct these parts of your system:
- text normalisation
- lexicon & letter-to-sound
- duration model
- F0 model
- aligner (i.e., any component used only to label the database, such as a set of HMMs used for forced alignment)
- RULE SET 2 (voice conversion / speaker adaptation rules): You may use external data in any way you wish
- RULE SET 1 (standard rules): You may use external data to construct these parts of your system:
- When building a voice from a subset of the data, you must not use the remainder of the data at all, with the sole exception of training the aligner
- You may exclude any parts of the provided databases if you wish.
- If you are in any doubt about how to apply these rules, please contact the organizers immediately.
- You must include in your submission of the test sentences a declaration of which rule set you followed for each task
SYNTHESISING THE TEST EXAMPLES
- No manual intervention is allowed during synthesis. This includes, but is not limited to:
- "Prompt sculpting"
- Altering existing entries in your lexicon (however, you are allowed to add new words)
- Using different subsets of the database for different test sentences or sentence types, unless this is a fully automatic part of your system
RETENTION OF SUBMITTED SYNTHETIC SPEECH SAMPLES
- Any examples that you submit for evaluation will be retained by the Blizzard organisers for future use.
- You must include in your submission of the test sentences a statement of whether you give the organisers permission to publically distribute your waveforms and the corresponding listening test results in anonymised form. We strongly encourage you to give this consent.
LISTENING TEST
- The listening test design is likely to be similar to that used in the 2008 Challenge. Depending on the number of entries for each task, the organisers may only be able to evaluate certain subsets of the synthesised sentences or certain system configurations. For example, if a large number of entries are received for spoke task ES1, then we may choose only to evaluate the 100 sentence condition.
PAPER
- Each participant will be expected to submit a six-page paper describing their entry for review.
- One of the authors of each accepted paper should present it at the Blizzard 2008 Workshop, which will be a satellite of Interspeech 2009 in the UK. The workshop will probably be in Edinburgh, but this will be confirmed later.
- In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)
HOW ARE THESE RULES ENFORCED?
- This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.