Blizzard Challenge 2010 Rules

From SynSIG
Revision as of 15:28, 31 January 2010 by Simon.King (talk | contribs)

THESE RULES ARE CURRENTLY ONLY A DRAFT VERSION

DATABASE ACCESS

  • You will receive a separate message about how to download this

REGISTRATION FEE

  • A registration fee of 750USD (500GBP) is due to offset the costs of running the challenge, including paying local assistants and undergraduate listeners. The fee is fixed, regardless of how many hub or spoke tasks you participate in. The fee must be paid by Friday 26th March 2010. You can pay this fee using Edinburgh University's online payments system (URL WILL BE PROVIDED SHORTLY) where you should register for the event called 'Blizzard Challenge 2010'. After doing this, please also email blizzard@festvox.org to notify us that you have paid. If you are unable to use the online payments system, please contact blizzard@festvox.org for assistance with other methods of payment.

EXPERT LISTENERS

  • Each participant is expected to provide at least ten speech experts as listeners of the evaluation tests. English and/or Mandarin native speakers (as appropriate) are preferable, where possible. The organisers would also appreciate assistance in advertising the Challenge as widely as possible (e.g., to your students or colleagues).

BUILDING VOICES

  • Participants may submit entries for any combination of tasks, subject to the following restrictions:
  • For each language in which you are participating, you must complete at least of the one hub tasks.
  • If you complete a hub task for a language, you may then submit entries for any number of the spoke tasks for that language.
  • You are encouraged to attempt both languages (remembering that, in past Challenges, the best-performing systems were not generally from native-speaker teams!).
  • It is not permissible for a single participant to submit multiple entries for any task, because the listening test will become unmanageable.
  • Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance to agree this. We will try to accommodate all reasonable requests, provided the listening test remains manageable.

Hub task for English

  • Task EH1: build a voice from the UK English 'rjs' database. You may use either the 16kHz or 48kHz versions, but the submitted wav files must be at 16kHz sampling rate.
  • Task EH2: build a voice from the specified 'ARCTIC' subset of the UK English 'roger' database, optionally using the provided hand-corrected labels. You may use either the 16kHz or 48kHz versions, but the submitted wav files must be at 16kHz sampling rate.

Spoke tasks for English

  • Task ES1: build voices from the specified 'E_SMALL10', 'E_SMALL50' and 'E_SMALL100' datasets, which consist of the first 10, 50 and 100 sentences respectively of the 'rjs' database. You may use voice conversion, speaker adaptation techniques or any other technique you like. You may use either the 16kHz or 48kHz versions, but the submitted wav files must be at 16kHz sampling rate.
  • Task ES2: build a voice from the 'rjs' database suitable for synthesising speech to be heard in the presence of additive noise. You may enter the same voice as task EH1 or EH2 if you wish, although specially-designed voices are strongly encouraged. You may use either the 16kHz or 48kHz versions, but the submitted wav files must be at 16kHz sampling rate.
  • Task ES3: the same as EH1, but you must submit 48kHz sampling rate wav files.

Hub task for Mandarin

  • Task MH1: build a voice from the full Mandarin database (approx. 6000 utterances)
  • Task MH2: build a voice from utterances XXXX to XXXX of the full Mandarin database (XXX utterances)

Spoke tasks for Mandarin

  • Task MS1: build voices from the specified 'M_SMALL10', M_'SMALL50' and 'M_SMALL100' datasets, which consist of the first 10, 50 and 100 sentences respectively of the utterances used in MH2.
  • Task MS2: build a voice from the full Mandarin database suitable for synthesising speech to be heard in the presence of additive noise. You may enter the same voice as task EH1 or EH2 if you wish, although specially-designed voices are strongly encouraged.

USE OF EXTERNAL DATA

  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data in any way you wish
  • For the UK English tasks, you should consider the Mandarin database to be external data.
  • For the Mandarin tasks, you should consider the UK English database to be external data.
  • For tasks ES1, MH2 and MS1 in which you build a voice from a subset of a database, you must not use the remainder of that database at all, for any purpose - you must pretend it does not exist.
  • You may exclude any parts of the provided databases if you wish.
  • Use of the provided labels is optional.
  • If you are in any doubt about how to apply these rules, please contact the organizers immediately.

SYNTHESISING THE TEST EXAMPLES

  • No manual intervention is allowed during synthesis. This includes, but is not limited to:
    • "Prompt sculpting"
    • Altering existing entries in your lexicon (however, you are allowed to add new words)
    • Using different subsets of the database to generate different test sentences or sentence types within a single task, unless this is a fully automatic part of your system.

RETENTION OF SUBMITTED SYNTHETIC SPEECH SAMPLES

  • Any examples that you submit for evaluation will be retained by the Blizzard organisers for future use.
  • You must include in your submission of the test sentences a statement of whether you give the organisers permission to publically distribute your waveforms and the corresponding listening test results in anonymised form. In the past, all participants have agreed to this and we strongly encourage you to give this consent.

LISTENING TEST

  • The listening test design is likely to be similar to that used in the 2009 Challenge. Depending on the number of entries for each task, the organisers may only be able to evaluate certain subsets of the synthesised sentences or certain system configurations.

PAPER

  • Each participant will be expected to submit a six-page paper describing their entry for review.
  • One of the authors of each accepted paper should present it at the Blizzard 2010 Workshop, which will be a satellite of Interspeech 2010 in Japan. The workshop will be in the Kyoto area.
  • In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)

HOW ARE THESE RULES ENFORCED?

  • This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.