Blizzard Challenge 2013 Rules

From SynSIG

THESE RULES ARE CURRENTLY UNDER CONSTRUCTION AND ARE SUBJECT TO CHANGE


DATABASE ACCESS

REGISTRATION FEE

  • A registration fee of 500GBP (approx 800USD) is payable by all participants to offset the costs of running the challenge, including paying local assistants and listeners. The fee must be paid by the end of May 2013. You can pay this fee using Edinburgh University's online payments system: LINK TO BE PROVIDED LATER. After doing this, please also email blizzard@festvox.org to notify us that you have paid. If you are genuinely unable to use the online payments system, please contact blizzard@festvox.org for assistance with other methods of payment. However, we strongly prefer the epay system because it reduces the costs and admin work for us. If you must pay by any other method, please contact us in plenty of time (several weeks before the payment deadline); an additional charge of 100GBP will be made for any payments not made using the epay system.

EXPERT LISTENERS

  • Each participant should try to recruit at least ten volunteer listeners for the each of the evaluation tests. Native speakers are preferable, where possible. The organisers would also appreciate assistance in advertising the Challenge as widely as possible (e.g., to your students or colleagues).

MATERIALS PROVIDED

All participants will have access to the following materials (subject to signing appropriate licenses):

Voice building data

  • English: Audiobook data kindly provided by The Voice Factory, from a single female speaker, provided as approximately 300 hours of chapter-sized mp3 files, plus approximately 19 hours of non-compressed wav files. The wav files have been segmented into sentences and aligned with the text by Lessac Technologies, Inc.
  • Indian languages: About 1 hour of speech data in each of four Indian languages (Hindi, Bengali, Kannada and Tamil) from the IIT-H INDIC corpus, recorded by native non-professional speakers in quiet office environments. Text is provided in UTF-8 format. No other information, such as segment labels, is provided.

THE CHALLENGES

This year there are two parts to the Blizzard Challenge: the main English audiobook tasks, and pilot tasks on Indian language data. The Indian data will then be used as the main task in 2014 and 2015 (possibly with additional data in further languages being made available).

  • It is not permissible for a single participant to submit multiple entries to any task, because the listening test will become unmanageable. This rule may be relaxed in the event of a small number of participants.
  • Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance to agree this. We will try to accommodate all reasonable requests, provided the listening test remains manageable.
  • It is strongly encouraged to participate in all tasks and not to "cherry pick".
  • English tasks
    • Task 2013-EH1 -- build a voice from the provided unsegmented audio; text is not provided, so must be obtained by participants from the web (e.g., Project Gutenberg) and aligned with the audio
    • Task 2013-EH2 -- build a voice from the provided segmented audio; the accompanying aligned text may be used, or text may be obtained from the web
  • Indian Language task (pilot phase for the 2014 challenge)
  • Task 2013-IH1 -- Build one voice in each language from the provided data.


USE OF EXTERNAL DATA

  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data in any way you wish, subject to any exclusions given in these rules
  • Use of external data is entirely optional and is not compulsory
  • You must use the provided audio files
  • You must not use any additional speech data from the same speakers
  • You may exclude any parts of the provided databases if you wish.
  • Use of any provided segmentations, transcriptions or labels is optional.
  • If you are in any doubt about how to apply these rules, please contact the organizers immediately.

SYNTHESISING THE TEST EXAMPLES

  • The exact nature of the test set will not be revealed in advance, but is likely to include both sentence, paragraph and possibly longer texts from a similar domain to the provided corpus, as well as texts from other domains. Formal listening tests will be conducted to evaluate the synthetic speech submitted.

RETENTION OF SUBMITTED SYNTHETIC SPEECH SAMPLES

  • Any examples that you submit for evaluation will be retained by the Blizzard organisers for future use.
  • You must include in your submission of the test sentences a statement of whether you give the organisers permission to publically distribute your waveforms and the corresponding listening test results in anonymised form. In the past, all participants have agreed to this and we strongly encourage you to give this consent.

LISTENING TEST

  • The Blizzard organisers will conduct a listening test design which will probably include the standard elements used in previous years (naturalness, speaker similarity, intelligibility) and may be extended to include additional tests specific to the audiobook reading task.

PAPER

  • Each participant will be expected to submit a six-page paper describing their entry for review.
  • One of the authors of each accepted paper should present it at the Blizzard 2013 Workshop
  • In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)

HOW ARE THESE RULES ENFORCED?

  • This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.