Blizzard Challenge 2007 Rules: Difference between revisions

Revision as of 10:10, 17 April 2007

DATABASE ACCESS

You will receive a separate message about how to download this

REGISTRATION FEE

A registration fee of 500USD is due to offset the costs of running the challenge, including paying undergraduate listeners. This must be paid by the time you submit your test examples. You will receive separate instructions on how to pay this.

EXPERT LISTENERS

Each participant is expected to provide ten speech experts as listeners of the evaluation tests. English native speakers are preferable, where possible.

BUILDING VOICES

Each participant should build three synthetic voices from the database. It is permissible to submit fewer than three voices, but we strongly encourage you to complete the full challenge because this will be more informative.
It is not permissible for a single participant to submit multiple entries for any of the voices (because the listening test will become unmanageable).
All three voices should be built using the same method, software, external data, etc. For example, you are not allowed to use unit selection for voice A but a voice conversion method for voices B and C.

Voices to be built

Voice A: from the full dataset (about 8 hours)
Voice B: from the ARCTIC subset (about 1 hour)
Voice C: from a subset of the data chosen by you, under the following conditions:
- you may only base your selection on the text (and not the speech, or any information such as labelling which has been derived with reference to the speech signal)
- if your selection method requires phonetic, prosodic, or any other type of labelling, this must have been derived from the text only
- you must select entire utterances
- the total duration of the utterances you select must be no more than 2914 seconds (which is equal to the duration of the ARCTIC subset); you should use the officially provided durations file to make this calculation, which will be emailed to you.
- If you use the provided database to train any parts of your system (e.g., a prosodic model or HMM parameters), then for voices B and C, you must not use the whole database to train those parts, but only the appropriate subset. See below for rules on using external data.

USE OF EXTERNAL DATA

"External data" is defined as data, of any type, that is not part of the provided database.
You are allowed to use external data. You must follow one of these two sets of rules (and the same one for all three voices):
- Standard rules: You may use external data to construct these parts of your system:
  - text normalisation
  - lexicon & letter-to-sound
  - duration model
  - F0 model
  - aligner (i.e., any component used only to label the database, such as a set of HMMs used for forced alignment)
- Voice conversion rules: You may use external data in any way you wish
In essence, if there is any possibility that your system could sound like a different speaker than the database speaker, then your system should be classified as a voice conversion type of system.
If you are in any doubt about how to apply these rules, please contact the organizers immediately.

SYNTHESISING THE TEST EXAMPLES

No manual intervention is allowed during synthesis. This includes, but is not limited to:
- "Prompt sculpting"
- Altering existing entries in your lexicon (however, you are allowed to add new words)
- Using different subsets of the database for different test sentences or sentence types, unless this is a fully automatic part of your system

LISTENING TEST

We are not releasing details of the listening test design at this time, because you should not be tailoring your voice building to it. It will be largely similar to previous challenges, and you will need to synthesise several hundred sentences from text.
For voice conversion-type systems, there will be an additional component of the test, to judge how close the system sounds to the database speaker. If the listening test design allows, we will perform this test for all standard systems too.
Any examples that you submit for evaluation may be retained for future use. We hope to be able to distribute them in anonymised form to all participants, or publically.

PAPER

Each participant will be expected to submit a four-page paper describing their entry for review.
One of the authors of each accepted paper should present it at a satellite workshop of SSW6, on August 25, 2007 in Bonn, Germany
In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)

HOW ARE THESE RULES ENFORCED?

This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.

@@ Line 1: / Line 1: @@
-DATABASE ACCESS
+==DATABASE ACCESS==
+* You will receive a separate message about how to download this
-  * You will receive a separate message about how to download this
+==REGISTRATION FEE==
+* A registration fee of 500USD is due to offset the costs of running the challenge, including paying undergraduate listeners. This must be paid by the time you submit your test examples. You will receive separate instructions on how to pay this.
-REGISTRATION FEE
+==EXPERT LISTENERS==
+* Each participant is expected to provide ten speech experts as listeners of the evaluation tests. English native speakers are preferable, where possible.
-  * A registration fee of 500USD is due to offset the costs of running
+==BUILDING VOICES==
-    the challenge, including paying undergraduate listeners. This must
+* Each participant should build three synthetic voices from the database. It is permissible to submit fewer than three voices, but we strongly encourage you to complete the full challenge because this will be more informative.
-    be paid by the time you submit your test examples. You will
+* It is not permissible for a single participant to submit multiple entries for any of the voices (because the listening test will become unmanageable).
-    receive separate instructions on how to pay this.
+* All three voices should be built using the same method, software, external data, etc. For example, you are not allowed to use unit selection for voice A but a voice conversion method for voices B and C.
-EXPERT LISTENERS
+===Voices to be built===
+* Voice A: from the full dataset (about 8 hours)
+* Voice B: from the ARCTIC subset (about 1 hour)
+* Voice C: from a subset of the data chosen by you, under the following conditions:
+** you may only base your selection on the text (and not the speech, or any information such as labelling which has been derived with reference to the speech signal)
+** if your selection method requires phonetic, prosodic, or any other type of labelling, this must have been derived from the text only
+** you must select entire utterances
+** the total duration of the utterances you select must be no more than 2914 seconds (which is equal to the duration of the ARCTIC subset); you should use the officially provided durations file to make this calculation, which will be emailed to you.
+** If you use the provided database to train any parts of your system (e.g., a prosodic model or HMM parameters), then for voices B and C, you must not use the whole database to train those parts, but only the appropriate subset. See below for rules on using external data.
-  * Each participant is expected to provide ten speech experts as
+==USE OF EXTERNAL DATA==
-    listeners of the evaluation tests. English native speakers are
+* "External data" is defined as data, of any type, that is not part of the provided database.
-    preferable, where possible.
+* You are allowed to use external data. You must follow one of these two sets of rules (and the same one for all three voices):
+** Standard rules: You may use external data to construct these parts of your system:
+*** text normalisation
+*** lexicon & letter-to-sound
+*** duration model
+*** F0 model
+*** aligner (i.e., any component used only to label the database, such as a set of HMMs used for forced alignment)
+** Voice conversion rules: You may use external data in any way you wish
+* In essence, if there is any possibility that your system could sound like a different speaker than the database speaker, then your system should be classified as a voice conversion type of system.
+* If you are in any doubt about how to apply these rules, please contact the organizers immediately.
-BUILDING VOICES
+==SYNTHESISING THE TEST EXAMPLES==
+* No manual intervention is allowed during synthesis. This includes, but is not limited to:
+** "Prompt sculpting"
+** Altering existing entries in your lexicon (however, you are allowed to add new words)
+** Using different subsets of the database for different test sentences or sentence types, unless this is a fully automatic part of your system
-  * Each participant should build three synthetic voices from the
+==LISTENING TEST==
-    database. It is permissible to submit fewer than three voices, but
+* We are not releasing details of the listening test design at this time, because you should not be tailoring your voice building to it. It will be largely similar to previous challenges, and you will need to synthesise several hundred sentences from text.
-    we strongly encourage you to complete the full challenge because
+* For voice conversion-type systems, there will be an additional component of the test, to judge how close the system sounds to the database speaker. If the listening test design allows, we will perform this test for all standard systems too.
-    this will be more informative.
+* Any examples that you submit for evaluation may be retained for future use. We hope to be able to distribute them in anonymised form to all participants, or publically.
-  * It is not permissible for a single participant to submit multiple
-    entries for any of the voices (because the listening test will
-    become unmanageable).
-  * All three voices should be built using the same method, software,
-    external data, etc. For example, you are not allowed to use unit
-    selection for voice A but a voice conversion method for voices B
-    and C.
-  * Voices to be built:
-      Voice A: from the full dataset (about 8 hours)
-      Voice B: from the ARCTIC subset (about 1 hour)
-      Voice C: from a subset of the data chosen by you, under the
-      following conditions:
-      - you may only base your selection on the text (and not the
-        speech, or any information such as labelling which has been
-        derived with reference to the speech signal)
-      - if your selection method requires phonetic, prosodic, or any
-        other type of labelling, this must have been derived from the
-        text only
-      - you must select entire utterances
-      - the total duration of the utterances you select must be no
-        more than 2914 seconds (which is equal to the duration of the
-        ARCTIC subset); you should use the officially provided
-        durations file to make this calculation, which will be emailed
-        to you.
-  * If you use the provided database to train any parts of your system
-    (e.g., a prosodic model or HMM parameters), then for voices B and
-    C, you must not use the whole database to train those parts, but
-    only the appropriate subset. See below for rules on using external
-    data.
+==PAPER==
+* Each participant will be expected to submit a four-page paper describing their entry for review.
+* One of the authors of each accepted paper should present it at a satellite workshop of SSW6, on August 25, 2007 in Bonn, Germany
+* In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)
-USE OF EXTERNAL DATA
+==HOW ARE THESE RULES ENFORCED?==
+* This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.
-  * "External data" is defined as data, of any type, that is not part
-    of the provided database.
-  * You are allowed to use external data. You must follow one of these
-    two sets of rules (and the same one for all three voices):
-      * Standard rules: You may use external data to construct these
-          parts of your system:
-            - text normalisation
-            - lexicon & letter-to-sound
-            - duration model
-            - F0 model
-            - aligner (i.e., any component used only to label the
-              database, such as a set of HMMs used for forced alignment)
-      * Voice conversion rules: You may use external data in any way
-        you wish
- * In essence, if there is any possibility that your system could sound
-   like a different speaker than the database speaker, then your system
-   should be classified as a voice conversion type of system.
- * If you are in any doubt about how to apply these rules, please contact
-   the organizers immediately.
-SYNTHESISING THE TEST EXAMPLES
-  * No manual intervention is allowed during synthesis. This includes,
-    but is not limited to:
-       * "Prompt sculpting"
-       * Altering existing entries in your lexicon (however, you are
-         allowed to add new words)
-       * Using different subsets of the database for different test
-         sentences or sentence types, unless this is a fully automatic
-         part of your system
-LISTENING TEST
-  * We are not releasing details of the listening test design at this
-    time, because you should not be tailoring your voice building to
-    it. It will be largely similar to previous challenges, and you will
-    need to synthesise several hundred sentences from text.
-  * For voice conversion-type systems, there will be an additional
-    component of the test, to judge how close the system sounds to the
-    database speaker. If the listening test design allows, we will
-    perform this test for all standard systems too.
-  * Any examples that you submit for evaluation may be retained for
-    future use. We hope to be able to distribute them in anonymised
-    form to all participants, or publically.
-PAPER
-  * Each participant will be expected to submit a four-page paper
-    describing their entry for review.
-  * One of the authors of each accepted paper should present it at a
-    satellite workshop of SSW6, on August 25, 2007 in Bonn, Germany
-  * In addition, each participant will be expected to complete a form
-    giving the general technical specification of their system, to
-    facilitate easy cross-system comparisons (e.g. is it unit
-    selection? does it predict prosody? etc. etc)
-HOW ARE THESE RULES ENFORCED?
-  * This is a challenge, which is designed to answer scientific
-    questions, and not a competition. Therefore, we rely on your
-    honesty in preparing your entry.

Anonymous

Search

Blizzard Challenge 2007 Rules: Difference between revisions

Namespaces

More

Page actions

Revision as of 10:10, 17 April 2007

Contents

DATABASE ACCESS

REGISTRATION FEE

EXPERT LISTENERS

BUILDING VOICES

Voices to be built

USE OF EXTERNAL DATA

SYNTHESISING THE TEST EXAMPLES

LISTENING TEST

PAPER

HOW ARE THESE RULES ENFORCED?

Navigation

Navigation

Special pages

Wiki tools

Wiki tools

Anonymous

Search

Blizzard Challenge 2007 Rules: Difference between revisions

Revision as of 10:10, 17 April 2007

DATABASE ACCESS

REGISTRATION FEE

EXPERT LISTENERS

BUILDING VOICES

Voices to be built

USE OF EXTERNAL DATA

SYNTHESISING THE TEST EXAMPLES

LISTENING TEST

PAPER

HOW ARE THESE RULES ENFORCED?

Navigation

Wiki tools

Page tools