Blizzard Challenge 2017: Difference between revisions

From SynSIG
No edit summary
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[image:Google_logo.JPG||frame|right| Apple and Google have generously provided financial support to the Blizzard Challenge 2017]]
== This Blizzard Challenge has finished ==
* Please do not try to register for this challenge - you are too late!
* However, the data are still available. We recommend using the 2018 data, which is a superset of the 2017 data. Do not email us, but simply go to https://synsig.org/index.php/Blizzard_Challenge#Tools_and_data
* The remainder of this page is left as a record of the Challenge
== Read these first ==
== Read these first ==


* First, please read the [[media: Blizzard2017-CallforParticipation.pdf | Call for Participation in the Blizzard Challenge 2017]] in PDF format (coming soon)
This year, there are two distinct parts to the Blizzard Challenge. Teams may enter either one, or both. The first part of the challenge follows the standard approach of previous years, and comprises the single hub task (2017-EH1) which requires teams to build an end-to-end text-to-speech system. The second part of the challenge is novel and is designed to be accessible to the wider machine learning community; it comprises two spoke tasks (2017-ES1 and 2017-ES2)
* Then, read and agree to the [[Blizzard Challenge 2017 Rules]] before participating
 
* First, please read the calls for participation:
** [[media: Blizzard2017-CallforParticipation.pdf | Call for Participation in the Blizzard Challenge 2017]]
** [[media: BlizzardMachineLearningChallenge2017-CallforParticipation.pdf‎ | Call for Participation in the Blizzard Machine Learning Challenge 2017]]
* Before participating, please read and agree to the rules for whichever part(s) of the challenge you are interested in
**  [[Blizzard Challenge 2017 Rules]]
**  [[Blizzard Machine Learning Challenge 2017 Rules]]
* You should only register for the challenge if you actually plan to submit an entry to the challenge
* You should only register for the challenge if you actually plan to submit an entry to the challenge
== New:  the Blizzard Machine Learning Challenge ==
Speech synthesis as a machine learning problem ---exploring new types of acoustic models
In the HMM era, by taking a unified view of both Automatic Speech Recognition (ASR) and Text-to-Speech (TTS), it was possible to develop various types of new ASR and TTS techniques, e.g., cross-lingual speaker adaptation, adaptive training for TTS, use of prosody in ASR, etc. We expect that by once again taking a unified view in the current DNN era, it will be possible to develop new types of acoustic modeling techniques that are useful for both ASR and TTS.
The series of Blizzard Challenges has helped us measure progress in TTS. But, to get competitive performance, a lot time has to be spent on skilled tasks such as updating the lexicon, removing inappropriate audio files, segmenting and aligning audio files, detecting alignment errors, etc. This may make the Blizzard Challenge unattractive to Machine Learning (ML) researchers from other fields.
We therefore propose a spin-off challenge that does not involve these speech-specific tasks, and allows participants to concentrate on the acoustic modeling task, framed as a straightforward ML problem, with a fixed data set.
The data that the organizers will provide is in the form of corresponding sequences of linguistic features, speech features and speech waveforms. Participants must train a model to predict speech features from linguistic features (or, to directly predict speech waveforms from linguistic features, as done in WaveNet), and then use that model to make predictions for a test set of previously-unseen linguistic features.
Evaluation will be done by the organisers, using a listening test, as in the main challenge.


== Registration ==
== Registration ==


Register by emailing blizzard@festvox.org. We need to know your team name, the name of the main contact person, your affiliation, and contact details including email address, postal address and phone number.
Register by emailing blizzard@festvox.org. We need to know your team name, the name of the main contact person, your affiliation, and contact details including email address, postal address and phone number. Please specify which task(s) you plan to submit entries for.


== Data download ==
== Data download ==
Line 13: Line 42:
The speech + text data comes from professional audiobooks produced by [http://www.usborne.com Usborne Publishing].
The speech + text data comes from professional audiobooks produced by [http://www.usborne.com Usborne Publishing].


* About 6.5 hours of British English speech data from a single female talker, which comprises 5 hours of speech already released for the 2016 challenge plus the audio from 6 additional books that were used for test material in 2016.
* 2017-EH1
* Processed versions, such as alignments, are shared via the [[Blizzard Challenge 2016-7 Git Repository]]  
** About 6.5 hours of British English speech data from a single female talker, which comprises 5 hours of speech already released for the 2016 challenge plus the audio from 6 additional books that were used for test material in 2016.
* Download links can be found via http://www.cstr.ed.ac.uk/projects/blizzard/2017/usborne_blizzard2017
** Processed versions, such as alignments, are shared via the [[Blizzard Challenge 2016-7 Git Repository]]  
* 2017-ES1 and 2017-ES2
** About 4 hours of British English speech data (waveforms) from a single female talker, which is a cleaned-up version of the data used in the 2016 challenge, along with linguistic features and speech features.
 
Download links (including the online license form) can be found via http://www.cstr.ed.ac.uk/projects/blizzard/2017/usborne_blizzard2017
 
MD5 checksums:
* blizzard_release_2017_v2.zip = 21c3f4ddcd724417632b96ef99deec20
* blizzard_machine_learning_challenge_2017-ES1.zip = d59998653f450d0bd9cd4084334f130e
* blizzard_machine_learning_challenge_2017-ES2.zip = 1e88ba7edb8af1f88710318ceee69075


=== Development tools ===
=== Development tools ===
Line 23: Line 61:
=== Questionnaire ===
=== Questionnaire ===


* Download the questionnaire, complete it, and return it by the deadline given in the timeline. http://data.cstr.ed.ac.uk/blizzard2017/system_questionnaire.txt
* Download the questionnaire, complete it, and return it at the same time as your synthetic speech: http://data.cstr.ed.ac.uk/blizzard2017/system_questionnaire.txt


== Mailing list ==
== Mailing list ==
Line 41: Line 79:
The timeline shown on this web page is the official one and supercedes those shown in announcements - it is subject to change, but we will try to follow it as closely as possible. Note that we will not consider any requests from participants to change the synthetic speech submission date or the paper submission date!
The timeline shown on this web page is the official one and supercedes those shown in announcements - it is subject to change, but we will try to follow it as closely as possible. Note that we will not consider any requests from participants to change the synthetic speech submission date or the paper submission date!


   Dec ??    2016  -  database released
   Dec 8    2016  -  2017-EH1 database released
   Mar ??   2017  -  test sentences released to participants
  Jan  ?    2016  -  2017-ES1 and 2017-ES2 database released
   Apr ??   2017  -  participants submit synthetic speech and questionnaire (by midnight UTC)
   Mar 29   2017  -  test sentences released to participants
   Apr 17   2017  -  participants submit their output, plus questionnaire (by midnight UTC)
   Apr      2017  -  evaluation systems go live
   Apr      2017  -  evaluation systems go live
   Jun      2017  -  end of evaluation period
   Jun      2017  -  end of evaluation period
   Jul 1   2017  -  release of results  
   Jun 15    2017  -  expected release of results for 2017-ES1 and 2017-ES2
   Jul ??   2017  -  deadline to submit workshop papers
  Jun 29    2017  - deadline to submit ASRU papers for 2017-ES1 and 2017-ES2
   Aug  1   2017  -  notification of acceptance
  Jun 29   2017  -  expected release of results for 2017-EH1
   Jul 3   2017  -  deadline to submit Blizzard Workshop papers for 2017-EH1
   Jul 31   2017  -  notification of paper acceptance for 2017-EH1
   Aug 20-24 2017  -  [http://www.interspeech2017.org Interspeech 2017, Stockholm, Sweden]
   Aug 20-24 2017  -  [http://www.interspeech2017.org Interspeech 2017, Stockholm, Sweden]
   Aug 25    2017  -  Blizzard Challenge workshop, Stockholm (date and location provisional)
   Aug 25    2017  -  Blizzard Challenge workshop, Stockholm - for task 2017-EH1
   Aug 28-  2017  -  [http://www.eusipco2017.org EUSIPCO 2017, Kos, Greece]
   Aug 28-  2017  -  [http://www.eusipco2017.org EUSIPCO 2017, Kos, Greece]
  Aug 31    2017  -  notification of paper acceptance for 2017-ES1 and 2017-ES2
  Dec 16-20 2017  -  [http://www.kecl.ntt.co.jp/icl/signal/asru2017/index.html ASRU 2017]
                      will include the workshop for tasks 2017-ES1 and 2017-ES2


== Workshop ==
== Workshop ==


Information on the workshop can be found here: [[Blizzard Challenge 2017 Workshop]]
Information on the two workshops can be found here:
* [[Blizzard Challenge 2017 Workshop]]
* [[Blizzard Machine Learning Challenge 2017 Workshop]]


== Any questions? ==
== Any questions? ==

Latest revision as of 14:10, 5 December 2018

Apple and Google have generously provided financial support to the Blizzard Challenge 2017

This Blizzard Challenge has finished

  • Please do not try to register for this challenge - you are too late!
  • However, the data are still available. We recommend using the 2018 data, which is a superset of the 2017 data. Do not email us, but simply go to https://synsig.org/index.php/Blizzard_Challenge#Tools_and_data
  • The remainder of this page is left as a record of the Challenge


Read these first

This year, there are two distinct parts to the Blizzard Challenge. Teams may enter either one, or both. The first part of the challenge follows the standard approach of previous years, and comprises the single hub task (2017-EH1) which requires teams to build an end-to-end text-to-speech system. The second part of the challenge is novel and is designed to be accessible to the wider machine learning community; it comprises two spoke tasks (2017-ES1 and 2017-ES2)

New: the Blizzard Machine Learning Challenge

Speech synthesis as a machine learning problem ---exploring new types of acoustic models

In the HMM era, by taking a unified view of both Automatic Speech Recognition (ASR) and Text-to-Speech (TTS), it was possible to develop various types of new ASR and TTS techniques, e.g., cross-lingual speaker adaptation, adaptive training for TTS, use of prosody in ASR, etc. We expect that by once again taking a unified view in the current DNN era, it will be possible to develop new types of acoustic modeling techniques that are useful for both ASR and TTS.

The series of Blizzard Challenges has helped us measure progress in TTS. But, to get competitive performance, a lot time has to be spent on skilled tasks such as updating the lexicon, removing inappropriate audio files, segmenting and aligning audio files, detecting alignment errors, etc. This may make the Blizzard Challenge unattractive to Machine Learning (ML) researchers from other fields.

We therefore propose a spin-off challenge that does not involve these speech-specific tasks, and allows participants to concentrate on the acoustic modeling task, framed as a straightforward ML problem, with a fixed data set.

The data that the organizers will provide is in the form of corresponding sequences of linguistic features, speech features and speech waveforms. Participants must train a model to predict speech features from linguistic features (or, to directly predict speech waveforms from linguistic features, as done in WaveNet), and then use that model to make predictions for a test set of previously-unseen linguistic features.

Evaluation will be done by the organisers, using a listening test, as in the main challenge.

Registration

Register by emailing blizzard@festvox.org. We need to know your team name, the name of the main contact person, your affiliation, and contact details including email address, postal address and phone number. Please specify which task(s) you plan to submit entries for.

Data download

The speech + text data comes from professional audiobooks produced by Usborne Publishing.

  • 2017-EH1
    • About 6.5 hours of British English speech data from a single female talker, which comprises 5 hours of speech already released for the 2016 challenge plus the audio from 6 additional books that were used for test material in 2016.
    • Processed versions, such as alignments, are shared via the Blizzard Challenge 2016-7 Git Repository
  • 2017-ES1 and 2017-ES2
    • About 4 hours of British English speech data (waveforms) from a single female talker, which is a cleaned-up version of the data used in the 2016 challenge, along with linguistic features and speech features.

Download links (including the online license form) can be found via http://www.cstr.ed.ac.uk/projects/blizzard/2017/usborne_blizzard2017

MD5 checksums:

  • blizzard_release_2017_v2.zip = 21c3f4ddcd724417632b96ef99deec20
  • blizzard_machine_learning_challenge_2017-ES1.zip = d59998653f450d0bd9cd4084334f130e
  • blizzard_machine_learning_challenge_2017-ES2.zip = 1e88ba7edb8af1f88710318ceee69075

Development tools

Questionnaire

Mailing list

There is a mailing list for discussion and announcements for the challenge:

 blizzard-discuss@festvox.org

Participants must join the list by sending a message to majordomo@festvox.org with the following line in the body of the message

 subscribe blizzard-discuss

Once you are a member you will be able to mail messages to blizzard-discuss@festvox.org

Timeline

The timeline shown on this web page is the official one and supercedes those shown in announcements - it is subject to change, but we will try to follow it as closely as possible. Note that we will not consider any requests from participants to change the synthetic speech submission date or the paper submission date!

  Dec  8    2016  -  2017-EH1 database released
  Jan  ?    2016  -  2017-ES1 and 2017-ES2 database released
  Mar 29    2017  -  test sentences released to participants
  Apr 17    2017  -  participants submit their output, plus questionnaire (by midnight UTC)
  Apr       2017  -  evaluation systems go live
  Jun       2017  -  end of evaluation period
  Jun 15    2017  -  expected release of results for 2017-ES1 and 2017-ES2
  Jun 29    2017  -  deadline to submit ASRU papers for 2017-ES1 and 2017-ES2
  Jun 29    2017  -  expected release of results for 2017-EH1
  Jul  3    2017  -  deadline to submit Blizzard Workshop papers for 2017-EH1
  Jul 31    2017  -  notification of paper acceptance for 2017-EH1
  Aug 20-24 2017  -  Interspeech 2017, Stockholm, Sweden
  Aug 25    2017  -  Blizzard Challenge workshop, Stockholm - for task 2017-EH1
  Aug 28-   2017  -  EUSIPCO 2017, Kos, Greece
  Aug 31    2017  -  notification of paper acceptance for 2017-ES1 and 2017-ES2
  Dec 16-20 2017  -  ASRU 2017
                      will include the workshop for tasks 2017-ES1 and 2017-ES2

Workshop

Information on the two workshops can be found here:

Any questions?

  • Please contact blizzard@festvox.org if you have any questions

Previous challenges