Blizzard Challenge organisation: Difference between revisions

From SynSIG
Line 93: Line 93:
* Write the call for participation, probably based on one from a previous year. Create a PDF version.
* Write the call for participation, probably based on one from a previous year. Create a PDF version.
* Write the rules for the challenge and publish on the SynSIG website.
* Write the rules for the challenge and publish on the SynSIG website.
* Decide on the entry fee and method of payment. The University of Edinburgh is able to accept online credit card payments, so this should be your default choice. In this case, agree with Simon King how the received funds will be used (see also [[#Listening_Test|Listening Test]])
* Decide on the entry fee and method of payment. The University of Edinburgh is able to accept online credit card payments, so this should be your default choice. In this case, agree with Simon King how the received funds will be used (see also [[#Listening_test|Listening test]])
* Publish the call on the SynSIG website and send to appropriate mailing lists: blizzard-discuss, synsig, Festival, HTS.
* Publish the call on the SynSIG website and send to appropriate mailing lists: blizzard-discuss, synsig, Festival, HTS.
* Receive team registrations via the blizzard email alias
* Receive team registrations via the blizzard email alias

Revision as of 15:06, 3 January 2019

Who organises the Blizzard Challenge?

The challenge was conceived by Alan Black and Keiichi Tokuda, as described in their Interspeech 2005 paper. In 2007, they were joined by Simon King. Together these three people form the steering committee with overall control of the challenge.


Each annual challenge has an organiser who is responsible for the main tasks of data preparation and release, planning and executing the evaluation, and arranging the workshop. In the first years of the challenge, this organiser was one of the steering committee.

Steering Committee

Annual organisers

Who provides the data?

The data are donated by external organisations. Where possible, we ask for the data to be made available for longer term research use either only to the participating teams, or more generally.

Data providers for each challenge

How to get involved in the organisation of the challenge

Providing data

We are always looking for new an interesting data. The amount of data used in the Challenge has varied widely over the years, from 1 hour to 100s of hours. If external data is allowed, or a multi-speaker corpus is made available, then the amount data from the target speaker(s) can be small. However, for a conventional single-speaker corpus, most participants would probably like a minimum of 5 hours (e.g., Blizzard Challenges 2017 and 2018).

The data can be raw and unprocessed (e.g., complete audiobooks) or partially segmented by the provider. There must normally be a transcript, but this may not be aligned with the data, and small mismatches can be handled by most participants. We have a system for sharing the effort of preparing data across the participating teams, with a Git repository of cleaned transcripts, alignments, etc. This system works best for data that is used for two or three consecutive challenges.

Specifying the task and designing the evaluation

There is a default design of task and evaluation, and we have not deviated from this much over the years. But we are always looking for constructive criticism of all aspects, including the task for participants, the materials to be synthesised, the listening test, and the statistical analysis.

Organisers' checklist

Here are the key tasks and timing for organising a Blizzard Challenge. Please can organisers update this after each Challenge to make it as useful as possible to future organisers.

Preliminaries

  • Form an organising committee for the current Challenge. You should include at least one member of the permanent steering committee
  • Check that you and all your committee are on the blizzard-discuss mailing list, so you can post to it
  • Get yourself added to the blizzard@festvox.org alias, managed by Alan Black
  • Get an account on the SynSIG website, which is hosted by Thierry Dutoit's group at U Mons. Simon King, or any member of the SynSIG steering committee can request this for you.

Data

  • Prepare the data for distribution
  • Obtain a clear written statement from the owner of the data about usage rights, and store this in a safe place.
  • Make an initial guess at materials for the listening test, and hold out sufficient natural data from the distribution. In general, plan for 3 consecutive years using the same data, so hold out at least enough data for three listening tests.
  • Choose a license (a good default starting point is a recent license). Make sure the data owner is happy with the license. If you are able to release the data under liberal terms (e.g., allowing commercial use) get this agreed now, and don't leave it for later.
  • We restrict access to the data to registered teams. But you still need to determine now whether you will be able to make the data more widely available after completion of the Challenge. Make sure the license is consistent with what you decide.
  • Decide how to distribute the data. By default, it will be hosted at the University of Edinburgh and Simon King will take care of the online license form, and issuing of passwords to registered teams. If you are able to make the data more generally available after the Challenge is finished, then be sure to host it in a suitable location.

Website

  • Create pages for the Challenge, the rules, and the workshop, on the SynSIG website. It's easiest to copy pages from previous years, then edit.
  • Decide on a timeline, and publish this on the website

Participating teams

  • Write the call for participation, probably based on one from a previous year. Create a PDF version.
  • Write the rules for the challenge and publish on the SynSIG website.
  • Decide on the entry fee and method of payment. The University of Edinburgh is able to accept online credit card payments, so this should be your default choice. In this case, agree with Simon King how the received funds will be used (see also Listening test)
  • Publish the call on the SynSIG website and send to appropriate mailing lists: blizzard-discuss, synsig, Festival, HTS.
  • Receive team registrations via the blizzard email alias
  • Create a shared Google spreadsheet to store team information (you could ask Simon King to create this from a previous year's version, as a template)
  • Issue data download passwords to teams who have both registered and completed the data license (see also Data)

Listening test

  • Depending on the language, the listening test facility at the University of Edinburgh is a good choice for running around 100-150 "gold standard" listeners under ideal conditions. This will cost around GBP 1200 for listener payments, plus around GBP 1500 for an assistant to recruit and run the listeners.
  • Decide whether you also want to use paid crowd-sourcing to obtain further listeners.
  • Ask Simon King to approach sponsors to cover some of the costs of the listening test, including payments to listeners and to assistants running the test. Agree with him how funding will be used (e.g., do you need money to pay your own assistants or to pay for crowd-sourcing?). Normally, sponsors make a donation to the University of Edinburgh, and funds are then distributed as needed.

Workshop

Simon King is willing to do most of this part of the organisation, so try to delegate this task to him!

  • Choose a date and location for the workshop. The default is a satellite of Interspeech or SSW.
  • Find a sponsor for the workshop to cover: venue cost, event catering and evening drinks.