Education: Difference between revisions

From SynSIG
mNo edit summary
No edit summary
 
(31 intermediate revisions by 4 users not shown)
Line 1: Line 1:
"'''During the initial meetings of the ESCA (now [[ISCA]]) speech synthesis SIG ([[SynSIG]]) at the ICSLP conference in Sydney 1998 many of us felt that we should devote some of our efforts to improve our teaching activities at universities and other academic institutions. Although everybody has his own way of teaching we can improve our courses by sharing experience and already prepared course material. This web page is devoted to this task.'''"
{{webmaster}}
(text by ''Gregor Möhler'')


== SPCC ==


= Organizations involved in teaching speech synthesis =
The Speech Processing Courses in Crete (SPCC) are targeting to teach graduate students and researchers the latest advancements of speech processing covering theory, hands on, and establishing contacts between the academics and industry. The school will provide the chance to students and professionals to meet world leaders in speech technology, exchanging ideas, sharing experiences and vision.
The Summer School is organized by the University of Crete, Greece.


{| style="background: transparent; border:none;" width=100%
=== 2020 ===
| width=50% | <span style="font-size: 135%; border: ">Organization </span>
* webpage: http://spcc.csd.uoc.gr/
| width=50% | <span style="font-size: 135%; border: ">Teaching staff (in speech synthesis)</span>
* For 2020, the school topic is '''Neural approaches for speech enhancement, synthesis, and coding''', the topic includes:
|}
** Basic components of neural vocoders: Wavenet, Parallel Wavenet, and WaveRNN
** Deep generative models for speech compression
** Neural auto-regressive, source-filter and glottal vocoders for speech and music signals
** Neural vocoders for coding, synthesis, and enhancement


{|style="background-color:#CCFFCC; border:1px dashed #a0a0a0;" width=100%
|width=50%| ETH Zürich, Switzerland, [http://www.tik.ee.ethz.ch/~spr/ TIK]
| Beat Pfister
|-
| KTH Stockholm, Sweden, [http://www.speech.kth.se/ Department of Speech Music and Hearing]
| Inger Karlsson
|-
| Oregon Graduate Institute, USA, [http://cslu.cse.ogi.edu/tts/ CSLU (Speech Synthesis Research Group)]
| Jan Van Santen, Ricahrd Sproat
|-
| University of Bonn, Germany, [http://www.ikp.uni-bonn.de/ IKP]
| Wolfgang Hess
|-
| Universtity of Cottbus, Germany, [http://www.kt.tu-cottbus.de/ Lehrstuhl Kommunikationstechnik]
| Klaus Fellbaum
|-
| University of Dresden, Germany, [http://www.ias.et.tu-dresden.de/ IAS]
| Rüdiger Hoffmann
|-
| University of Edinburgh, U.K., [http://www.cstr.ed.ac.uk/ CSTR]
| Paul Taylor
|-
| University of Grenoble, France, [http://www.icp.grenet.fr/ ICP]
| Gerard Bailly
|-
| Universidade Estadual de Campinas (Unicamp), Brasil, [http://www.iel.unicamp.br/portal Instituto de Estudos da Linguagem]
| Plinio Almeida Barbosa
|-
| Faculté Polytechnique de Mons, Belgium, [http://tcts.fpms.ac.be/voqual/index.php/ TTS research group]
| Thierry Dutoit
|-
| University Paris XI, France, [http://www.limsi.fr/ LIMSI]
| Christophe d'Alessandro
|-
| University of Stuttgart, Germany, [http://www.ims.uni-stuttgart.de/phonetik/ IMS] (Chair of Experimental Phonetics)
| Gregor Möhler, Bernd Möbius
|}


=== 2019 ===
* webpage: http://spcc.csd.uoc.gr/2019/
* For 2019, the school topic is '''Conversational Speech Synthesis: from design to evaluation''', the topic includes:
** Introduction to modern statistical dialogue systems and requests for a conversational speech synthesis system
** Modern Acoustic Modelling Approaches (WaveNet, Tacotron)
** Advanced flexibility for effective conversational TTS: Style token and Voice Conversion
** Evaluation of conversational and multimodal TTS


= Courses in speech synthesis =
== Introductory courses ==
* [http://www.ims.uni-stuttgart.de/~moehler/ISCA-SynSIG/courses/ims_synthese1.html Speech synthesis I].
** Authors : Gregor Möhler, Bernd Möbius.
** Language : german.
** Material : slides provided
* [http://tcts.fpms.ac.be/cours/1005-07-08/speech/ An introductory course on speech processing].
**Author : Thierry Dutoit.
** Languages : french and english.
** Material : slides provided


== Specific topics ==
=== 2018 ===
* [http://www.ims.uni-stuttgart.de/~moehler/ISCA-SynSIG/courses/lafape_speechsci.html Speech Science and Technology].
* webpage: http://spcc.csd.uoc.gr/SPCC2018/
** Author : Plínio A. Barbosa, Dr.
* For 2018, the school topic is '''Towards Flexible and Intelligible End-to-End Speech synthesis systems''', The topic includes:
** Language : Portuguese (Brasil).
** Modern Acoustic Modelling Approaches (WaveNet, Tacotron)
* [http://www.ims.uni-stuttgart.de/~moehler/ISCA-SynSIG/courses/ims_synthese2.html Speech synthesis II].
** Contemporary Unit Selection: The art of creating thousands of voices in real products
** Autors : Bernd Möbius, Gregor Möhler.
** Advanced Voice Conversion using Deep Learning (Wavenet, GAN, etc)
** Language : german.
** Intelligibility and Cognitive Effort in Speech Synthesis
** Material : slides provided


= Tutorials =
=== 2017 ===
* [http://en.wikipedia.org/wiki/Speech_synthesis Speech Synthesis on Wikipedia]
* webpage: http://spcc.csd.uoc.gr/SPCC2017/
* [http://www.ias.et.tu-dresden.de/sprache/lehre/multimedia/tutorial/rahmen.htm Demonstration of the TTS-System, Selection of the Speech Units], University of Dresden.
* the school topic is: '''Towards Intelligible and Conversational Speech Synthesis Engines''', The topic includes:
* [http://www.kt.tu-cottbus.de/speech-analysis/ Human Speech Production Based on a Linear Predictive Vocoder], University of Cottbus.
** Modern Acoustic Modelling Approaches (DNN/LSTM, WaveNet)
** Text Normalization and Linguistic Analysis
** Prosody
** Advanced Vocoders and Modifications (Voice Conversion)
** Intelligibility and Cognitive Effort in Speech Synthesis


= Historical images =
=== 2016 ===
Take a look at our gallery of [[historical images]].
* webpage: http://spcc.csd.uoc.gr/SPCC2016/
* the school topic is: '''Advancements in Modern Speech Synthesis Engines''', The topic includes:
** Advanced Speech Signal Modelling and Modifications
** Current Acoustic Modelling Approaches
** Challenges in Fornt-End Processing
** Listening Context Aware Speech Synthesis Systems
** Text Normalization and Linguistic Analysis
 
Lecture slides used for SPCC 2016 are available online
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_Agiomyrgiannakis.pdf Dr. Yannis Agiomyrgiannakis, Google UK : Vocaine the Vocoder]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_StylianouI.pdf Prof. Yannis Stylianou, University of Crete : Adaptive Sinusoidal Models]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_StylianouII.pdf Prof. Yannis Stylianou, University of Crete : Sinusoidal Modeling]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_KingI.pdf Prof. Simon King, University of Edinburgh : Text Processing for Speech Synthesis]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_Raptis.pdf Dr. Spyros Raptis, ILSP Athena : Unit-Selection-based Text-To-Speech Synthesis]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_KingII.pdf Prof. Simon King, University of Edinburgh : Speech Synthesis with Hidden Markov Models]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_Tsiaras.pdf Dr. Vassilis Tsiaras, University of Crete : Linear Dynamical Models in Speech Synthesis]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_AgiomyrgiannakisII.pdf Dr. Yannis Agiomyrgiannakis, Google UK : Vocoder-side Voice Morphing for TTS]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_Zen.pdf Dr. Heiga Zen, Google UK : Artificial Neural Network based Speech Synthesis]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_Akamine.pdf Dr. Masami Akamine, Toshiba Japan : Closed Loop Diphone-based Text-To-Speech Synthesis]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_StylianouIII.pdf Prof. Yannis Stylianou, University of Crete : Speech Intelligibility]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_KingIII.pdf Prof. Simon King, University of Edinburgh : Evaluating Speech Synthesis]
*[http://spcc.csd.uoc.gr/_docs/Lectures2016/SPCC16_KingIV.pdf Prof. Simon King, University of Edinburgh : Hybrid Speech Synthesis]
 
=== 2015 ===
* webpage: http://spcc.csd.uoc.gr/SPCC2015/
* the school topic is: '''From Diphones to Modern Speech Synthesis Engines''', The topic includes:
** Speech Signal Modelling and Modifications
** Acoustic Modelling: HMM, LDM, DNN
** Approaches: Diphones, Unit Selection, Statistical, Hybrid
** Listening Context Aware speech synthesis systems
 
=== 2014 ===
* webpage: http://spcc.csd.uoc.gr/SPCC2014/
 
Lecture slides used for SPCC 2014 are available online
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-Introduction_Stylianou.pdf Prof. Yannis Stylianou, University of Crete, Greece : Welcome and Introduction]
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-Modeling_Stylianou.pdf Prof. Yannis Stylianou, University of Crete, Greece : Speech Production and Modeling]
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-Pathology_Stylianou.pdf Prof. Yannis Stylianou, University of Crete, Greece : Voice Pathology]
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-HearingAids_Martin.pdf Prof. Rainer Martin, Intitute of Communication Acoustics, Germany : Signal Processing for Hearing Aids]
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-ASR_Potamianos.pdf Prof. Gerasimos Potamianos, University of Thessaly, Greece : Automatic Speech Recognition]
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-SpokenDialogueSystems_Gasic_I.pptx Dr Milica Gasic, University of Cambridge, U.K. : Spoken Dialogue Systems I]
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-SpokenDialogueSystems_Gasic_II.pptx Dr Milica Gasic, University of Cambridge, U.K. : Spoken Dialogue Systems II]
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-DialogueModeling_Pietquin.pdf Dr Olivier Pietquin, Univeristy of Lille 1, France : Statistical Dialogue Modeling]
*[http://spcc.csd.uoc.gr/SPCC2014/material/SPCC14-DSR_Katsamanis.pdf Dr Nassos Katsamanis, NTUA, Athens : Distant Speech Recognition]
 
 
== Keynotes & tutorials at International conferences and workshops ==
 
* [http://www.eusipco2017.org/wp-content/uploads/2017/09/SimonKing_Keynote-talk_EUSIPCO_2017.pdf  Simon King, Speech synthesis: where did the signal processing go? @ EUSIPCO2017]
* [http://www.speech.zone/courses/one-off/merlin-interspeech2017/ Simon King, Oliver Watts, Srikanth Ronanki, Zhizheng Wu, Felipe Espic, Deep Learning for Text-to-Speech Synthesis, using the Merlin toolkit @ Interspeech 2017]
* [https://www.superlectures.com/interspeech2016/isca-medalist-for-leadership-and-extensive-contributions-to-speech-and-language-processing John Makhoul: A 50-year retrospective on speech and languag processing @ Interspeech 2016]
* [https://www.superlectures.com/odyssey2016/voice-conversion-and-spoofing-countermeasures-for-speaker-verification Haizhou Li, Voice conversion and spoofing countermeasures for speaker verification @ Odyssey 2016]
* [https://www.superlectures.com/odyssey2016/understanding-individual-level-speech-variability-from-novel-speech-production-data-to-robust-speaker-recognition Shri Narayanan, Understanding individual-level speech variability: From novel speech production data to robust speaker recognition @ Odyssey 2016]
* [https://www.superlectures.com/iscslp2014/tutorial-4-deep-learning-for-speech-generation-and-synthesis Yao Qian and Frank K. Soong, Deep Learning for Speech Generation and Synthesis @ ISCSLP 2014]
* [https://www.superlectures.com/odyssey2014/speaking-in-adverse-conditions-from-behavioural-observations-to-intelligibility-enhancing-speech-modifications Martin Cooke, Speaking in adverse conditions: from behavioural observations to intelligibility-enhancing speech modifications @ Odyssey 2014]
* [https://www.superlectures.com/asru2011/speech-synthesis-as-a-statistical-machine-learning-problem Keiichi Tokuda, Speech Synthesis as A Statistical Machine Learning Problem @ ASRU 2011]
* [https://www.sp.nitech.ac.jp/~tokuda/tokuda_interspeech09_tutorial.pdf Keiichi Tokuda, Heiga Zen, Fundamentals and recent advances in HMM-based speech synthesis @ Interspeech 2009]
 
== Podcasts ==
Simon King,  Using speech synthesis to give everyone their own voice, Inaugural lecture, University of Edinburgh
https://itunes.apple.com/jp/podcast/prof-simon-king-using-speech-synthesis-to-give-everyone/id738501766?i=1000170300147&mt=2
 
Keiichi Tokuda, Human-like singing and talking machine, Human Language Technology Lecture Series, MIT 
https://itunes.apple.com/jp/podcast/human-like-singing-and-talking-machines/id787393959?i=1000344067611&mt=2
 
== Youtube videos ==
=== SynSIG ===
 
SynSIG has a dedicated channel: https://www.youtube.com/channel/UCiNEMZxIjvlsBKlBdAqT-VQ
 
=== Others ===


= Educational Software =
* Kim Silverman - Speech Synthesis https://www.youtube.com/watch?v=7mjh0PSUv0M
== KPE ==
* Prof. Simon King - Using Speech Synthesis to give Everyone their own Voice https://www.youtube.com/watch?v=xzL-pxcpo-E
* The KPE80 program provides a graphical interface for the implementation of the Klatt 1980 [[formant synthesiser]]. The interface allows users to display and edit Klatt parameters using a graphical display which includes the time-amplitude waveform of both the original speech and its synthetic copy, and some signal analysis facilities.
* Zhen-Hua Ling - HMM-based Speech Synthesis: Fundamentals and Its Recent Advances https://www.youtube.com/watch?v=MPdOp72bOCA
* [http://www.enhance.phon.ucl.ac.uk/public/examples/copysyn/kpe/kpe.htm KPE] and many other [http://www.enhance.phon.ucl.ac.uk/ University College London softwares]


== MBROLA ==
== Other kind of teaching materials ==
* The aim of the MBROLA project, initiated by the TCTS Lab of the Faculté Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many languages as possible, and provide them free for non-commercial applications. The ultimate goal is to boost academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges taken up by [[Text-To-Speech synthesizers]] for the years to come.
* [http://speech.zone Simon King's course website]
* [http://tcts.fpms.ac.be/synthesis/mbrola.html MBROLA]


== Praat ==
* [http://tcts.fpms.ac.be/cours/1005-07-08/speech/ An introductory course on speech processing] (in French and English) by Thierry Dutoit, , Faculté Polytechnique de Mons, Belgium.
* A system for doing phonetics by computer. The computer program Praat is a research, publication, and productivity tool for phoneticians. With it, you can analyse, synthesize, and manipulate speech, and create high-quality pictures for your articles and thesis.
* [http://fonsg3.let.uva.nl/praat/manual/Praat_program.html Praat]


== CSLU Toolkit ==
* [http://tcts.fpms.ac.be/projects/ttsbox/ TTSBOX, A Matlab tutorial toolbox on corpus-based Text-to-Speech synthesis], by Thierry Dutoit, Faculté Polytechnique de Mons, Belgium.
* The CSLU Toolkit was created to provide the basic framework and tools for people to build, investigate and use interactive language systems. These systems incorporate leading-edge speech recognition, natural language understanding, speech synthesis and facial animation.
* [http://cslu.cse.ogi.edu/toolkit/ CSLU Toolkit]


== TrackDraw ==
== Historical images ==
* TrackDraw is a graphical interface for controlling the parameters of a speech synthesizer.
Take a look at our gallery of [[historical images]].
* [http://www.utdallas.edu/~assmann/TRACKDRAW/trackdraw.html TrackDraw]
== Wavesurfer ==
* Wavesurfer is a tool for doing speech analysis. The analysis features include formants and pitch extraction and real time spectrograms. The Wavesurfer tool built on top of the [http://www.speech.kth.se/snack/ Snack] speech visualization module, is highly modular and extensible at several levels.
* [http://www.speech.kth.se/wavesurfer/ WaveSurfer]


= External Links =
== External Links ==
* See our [[Pointers|external references page]]
* [http://www.cs.indiana.edu/rhythmsp/ASA/Contents.html Dennis Klatt's History of Speech Synthesis],
* [http://www.cs.indiana.edu/rhythmsp/ASA/Contents.html Dennis Klatt's History of Speech Synthesis],
* [http://www.ims.uni-stuttgart.de/~moehler/synthspeech/examples.html Examples of Synthesized Speech],
* [http://www.ims.uni-stuttgart.de/~moehler/synthspeech/examples.html Examples of Synthesized Speech],
* [http://mitpress.mit.edu/e-books/Hal/chap6/six1.html "The Talking Computer": Text to Speech Synthesis] (J.P. Olive in Hal's Legacy, MITPress),
* [http://mitpress.mit.edu/e-books/Hal/chap6/six1.html "The Talking Computer": Text to Speech Synthesis] (J.P. Olive in Hal's Legacy, MITPress),
* [http://ttssamples.syntheticspeech.de/ German-TTS] and emotional synthesis ([http://emosyn.syntheticspeech.de/] and [http://emosamples.syntheticspeech.de/]) demo by [http://felix.syntheticspeech.de/ Felix Burkhardt].
* [http://ttssamples.syntheticspeech.de/ German-TTS] and emotional synthesis ([http://emosyn.syntheticspeech.de/] and [http://emosamples.syntheticspeech.de/]) demo by [http://felix.syntheticspeech.de/ Felix Burkhardt].

Latest revision as of 14:23, 14 January 2021

SPCC

The Speech Processing Courses in Crete (SPCC) are targeting to teach graduate students and researchers the latest advancements of speech processing covering theory, hands on, and establishing contacts between the academics and industry. The school will provide the chance to students and professionals to meet world leaders in speech technology, exchanging ideas, sharing experiences and vision. The Summer School is organized by the University of Crete, Greece.

2020

  • webpage: http://spcc.csd.uoc.gr/
  • For 2020, the school topic is Neural approaches for speech enhancement, synthesis, and coding, the topic includes:
    • Basic components of neural vocoders: Wavenet, Parallel Wavenet, and WaveRNN
    • Deep generative models for speech compression
    • Neural auto-regressive, source-filter and glottal vocoders for speech and music signals
    • Neural vocoders for coding, synthesis, and enhancement


2019

  • webpage: http://spcc.csd.uoc.gr/2019/
  • For 2019, the school topic is Conversational Speech Synthesis: from design to evaluation, the topic includes:
    • Introduction to modern statistical dialogue systems and requests for a conversational speech synthesis system
    • Modern Acoustic Modelling Approaches (WaveNet, Tacotron)
    • Advanced flexibility for effective conversational TTS: Style token and Voice Conversion
    • Evaluation of conversational and multimodal TTS


2018

  • webpage: http://spcc.csd.uoc.gr/SPCC2018/
  • For 2018, the school topic is Towards Flexible and Intelligible End-to-End Speech synthesis systems, The topic includes:
    • Modern Acoustic Modelling Approaches (WaveNet, Tacotron)
    • Contemporary Unit Selection: The art of creating thousands of voices in real products
    • Advanced Voice Conversion using Deep Learning (Wavenet, GAN, etc)
    • Intelligibility and Cognitive Effort in Speech Synthesis

2017

  • webpage: http://spcc.csd.uoc.gr/SPCC2017/
  • the school topic is: Towards Intelligible and Conversational Speech Synthesis Engines, The topic includes:
    • Modern Acoustic Modelling Approaches (DNN/LSTM, WaveNet)
    • Text Normalization and Linguistic Analysis
    • Prosody
    • Advanced Vocoders and Modifications (Voice Conversion)
    • Intelligibility and Cognitive Effort in Speech Synthesis

2016

  • webpage: http://spcc.csd.uoc.gr/SPCC2016/
  • the school topic is: Advancements in Modern Speech Synthesis Engines, The topic includes:
    • Advanced Speech Signal Modelling and Modifications
    • Current Acoustic Modelling Approaches
    • Challenges in Fornt-End Processing
    • Listening Context Aware Speech Synthesis Systems
    • Text Normalization and Linguistic Analysis

Lecture slides used for SPCC 2016 are available online

2015

  • webpage: http://spcc.csd.uoc.gr/SPCC2015/
  • the school topic is: From Diphones to Modern Speech Synthesis Engines, The topic includes:
    • Speech Signal Modelling and Modifications
    • Acoustic Modelling: HMM, LDM, DNN
    • Approaches: Diphones, Unit Selection, Statistical, Hybrid
    • Listening Context Aware speech synthesis systems

2014

Lecture slides used for SPCC 2014 are available online


Keynotes & tutorials at International conferences and workshops

Podcasts

Simon King, Using speech synthesis to give everyone their own voice, Inaugural lecture, University of Edinburgh https://itunes.apple.com/jp/podcast/prof-simon-king-using-speech-synthesis-to-give-everyone/id738501766?i=1000170300147&mt=2

Keiichi Tokuda, Human-like singing and talking machine, Human Language Technology Lecture Series, MIT https://itunes.apple.com/jp/podcast/human-like-singing-and-talking-machines/id787393959?i=1000344067611&mt=2

Youtube videos

SynSIG

SynSIG has a dedicated channel: https://www.youtube.com/channel/UCiNEMZxIjvlsBKlBdAqT-VQ

Others

Other kind of teaching materials

Historical images

Take a look at our gallery of historical images.

External Links