Software: Difference between revisions
No edit summary |
No edit summary |
||
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Full system == | == Full system == | ||
=== Festival === | === Multilingual === | ||
==== Festival ==== | |||
Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Tools and documentation for build new voices are available through Carnegie Mellon's FestVox project | Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Tools and documentation for build new voices are available through Carnegie Mellon's FestVox project | ||
* Last update: 2015/01/06 | * Last update: 2015/01/06 | ||
* | * Link: http://www.cstr.ed.ac.uk/downloads/festival/2.4/ | ||
* Reference: | * Reference: | ||
@article{black2001festival, | @article{black2001festival, | ||
title = {The festival speech synthesis system, version 1.4.2}, | title = {The festival speech synthesis system, version 1.4.2}, | ||
author = {Black, Alan and Taylor, Paul and Caley, Richard and | author = {Black, Alan and Taylor, Paul and Caley, Richard and | ||
Clark, Rob and Richmond, Korin and King, Simon and | Clark, Rob and Richmond, Korin and King, Simon and | ||
Strom, Volker and Zen, Heiga}, | Strom, Volker and Zen, Heiga}, | ||
journal = {Unpublished document available via | journal = {Unpublished document available via http://www.cstr.ed.ac.uk/projects/festival.html}, | ||
year = {2001} | |||
year | |||
} | } | ||
=== FreeTTS === | ==== FreeTTS ==== | ||
FreeTTS is a speech synthesis system written entirely in the JavaTM programming language. It is | FreeTTS is a speech synthesis system written entirely in the JavaTM programming language. It is | ||
based upon Flite: a small run-time speech synthesis engine developed at Carnegie Mellon | based upon Flite: a small run-time speech synthesis engine developed at Carnegie Mellon | ||
Line 27: | Line 27: | ||
* Last update: 2009-03-09 | * Last update: 2009-03-09 | ||
* | * Link: http://freetts.sourceforge.net/docs/index.php | ||
* Reference: | * Reference: | ||
@misc{walker2010freetts, | @misc{walker2010freetts, | ||
title = {Freetts 1.2: A speech synthesizer written entirely | title = {Freetts 1.2: A speech synthesizer written entirely in the Java programming language}, | ||
author = {Walker, Willie and Lamere, Paul and Kwok, Philip}, | author = {Walker, Willie and Lamere, Paul and Kwok, Philip}, | ||
year | year = {2010} | ||
} | } | ||
=== MBROLA === | ==== MBROLA ==== | ||
The aim of the MBROLA project, initiated by the TCTS Lab of the Faculté Polytechnique de Mons | The aim of the MBROLA project, initiated by the TCTS Lab of the Faculté Polytechnique de Mons | ||
(Belgium), is to obtain a set of diphone-based speech synthesizers for as many languages as | (Belgium), is to obtain a set of diphone-based speech synthesizers for as many languages as | ||
Line 45: | Line 44: | ||
* Last update: | * Last update: | ||
* | * Link: http://tcts.fpms.ac.be/synthesis/mbrola.html | ||
* Reference: | * Reference: | ||
@inproceedings{dutoit1996mbrola, | @inproceedings{dutoit1996mbrola, | ||
title = {The MBROLA project: Towards a set of high quality | title = {The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes}, | ||
author = {Dutoit, Thierry and Pagel, Vincent and Pierret, | author = {Dutoit, Thierry and Pagel, Vincent and Pierret, | ||
Nicolas and Bataille, Fran{\c{c}}ois and Van der | Nicolas and Bataille, Fran{\c{c}}ois and Van der | ||
Vrecken, Olivier}, | Vrecken, Olivier}, | ||
booktitle = {Spoken Language | booktitle = {Proceedings of the Internal Conference of Spoken Language Processing} | ||
volume = {3}, | volume = {3}, | ||
pages = {1393--1396}, | pages = {1393--1396}, | ||
year | year = {1996}, | ||
organization = {IEEE} | organization = {IEEE} | ||
} | } | ||
=== MARY === | ==== MARY ==== | ||
MARY is a multi-lingual (German, English, Tibetan) and multi-platform (Windows, Linux, MacOs X and | MARY is a multi-lingual (German, English, Tibetan) and multi-platform (Windows, Linux, MacOs X and | ||
Solaris) speech synthesis system. It comes with an easy-to-use installer - no technical expertise | Solaris) speech synthesis system. It comes with an easy-to-use installer - no technical expertise | ||
Line 72: | Line 68: | ||
* Last update: 2017/09/26 | * Last update: 2017/09/26 | ||
* | * Link: http://mary.dfki.de/ | ||
* Reference: | * Reference: | ||
@article{schroder2003german, | @article{schroder2003german, | ||
title = {The German text-to-speech synthesis system MARY: A | title = {The German text-to-speech synthesis system MARY: A tool for research, development and teaching}, | ||
author = {Schr{"o}der, Marc and Trouvain, J{"u}rgen}, | author = {Schr{"o}der, Marc and Trouvain, J{"u}rgen}, | ||
journal = {International Journal of Speech Technology}, | journal = {International Journal of Speech Technology}, | ||
volume = {6}, | volume = {6}, | ||
number = {4}, | number = {4}, | ||
pages = {365--377}, | pages = {365--377}, | ||
year | year = {2003}, | ||
publisher = {Springer} | publisher = {Springer} | ||
} | } | ||
==== AhoTTS ==== | |||
Text-to-Speech conversor for Basque, Spanish, Catalan, Galician and English. | |||
It includes linguistic processing and built voices for all the languages aforementioned. Its acoustic engine is based on hts<sub>engine</sub> and it uses a high quality vocoder called AhoCoder. | |||
* Last update: 2015/07/15 | |||
* Link: https://sourceforge.net/projects/ahottsmultiling/ | |||
=== Language specific === | |||
==== AHOTTS (Basque & spanish) ==== | |||
Text-to-Speech conversor for Basque and Spanish. It includes | |||
linguistic processing and built voices for the languages | |||
aforementioned. Its acoustic engine is based on hts<sub>engine</sub> and it uses | |||
a high quality vocoder called AhoCoder. | |||
* Last update: 2016/04/07 | |||
* Link: https://sourceforge.net/projects/ahotts | |||
* Link2: https://sourceforge.net/projects/ahottsiparrahotsa/ (for Lapurdian dialect of Basque.) | |||
* Reference: | |||
@inproceedings{hernaez2001description, | |||
title = {Description of the ahotts system for the basque language}, | |||
author = {Hernaez, Inma and Navas, Eva and Murugarren, Juan Luis and Etxebarria, Borja}, | |||
booktitle = {Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis}, | |||
year = {2001} | |||
} | |||
==== RHVoice (Russian) ==== | |||
RHVoice is a free and open source speech synthesizer. | |||
* Last update: 2017/09/24 | |||
* Link: https://github.com/Olga-Yakovleva/RHVoice | |||
== Front end (NLP part) == | == Front end (NLP part) == | ||
Line 97: | Line 129: | ||
* Last update: 2016/10/11 | * Last update: 2016/10/11 | ||
* | * Link: https://github.com/RasmusD/SiRe | ||
==== Phonetisaurus ==== | ==== Phonetisaurus ==== | ||
Line 173: | Line 205: | ||
# Sentences extraction from pdf files | # Sentences extraction from pdf files | ||
# Sentences classification by langues | # Sentences classification by langues | ||
# Sentences filtering and cleaning | # Sentences filtering and cleaning | ||
Line 187: | Line 217: | ||
* Last update: 2017/09/20 | * Last update: 2017/09/20 | ||
* Link: https://github.com/idiap/asrt | |||
* Link: https://github.com/ | |||
==== IRISA text normalizer ==== | |||
Text normalisation tools from IRISA lab. | |||
The tools provided here are split into 3 steps: | |||
# Tokenisation (adding blanks around punctation marks, dealing with special cases like URLs, etc.) | |||
# Generic normalization (leading to homogeneous texts where (almost) information have been lost and where tags have been added for some entities) | |||
# Specific normalisation (projection of the generic texts into specific forms) | |||
* Last update: 2018/01/09 | |||
* Link: https://github.com/glecorve/irisa-text-normalizer | |||
=== Dictionary related tools === | === Dictionary related tools === | ||
Line 223: | Line 265: | ||
* Last update: 2016/12/25 | * Last update: 2016/12/25 | ||
* | * Link: http://hts.sp.nitech.ac.jp/ | ||
==== HTS Engine ==== | ==== HTS Engine ==== | ||
Line 232: | Line 274: | ||
* Last update: 2015/12/25 | * Last update: 2015/12/25 | ||
* | * Link: http://hts-engine.sourceforge.net/ | ||
=== DNN based === | === DNN based === | ||
Line 248: | Line 290: | ||
* Reference: | * Reference: | ||
@ | @inproceedings{wu2016merlin, | ||
title | title = {Merlin: An open source neural network speech synthesis system}, | ||
author | author = {Wu, Zhizheng and Watts, Oliver and King, Simon}, | ||
booktitle = {Proceedings of the Speech Synthesis Workshop (SSW)}, | |||
year | year = {2016} | ||
} | } | ||
Line 273: | Line 315: | ||
@inproceedings{potard2016idlak, | @inproceedings{potard2016idlak, | ||
title = {Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN.}, | title = {Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN.}, | ||
author = {Potard, Blaise and Aylett, Matthew P and Baude, David A and Motlicek, Petr}, | author = {Potard, Blaise and Aylett, Matthew P and Baude, David A and Motlicek, Petr}, | ||
booktitle = { | booktitle = {Proceedings of Interspeech}, | ||
pages = {2293--2297}, | pages = {2293--2297}, | ||
year | year = {2016} | ||
} | } | ||
==== CURRENNT scripts ==== | |||
The scripts and examples on the modified CURRENNT toolkit | |||
* Last update: 2017/08/27 | |||
* Link: https://github.com/TonyWangX/CURRENNT_SCRIPTS | |||
=== Wavenet based === | === Wavenet based === | ||
==== tensorflow-wavenet ==== | |||
A TensorFlow implementation of DeepMind's WaveNet paper | |||
* Last update: 2017/05/23 | |||
* Link: https://github.com/ibab/tensorflow-wavenet | |||
=== Other === | === Other === | ||
== End-to-end (text to audio) == | |||
=== barronalex/Tacotron === | |||
Implementation of Google's Tacotron in TensorFlow | |||
* Last update: 2017/08/08 | |||
* Link: https://github.com/barronalex/Tacotron | |||
=== keithito/tacotron === | |||
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model | |||
* Last update: 2017/11/06 | |||
* Link: https://github.com/keithito/tacotron | |||
=== Char2Wav: End-to-End Speech Synthesis === | |||
This repo has the code for our ICLR submission: | |||
Jose Sotelo, Soroush Mehri, Kundan Kumar, João Felipe Santos, Kyle Kastner, Aaron Courville, Yoshua Bengio. Char2Wav: End-to-End Speech Synthesis. | |||
The website is [http://www.josesotelo.com/speechsynthesis/ here]. | |||
* Last update: 2017/02/28 | |||
* Link: https://github.com/sotelo/parrot | |||
* Reference: | |||
@inproceedings{sotelo2017char2wav, | |||
title = {Char2Wav: End-to-end speech synthesis}, | |||
author = {Sotelo, Jose and Mehri, Soroush and Kumar, Kundan and Santos, Joao Felipe and Kastner, Kyle and Courville, Aaron and Bengio, Yoshua}, | |||
year = {2017}, | |||
booktitle = {Proceedings of International Conference on Learning Representations (ICLR)} | |||
} | |||
== Signal processing == | == Signal processing == | ||
Line 299: | Line 389: | ||
* Last update: | * Last update: | ||
* | * Link: http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html | ||
* Reference: | |||
@article{Kawahara1999, | |||
author = {Kawahara, Hideki and Masuda-katsuse, Ikuyo and {De Cheveigné}, Alain}, | |||
year = {1999}, | |||
journal = {Speech Communication}, | |||
pages = {187--207}, | |||
title = {Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency}, | |||
volume = {27}, | |||
} | |||
==== World ==== | ==== World ==== | ||
Line 309: | Line 410: | ||
* Link: https://github.com/mmorise/World | * Link: https://github.com/mmorise/World | ||
* Reference: | |||
@article{morise2016world, | |||
title = {WORLD: A vocoder-based high-quality speech synthesis system for real-time applications}, | |||
author = {Morise, Masanori and Yokomori, Fumiya and Ozawa, Kenji}, | |||
journal = {IEICE TRANSACTIONS on Information and Systems}, | |||
volume = {99}, | |||
number = {7}, | |||
pages = {1877--1884}, | |||
year = {2016}, | |||
publisher = {The Institute of Electronics, Information and Communication Engineers} | |||
} | |||
==== Covarep - A Cooperative Voice Analysis Repository for Speech Technologies ==== | ==== Covarep - A Cooperative Voice Analysis Repository for Speech Technologies ==== | ||
Line 337: | Line 451: | ||
@misc{degottex2014covarep, | @misc{degottex2014covarep, | ||
title = {COVAREP: A Cooperative Voice Analysis Repository for Speech Technologies}, | title = {COVAREP: A Cooperative Voice Analysis Repository for Speech Technologies}, | ||
author = {Degottex, Gilles}, | author = {Degottex, Gilles}, | ||
year | year = {2014} | ||
} | } | ||
Line 355: | Line 469: | ||
# Reference: | # Reference: | ||
@ | @inproceedings{espic2017direct, | ||
title = {Direct Modelling of Magnitude and Phase Spectra for | title = {Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis}, | ||
author = {Espic, Felipe and Valentini-Botinhao, Cassia and King, Simon}, | author = {Espic, Felipe and Valentini-Botinhao, Cassia and King, Simon}, | ||
booktitle = {Proceedings of Interspeech}, | |||
year | year = {2017} | ||
} | } | ||
Line 373: | Line 486: | ||
@inproceedings{espic2016waveform, | @inproceedings{espic2016waveform, | ||
title = {Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis.}, | title = {Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis.}, | ||
author = {Espic, Felipe and Valentini-Botinhao, Cassia and Wu, Zhizheng and King, Simon}, | author = {Espic, Felipe and Valentini-Botinhao, Cassia and Wu, Zhizheng and King, Simon}, | ||
booktitle = { | booktitle = {Proceedings of Interspeech}, | ||
pages = {2263--2267}, | pages = {2263--2267}, | ||
year | year = {2016} | ||
} | } | ||
Line 391: | Line 504: | ||
# Reference: | # Reference: | ||
@ | @inproceedings{degottex2016pulse, | ||
title = {A pulse model in log-domain for a uniform synthesizer}, | title = {A pulse model in log-domain for a uniform synthesizer}, | ||
author = {Degottex, Gilles and Lanchantin, Pierre and Gales, Mark}, | author = {Degottex, Gilles and Lanchantin, Pierre and Gales, Mark}, | ||
year | year = {2016}, | ||
booktitle = {Proceedings of the Speech Synthesis Workshop (SSW)} | |||
} | } | ||
Line 418: | Line 531: | ||
* Link: http://aholab.ehu.es/ahocoder/ | * Link: http://aholab.ehu.es/ahocoder/ | ||
@article{erro2014harmonics, | |||
title = {Harmonics plus noise model based vocoder for statistical parametric speech synthesis}, | |||
author = {Erro, Daniel and Sainz, Inaki and Navas, Eva and Hernaez, Inma}, | |||
journal = {IEEE Journal of Selected Topics in Signal Processing}, | |||
volume = {8}, | |||
number = {2}, | |||
pages = {184--194}, | |||
year = {2014}, | |||
publisher = {IEEE} | |||
} | |||
==== PhonVoc: Phonetic and Phonological vocoding ==== | ==== PhonVoc: Phonetic and Phonological vocoding ==== | ||
Line 429: | Line 553: | ||
* Link: https://github.com/idiap/phonvoc | * Link: https://github.com/idiap/phonvoc | ||
==== GlottGAN ==== | |||
Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis | |||
* Last update: 2017/05/30 | |||
* Link: https://github.com/bajibabu/GlottGAN | |||
* Reference: | |||
@inproceedings{bollepalli2017generative, | |||
title = {Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis}, | |||
author = {Bollepalli, Bajibabu and Juvela, Lauri and Alku, Paavo}, | |||
booktitle = {Proceedings of Interspeech}, | |||
pages = {3394--3398}, | |||
year = {2017} | |||
} | |||
==== Postfilt gan ==== | |||
This is an implementation of "Generative adversarial network-based postfilter for statistical parametric speech synthesis" | |||
Please check the run.sh file to train the system. Currently, testing part is not yet implemented. | |||
* Last update: 2017/07/06 | |||
* Link: https://github.com/bajibabu/postfilt_gna | |||
* Reference: | |||
@INPROCEEDINGS{Kaneko2017, | |||
author = {T. Kaneko and H. Kameoka and N. Hojo and Y. Ijima and K. Hiramatsu and K. Kashino}, | |||
booktitle = {Proceedings of the IEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, | |||
title = {Generative adversarial network-based postfilter for statistical parametric speech synthesis}, | |||
year = {2017}, | |||
volume = {}, | |||
number = {}, | |||
pages = {4910-4914}, | |||
doi = {10.1109/ICASSP.2017.7953090}, | |||
ISSN = {}, | |||
month = {March}, | |||
} | |||
=== Pitch extractor === | === Pitch extractor === | ||
Line 451: | Line 616: | ||
* Link: https://github.com/idiap/ssp | * Link: https://github.com/idiap/ssp | ||
=== | === Sample modelling === | ||
==== SampleRNN ==== | |||
SampleRNN: An Unconditional End-to-End Neural Audio Generation Mode | |||
* Last update: | |||
* Link: https://github.com/soroushmehr/sampleRNN_ICLR2017 | |||
@article{mehri2016samplernn, | |||
title = {SampleRNN: An unconditional end-to-end neural audio generation model}, | |||
author = {Mehri, Soroush and Kumar, Kundan and Gulrajani, Ishaan and Kumar, Rithesh and Jain, Shubham and Sotelo, Jose and Courville, Aaron and Bengio, Yoshua}, | |||
journal = {arXiv preprint arXiv:1612.07837}, | |||
year = {2016} | |||
} | |||
=== Toolkits === | |||
==== SPTK - Speech Signal Processing Toolkit ==== | ==== SPTK - Speech Signal Processing Toolkit ==== | ||
The main feature of the Speech Signal Processing Toolkit, available from NITECH, is that not only | The main feature of the Speech Signal Processing Toolkit, available from NITECH, is that not only | ||
Line 460: | Line 640: | ||
* Last update: 2016/12/25 | * Last update: 2016/12/25 | ||
* | * Link: http://sp-tk.sourceforge.net/ | ||
== Singing synthesizer == | == Singing synthesizer == | ||
Line 497: | Line 677: | ||
@inproceedings{huckvale2009klair, | @inproceedings{huckvale2009klair, | ||
title = {KLAIR: a virtual infant for spoken language acquisition research.}, | title = {KLAIR: a virtual infant for spoken language acquisition research.}, | ||
author = {Huckvale, Mark and Howard, Ian S and Fagel, Sascha}, | author = {Huckvale, Mark and Howard, Ian S and Fagel, Sascha}, | ||
booktitle = { | booktitle = {Proceedings of Interspeech}, | ||
pages = {696--699}, | pages = {696--699}, | ||
year | year = {2009} | ||
} | } | ||
Line 512: | Line 692: | ||
* Link: http://www.vocaltractlab.de/ | * Link: http://www.vocaltractlab.de/ | ||
== API/Library == | |||
=== Speech Tools === | |||
The Edinburgh Speech Tools Library is a collection of C++ class, | |||
functions and related programs for manipulating the sorts of objects | |||
used in speech processing. It includes support for reading and writing | |||
waveforms, parameter files (LPC, Ceptra, F0) in various formats and | |||
converting between them. It also includes support for linguistic type | |||
objects and support for various label files and ngrams (with | |||
smoothing). | |||
In addition to the library a number of programs are included. An | |||
intonation library which includes a pitch tracker, smoother and | |||
labelling system (using the Tilt Labelling system), a classification | |||
and regression tree (CART) building program called wagon. Also there | |||
is growing support for various speech recognition classes such as | |||
decoders and HMMs. | |||
The Edinburgh Speech Tools Library is not an end in itself but | |||
designed to make the construction of other speech systems easy. It is | |||
for example to provided the underlying classes in the Festival Speech | |||
Synthesis System | |||
The speech tools are currently distributed in full source form free | |||
for unrestricted use. | |||
* Last update: 2015/01/06 | |||
* Link: http://www.cstr.ed.ac.uk/projects/speech_tools/ | |||
=== ROOTS === | |||
Roots is an open source toolkit dedicated to annotated sequential data generation, management and | |||
processing. It is made of a core library and of a collection of utility scripts. A rich API is | |||
available in C++ and in Perl. | |||
* Last update: 2015/07/01 | |||
* Link: http://roots-toolkit.gforge.inria.fr/ | |||
* Reference: | |||
@inproceedings{chevelu:hal-00974628, | |||
AUTHOR = {Chevelu, Jonathan and Lecorv{'e}, Gw{'e}nol{'e} and Lolive, Damien}, | |||
TITLE = {ROOTS: a toolkit for easy, fast and consistent processing of large sequential annotated data collections}, | |||
BOOKTITLE = {Proceedings of Language Resources and Evaluation Conference (LREC)}, | |||
YEAR = {2014}, | |||
ADDRESS = {Reykjavik, Iceland}, | |||
URL = {http://hal.inria.fr/hal-00974628} | |||
} | |||
== Visualization & annotation tools == | == Visualization & annotation tools == | ||
Line 521: | Line 750: | ||
* Last update: | * Last update: | ||
* | * Link: http://www.fon.hum.uva.nl/praat/ | ||
* Reference: | * Reference: | ||
@article{boersma2006praat, | @article{boersma2006praat, | ||
title = {Praat: doing phonetics by computer}, | title = {Praat: doing phonetics by computer}, | ||
author = {Boersma, Paul}, | author = {Boersma, Paul}, | ||
journal = {http://www.praat.org/}, | journal = {http://www.praat.org/}, | ||
year | year = {2006} | ||
} | } | ||
Line 540: | Line 769: | ||
* Last update: | * Last update: | ||
* | * Link: http://www.speech.cs.cmu.edu/comp.speech/Section5/Synth/klatt.kpe80.html | ||
=== Wavesurfer === | === Wavesurfer === | ||
Line 551: | Line 778: | ||
* Last update: | * Last update: | ||
* | * Link: https://sourceforge.net/projects/wavesurfer/ | ||
== Resources == | |||
=== Dictionary === | |||
==== Unisyn lexicon ==== | |||
The Unisyn lexicon is a master lexicon transcribed in keysymbols, a kind of metaphoneme which allows the encoding of multiple accents of English. | |||
The lexicon is accompanied by a number of perl scripts which transform the base lexicon via phonological and allophonic rules, and other symbol changes, to produce output transcriptions in different accents. The rules can be applied to the whole lexicon, to produce an accent-specific lexicon, or to running text. Output can be displayed in keysymbols, SAMPA, or IPA. | |||
The system uses a geographically-based accent hierarchy, with a tree structure describing countries, regions, towns and speakers; this hierarchy is used to specify the application of rules and other pronunciation features. | |||
The lexicon system is customisable, and the documentation explains how to modify output by swtiching rules on and off, adding new rules or editing existing ones. The user can also add new nodes in the accent hierarchy (new accents or new speakers within an accent), or add new symbols. | |||
A number of UK, US, Australian and New Zealand accents are included in the release. | |||
The scripts run under unix, or Windows 98 (DOS), and use perl 5.6.0. | |||
* Last update: | |||
* Link: http://www.cstr.ed.ac.uk/projects/unisyn/ | |||
==== Combilex ==== | |||
Combilex GA is a keyword-based lexicon for the General American pronunciation. | |||
The combilex contains c.145,000 entries, including the 20,000 most frequent words and contains a variety of linguistic information alongside detailed pronunciations, including many useful proper names. | |||
Combilex GA is an ASCII text file, one entry-per-line, which is easily adaptable for use in text-to-speech synthesis (voice-building or run-time synthesis) and in speech recognition systems. | |||
Full manually notated orthographic-phonemic correspondences are included, allowing derivation of accurate grapheme-to-phoneme rules. | |||
* Last update: | |||
* Link: https://licensing.edinburgh-innovations.ed.ac.uk/item.php?item=combilex-ga | |||
* Reference: | * Reference: | ||
@inproceedings{richmond2009robust, | |||
title = {Robust LTS rules with the Combilex speech technology lexicon}, | |||
author = {Richmond, Korin and Clark, Robert AJ and Fitt, Susan}, | |||
year = {2009}, | |||
booktitle = {Proceedings of Interspeech} | |||
} |
Latest revision as of 14:52, 30 June 2020
Full system
Multilingual
Festival
Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Tools and documentation for build new voices are available through Carnegie Mellon's FestVox project
- Last update: 2015/01/06
- Reference:
@article{black2001festival, title = {The festival speech synthesis system, version 1.4.2}, author = {Black, Alan and Taylor, Paul and Caley, Richard and Clark, Rob and Richmond, Korin and King, Simon and Strom, Volker and Zen, Heiga}, journal = {Unpublished document available via http://www.cstr.ed.ac.uk/projects/festival.html}, year = {2001} }
FreeTTS
FreeTTS is a speech synthesis system written entirely in the JavaTM programming language. It is based upon Flite: a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University.
- Last update: 2009-03-09
- Reference:
@misc{walker2010freetts, title = {Freetts 1.2: A speech synthesizer written entirely in the Java programming language}, author = {Walker, Willie and Lamere, Paul and Kwok, Philip}, year = {2010} }
MBROLA
The aim of the MBROLA project, initiated by the TCTS Lab of the Faculté Polytechnique de Mons (Belgium), is to obtain a set of diphone-based speech synthesizers for as many languages as possible, and provide them free for non-commercial applications.
- Last update:
- Reference:
@inproceedings{dutoit1996mbrola, title = {The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes}, author = {Dutoit, Thierry and Pagel, Vincent and Pierret, Nicolas and Bataille, Fran{\c{c}}ois and Van der Vrecken, Olivier}, booktitle = {Proceedings of the Internal Conference of Spoken Language Processing} volume = {3}, pages = {1393--1396}, year = {1996}, organization = {IEEE} }
MARY
MARY is a multi-lingual (German, English, Tibetan) and multi-platform (Windows, Linux, MacOs X and Solaris) speech synthesis system. It comes with an easy-to-use installer - no technical expertise should be required for installation. It enables expressive speech synthesis, using both diphone and unit-selection synthesis.
- Last update: 2017/09/26
- Link: http://mary.dfki.de/
- Reference:
@article{schroder2003german, title = {The German text-to-speech synthesis system MARY: A tool for research, development and teaching}, author = {Schr{"o}der, Marc and Trouvain, J{"u}rgen}, journal = {International Journal of Speech Technology}, volume = {6}, number = {4}, pages = {365--377}, year = {2003}, publisher = {Springer} }
AhoTTS
Text-to-Speech conversor for Basque, Spanish, Catalan, Galician and English. It includes linguistic processing and built voices for all the languages aforementioned. Its acoustic engine is based on htsengine and it uses a high quality vocoder called AhoCoder.
- Last update: 2015/07/15
Language specific
AHOTTS (Basque & spanish)
Text-to-Speech conversor for Basque and Spanish. It includes linguistic processing and built voices for the languages aforementioned. Its acoustic engine is based on htsengine and it uses a high quality vocoder called AhoCoder.
- Last update: 2016/04/07
- Link2: https://sourceforge.net/projects/ahottsiparrahotsa/ (for Lapurdian dialect of Basque.)
- Reference:
@inproceedings{hernaez2001description, title = {Description of the ahotts system for the basque language}, author = {Hernaez, Inma and Navas, Eva and Murugarren, Juan Luis and Etxebarria, Borja}, booktitle = {Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis}, year = {2001} }
RHVoice (Russian)
RHVoice is a free and open source speech synthesizer.
- Last update: 2017/09/24
Front end (NLP part)
Front end inc G2P
SiRE
(Si)mply a (Re)search front-end for Text-To-Speech Synthesis. This is a research front-end for TTS. It is incomplete, inconsistent, badly coded and slow. But it is useful for me and should slowly develop into something useful to others.
- Last update: 2016/10/11
Phonetisaurus
This repository contains scripts suitable for training, evaluating and using grapheme-to-phoneme models for speech recognition using the OpenFst framework. The current build requires OpenFst version 1.6.0 or later, and the examples below use version 1.6.2.
The repository includes C++ binaries suitable for training, compiling, and evaluating G2P models. It also some simple python bindings which may be used to extract individual multigram scores, alignments, and to dump the raw lattices in .fst format for each word.
- Last update: 2017/09/17
Ossian
Ossian is a collection of Python code for building text-to-speech (TTS) systems, with an emphasis on easing research into building TTS systems with minimal expert supervision. Work on it started with funding from the EU FP7 Project Simple4All, and this repository contains a version which is considerable more up-to-date than that previously available. In particular, the original version of the toolkit relied on HTS to perform acoustic modelling. Although it is still possible to use HTS, it now supports the use of neural nets trained with the Merlin toolkit as duration and acoustic models. All comments and feedback about ways to improve it are very welcome.
- Last update: 2017/09/15
SALB
The SALB system is a software framework for speech synthesis using HMM based voice models built by HTS (http://hts.sp.nitech.ac.jp/). See a more generic description on http://m-toman.github.io/SALB/.
The package currently includes:
A C++ framework that abstracts the backend functionality and provides a SAPI5 interface, a command line interface and a C++ API.
Backend functionality is provided by
- an internal text analysis module for (Austrian) German,
- flite as text analysis module for English and
- htsengine for parameter generation/synthesis. (see COPYING for information on 3rd party libraries)
Also included is an Austrian German male voice model.
- Last update: 2016/11/14
Sequence-to-Sequence G2P toolkit
The tool does Grapheme-to-Phoneme (G2P) conversion using recurrent neural network (RNN) with long short-term memory units (LSTM). LSTM sequence-to-sequence models were successfully applied in various tasks, including machine translation [1] and grapheme-to-phoneme [2].
This implementation is based on python TensorFlow, which allows an efficient training on both CPU and GPU.
- Last update: 2017/03/28
Text normalization
Sparrowhawk
Sparrowhawk is an open-source implementation of Google's Kestrel text-to-speech text normalization system. It follows the discussion of the Kestrel system as described in:
Ebden, Peter and Sproat, Richard. 2015. The Kestrel TTS text normalization system. Natural Language Engineering, Issue 03, pp 333-353.
After sentence segmentation (sentenceboundary.h), the individual sentences are first tokenized with each token being classified, and then passed to the normalizer. The system can output as an unannotated string of words, and richer annotation with links between input tokens, their input string positions, and the output words is also available.
- Last update: 2017/07/25
ASRT
This is the README for the Automatic Speech Recognition Tools.
This project contains various scripts in order to facilitate the preparation of ASR related tasks.
Current tasks ares:
- Sentences extraction from pdf files
- Sentences classification by langues
- Sentences filtering and cleaning
Document sentences can be extracted into single document or batch mode.
For an example on how to extract sentences in batch mode, please have a look at the rundatapreparationtask.sh script located in examples/bash directory.
For an example on how to extract sentences in single document mode, please have a look at the rundatapreparation.sh script located in examples/bash directory.
The is also an API to be used in python code. It is located into the common package and is called DataPreparationAPI.py
- Last update: 2017/09/20
- Link: https://github.com/idiap/asrt
IRISA text normalizer
Text normalisation tools from IRISA lab.
The tools provided here are split into 3 steps:
- Tokenisation (adding blanks around punctation marks, dealing with special cases like URLs, etc.)
- Generic normalization (leading to homogeneous texts where (almost) information have been lost and where tags have been added for some entities)
- Specific normalisation (projection of the generic texts into specific forms)
- Last update: 2018/01/09
- Link: https://github.com/glecorve/irisa-text-normalizer
CMU Pronunciation Dictionary Tools
Tools for working with the CMU Pronunciation Dictionary
- Last update: 2015/02/23
ISS scripts for dictionary maintenance
These scripts are sufficient to convert the distributed forms of dictionaries into forms useful for our tools (notably HTK and ISS). Once a dictionary is in a standard form, the generic tools in ISS can be used to manipulate it further.
- Last update: 2017/07/04
Backend (Acoustic part)
Unit selection
HMM based
MAGE
MAGE is a C/C++ software toolkit for reactive implementation of HMM-based speech and singing synthesis.
- Last update: 2014/07/18
HMM-Based Speech Synthesis System (HTS)
The basic core system of HTS, available from NITECH, was implemented as a modified version of HTK together with SPTK (see below), and is released as HMM-Based Speech Synthesis System (HTS) in a form of patch code to HTK.
- Last update: 2016/12/25
HTS Engine
htsengine is a small run-time synthesis engine (less than 1 MB including acoustic models), which can run without the HTK library. The current version does not include any text analyzer but the Festival Speech Synthesis System can be used as a text analyzer.
- Last update: 2015/12/25
DNN based
MERLIN
Merlin is a toolkit for building Deep Neural Network models for statistical parametric speech synthesis. It must be used in combination with a front-end text processor (e.g., Festival) and a vocoder (e.g., STRAIGHT or WORLD).
The system is written in Python and relies on the Theano numerical computation library.
Merlin comes with recipes (in the spirit of the Kaldi automatic speech recognition toolkit) to show you how to build state-of-the art systems.
- Last update: 2017/09/29
- Reference:
@inproceedings{wu2016merlin, title = {Merlin: An open source neural network speech synthesis system}, author = {Wu, Zhizheng and Watts, Oliver and King, Simon}, booktitle = {Proceedings of the Speech Synthesis Workshop (SSW)}, year = {2016} }
IDLAK
Idlak is a project to build an end-to-end parametric TTS system within Kaldi, to be distributed with the same licence.
It contains a robust front-end, voice building tools, speech analysis utilities, and DNN tools suitable for parametric synthesis. It also contains an example of using Idlak as an end-to-end TTS system, in egs/ttsdnnarctic/s1
Note that the kaldi structure has been maintained and the tool building procedure is identical.
- Last update: 2017/07/03
- Reference:
@inproceedings{potard2016idlak, title = {Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN.}, author = {Potard, Blaise and Aylett, Matthew P and Baude, David A and Motlicek, Petr}, booktitle = {Proceedings of Interspeech}, pages = {2293--2297}, year = {2016} }
CURRENNT scripts
The scripts and examples on the modified CURRENNT toolkit
- Last update: 2017/08/27
Wavenet based
tensorflow-wavenet
A TensorFlow implementation of DeepMind's WaveNet paper
- Last update: 2017/05/23
Other
End-to-end (text to audio)
barronalex/Tacotron
Implementation of Google's Tacotron in TensorFlow
- Last update: 2017/08/08
keithito/tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model
- Last update: 2017/11/06
Char2Wav: End-to-End Speech Synthesis
This repo has the code for our ICLR submission:
Jose Sotelo, Soroush Mehri, Kundan Kumar, João Felipe Santos, Kyle Kastner, Aaron Courville, Yoshua Bengio. Char2Wav: End-to-End Speech Synthesis.
The website is here.
- Last update: 2017/02/28
- Reference:
@inproceedings{sotelo2017char2wav, title = {Char2Wav: End-to-end speech synthesis}, author = {Sotelo, Jose and Mehri, Soroush and Kumar, Kundan and Santos, Joao Felipe and Kastner, Kyle and Courville, Aaron and Bengio, Yoshua}, year = {2017}, booktitle = {Proceedings of International Conference on Learning Representations (ICLR)} }
Signal processing
Vocoder, Glottal modelling
STRAIGHT
STRAIGHT is a tool for manipulating voice quality, timbre, pitch, speed and other attributes flexibly. It is an always evolving system for attaining better sound quality, that is close to the original natural speech, by introducing advanced signal processing algorithms and findings in computational aspects of auditory processing.
STRAIGHT decomposes sounds into source information and resonator (filter) information. This conceptually simple decomposition makes it easy to conduct experiments on speech perception using STRAIGHT, the initial design objective of this tool, and to interpret experimental results in terms of huge body of classical studies.
- Last update:
- Reference:
@article{Kawahara1999, author = {Kawahara, Hideki and Masuda-katsuse, Ikuyo and {De Cheveigné}, Alain}, year = {1999}, journal = {Speech Communication}, pages = {187--207}, title = {Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency}, volume = {27}, }
World
WORLD is free software for high-quality speech analysis, manipulation and synthesis. It can estimate Fundamental frequency (F0), aperiodicity and spectral envelope and also generate the speech like input speech with only estimated parameters.
This source code is released under the modified-BSD license. There is no patent in all algorithms in WORLD.
- Last update: 2017/08/23
- Reference:
@article{morise2016world, title = {WORLD: A vocoder-based high-quality speech synthesis system for real-time applications}, author = {Morise, Masanori and Yokomori, Fumiya and Ozawa, Kenji}, journal = {IEICE TRANSACTIONS on Information and Systems}, volume = {99}, number = {7}, pages = {1877--1884}, year = {2016}, publisher = {The Institute of Electronics, Information and Communication Engineers} }
Covarep - A Cooperative Voice Analysis Repository for Speech Technologies
Covarep is an open-source repository of advanced speech processing algorithms and is stored as a GitHub project (https://github.com/covarep/covarep) where researchers in speech processing can store original implementations of published algorithms.
Over the past few decades a vast array of advanced speech processing algorithms have been developed, often offering significant improvements over the existing state-of-the-art. Such algorithms can have a reasonably high degree of complexity and, hence, can be difficult to accurately re-implement based on article descriptions. Another issue is the so-called 'bug magnet effect' with re-implementations frequently having significant differences from the original ones. The consequence of all this has been that many promising developments have been under-exploited or discarded, with researchers tending to stick to conventional analysis methods.
By developing Covarep we are hoping to address this by encouraging authors to include original implementations of their algorithms, thus resulting in a single de facto version for the speech community to refer to.
- Last update: 2016/10/16
- Reference:
@misc{degottex2014covarep, title = {COVAREP: A Cooperative Voice Analysis Repository for Speech Technologies}, author = {Degottex, Gilles}, year = {2014} }
MagPhase Vocoder
Speech analysis/synthesis system for TTS and related applications.
This software is based on the method described in the paper:
- Espic, C. Valentini-Botinhao, and S. King, “Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis,” in Proc. Interspeech, Stockholm, Sweden, August, 2017.
- Last update: 2017/08/30
- Reference:
@inproceedings{espic2017direct, title = {Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis}, author = {Espic, Felipe and Valentini-Botinhao, Cassia and King, Simon}, booktitle = {Proceedings of Interspeech}, year = {2017} }
WavGenSR
Waveform generator based on signal reshaping for statistical parametric speech synthesis.
- Last update: 2017/08/30
- Reference:
@inproceedings{espic2016waveform, title = {Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis.}, author = {Espic, Felipe and Valentini-Botinhao, Cassia and Wu, Zhizheng and King, Simon}, booktitle = {Proceedings of Interspeech}, pages = {2263--2267}, year = {2016} }
Pulse model analysis and synthesis
It is basically the vocoder described in:
- Degottex, P. Lanchantin, and M. Gales, "A Pulse Model in Log-domain for a Uniform Synthesizer," in Proc. 9th Speech Synthesis Workshop (SSW9), 2016.
- Last update: 2017/09/7
- Reference:
@inproceedings{degottex2016pulse, title = {A pulse model in log-domain for a uniform synthesizer}, author = {Degottex, Gilles and Lanchantin, Pierre and Gales, Mark}, year = {2016}, booktitle = {Proceedings of the Speech Synthesis Workshop (SSW)} }
YANG VOCODER: Yet-ANother-Generalized VOCODER
Yet another vocoder that is not STRAIGHT.
This project is a state-of-the-art vocoder that parameterizes the speech signal into a parameterization that is amenable to statistical manipulation.
The VOCODER was developed by Hideki Kawahara during his internship at Google.
- Last update: 2017/01/02
Ahocoder
Ahocoder parameterizes speech waveforms into three different streams: log-f0, cepstral representation of the spectral envelope, and maximum voiced frequency. It provides high accuracy during analysis and high quality during reconstruction. It is adequate for statistical parametric speech synthesis and voice conversion. Furthermore, it can be used just for basic speech manipulation and transformation (pitch level and variance, speaking rate, vocal tract length…).
Ahocoder is reported to be a very good complement for HTS. The output files generated by Ahocoder contain float numbers without header, so they are fully compatible with the HTS demo scripts in the HTS website. You can use the same configuration as in the STRAIGHT-based demo, using the "bap" stream to handle maximum voiced frequency (set its dimension to 1 both in data/Makefile and in scripts/Config.pm).
- Last update: 2014
@article{erro2014harmonics, title = {Harmonics plus noise model based vocoder for statistical parametric speech synthesis}, author = {Erro, Daniel and Sainz, Inaki and Navas, Eva and Hernaez, Inma}, journal = {IEEE Journal of Selected Topics in Signal Processing}, volume = {8}, number = {2}, pages = {184--194}, year = {2014}, publisher = {IEEE} }
PhonVoc: Phonetic and Phonological vocoding
This is a computational platform for Phonetic and Phonological vocoding, released under the BSD licence. See file COPYING for details. The software is based on Kaldi (v. 489a1f5) and Idiap SSP. For training of the analysis and synthesis models, follow please train/README.txt.
- Last update: 2016/11/23
GlottGAN
Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis
- Last update: 2017/05/30
- Reference:
@inproceedings{bollepalli2017generative, title = {Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis}, author = {Bollepalli, Bajibabu and Juvela, Lauri and Alku, Paavo}, booktitle = {Proceedings of Interspeech}, pages = {3394--3398}, year = {2017} }
Postfilt gan
This is an implementation of "Generative adversarial network-based postfilter for statistical parametric speech synthesis"
Please check the run.sh file to train the system. Currently, testing part is not yet implemented.
- Last update: 2017/07/06
- Reference:
@INPROCEEDINGS{Kaneko2017, author = {T. Kaneko and H. Kameoka and N. Hojo and Y. Ijima and K. Hiramatsu and K. Kashino}, booktitle = {Proceedings of the IEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title = {Generative adversarial network-based postfilter for statistical parametric speech synthesis}, year = {2017}, volume = {}, number = {}, pages = {4910-4914}, doi = {10.1109/ICASSP.2017.7953090}, ISSN = {}, month = {March}, }
Pitch extractor
REAPER: Robust Epoch And Pitch EstimatoR
This is a speech processing system. The reaper program uses the EpochTracker class to simultaneously estimate the location of voiced-speech "epochs" or glottal closure instants (GCI), voicing state (voiced or unvoiced) and fundamental frequency (F0 or "pitch"). We define the local (instantaneous) F0 as the inverse of the time between successive GCI.
This code was developed by David Talkin at Google. This is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.
- Last update: 2015/03/04
SSP - Speech Signal Processing module
SSP is a package for doing signal processing in python; the functionality is biassed towards speech signals. Top level programs include a feature extracter for speech recognition, and a vocoder for both coding and speech synthesis. The vocoder is based on linear prediction, but with several experimental excitation models. A continuous pitch extraction algorithm is also provided, built around standard components and a Kalman filter.
There is a "sister" package, libssp, that includes translations of some algorithms in C++. Libssp is built around libube that makes this translation easier.
SSP is released under a BSD licence. See the file COPYING for details.
- Last update: 2017/04/16
Sample modelling
SampleRNN
SampleRNN: An Unconditional End-to-End Neural Audio Generation Mode
- Last update:
@article{mehri2016samplernn, title = {SampleRNN: An unconditional end-to-end neural audio generation model}, author = {Mehri, Soroush and Kumar, Kundan and Gulrajani, Ishaan and Kumar, Rithesh and Jain, Shubham and Sotelo, Jose and Courville, Aaron and Bengio, Yoshua}, journal = {arXiv preprint arXiv:1612.07837}, year = {2016} }
Toolkits
SPTK - Speech Signal Processing Toolkit
The main feature of the Speech Signal Processing Toolkit, available from NITECH, is that not only standard speech analysis and synthesis techniques (e.g., LPC analysis, PARCOR analysis, LSP analysis, PARCOR synthesis filter, LSP synthesis filter, and vector quantization techniques) but also speech analysis and synthesis techniques developed at the research group can easily be used.
- Last update: 2016/12/25
Singing synthesizer
Sinsy
Sinsy is a HMM-based singing voice synthesis system.
- Last update: 2015/12/25
Ebook reader
Bard Storyteller ebook reader
Bard Storyteller is a text reader. Bard not only allows a user to read books, but can also read books to the user using text-to-speech. It supports txt, epub and (x)html files.
- Last update: 2014/07
- Link: http://festvox.org/bard/
Various tools
SparkNG
Matlab realtime speech tools and voice production tools
- Last update: 2017/06/29
Articulatory synthesizer
KLAIR - A virtual infant for spoken language acquisition research
The KLAIR project aims to build and develop a computational platform to assist research into the acquisition of spoken language. The main part of KLAIR is a sensori-motor server that displays a virtual infant on screen that can see, hear and speak. Behind the scenes, the server can talk to one or more client applications. Each client can monitor the audio visual input to the server and can send articulatory gestures to the head for it to speak through an articulatory synthesizer. Clients can also control the position of the head and the eyes as well as setting facial expressions. By encapsulating the real-time complexities of audio and video processing within a server that will run on a modern PC, we hope that KLAIR will encourage and facilitate more experimental research into spoken language acquisition through interaction.
- Last update:
- Reference:
@inproceedings{huckvale2009klair, title = {KLAIR: a virtual infant for spoken language acquisition research.}, author = {Huckvale, Mark and Howard, Ian S and Fagel, Sascha}, booktitle = {Proceedings of Interspeech}, pages = {696--699}, year = {2009} }
Vocaltractlab
VocalTractLab stands for "Vocal Tract Laboratory" and is an interactive multimedial software tool to demonstrate the mechanism of speech production. It is meant to facilitate an intuitive understanding of speech production for students of phonetics and related disciplines.
The current versions of VocalTractLab are free of charge. Only a registration code, which you can request by email, will be necessary to activate the software. VocalTractLab is written for Windows operating systems (XP or higher), but a porting to Linux/Unix is conceivable for the future.
- Last update: 2016
API/Library
Speech Tools
The Edinburgh Speech Tools Library is a collection of C++ class, functions and related programs for manipulating the sorts of objects used in speech processing. It includes support for reading and writing waveforms, parameter files (LPC, Ceptra, F0) in various formats and converting between them. It also includes support for linguistic type objects and support for various label files and ngrams (with smoothing).
In addition to the library a number of programs are included. An intonation library which includes a pitch tracker, smoother and labelling system (using the Tilt Labelling system), a classification and regression tree (CART) building program called wagon. Also there is growing support for various speech recognition classes such as decoders and HMMs.
The Edinburgh Speech Tools Library is not an end in itself but designed to make the construction of other speech systems easy. It is for example to provided the underlying classes in the Festival Speech Synthesis System
The speech tools are currently distributed in full source form free for unrestricted use.
- Last update: 2015/01/06
ROOTS
Roots is an open source toolkit dedicated to annotated sequential data generation, management and processing. It is made of a core library and of a collection of utility scripts. A rich API is available in C++ and in Perl.
- Last update: 2015/07/01
- Reference:
@inproceedings{chevelu:hal-00974628, AUTHOR = {Chevelu, Jonathan and Lecorv{'e}, Gw{'e}nol{'e} and Lolive, Damien}, TITLE = {ROOTS: a toolkit for easy, fast and consistent processing of large sequential annotated data collections}, BOOKTITLE = {Proceedings of Language Resources and Evaluation Conference (LREC)}, YEAR = {2014}, ADDRESS = {Reykjavik, Iceland}, URL = {http://hal.inria.fr/hal-00974628} }
Visualization & annotation tools
Praat
Praat is a system for doing phonetics by computer. The computer program Praat is a research, publication, and productivity tool for phoneticians. With it, you can analyse, synthesize, and manipulate speech, and create high-quality pictures for your articles and thesis.
- Last update:
- Reference:
@article{boersma2006praat, title = {Praat: doing phonetics by computer}, author = {Boersma, Paul}, journal = {http://www.praat.org/}, year = {2006} }
KPE
KPE provides a graphical interface for the implementation of the Klatt 1980 formant synthesiser. The interface allows users to display and edit Klatt parameters using a graphical display which includes the time-amplitude waveform of both the original speech and its synthetic copy, and some signal analysis facilities.
- Last update:
Wavesurfer
WaveSurfer is a tool for doing speech analysis. The analysis features include formants and pitch extraction and real time spectrograms. The Wavesurfer tool built on top of the Snack speech visualization module, is highly modular and extensible at several levels.
- Last update:
Resources
Dictionary
Unisyn lexicon
The Unisyn lexicon is a master lexicon transcribed in keysymbols, a kind of metaphoneme which allows the encoding of multiple accents of English.
The lexicon is accompanied by a number of perl scripts which transform the base lexicon via phonological and allophonic rules, and other symbol changes, to produce output transcriptions in different accents. The rules can be applied to the whole lexicon, to produce an accent-specific lexicon, or to running text. Output can be displayed in keysymbols, SAMPA, or IPA.
The system uses a geographically-based accent hierarchy, with a tree structure describing countries, regions, towns and speakers; this hierarchy is used to specify the application of rules and other pronunciation features.
The lexicon system is customisable, and the documentation explains how to modify output by swtiching rules on and off, adding new rules or editing existing ones. The user can also add new nodes in the accent hierarchy (new accents or new speakers within an accent), or add new symbols.
A number of UK, US, Australian and New Zealand accents are included in the release.
The scripts run under unix, or Windows 98 (DOS), and use perl 5.6.0.
- Last update:
Combilex
Combilex GA is a keyword-based lexicon for the General American pronunciation.
The combilex contains c.145,000 entries, including the 20,000 most frequent words and contains a variety of linguistic information alongside detailed pronunciations, including many useful proper names.
Combilex GA is an ASCII text file, one entry-per-line, which is easily adaptable for use in text-to-speech synthesis (voice-building or run-time synthesis) and in speech recognition systems.
Full manually notated orthographic-phonemic correspondences are included, allowing derivation of accurate grapheme-to-phoneme rules.
- Last update:
- Reference:
@inproceedings{richmond2009robust, title = {Robust LTS rules with the Combilex speech technology lexicon}, author = {Richmond, Korin and Clark, Robert AJ and Fitt, Susan}, year = {2009}, booktitle = {Proceedings of Interspeech} }