FastSpeech 2: Fast and High-Quality End-to-End Text-to-Speech

Audio Samples

All of the audio samples use Parallel WaveGAN (PWG) as vocoder. For all audio samples, the background noise of LJSpeech is reduced using spectral subtraction.

Comparison with Other Models

Were the leaders in this luckless change, though our own Baskerville, who was at work some years before them, went much on the same lines.

GT GT (PWG) Transformer TTS
FastSpeech FastSpeech 2 FastSpeech 2s


And were occupied as a rule by ten to fifteen people when the prison was not crowded, but double the number was occasionally placed in them.

GT GT (PWG) Transformer TTS
FastSpeech FastSpeech 2 FastSpeech 2s


Was used for debtors arrested for the lowest sums within twelve miles of the palace of Whitehall.

GT GT (PWG) Transformer TTS
FastSpeech FastSpeech 2 FastSpeech 2s


Notably as when numbers filled Newgate in anticipation of Lord Redesdale's bill for insolvent debtors.

GT GT (PWG) Transformer TTS
FastSpeech FastSpeech 2 FastSpeech 2s


Others who arrived just after the time of distribution were often forty-eight hours without food. The latter might also be six days without meat.

GT GT (PWG) Transformer TTS
FastSpeech FastSpeech 2 FastSpeech 2s

Ablation Study

Most of Caxton's own types are of an earlier character, though they also much resemble Flemish or Cologne letter.

FastSpeech 2 - CWT - Pitch
- Energy - Energy Pitch
FastSpeech 2s - CWT - Pitch
- Energy - Energy Pitch - Mel Decoder


And were occupied as a rule by ten to fifteen people when the prison was not crowded, but double the number was occasionally placed in them.

FastSpeech 2 - CWT - Pitch
- Energy - Energy Pitch
FastSpeech 2s - CWT - Pitch
- Energy - Energy Pitch - Mel Decoder