CHEAP ONLINE COURSE

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 30 June 2010

Digital Signal Processor and Text-to-Speech

Posted on 06:34 by Unknown

This is the second post in a series on Text-to-Speech for eLearning written by Dr. Joel Harband and edited by me (which turns out to be a great way to learn).  The first post, Text-to-Speech Overview and NLP Quality, introduced the text to speech voice and discussed issues of quality related to its first component – the natural language processor (NLP). In this post we’ll look at the second component of a text to speech voice: the digital signal processor (DSP) and its measures of quality.

Digital Signal Processor (DSP)

The digital signal processor translates the phonetic language specification of the text produced by the NLP into spoken speech. The main challenge of the DSP is to produce a voice that is both intelligible and natural.  Two methods are used:

  • Formant Synthesis.  Formant Synthesis seeks to model the human voice by computer-generated sounds, using an acoustic model. Typically, this method produces intelligible, but not very natural, speech. These are the robotic voices, like MS Mike, that people often associate with text to speech. Although not acceptable for eLearning, these voices have the advantages of being small and fast programs and so they find application in embedded systems and in applications where naturalness is not required as in toys and in assistive technology.
  • Concatenative Synthesis. To achieve the remarkable naturalness of Paul and Heather, concatenative synthesis is used. A recording of a real human voice is broken down into acoustic units: phonemes, syllables, words, phrases and sentences and stored in a database. The processor retrieves acoustic units from the database in real time and connects (concatenates) them together to best match the input text.

Concatenative Synthesis and Quality

When you think about how concatenative synthesis works – joining together a lot of smaller sounds to form the voice, it suggests where there can be glitches.  Glitches will occur either because there’s not a recorded version of exactly what the sound should be or will occur where the segments are joined when it doesn’t come together quite right. The main strategy is to try to choose database segments that are as long as possible– phrases and even sentences – to minimize the number of connection glitches.

Here is an example of a glitch in Paul when joining the two words “bright” and “eyes”. (It wasn’t easy to find a glitch in Paul – finally found one in a Shakespeare sonnet!)

  • Mike - bright eyes
  • Heather - bright eyes
  • Paul - bright eyes

The output from the best concatenative systems is often indistinguishable from real human voices. Maximum naturalness typically requires speech databases to be very large so the larger the database the higher the quality. Typical TTS voice databases that will be acceptable in eLearning, will be on the order of 100-200 Mb. For lower fidelity applications like telephony, the acoustic unit files can be made smaller by using a lower sampling rate without sacrificing intelligibility and naturalness, making a smaller database (smaller footprint).

By the way, the database is only used to generate the sounds which are then stored as .wav, .mp3, etc.  It is not brought along with the eLearning piece itself.  So a large database is generally a good thing.

Here is a list of the TTS voices offered by NeoSpeech, Acapela and Nuance with their file sizes and sampling rates.

Voice

Vendor

Sampling rate (kHz)

File Size (Mb)

Applications

Paul

NeoSpeech

8

270  (Max DB)

Telephone

Paul

NeoSpeech

16

64

Multi-media

Paul

NeoSpeech

16

490  (Max DB)

Multi-media

Kate

NeoSpeech

8

340  (Max DB)

Telephone

Kate

NeoSpeech

16

64

Multi-media

Kate

NeoSpeech

16

610  (Max DB)

Multi-media

Heather

Acapela

22

110

Multi-media

Ryan

Acapela

22

132

Multi-media

Samantha

Nuance

22

48

Multi-media

Jill

Nuance

22

39

Multi-media

The file size is a combination of the sampling rate and the database size, where the database size is related to the number of acoustics units stored. For example, voices 2 and 3 have the same sampling rate, 16, but voice 3 has a much bigger file size because of the larger database size. In general, the higher sampling rates are used for multimedia applications and the lower sampling rates for telecommunications.  Often larger sizes also indicate a higher price point.

The DSP voice quality is then a combination of the two factors: the sampling rate, which determines the voice fidelity and the database size which determines the quality of concatenation and frequency of glitches – the more acoustic units stored in the database, the better the chances of achieving a perfect concatenation without glitches.

And don’t forget to factor in Text-to-Speech NLP Quality.  Together with DSP quality you get the overall quality of different Text-to-Speech solutions.

Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)
Be Where your Customers are $1.99 / mo hosting for 12 months! Be Where your Customers are $1.99 / mo hosting for 12 months!-

Popular Posts

  • Leave A Comment
    Blogs are all about conversation. I just saw a post that explained to readers how to leave a comment, and I thought that might be a good id...
  • Discussion Forums for Knowledge Sharing at Capital City Bank
    Looking at Capital City Bank from the outside, I wouldn’t have expected to find a great example of social learning inside.  They are a sol...
  • eLearning Conferences 2013
    Clayton R. Wright has done his 28th version of his amazing list of conferences again this year. Past years eLearning Conferences 2012 , eL...
  • eLearning Conferences 2011 Updated
    May 18 2011 - Updated conferences with new list for June - December 2011 (and beyond). Clayton R. Wright has done his amazing list of conf...
  • eLearning Conferences 2010
    You can find other posts about eLearning Conferences in eLearning Conferences 2011 , eLearning Conferences 2010 , and eLearning Conferences ...
  • eLearning Conferences
    You can find other posts about eLearning Conferences in eLearning Conferences 2011 , eLearning Conferences 2010 , and eLearning Conferences ...
  • Top 100 eLearning Items
    Using eLearning Learning , I thought it would be interesting to go look what it thinks are some of the top items of all time. Learning ...
  • Twitter VLE Conversation - Best of eLearning Last Week
    In case you missed any of these great posts last week - here's a recap of what the top stuff was from eLearning Learning . Top Posts Th...
  • Online Degrees Get No Respect
    Saturday Night Live certainly doesn't think much of online degrees a subject that I was thinking was going away since I posted about it ...
  • eLearning Learning – March 1 – 15 2009
    Here are the top items via eLearning Learning . Top Items Communities of Practice Online Education - Introducing the Microlecture...

Blog Archive

  • ►  2012 (5)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  June (1)
    • ►  January (1)
  • ►  2011 (14)
    • ►  November (1)
    • ►  October (1)
    • ►  August (1)
    • ►  June (1)
    • ►  April (2)
    • ►  March (2)
    • ►  February (4)
    • ►  January (2)
  • ▼  2010 (51)
    • ►  December (1)
    • ►  November (3)
    • ►  October (5)
    • ►  September (2)
    • ►  August (5)
    • ►  July (4)
    • ▼  June (5)
      • Digital Signal Processor and Text-to-Speech
      • Learning Flash
      • Online Exam Preparation and Tutoring – Hot Market
      • eLearning Learning Sponsored by Rapid Intake
      • Text-to-Speech Overview and NLP Quality
    • ►  May (5)
    • ►  April (3)
    • ►  March (4)
    • ►  February (5)
    • ►  January (9)
  • ►  2009 (222)
    • ►  December (10)
    • ►  November (14)
    • ►  October (13)
    • ►  September (16)
    • ►  August (12)
    • ►  July (16)
    • ►  June (22)
    • ►  May (20)
    • ►  April (22)
    • ►  March (23)
    • ►  February (28)
    • ►  January (26)
  • ►  2008 (196)
    • ►  December (25)
    • ►  November (27)
    • ►  October (17)
    • ►  September (18)
    • ►  August (8)
    • ►  July (16)
    • ►  June (23)
    • ►  May (18)
    • ►  April (12)
    • ►  March (17)
    • ►  February (15)
Powered by Blogger.

About Me

Unknown
View my complete profile