Transaction Description:
NEXT-GENERATION EXPRESSIVE PERSONALIZED VOICES FOR SPEECH-GENERATING DEVICES - PROJECT SUMMARY/ABSTRACT THE CREATION OF PERSONALIZED SYNTHETIC VOICES HAS WIDE APPLICATION IN MEDICAL/REHABILITATION SETTINGS FOR PA- TIENTS WHO RELY ON A SPEECH-GENERATING DEVICE (SGD) FOR COMMUNICATION. ONE COMMON APPLICATION IS VOICE BANKING, WHEREIN A PERSON WHO RISKS LOSING THEIR VOICE, SUCH AS SOMEBODY WITH A NEURODEGENERATIVE DISEASE LIKE AMYOTROPHIC LATERAL SCLEROSIS (ALS), RECORDS THEIR OWN SPEECH BEFORE THE ONSET OF DISEASE-RELATED DYSAR- THRIA FOR LATER USE IN AN SGD THAT MIMICS THEIR NATURAL SPEECH CHARACTERISTICS. WHILE THE TECHNOLOGY UNDERLYING THE CREATION OF SUCH PERSONALIZED SYNTHETIC VOICES IS GROWING IN MATURITY AND ADOPTION BY SGD USERS, IT STILL SUF- FERS FROM TWO PRIMARY LIMITATIONS: A LACK OF EXPRESSIVENESS AND A BURDENSOME AMOUNT OF RECORDING NEEDED TO CREATE HIGHLY NATURAL-SOUNDING VOICES. THE PROPOSED PROJECT AIMS TO REMEDY THIS SITUATION BY MARRYING THE MA- CHINE-LEARNING TECHNOLOGY BEHIND MODELTALKER, A PIONEERING VOICE-BANKING TEXT-TO-SPEECH SERVICE DEVELOPED AT NEMOURS CHILDREN’S HEALTH, WITH THE KNOWLEDGE-BASED TECHNOLOGY UNDERLYING SYNFONY, A RULE-BASED TEXT-TO- SPEECH SYSTEM DEVELOPED BY SYNFONICA LLC, WHICH IS CAPABLE OF GENERATING A VARIETY OF SPEECH STYLES AND EX- PRESSIVE MODES. THE EXPERT KNOWLEDGE BUILT INTO SYNFONICA WILL BE USED TO DESIGN AN OPTIMAL SET OF SENTENCES FOR VOICE BANKERS TO RECORD, AND ITS ALGORITHMS FOR THE GENERATION OF NATURAL-SOUNDING PROSODY IN DIFFERENT MODES AND STYLES WILL BE INTEGRATED INTO MODELTALKER’S MACHINE-LEARNING ALGORITHMS, CREATING A HYBRID SYSTEM THAT EMBRACES THE BEST QUALITIES OF BOTH APPROACHES. THE NEW TEXT-TO-SPEECH (TTS) SYSTEM RESULTING FROM THIS PROJECT WILL (A) REQUIRE A MINIMAL AMOUNT OF RECORDED SPEECH FROM THE VOICE BANKER, (B) ACCURATELY CAPTURE THEIR VOCAL IDENTITY, AND (C) BE STRUCTURED SUCH THAT NEW EXPRESSIVE MODES AND SPEECH STYLES CAN BE ADDED EASILY WITHOUT ADDITIONAL RECORDING. THE FEASIBILITY OF THE PROJECT WILL BE DEMONSTRATED BY RECORDING THE VOICES OF AN ADULT MALE, AN ADULT FEMALE, AND A CHILD, AND GENERATING TTS VOICES THAT CAN SPEAK IN THREE EXPRESSIVE MODES (NEUTRAL, HAPPY, AND SAD). PERCEPTUAL EXPERIMENTS WILL BE RUN TO EVALUATE THEIR INTELLIGIBILITY, NATURALNESS, SUC- CESS IN CAPTURING THE VOCAL IDENTITY OF THE SPEAKER, AND THE APPROPRIATENESS OF THEIR EXPRESSIVE MODES. IN GEN- ERAL, THE PROJECT WILL BE A MAJOR STEP FORWARD IN ENABLING THE USERS OF PERSONALIZED SYNTHETIC VOICES TO EXPRESS THEIR EMOTIONS AND INTENTIONS.