voice_kal_diphone and voice_ral_diphone work correctly in singing mode (there are voice outputs, and tonal values ββare correct for the indicated notes).
voice_cmu_us_ahw_cg , and other CMU voices do not work correctly - there is a vocal output, but the step does not change in accordance with the indicated notes.
Is it possible to get the right result with higher quality CMU signals?
Command line for working (taking into account the pitch) output:
text2wave -mode singing -eval "(voice_kal_diphone)" -o song.wav song.xml
Command line for non-working (without tone) output:
text2wave -mode singing -eval "(voice_cmu_us_ahw_cg)" -o song.wav song.xml
Here's the song.xml :
<?xml version="1.0"?> <!DOCTYPE SINGING PUBLIC "-//SINGING//DTD SINGING mark up//EN" "Singing.v0_1.dtd" []> <SINGING BPM="60"> <PITCH NOTE="A4,C4,C4"><DURATION BEATS="0.3,0.3,0.3">nationwide</DURATION></PITCH> <PITCH NOTE="C4"><DURATION BEATS="0.3">is</DURATION></PITCH> <PITCH NOTE="D4"><DURATION BEATS="0.3">on</DURATION></PITCH> <PITCH NOTE="F4"><DURATION BEATS="0.3">your</DURATION></PITCH> <PITCH NOTE="F4"><DURATION BEATS="0.3">side</DURATION></PITCH> </SINGING>
You may also need this patch for singing-mode.scm :
@@ -339,7 +339,9 @@ (defvar singing-max-short-vowel-length 0.11) (define (singing_do_initial utt token) - (if (equal? (item.name token) "") + (if (and + (not (equal? nil token)) + (equal? (item.name token) "")) (let ((restlen (car (item.feat token 'rest)))) (if singing-debug (format t "restlen %l\n" restlen))
To set up the environment, I used the festvox fest_build script . You can also download voice_cmu_us_ahw_cg separately .
source share