If your pitch results are not as expected, there are several possible reasons: First, check that the voice activity detection is working properly. If it is constantly opening and closing during periods of voice activity, the pitch cannot be estimated correctly because the memory assumptions do not hold. By looking at the log_features.csv file, you can see how many values are extracted over time. You should find new values every 5 to 20 ms, depending on the chosen frameshift.
If the VAD works well, the parameters of the logo_objspeechana_pitchest.ini can be chosen differently. Default values are given for the three task categories (Vowel, Speech and Training). If you have a lot of zeros in your pitch results, try lowering the amount of "f...MeanCalcTime" of the depending category. Changing the maximum allowed pitch "fMaxPitch" or the initialization time "fInitTimeSec" may also help. If a person's pitch varies a lot in different tasks, changing "bUseOldPitchforNextTestEnabled" to false might also help. Try changing these parameters step by step and observe the results. You can also use the Autocorrelation Life Plotter in the GUI Medical Signal Processing -> Logopedic Signal Processing -> Objective Speech Analysis -> Plotter. The first maximum is related to pitch estimation. If you see a clear maximum, but the estimated and marked peak is somewhere else, try changing more parameters, or if you cannot improve the results, contact
You can specify the number of tests for each category. The total number of trials is calculated by the sum of all tests across categories.
ObjSpeechAna bFirstTestEnabled = true
ObjSpeechAna iFirstPreparingTestNum = 0
ObjSpeechAna iFirstTestNum = 10
There are four categories of tests.
ObjSpeechAna bFirstTestEnabled = true
ObjSpeechAna bSecondTestEnabled = true
ObjSpeechAna bThirdTestEnabled = true
ObjSpeechAna bFourthTestEnabled = false
All the categories can be enabled or disabled in their respective section by changing the value from "true" to "false"
For each test, you can specify whether a fixed time should be used.
ObjSpeechAna Sustain_Vowel_Fixed_Time_Sec27 = true
ObjSpeechAna Fixed_Time27 = 30
The parameter fMaxTimeExpectedForAnalysis defines the maximum time for one speech task. This prevents tests that do not end because of malfunctioning of the VAD or other errors. If your speech task is longer than this defined time, you have to adapt this parameter.
ObjSpeechAna fMaxTimeExpectedForAnalysis = 40
The countdown time can be set in the file "logo_objspeechana.ini" and the parameter "fCountdownTimeinSec". The wav file with the recording will inherit the countdown time minus half a second (for clicking the button) at the beginning, your voice recording and some pause at the end, depending on your chosen maximum speech pause time in seconds. This allows you to reprocess your audio files offline directly after your real-time recording.
ObjSpeechAna fCountdownTimeinSec = 5
The maximum allowed length of speech pauses in seconds is also defined in "logo_objspeechana_vad.ini". The parameter "bNoVoiceTaskSpecific" defines if you want different parameters for the three different voice tasks (Vowels, Speech and Training). If true, there will be 3 options for each voice task, which can be selected up to a time in seconds. It is useful to select less pause time for vowel tasks than for training tasks. If "bNoVoiceTaskSpecific" is set to false, a parameter will be set and used regardless of the voice task. For example, if the test ends within a sentence, the allowed length of speech pauses must be increased.
VAD bNoVoiceTaskSpecific = true
VAD fNoVoiceMaxinSecVowel = 1.8
VAD fNoVoiceMaxinSecSpeech = 2.8
VAD fNoVoiceMaxinSecTraining = 3.8
The thresholds of the VAD depend on your specific setup. The parameters "fSnrThreshSmall" and "fSnrThreshBig" from "logo_objspeechana_vad.ini" mainly define the decision of the VAD: Default values are 4 and 6. Try to set the small value to 2/3 of the big one and decrease it if no voice is detected even when you speak, increase it if the VAD is too sensitive. Important: Do not speak during the countdown time, at some point the VAD will need to initialize and measure the noise floor. If you speak at this point, the VAD will not function properly.
VAD fOverEstimation = 1.2
VAD fAdaptionFactor = 0.9
VAD fInitAdaptionFactor = 0.5
VAD fIncMaxTimeinSec = 4
VAD fIncindBperSec = 5
VAD fDecindBperSec = -5
VAD fSnrThreshSmall = 2
VAD fVADSmoothingPercentage = 0.3
VAD fVADSmoothedThreshSec = 0.3
VAD fSnrThreshBig = 6
Voice activity detection (VAD) is a technique used in speech processing to determine whether an audio signal contains human speech or not. It is a binary classifier that outputs a decision of "speech" or "non-speech" for each frame of the audio signal.
Prof. Dr.-Ing. Gerhard Schmidt
E-Mail: gus@tf.uni-kiel.de
Christian-Albrechts-Universität zu Kiel
Faculty of Engineering
Institute for Electrical Engineering and Information Engineering
Digital Signal Processing and System Theory
Kaiserstr. 2
24143 Kiel, Germany
% First test:
%---------------------------------------------------------
% --- Module activation