If your pitch results are not as expected, there are several possible reasons: First, check that the voice activity detection is working properly. If it is constantly opening and closing during periods of voice activity, the pitch cannot be estimated correctly because the memory assumptions do not hold. By looking at the log_features.csv file, you can see how many values are extracted over time. You should find new values every 5 to 20 ms, depending on the chosen frameshift.
If the VAD works well, the parameters of the logo_objspeechana_pitchest.ini can be chosen differently. Default values are given for the three task categories (Vowel, Speech and Training). If you have a lot of zeros in your pitch results, try lowering the amount of "f...MeanCalcTime" of the depending category. Changing the maximum allowed pitch "fMaxPitch" or the initialization time "fInitTimeSec" may also help. If a person's pitch varies a lot in different tasks, changing "bUseOldPitchforNextTestEnabled" to false might also help. Try changing these parameters step by step and observe the results. You can also use the Autocorrelation Life Plotter in the GUI Medical Signal Processing -> Logopedic Signal Processing -> Objective Speech Analysis -> Plotter. The first maximum is related to pitch estimation. If you see a clear maximum, but the estimated and marked peak is somewhere else, try changing more parameters, or if you cannot improve the results, contact
You can specify the number of tests for each category. The total number of trials is calculated by the sum of all tests across categories.
There are four categories of tests. The first category includes tests that involve sustaining a vowel sound. The second category comprises articulation tests. The third category is for free speech, and the fourth category includes tests that measure tremor. The config determines the number of tests within each of the four test categories. This flexibility allows the option to exclude specific categories.
For each test, you can specify whether a fixed time should be used.
If a fixed time is entered for the test, a progress bar appears in the exo showing how far the test has already progressed.
The parameter fMaxTimeExpectedForAnalysis defines the maximum time for one speech task. This prevents tests that do not end because of malfunctioning of the VAD or other errors. If your speech task is longer than this defined time, you have to adapt this parameter.
The countdown time can be set in the file "logo_objspeechana.ini" and the parameter "fCountdownTimeinSec". The wav file with the recording will inherit the countdown time minus half a second (for clicking the button) at the beginning, your voice recording and some pause at the end, depending on your chosen maximum speech pause time in seconds. This allows you to reprocess your audio files offline directly after your real-time recording.
The maximum allowed length of speech pauses in seconds is also defined in "logo_objspeechana_vad.ini". The parameter "bNoVoiceTaskSpecific" defines if you want different parameters for the three different voice tasks (Vowels, Speech and Training). If true, there will be 3 options for each voice task, which can be selected up to a time in seconds. It is useful to select less pause time for vowel tasks than for training tasks. If "bNoVoiceTaskSpecific" is set to false, a parameter will be set and used regardless of the voice task. For example, if the test ends within a sentence, the allowed length of speech pauses must be increased.
The thresholds of the VAD depend on your specific setup. The parameters "fSnrThreshSmall" and "fSnrThreshBig" from "logo_objspeechana_vad.ini" mainly define the decision of the VAD: Default values are 4 and 6. Try to set the small value to 2/3 of the big one and decrease it if no voice is detected even when you speak, increase it if the VAD is too sensitive. Important: Do not speak during the countdown time, at some point the VAD will need to initialize and measure the noise floor. If you speak at this point, the VAD will not function properly.
Voice activity detection (VAD) is a technique used in speech processing to determine whether an audio signal contains human speech or not. It is a binary classifier that outputs a decision of "speech" or "non-speech" for each frame of the audio signal.
Prof. Dr.-Ing. Gerhard Schmidt
E-Mail: gus@tf.uni-kiel.de
Christian-Albrechts-Universität zu Kiel
Faculty of Engineering
Institute for Electrical Engineering and Information Engineering
Digital Signal Processing and System Theory
Kaiserstr. 2
24143 Kiel, Germany