Objective Speech Analysis

What is VAD?

Voice activity detection (VAD) is a technique used in speech processing to determine whether an audio signal contains human speech…

Voice activity detection (VAD) is a technique used in speech processing to determine whether an audio signal contains human speech or not. It is a binary classifier that outputs a decision of "speech" or "non-speech" for each frame of the audio signal.

How to set the threshold of the VAD correctly?

The thresholds of the VAD depend on your specific setup. The parameters "fSnrThreshSmall" and "fSnrThreshBig" from "logo_objspeechana_vad.ini" mainly define the…

The thresholds of the VAD depend on your specific setup. The parameters "fSnrThreshSmall" and "fSnrThreshBig" from "logo_objspeechana_vad.ini" mainly define the decision of the VAD: Default values are 4 and 6. Try to set the small value to 2/3 of the big one and decrease it if no voice is detected even when you speak, increase it if the VAD is too sensitive. Important: Do not speak during the countdown time, at some point the VAD will need to initialize and measure the noise floor. If you speak at this point, the VAD will not function properly.

%--------------------------------------------------------------------------------------------------------------
% Set internal parameters
%--------------------------------------------------------------------------------------------------------------

VAD fOverEstimation = 1.2
VAD fAdaptionFactor = 0.9
VAD fInitAdaptionFactor = 0.5
VAD fIncMaxTimeinSec = 4
VAD fIncindBperSec = 5
VAD fDecindBperSec = -5
VAD fSnrThreshSmall = 2
VAD fVADSmoothingPercentage = 0.3
VAD fVADSmoothedThreshSec = 0.3
VAD fSnrThreshBig = 6

Objective Speech Analysis

How to set the maximum pause time for the VAD?

The maximum allowed length of speech pauses in seconds is also defined in "logo_objspeechana_vad.ini". The parameter "bNoVoiceTaskSpecific" defines if you…

The maximum allowed length of speech pauses in seconds is also defined in "logo_objspeechana_vad.ini". The parameter "bNoVoiceTaskSpecific" defines if you want different parameters for the three different voice tasks (Vowels, Speech and Training). If true, there will be 3 options for each voice task, which can be selected up to a time in seconds. It is useful to select less pause time for vowel tasks than for training tasks. If "bNoVoiceTaskSpecific" is set to false, a parameter will be set and used regardless of the voice task. For example, if the test ends within a sentence, the allowed length of speech pauses must be increased.

% Choose if time until VA stops recordings is chosen global or task specific:

VAD bNoVoiceTaskSpecific = true

% Vowel: no to very short pauses expected, Speech: longer pauses expected

VAD fNoVoiceMaxinSecVowel = 1.8
VAD fNoVoiceMaxinSecSpeech = 2.8
VAD fNoVoiceMaxinSecTraining = 3.8

Objective Speech Analysis

When does the recording time start?

The countdown time can be set in the file "logo_objspeechana.ini" and the parameter "fCountdownTimeinSec". The wav file with the recording…

The countdown time can be set in the file "logo_objspeechana.ini" and the parameter "fCountdownTimeinSec". The wav file with the recording will inherit the countdown time minus half a second (for clicking the button) at the beginning, your voice recording and some pause at the end, depending on your chosen maximum speech pause time in seconds. This allows you to reprocess your audio files offline directly after your real-time recording.

%--------------------------------------------------------------------------------------------------------------
% Coundown time (min = 3 s, max = 15 s, (3-5 seconds visible timer + preparation time)
%--------------------------------------------------------------------------------------------------------------

ObjSpeechAna fCountdownTimeinSec = 5

Objective Speech Analysis

Why does my test end, even if speech activity was present?

The parameter fMaxTimeExpectedForAnalysis defines the maximum time for one speech task. This prevents tests that do not end because of…

The parameter fMaxTimeExpectedForAnalysis defines the maximum time for one speech task. This prevents tests that do not end because of malfunctioning of the VAD or other errors. If your speech task is longer than this defined time, you have to adapt this parameter.

%--------------------------------------------------------------------------------------------------------------
% Maximal expected times in seconds for all speech tasks:
%--------------------------------------------------------------------------------------------------------------

ObjSpeechAna fMaxTimeExpectedForAnalysis = 40

Objective Speech Analysis

How to adjust the bars of the recording bars?

For each test, you can specify whether a fixed time should be used. ObjSpeechAna Sustain_Vowel_Fixed_Time_Sec27 = true ObjSpeechAna Fixed_Time27 =…

For each test, you can specify whether a fixed time should be used.

ObjSpeechAna Sustain_Vowel_Fixed_Time_Sec27 = true
ObjSpeechAna Fixed_Time27 = 30

If a fixed time is entered for the test, a progress bar appears in the exo showing how far the test has already progressed.

Objective Speech Analysis

What is the meaning of the individual test categories?

There are four categories of tests. category includes tests that involve sustaining a vowel sound. category comprises articulation tests. category…

There are four categories of tests.

category includes tests that involve sustaining a vowel sound.
category comprises articulation tests.
category is for free speech
category includes tests that measure tremor.

The config determines the number of tests within each of the four test categories. This flexibility allows the option to exclude specific categories.

%---------------------------------------------------------
% --- Module activation

ObjSpeechAna bFirstTestEnabled = true
ObjSpeechAna bSecondTestEnabled = true
ObjSpeechAna bThirdTestEnabled = true
ObjSpeechAna bFourthTestEnabled = false

All the categories can be enabled or disabled in their respective section by changing the value from "true" to "false"

Objective Speech Analysis

How to adjust the number of trials?

You can specify the number of tests for each category. The total number of trials is calculated by the sum…

You can specify the number of tests for each category. The total number of trials is calculated by the sum of all tests across categories.

%---------------------------------------------------------
% First test:
%---------------------------------------------------------
% --- Module activation

ObjSpeechAna bFirstTestEnabled = true

ObjSpeechAna iFirstPreparingTestNum = 0
ObjSpeechAna iFirstTestNum = 10

Objective Speech Analysis

What to do if the pitch is poorly estimated?

If your pitch results are not as expected, there are several possible reasons: First, check that the voice activity detection…

If your pitch results are not as expected, there are several possible reasons:

First, check that the voice activity detection is working properly. If it is constantly opening and closing during periods of voice activity, the pitch cannot be estimated correctly because the memory assumptions do not hold. By looking at the log_features.csv file, you can see how many values are extracted over time. You should find new values every 5 to 20 ms, depending on the chosen frameshift.
If the VAD works well, the parameters of the logo_objspeechana_pitchest.ini can be chosen differently. Default values are given for the three task categories (Vowel, Speech and Training). If you have a lot of zeros in your pitch results, try lowering the amount of "f...MeanCalcTime" of the corresponding category.
Changing the maximum pitch allowed (fMaxPitch) or the time needed to initialize (fInitTimeSec) may also help.
If a person's pitch varies a lot between tasks, setting bUseOldPitchforNextTestEnabled to false might also help.
You can also use the Autocorrelation Life Plotter in the GUI Medical Signal Processing -> Logopedic Signal Processing -> Objective Speech Analysis -> Plotter. The first maximum is related to pitch estimation. If you see a clear maximum, but the estimated and marked peak is somewhere else, try changing more parameters.
Or if you cannot improve the results, contact This email address is being protected from spambots. You need JavaScript enabled to view it..

Objective Speech Analysis

KiRAT - Kiel Real-time Application Toolkit