KiRAT - Kiel Real-time Application Toolkit

Objective Speech Analysis

What to do if the pitch is poorly estimated ?

If your pitch results are not as expected, there are several possible reasons: First, check that the voice activity detection…

If your pitch results are not as expected, there are several possible reasons: First, check that the voice activity detection is working properly. If it is constantly opening and closing during periods of voice activity, the pitch cannot be estimated correctly because the memory assumptions do not hold. By looking at the log_features.csv file, you can see how many values are extracted over time. You should find new values every 5 to 20 ms, depending on the chosen frameshift. If the VAD works well, the parameters of the logo_objspeechana_pitchest.ini can be chosen differently. Default values are given for the three task categories (Vowel, Speech and Training). If you have a lot of zeros in your pitch results, try lowering the amount of "f...MeanCalcTime" of the depending category. Changing the maximum allowed pitch "fMaxPitch" or the initialization time "fInitTimeSec" may also help. If a person's pitch varies a lot in different tasks, changing "bUseOldPitchforNextTestEnabled" to false might also help. Try changing these parameters step by step and observe the results. You can also use the Autocorrelation Life Plotter in the GUI Medical Signal Processing -> Logopedic Signal Processing -> Objective Speech Analysis -> Plotter. The first maximum is related to pitch estimation. If you see a clear maximum, but the estimated and marked peak is somewhere else, try changing more parameters, or if you cannot improve the results, contact This email address is being protected from spambots. You need JavaScript enabled to view it..

Posted 1 week agoby mmat

How to adjust the number of trials ?

You can specify the number of tests for each category. The total number of trials is calculated by the sum…

You can specify the number of tests for each category. The total number of trials is calculated by the sum of all tests across categories.

Posted 1 week agoby mmat

What is the meaning of the individual test categories ?

There are four categories of tests. The first category includes tests that involve sustaining a vowel sound. The second category…

There are four categories of tests. The first category includes tests that involve sustaining a vowel sound. The second category comprises articulation tests. The third category is for free speech, and the fourth category includes tests that measure tremor. The config determines the number of tests within each of the four test categories. This flexibility allows the option to exclude specific categories.

Posted 1 week agoby mmat

How to adjust the bars of the recording bars ?

For each test, you can specify whether a fixed time should be used. If a fixed time is entered for…

For each test, you can specify whether a fixed time should be used.

If a fixed time is entered for the test, a progress bar appears in the exo showing how far the test has already progressed.

Posted 1 week agoby mmat

Why does my test end, even if speech activity was present ?

The parameter fMaxTimeExpectedForAnalysis defines the maximum time for one speech task. This prevents tests that do not end because of…

The parameter fMaxTimeExpectedForAnalysis defines the maximum time for one speech task. This prevents tests that do not end because of malfunctioning of the VAD or other errors. If your speech task is longer than this defined time, you have to adapt this parameter.

Posted 1 week agoby mmat

When does the recording time start ?

The countdown time can be set in the file "logo_objspeechana.ini" and the parameter "fCountdownTimeinSec". The wav file with the recording…

The countdown time can be set in the file "logo_objspeechana.ini" and the parameter "fCountdownTimeinSec". The wav file with the recording will inherit the countdown time minus half a second (for clicking the button) at the beginning, your voice recording and some pause at the end, depending on your chosen maximum speech pause time in seconds. This allows you to reprocess your audio files offline directly after your real-time recording.

Posted 1 week agoby mmat

How to set the maximum pause time for the VAD ?

The maximum allowed length of speech pauses in seconds is also defined in "logo_objspeechana_vad.ini". The parameter "bNoVoiceTaskSpecific" defines if you…

The maximum allowed length of speech pauses in seconds is also defined in "logo_objspeechana_vad.ini". The parameter "bNoVoiceTaskSpecific" defines if you want different parameters for the three different voice tasks (Vowels, Speech and Training). If true, there will be 3 options for each voice task, which can be selected up to a time in seconds. It is useful to select less pause time for vowel tasks than for training tasks. If "bNoVoiceTaskSpecific" is set to false, a parameter will be set and used regardless of the voice task. For example, if the test ends within a sentence, the allowed length of speech pauses must be increased.

Posted 1 week agoby mmat

How to set the threshold of the VAD correctly ?

The thresholds of the VAD depend on your specific setup. The parameters "fSnrThreshSmall" and "fSnrThreshBig" from "logo_objspeechana_vad.ini" mainly define the…

The thresholds of the VAD depend on your specific setup. The parameters "fSnrThreshSmall" and "fSnrThreshBig" from "logo_objspeechana_vad.ini" mainly define the decision of the VAD: Default values are 4 and 6. Try to set the small value to 2/3 of the big one and decrease it if no voice is detected even when you speak, increase it if the VAD is too sensitive. Important: Do not speak during the countdown time, at some point the VAD will need to initialize and measure the noise floor. If you speak at this point, the VAD will not function properly.

Posted 1 week agoby mmat

What is VAD ?

Voice activity detection (VAD) is a technique used in speech processing to determine whether an audio signal contains human speech…

Voice activity detection (VAD) is a technique used in speech processing to determine whether an audio signal contains human speech or not. It is a binary classifier that outputs a decision of "speech" or "non-speech" for each frame of the audio signal.

Posted 1 week agoby mmat

Contact

Prof. Dr.-Ing. Gerhard Schmidt

E-Mail: gus@tf.uni-kiel.de

Christian-Albrechts-Universität zu Kiel
Faculty of Engineering
Institute for Electrical Engineering and Information Engineering
Digital Signal Processing and System Theory

Kaiserstr. 2
24143 Kiel, Germany