module WebrtcAudio::Vad
Defined in:
vad/core.crvad/filterbank.cr
vad/gmm.cr
vad/sp.cr
Class Method Summary
-
.all_pass_filter(data_in : Slice(Int16), data_length : Int32, filter_coefficient : Int16, filter_state : Pointer(Int16), data_out : Slice(Int16)) : Void
All pass filtering of |data_in|, used before splitting the signal into two frequency bands (low pass vs high pass).
- .calc_vad_16khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32
- .calc_vad_32khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32
- .calc_vad_48khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32
- .calc_vad_8khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32
- .calculate_features(inst : VadInstance, data_in : Slice(Int16), data_length : Int32, features : Array(Int16)) : Int16
- .downsampling(signal_in : Slice(Int16), signal_out : Slice(Int16), filter_state : Array(Int32), length : Int32) : Void
- .find_minimum(inst : VadInstance, feature_value : Int16, channel : Int32) : Int16
- .gaussian_probability(input : Int16, mean : Int16, std : Int16, delta : Slice(Int16)) : Int32
-
.gmm_probability(inst : VadInstance, features : Array(Int16), total_power : Int16, frame_length : Int32) : Int16
Calculates the probabilities for both speech and background noise using Gaussian Mixture Models (GMM).
-
.high_pass_filter(data_in : Slice(Int16), data_length : Int32, filter_state : Array(Int16), data_out : Slice(Int16)) : Void
High pass filtering, with a cut-off frequency at 80 Hz, if the |data_in| is sampled at 500 Hz.
-
.init_core(inst : VadInstance) : Int16
Initialize the VAD.
- .kAllPassCoefsQ13 : Array(Int16)
-
.kAllPassCoefsQ15 : Array(Int16)
Allpass filter coefficients, upper and lower, in Q15.
- .kBackEta : Int16
- .kCompVar : Int32
- .kDefaultMode : Int16
- .kGlobalThresholdAGG : Array(Int16)
- .kGlobalThresholdLBR : Array(Int16)
- .kGlobalThresholdQ : Array(Int16)
- .kGlobalThresholdVAG : Array(Int16)
- .kHpPoleCoefs : Array(Int16)
-
.kHpZeroCoefs : Array(Int16)
Coefficients used by HighPassFilter, Q14.
- .kInitCheck : Int32
- .kLocalThresholdAGG : Array(Int16)
- .kLocalThresholdLBR : Array(Int16)
- .kLocalThresholdQ : Array(Int16)
- .kLocalThresholdVAG : Array(Int16)
- .kLog2Exp : Int16
-
.kLogConst : Int16
Constants used in LogOfEnergy().
- .kLogEnergyIntPart : Int16
- .kMaximumNoise : Array(Int16)
- .kMaximumSpeech : Array(Int16)
- .kMaxSpeechFrames : Int16
- .kMinimumDifference : Array(Int16)
- .kMinimumMean : Array(Int16)
- .kMinStd : Int16
- .kNoiseDataMeans : Array(Int16)
- .kNoiseDataStds : Array(Int16)
- .kNoiseDataWeights : Array(Int16)
- .kNoiseUpdateConst : Int16
-
.kOffsetVector : Array(Int16)
Adjustment for division with two in SplitFilter.
-
.kOverHangMax1AGG : Array(Int16)
Mode 2, Aggressive.
-
.kOverHangMax1LBR : Array(Int16)
Mode 1, Low bitrate.
- .kOverHangMax1Q : Array(Int16)
-
.kOverHangMax1VAG : Array(Int16)
Mode 3, Very aggressive.
- .kOverHangMax2AGG : Array(Int16)
- .kOverHangMax2LBR : Array(Int16)
- .kOverHangMax2Q : Array(Int16)
- .kOverHangMax2VAG : Array(Int16)
- .kSmoothingDown : Int16
- .kSmoothingUp : Int16
- .kSpectrumWeight : Array(Int16)
- .kSpeechDataMeans : Array(Int16)
- .kSpeechDataStds : Array(Int16)
- .kSpeechDataWeights : Array(Int16)
- .kSpeechUpdateConst : Int16
-
.log_of_energy(data_in : Slice(Int16), data_length : Int32, offset : Int16, total_energy : Pointer(Int16), log_energy : Array(Int16), log_energy_index : Int32) : Void
Calculates the energy of |data_in| in dB, and also updates an overall |total_energy| if necessary.
-
.overflowing_muls16_by_s32_to_s32(a : Int16, b : Int32) : Int32
An s16 x s32 -> s32 multiplication that's allowed to overflow.
-
.set_mode_core(inst : VadInstance, mode : Int32) : Int32
Set aggressiveness mode
- .split_filter(data_in : Slice(Int16), data_length : Int32, upper_state : Pointer(Int16), lower_state : Pointer(Int16), hp_data_out : Array(Int16), lp_data_out : Array(Int16)) : Void
-
.weighted_average(data : Slice(Int16), offset : Int16, weights : Array(Int16)) : Int32
Calculates the weighted average w.r.t.
Class Method Detail
All pass filtering of |data_in|, used before splitting the signal into two frequency bands (low pass vs high pass). Note that |data_in| and |data_out| can NOT correspond to the same address.
- data_in [i] : Input audio signal given in Q0.
- data_length [i] : Length of input and output data.
- filter_coefficient [i] : Given in Q15.
- filter_state [i/o] : State of the filter given in Q(-1).
- data_out [o] : Output audio signal given in Q(-1).
Calculates the probabilities for both speech and background noise using Gaussian Mixture Models (GMM). A hypothesis-test is performed to decide which type of signal is most probable.
-
self [i/o] : Pointer to VAD instance
-
features [i] : Feature vector of length |kNumChannels| = log10(energy in frequency band)
-
total_power [i] : Total power in audio frame.
-
frame_length [i] : Number of input samples
-
returns : the VAD decision (0 - noise, 1 - speech).
High pass filtering, with a cut-off frequency at 80 Hz, if the |data_in| is sampled at 500 Hz.
- data_in [i] : Input audio data sampled at 500 Hz.
- data_length [i] : Length of input and output data.
- filter_state [i/o] : State of the filter.
- data_out [o] : Output audio data in the frequency interval 80 - 250 Hz.
Initialize the VAD. Set aggressiveness mode to default value.
Allpass filter coefficients, upper and lower, in Q15. Upper: 0.64, Lower: 0.17
Calculates the energy of |data_in| in dB, and also updates an overall |total_energy| if necessary.
- data_in [i] : Input audio data for energy calculation.
- data_length [i] : Length of input data.
- offset [i] : Offset value added to |log_energy|.
- total_energy [i/o] : An external energy updated with the energy of |data_in|. NOTE: |total_energy| is only updated if |total_energy| <= |kMinEnergy|.
- log_energy [o] : 10 * log10("energy of |data_in|") given in Q4.
An s16 x s32 -> s32 multiplication that's allowed to overflow. (It's still undefined behavior, so not a good idea; this just makes UBSan ignore the violation, so that our old code can continue to do what it's always been doing.)
Calculates the weighted average w.r.t. number of Gaussians. The |data| are updated with an |offset| before averaging.
- data [i/o] : Data to average.
- offset [i] : An offset added to |data|.
- weights [i] : Weights used for averaging.
returns : The weighted average.