module WebrtcAudio::Vad

Defined in:

vad/core.cr
vad/filterbank.cr
vad/gmm.cr
vad/sp.cr

Class Method Summary

Class Method Detail

def self.all_pass_filter(data_in : Slice(Int16), data_length : Int32, filter_coefficient : Int16, filter_state : Pointer(Int16), data_out : Slice(Int16)) : Void #

All pass filtering of |data_in|, used before splitting the signal into two frequency bands (low pass vs high pass). Note that |data_in| and |data_out| can NOT correspond to the same address.

  • data_in [i] : Input audio signal given in Q0.
  • data_length [i] : Length of input and output data.
  • filter_coefficient [i] : Given in Q15.
  • filter_state [i/o] : State of the filter given in Q(-1).
  • data_out [o] : Output audio signal given in Q(-1).

[View source]
def self.calc_vad_16khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32 #

[View source]
def self.calc_vad_32khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32 #

[View source]
def self.calc_vad_48khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32 #

[View source]
def self.calc_vad_8khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32 #

[View source]
def self.calculate_features(inst : VadInstance, data_in : Slice(Int16), data_length : Int32, features : Array(Int16)) : Int16 #

[View source]
def self.downsampling(signal_in : Slice(Int16), signal_out : Slice(Int16), filter_state : Array(Int32), length : Int32) : Void #

[View source]
def self.find_minimum(inst : VadInstance, feature_value : Int16, channel : Int32) : Int16 #

[View source]
def self.gaussian_probability(input : Int16, mean : Int16, std : Int16, delta : Slice(Int16)) : Int32 #

[View source]
def self.gmm_probability(inst : VadInstance, features : Array(Int16), total_power : Int16, frame_length : Int32) : Int16 #

Calculates the probabilities for both speech and background noise using Gaussian Mixture Models (GMM). A hypothesis-test is performed to decide which type of signal is most probable.

  • self [i/o] : Pointer to VAD instance

  • features [i] : Feature vector of length |kNumChannels| = log10(energy in frequency band)

  • total_power [i] : Total power in audio frame.

  • frame_length [i] : Number of input samples

  • returns : the VAD decision (0 - noise, 1 - speech).


[View source]
def self.high_pass_filter(data_in : Slice(Int16), data_length : Int32, filter_state : Array(Int16), data_out : Slice(Int16)) : Void #

High pass filtering, with a cut-off frequency at 80 Hz, if the |data_in| is sampled at 500 Hz.

  • data_in [i] : Input audio data sampled at 500 Hz.
  • data_length [i] : Length of input and output data.
  • filter_state [i/o] : State of the filter.
  • data_out [o] : Output audio data in the frequency interval 80 - 250 Hz.

[View source]
def self.init_core(inst : VadInstance) : Int16 #

Initialize the VAD. Set aggressiveness mode to default value.


[View source]
def self.kAllPassCoefsQ13 : Array(Int16) #

[View source]
def self.kAllPassCoefsQ15 : Array(Int16) #

Allpass filter coefficients, upper and lower, in Q15. Upper: 0.64, Lower: 0.17


[View source]
def self.kBackEta : Int16 #

[View source]
def self.kCompVar : Int32 #

[View source]
def self.kDefaultMode : Int16 #

[View source]
def self.kGlobalThresholdAGG : Array(Int16) #

[View source]
def self.kGlobalThresholdLBR : Array(Int16) #

[View source]
def self.kGlobalThresholdQ : Array(Int16) #

[View source]
def self.kGlobalThresholdVAG : Array(Int16) #

[View source]
def self.kHpPoleCoefs : Array(Int16) #

[View source]
def self.kHpZeroCoefs : Array(Int16) #

Coefficients used by HighPassFilter, Q14.


[View source]
def self.kInitCheck : Int32 #

[View source]
def self.kLocalThresholdAGG : Array(Int16) #

[View source]
def self.kLocalThresholdLBR : Array(Int16) #

[View source]
def self.kLocalThresholdQ : Array(Int16) #

[View source]
def self.kLocalThresholdVAG : Array(Int16) #

[View source]
def self.kLog2Exp : Int16 #

[View source]
def self.kLogConst : Int16 #

Constants used in LogOfEnergy().


[View source]
def self.kLogEnergyIntPart : Int16 #

[View source]
def self.kMaximumNoise : Array(Int16) #

[View source]
def self.kMaximumSpeech : Array(Int16) #

[View source]
def self.kMaxSpeechFrames : Int16 #

[View source]
def self.kMinimumDifference : Array(Int16) #

[View source]
def self.kMinimumMean : Array(Int16) #

[View source]
def self.kMinStd : Int16 #

[View source]
def self.kNoiseDataMeans : Array(Int16) #

[View source]
def self.kNoiseDataStds : Array(Int16) #

[View source]
def self.kNoiseDataWeights : Array(Int16) #

[View source]
def self.kNoiseUpdateConst : Int16 #

[View source]
def self.kOffsetVector : Array(Int16) #

Adjustment for division with two in SplitFilter.


[View source]
def self.kOverHangMax1AGG : Array(Int16) #

Mode 2, Aggressive.


[View source]
def self.kOverHangMax1LBR : Array(Int16) #

Mode 1, Low bitrate.


[View source]
def self.kOverHangMax1Q : Array(Int16) #

[View source]
def self.kOverHangMax1VAG : Array(Int16) #

Mode 3, Very aggressive.


[View source]
def self.kOverHangMax2AGG : Array(Int16) #

[View source]
def self.kOverHangMax2LBR : Array(Int16) #

[View source]
def self.kOverHangMax2Q : Array(Int16) #

[View source]
def self.kOverHangMax2VAG : Array(Int16) #

[View source]
def self.kSmoothingDown : Int16 #

[View source]
def self.kSmoothingUp : Int16 #

[View source]
def self.kSpectrumWeight : Array(Int16) #

[View source]
def self.kSpeechDataMeans : Array(Int16) #

[View source]
def self.kSpeechDataStds : Array(Int16) #

[View source]
def self.kSpeechDataWeights : Array(Int16) #

[View source]
def self.kSpeechUpdateConst : Int16 #

[View source]
def self.log_of_energy(data_in : Slice(Int16), data_length : Int32, offset : Int16, total_energy : Pointer(Int16), log_energy : Array(Int16), log_energy_index : Int32) : Void #

Calculates the energy of |data_in| in dB, and also updates an overall |total_energy| if necessary.

  • data_in [i] : Input audio data for energy calculation.
  • data_length [i] : Length of input data.
  • offset [i] : Offset value added to |log_energy|.
  • total_energy [i/o] : An external energy updated with the energy of |data_in|. NOTE: |total_energy| is only updated if |total_energy| <= |kMinEnergy|.
  • log_energy [o] : 10 * log10("energy of |data_in|") given in Q4.

[View source]
def self.overflowing_muls16_by_s32_to_s32(a : Int16, b : Int32) : Int32 #

An s16 x s32 -> s32 multiplication that's allowed to overflow. (It's still undefined behavior, so not a good idea; this just makes UBSan ignore the violation, so that our old code can continue to do what it's always been doing.)


[View source]
def self.set_mode_core(inst : VadInstance, mode : Int32) : Int32 #

Set aggressiveness mode


[View source]
def self.split_filter(data_in : Slice(Int16), data_length : Int32, upper_state : Pointer(Int16), lower_state : Pointer(Int16), hp_data_out : Array(Int16), lp_data_out : Array(Int16)) : Void #

[View source]
def self.weighted_average(data : Slice(Int16), offset : Int16, weights : Array(Int16)) : Int32 #

Calculates the weighted average w.r.t. number of Gaussians. The |data| are updated with an |offset| before averaging.

  • data [i/o] : Data to average.
  • offset [i] : An offset added to |data|.
  • weights [i] : Weights used for averaging.

returns : The weighted average.


[View source]