module WebrtcAudio::Vad

Defined in:

vad/core.cr
vad/filterbank.cr
vad/gmm.cr
vad/sp.cr

Class Method Summary

.all_pass_filter(data_in : Slice(Int16), data_length : Int32, filter_coefficient : Int16, filter_state : Pointer(Int16), data_out : Slice(Int16)) : Void
All pass filtering of |data_in|, used before splitting the signal into two frequency bands (low pass vs high pass).
.calc_vad_16khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32
.calc_vad_32khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32
.calc_vad_48khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32
.calc_vad_8khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32
.calculate_features(inst : VadInstance, data_in : Slice(Int16), data_length : Int32, features : Array(Int16)) : Int16
.downsampling(signal_in : Slice(Int16), signal_out : Slice(Int16), filter_state : Array(Int32), length : Int32) : Void
.find_minimum(inst : VadInstance, feature_value : Int16, channel : Int32) : Int16
.gaussian_probability(input : Int16, mean : Int16, std : Int16, delta : Slice(Int16)) : Int32
.gmm_probability(inst : VadInstance, features : Array(Int16), total_power : Int16, frame_length : Int32) : Int16
Calculates the probabilities for both speech and background noise using Gaussian Mixture Models (GMM).
.high_pass_filter(data_in : Slice(Int16), data_length : Int32, filter_state : Array(Int16), data_out : Slice(Int16)) : Void
High pass filtering, with a cut-off frequency at 80 Hz, if the |data_in| is sampled at 500 Hz.
.init_core(inst : VadInstance) : Int16
Initialize the VAD.
.kAllPassCoefsQ13 : Array(Int16)
.kAllPassCoefsQ15 : Array(Int16)
Allpass filter coefficients, upper and lower, in Q15.
.kBackEta : Int16
.kCompVar : Int32
.kDefaultMode : Int16
.kGlobalThresholdAGG : Array(Int16)
.kGlobalThresholdLBR : Array(Int16)
.kGlobalThresholdQ : Array(Int16)
.kGlobalThresholdVAG : Array(Int16)
.kHpPoleCoefs : Array(Int16)
.kHpZeroCoefs : Array(Int16)
Coefficients used by HighPassFilter, Q14.
.kInitCheck : Int32
.kLocalThresholdAGG : Array(Int16)
.kLocalThresholdLBR : Array(Int16)
.kLocalThresholdQ : Array(Int16)
.kLocalThresholdVAG : Array(Int16)
.kLog2Exp : Int16
.kLogConst : Int16
Constants used in LogOfEnergy().
.kLogEnergyIntPart : Int16
.kMaximumNoise : Array(Int16)
.kMaximumSpeech : Array(Int16)
.kMaxSpeechFrames : Int16
.kMinimumDifference : Array(Int16)
.kMinimumMean : Array(Int16)
.kMinStd : Int16
.kNoiseDataMeans : Array(Int16)
.kNoiseDataStds : Array(Int16)
.kNoiseDataWeights : Array(Int16)
.kNoiseUpdateConst : Int16
.kOffsetVector : Array(Int16)
Adjustment for division with two in SplitFilter.
.kOverHangMax1AGG : Array(Int16)
Mode 2, Aggressive.
.kOverHangMax1LBR : Array(Int16)
Mode 1, Low bitrate.
.kOverHangMax1Q : Array(Int16)
.kOverHangMax1VAG : Array(Int16)
Mode 3, Very aggressive.
.kOverHangMax2AGG : Array(Int16)
.kOverHangMax2LBR : Array(Int16)
.kOverHangMax2Q : Array(Int16)
.kOverHangMax2VAG : Array(Int16)
.kSmoothingDown : Int16
.kSmoothingUp : Int16
.kSpectrumWeight : Array(Int16)
.kSpeechDataMeans : Array(Int16)
.kSpeechDataStds : Array(Int16)
.kSpeechDataWeights : Array(Int16)
.kSpeechUpdateConst : Int16
.log_of_energy(data_in : Slice(Int16), data_length : Int32, offset : Int16, total_energy : Pointer(Int16), log_energy : Array(Int16), log_energy_index : Int32) : Void
Calculates the energy of |data_in| in dB, and also updates an overall |total_energy| if necessary.
.overflowing_muls16_by_s32_to_s32(a : Int16, b : Int32) : Int32
An s16 x s32 -> s32 multiplication that's allowed to overflow.
.set_mode_core(inst : VadInstance, mode : Int32) : Int32
Set aggressiveness mode
.split_filter(data_in : Slice(Int16), data_length : Int32, upper_state : Pointer(Int16), lower_state : Pointer(Int16), hp_data_out : Array(Int16), lp_data_out : Array(Int16)) : Void
.weighted_average(data : Slice(Int16), offset : Int16, weights : Array(Int16)) : Int32
Calculates the weighted average w.r.t.

Class Method Detail

def self.all_pass_filter(data_in : Slice(Int16), data_length : Int32, filter_coefficient : Int16, filter_state : Pointer(Int16), data_out : Slice(Int16)) : Void #

data_in [i] : Input audio signal given in Q0.
data_length [i] : Length of input and output data.
filter_coefficient [i] : Given in Q15.
filter_state [i/o] : State of the filter given in Q(-1).
data_out [o] : Output audio signal given in Q(-1).

[View source]

def self.calc_vad_16khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32 #

[View source]

def self.calc_vad_32khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32 #

[View source]

def self.calc_vad_48khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32 #

[View source]

def self.calc_vad_8khz(inst : VadInstance, speech_frame : Slice(Int16), frame_length : Int32) : Int32 #

[View source]

def self.calculate_features(inst : VadInstance, data_in : Slice(Int16), data_length : Int32, features : Array(Int16)) : Int16 #

[View source]

def self.downsampling(signal_in : Slice(Int16), signal_out : Slice(Int16), filter_state : Array(Int32), length : Int32) : Void #

[View source]

def self.find_minimum(inst : VadInstance, feature_value : Int16, channel : Int32) : Int16 #

[View source]

def self.gaussian_probability(input : Int16, mean : Int16, std : Int16, delta : Slice(Int16)) : Int32 #

[View source]

def self.gmm_probability(inst : VadInstance, features : Array(Int16), total_power : Int16, frame_length : Int32) : Int16 #

Calculates the probabilities for both speech and background noise using Gaussian Mixture Models (GMM). A hypothesis-test is performed to decide which type of signal is most probable.

self [i/o] : Pointer to VAD instance
features [i] : Feature vector of length |kNumChannels| = log10(energy in frequency band)
total_power [i] : Total power in audio frame.
frame_length [i] : Number of input samples
returns : the VAD decision (0 - noise, 1 - speech).

[View source]

def self.high_pass_filter(data_in : Slice(Int16), data_length : Int32, filter_state : Array(Int16), data_out : Slice(Int16)) : Void #

High pass filtering, with a cut-off frequency at 80 Hz, if the |data_in| is sampled at 500 Hz.

data_in [i] : Input audio data sampled at 500 Hz.
data_length [i] : Length of input and output data.
filter_state [i/o] : State of the filter.
data_out [o] : Output audio data in the frequency interval 80 - 250 Hz.

[View source]

def self.init_core(inst : VadInstance) : Int16 #

Initialize the VAD. Set aggressiveness mode to default value.

[View source]

def self.kAllPassCoefsQ13 : Array(Int16) #

[View source]

def self.kAllPassCoefsQ15 : Array(Int16) #

Allpass filter coefficients, upper and lower, in Q15. Upper: 0.64, Lower: 0.17

[View source]

def self.kBackEta : Int16 #

[View source]

def self.kCompVar : Int32 #

[View source]

def self.kDefaultMode : Int16 #

[View source]

def self.kGlobalThresholdAGG : Array(Int16) #

[View source]

def self.kGlobalThresholdLBR : Array(Int16) #

[View source]

def self.kGlobalThresholdQ : Array(Int16) #

[View source]

def self.kGlobalThresholdVAG : Array(Int16) #

[View source]

def self.kHpPoleCoefs : Array(Int16) #

[View source]

def self.kHpZeroCoefs : Array(Int16) #

Coefficients used by HighPassFilter, Q14.

[View source]

def self.kInitCheck : Int32 #

[View source]

def self.kLocalThresholdAGG : Array(Int16) #

[View source]

def self.kLocalThresholdLBR : Array(Int16) #

[View source]

def self.kLocalThresholdQ : Array(Int16) #

[View source]

def self.kLocalThresholdVAG : Array(Int16) #

[View source]

def self.kLog2Exp : Int16 #

[View source]

def self.kLogConst : Int16 #

Constants used in LogOfEnergy().

[View source]

def self.kLogEnergyIntPart : Int16 #

[View source]

def self.kMaximumNoise : Array(Int16) #

[View source]

def self.kMaximumSpeech : Array(Int16) #

[View source]

def self.kMaxSpeechFrames : Int16 #

[View source]

def self.kMinimumDifference : Array(Int16) #

[View source]

def self.kMinimumMean : Array(Int16) #

[View source]

def self.kMinStd : Int16 #

[View source]

def self.kNoiseDataMeans : Array(Int16) #

[View source]

def self.kNoiseDataStds : Array(Int16) #

[View source]

def self.kNoiseDataWeights : Array(Int16) #

[View source]

def self.kNoiseUpdateConst : Int16 #

[View source]

def self.kOffsetVector : Array(Int16) #

Adjustment for division with two in SplitFilter.

[View source]

def self.kOverHangMax1AGG : Array(Int16) #

Mode 2, Aggressive.

[View source]

def self.kOverHangMax1LBR : Array(Int16) #

Mode 1, Low bitrate.

[View source]

def self.kOverHangMax1Q : Array(Int16) #

[View source]

def self.kOverHangMax1VAG : Array(Int16) #

Mode 3, Very aggressive.

[View source]

def self.kOverHangMax2AGG : Array(Int16) #

[View source]

def self.kOverHangMax2LBR : Array(Int16) #

[View source]

def self.kOverHangMax2Q : Array(Int16) #

[View source]

def self.kOverHangMax2VAG : Array(Int16) #

[View source]

def self.kSmoothingDown : Int16 #

[View source]

def self.kSmoothingUp : Int16 #

[View source]

def self.kSpectrumWeight : Array(Int16) #

[View source]

def self.kSpeechDataMeans : Array(Int16) #

[View source]

def self.kSpeechDataStds : Array(Int16) #

[View source]

def self.kSpeechDataWeights : Array(Int16) #

[View source]

def self.kSpeechUpdateConst : Int16 #

[View source]

def self.log_of_energy(data_in : Slice(Int16), data_length : Int32, offset : Int16, total_energy : Pointer(Int16), log_energy : Array(Int16), log_energy_index : Int32) : Void #

Calculates the energy of |data_in| in dB, and also updates an overall |total_energy| if necessary.

data_in [i] : Input audio data for energy calculation.
data_length [i] : Length of input data.
offset [i] : Offset value added to |log_energy|.
total_energy [i/o] : An external energy updated with the energy of |data_in|. NOTE: |total_energy| is only updated if |total_energy| <= |kMinEnergy|.
log_energy [o] : 10 * log10("energy of |data_in|") given in Q4.

[View source]

def self.overflowing_muls16_by_s32_to_s32(a : Int16, b : Int32) : Int32 #

An s16 x s32 -> s32 multiplication that's allowed to overflow. (It's still undefined behavior, so not a good idea; this just makes UBSan ignore the violation, so that our old code can continue to do what it's always been doing.)

[View source]

def self.set_mode_core(inst : VadInstance, mode : Int32) : Int32 #

Set aggressiveness mode

[View source]

def self.split_filter(data_in : Slice(Int16), data_length : Int32, upper_state : Pointer(Int16), lower_state : Pointer(Int16), hp_data_out : Array(Int16), lp_data_out : Array(Int16)) : Void #

[View source]

def self.weighted_average(data : Slice(Int16), offset : Int16, weights : Array(Int16)) : Int32 #

Calculates the weighted average w.r.t. number of Gaussians. The |data| are updated with an |offset| before averaging.

data [i/o] : Data to average.
offset [i] : An offset added to |data|.
weights [i] : Weights used for averaging.

returns : The weighted average.

[View source]

CrystalDoc.info

webrtc_audio

module WebrtcAudio::Vad

Defined in:

Class Method Summary

Class Method Detail