2. Introduction¶

2.1. Overview¶

The VQE (Voice Quality Enhancement) module includes three sub-functions: AGC (Auto Gain Control), ANR (Audio Noise Reduction), and AEC (Acoustic Echo Cancelling).

It is mainly used for recording to provide better sound quality for the client in different product forms and usage scenarios with a single microphone.

The audio foundation mainly used by VQE is speech.

Therefore, the sampling rate mainly supports 8kHz and 16kHz speech audio signals.

This guide focuses on the debugging part of AEC.

2.1.1. The Function of AEC¶

From the above figure, it can be seen that if the Audio Input data is directly sent to Xiao Wang without being processed by the red VQE module, Xiao Wang in Shenzhen will hear two types of sounds—“123456789” and “abcdefg”.

“123456789” is Xiao Wang’s own voice, which will result in a poor experience for Xiao Wang.

One of the functions of the red VQE, AEC, is used to filter out the sound of “123456789”.

After filtering by VQE, Xiao Wang will only hear “abcdefg” said by Xiao Ming.

The sound content of Audio Input is the content of the “ain_record.pcm” file.

2.1.2. Basic Requirement for Algorithm¶

Recording requirement:

The sampling rate only supports 8kHz or 16kHz, and the playback and recording parameters should be the same.

AGC/ANR only support mono, not stereo.

AEC requires dual-channel recording (the left channel is the near-end sound recorded by the mic, and the right channel is the sound sent from the far end).

The sampling bit depth is 16 bits (enBitwidth = AUDIO_BIT_WIDTH_16).

The recorded left and right channels cannot be distorted (such as waveform too large, poor quality of mic and speaker, distortion caused by interference in the PCB analog circuit, etc.).

The amplitude of the near-end sound recorded by the left channel mic should be larger than that of the sound recorded from the speaker (far-end sound), otherwise, it will affect the algorithm processing effect.

The amplitude of the reference signal in the right channel should be larger than the far-end sound in the sound recorded by the left channel mic,

otherwise it will affect the algorithm processing effect.

Normal waveform graph (both mic channel and reference signal channel waveforms are moderate, without distortion, no background noise interference, etc.)：

The waveform diagram below is unacceptable:

Adjust Method：

Reduce the gain of the ADCR channel. For QFN packaging, it is recommended to set it to 1. For BGA packaging, adjust it accordingly.

If the distortion still occurs after the first step, reduce the gain of the Audio Output channel.

Note: The amplitude of the reference signal is affected by the original data amplitude sent by the other party,

the gain of the Audio Output channel, and the gain of the ADCR channel.

Hardware requirements:

The board hardware should have a mic component.

The board hardware should have a speaker for audio output.

The board should have an AEC circuit:

the sound of the speaker hardware should be captured into the recording’s right channel (ADCR) without interference.

More details to see: 《CViTEK Audio Hardware, Structural Design, and Device Selection Guide.docx》

Machine structure requirements：

The MIC should have a separate sound chamber design and be sealed, and should have an external shockproof rubber sleeve with good shock absorption.

The MIC pickup should face the opposite direction of the speaker.

The speaker should have a separate sound chamber design with rubber shock absorption and good shock absorption effect.

The farther the distance between the MIC and the speaker, the better, and the angle between them should ensure minimal sound coupling.

More details to see: 《CViTEK Audio Hardware, Structural Design, and Device Selection Guide.docx》

2.1.3. Ideal Debug Environment Requirements¶

Use the customer’s complete prototype, and the prototype should be structurally sealed as much as possible.

The mic and spk used must be inside the customer’s complete prototype.

Adjust the appropriate ADC/DAC gain level to ensure the stability of the mic in (reception) and ref in (playback circuit).

Confirm that there is no pop noise, circuit noise, or intermittent signal interference before capturing the correct speech pattern.