API Reference¶
live_audio_capture
¶
AudioNoiseReduction
¶
A utility class for audio processing tasks such as noise reduction, filtering, and resampling.
Source code in live_audio_capture\audio_noise_reduction.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
apply_low_pass_filter(audio_chunk, sampling_rate, cutoff_freq=7900.0)
staticmethod
¶
Apply a low-pass filter to the audio chunk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_chunk
|
ndarray
|
The audio chunk to process. |
required |
sampling_rate
|
int
|
The sample rate of the audio. |
required |
cutoff_freq
|
float
|
The cutoff frequency for the low-pass filter. |
7900.0
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The filtered audio chunk. |
Source code in live_audio_capture\audio_noise_reduction.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
apply_noise_reduction(audio_chunk, sampling_rate, stationary=False, prop_decrease=1.0, n_std_thresh_stationary=1.5, n_fft=1024, win_length=None, hop_length=None, n_jobs=1, use_torch=False, device='cuda')
staticmethod
¶
Apply noise reduction using the noisereduce package.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_chunk
|
ndarray
|
The audio chunk to process. |
required |
sampling_rate
|
int
|
The sample rate of the audio. |
required |
stationary
|
bool
|
Whether to perform stationary noise reduction. |
False
|
prop_decrease
|
float
|
Proportion to reduce noise by (1.0 = 100%). |
1.0
|
n_std_thresh_stationary
|
float
|
Number of standard deviations above mean for thresholding. |
1.5
|
n_fft
|
int
|
FFT window size. |
1024
|
win_length
|
int
|
Window length for STFT. |
None
|
hop_length
|
int
|
Hop length for STFT. |
None
|
n_jobs
|
int
|
Number of parallel jobs to run. Set to -1 to use all CPU cores. |
1
|
use_torch
|
bool
|
Whether to use the PyTorch version of spectral gating. |
False
|
device
|
str
|
Device to run the PyTorch spectral gating on (e.g., "cuda" or "cpu"). |
'cuda'
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The processed audio chunk with reduced noise. |
Source code in live_audio_capture\audio_noise_reduction.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
resample_audio(audio_chunk, original_rate, target_rate)
staticmethod
¶
Resample the audio chunk to a target sample rate.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_chunk
|
ndarray
|
The audio chunk to resample. |
required |
original_rate
|
int
|
The original sample rate. |
required |
target_rate
|
int
|
The target sample rate. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The resampled audio chunk. |
Source code in live_audio_capture\audio_noise_reduction.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
AudioPlayback
¶
Utilities for playing audio files and sounds.
Source code in live_audio_capture\audio_utils\audio_playback.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
play_audio_file(file_path)
staticmethod
¶
Play an audio file using the simpleaudio library.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path to the audio file to play. |
required |
Source code in live_audio_capture\audio_utils\audio_playback.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
play_beep(frequency, duration)
staticmethod
¶
Play a beep sound asynchronously using the simpleaudio library.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frequency
|
int
|
Frequency of the beep sound in Hz. |
required |
duration
|
int
|
Duration of the beep sound in milliseconds. |
required |
Source code in live_audio_capture\audio_utils\audio_playback.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
AudioProcessing
¶
Utilities for processing audio data.
Source code in live_audio_capture\audio_utils\audio_processing.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
apply_noise_reduction_to_file(input_file, output_file, stationary=False, prop_decrease=1.0, n_std_thresh_stationary=1.5, n_jobs=1, use_torch=False, device='cuda')
staticmethod
¶
Apply noise reduction to an audio file and save the result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file
|
str
|
Path to the input audio file. |
required |
output_file
|
str
|
Path to save the processed audio file. |
required |
stationary
|
bool
|
Whether to perform stationary noise reduction. |
False
|
prop_decrease
|
float
|
Proportion to reduce noise by (1.0 = 100%). |
1.0
|
n_std_thresh_stationary
|
float
|
Threshold for stationary noise reduction. |
1.5
|
n_jobs
|
int
|
Number of parallel jobs to run. Set to -1 to use all CPU cores. |
1
|
use_torch
|
bool
|
Whether to use the PyTorch version of spectral gating. |
False
|
device
|
str
|
Device to run the PyTorch spectral gating on (e.g., "cuda" or "cpu"). |
'cuda'
|
Source code in live_audio_capture\audio_utils\audio_processing.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
calculate_energy(audio_chunk)
staticmethod
¶
Calculate the energy of an audio chunk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_chunk
|
ndarray
|
The audio chunk to process. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The energy of the audio chunk. |
Source code in live_audio_capture\audio_utils\audio_processing.py
12 13 14 15 16 17 18 19 20 21 22 23 |
|
process_audio_chunk(raw_data, audio_format='f32le')
staticmethod
¶
Convert raw audio data to a NumPy array based on the audio format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raw_data
|
bytes
|
Raw audio data from the microphone. |
required |
audio_format
|
str
|
Audio format (e.g., "f32le" or "s16le"). |
'f32le'
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The processed audio chunk. |
Source code in live_audio_capture\audio_utils\audio_processing.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
AudioVisualizer
¶
A standalone real-time audio visualizer using PyQtGraph with a refined color scheme.
Source code in live_audio_capture\visualization.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
__init__(sampling_rate, chunk_duration)
¶
Initialize the AudioVisualizer instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sampling_rate
|
int
|
The sample rate of the audio. |
required |
chunk_duration
|
float
|
The duration of each audio chunk in seconds. |
required |
Source code in live_audio_capture\visualization.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
add_audio_chunk(audio_chunk)
¶
Add a new audio chunk to the visualization queue.
Source code in live_audio_capture\visualization.py
200 201 202 |
|
compute_spectrogram(audio_chunk)
¶
Compute the spectrogram for a given audio chunk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_chunk
|
ndarray
|
The audio chunk to process. |
required |
Returns:
Type | Description |
---|---|
Optional[ndarray]
|
Optional[np.ndarray]: The spectrogram data. |
Source code in live_audio_capture\visualization.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
compute_spectrum(audio_chunk)
¶
Compute the frequency spectrum for a given audio chunk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_chunk
|
ndarray
|
The audio chunk to process. |
required |
Returns:
Type | Description |
---|---|
Optional[ndarray]
|
Optional[np.ndarray]: The frequency spectrum in dB. |
Source code in live_audio_capture\visualization.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
stop()
¶
Stop the visualization.
Source code in live_audio_capture\visualization.py
204 205 206 207 |
|
LiveAudioCapture
¶
A cross-platform utility for capturing live audio from a microphone using FFmpeg. Features: - Continuous listening mode. - Dynamic recording based on voice activity. - Silence duration threshold for stopping recording. - Optional beep sounds for start/stop feedback. - Save recordings in multiple formats (WAV, MP3, OGG).
Source code in live_audio_capture\audio_capture.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 |
|
__init__(sampling_rate=16000, chunk_duration=0.1, audio_format='f32le', channels=1, aggressiveness=1, enable_beep=True, enable_noise_canceling=False, low_pass_cutoff=7500.0, stationary_noise_reduction=False, prop_decrease=1.0, n_std_thresh_stationary=1.5, n_jobs=1, use_torch=False, device='cuda', calibration_duration=2.0, use_adaptive_threshold=True)
¶
Initialize the LiveAudioCapture instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sampling_rate
|
int
|
Sample rate in Hz (e.g., 16000). |
16000
|
chunk_duration
|
float
|
Duration of each audio chunk in seconds (e.g., 0.1). |
0.1
|
audio_format
|
str
|
Audio format for FFmpeg output (e.g., "f32le"). |
'f32le'
|
channels
|
int
|
Number of audio channels (1 for mono, 2 for stereo). |
1
|
aggressiveness
|
int
|
Aggressiveness level for VAD (0 = least aggressive, 3 = most aggressive). |
1
|
enable_beep
|
bool
|
Whether to play beep sounds when recording starts/stops. |
True
|
enable_noise_canceling
|
bool
|
Whether to apply noise cancellation. |
False
|
low_pass_cutoff
|
float
|
Cutoff frequency for the low-pass filter. |
7500.0
|
stationary_noise_reduction
|
bool
|
Whether to use stationary noise reduction. |
False
|
prop_decrease
|
float
|
Proportion to reduce noise by (1.0 = 100%). |
1.0
|
n_std_thresh_stationary
|
float
|
Threshold for stationary noise reduction. |
1.5
|
n_jobs
|
int
|
Number of parallel jobs to run. Set to -1 to use all CPU cores. |
1
|
use_torch
|
bool
|
Whether to use the PyTorch version of spectral gating. |
False
|
device
|
str
|
Device to run the PyTorch spectral gating on (e.g., "cuda" or "cpu"). |
'cuda'
|
calibration_duration
|
float
|
Duration of the calibration phase in seconds. |
2.0
|
use_adaptive_threshold
|
bool
|
Whether to use adaptive thresholding for VAD. |
True
|
Source code in live_audio_capture\audio_capture.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
apply_noise_reduction_to_file(input_file, output_file, stationary=False, prop_decrease=1.0, n_std_thresh_stationary=1.5, n_jobs=1, use_torch=False, device='cuda')
¶
Apply noise reduction to an audio file and save the result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file
|
str
|
Path to the input audio file. |
required |
output_file
|
str
|
Path to save the processed audio file. |
required |
stationary
|
bool
|
Whether to perform stationary noise reduction. |
False
|
prop_decrease
|
float
|
Proportion to reduce noise by (1.0 = 100%). |
1.0
|
n_std_thresh_stationary
|
float
|
Threshold for stationary noise reduction. |
1.5
|
n_jobs
|
int
|
Number of parallel jobs to run. Set to -1 to use all CPU cores. |
1
|
use_torch
|
bool
|
Whether to use the PyTorch version of spectral gating. |
False
|
device
|
str
|
Device to run the PyTorch spectral gating on (e.g., "cuda" or "cpu"). |
'cuda'
|
Source code in live_audio_capture\audio_capture.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
change_input_device(mic_name)
¶
Change the input device to the specified microphone by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mic_name
|
str
|
The name of the microphone to use. |
required |
Source code in live_audio_capture\audio_capture.py
128 129 130 131 132 133 134 135 136 137 138 139 |
|
list_available_mics()
¶
List all available microphones on the system.
Returns:
Type | Description |
---|---|
Dict[str, str]
|
Dict[str, str]: A dictionary mapping microphone names to their device IDs. |
Source code in live_audio_capture\audio_capture.py
119 120 121 122 123 124 125 126 |
|
listen_and_record_with_vad(output_file='output.wav', silence_duration=2.0, format='wav')
¶
Continuously listen to the microphone and record speech segments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_file
|
str
|
Path to save the recorded audio file. |
'output.wav'
|
silence_duration
|
float
|
Duration of silence (in seconds) to stop recording. |
2.0
|
format
|
str
|
Output format (e.g., "wav", "mp3", "ogg"). |
'wav'
|
Source code in live_audio_capture\audio_capture.py
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
|
play_audio_file(file_path)
¶
Play an audio file using the simpleaudio library.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path to the audio file to play. |
required |
Source code in live_audio_capture\audio_capture.py
141 142 143 144 145 146 147 148 |
|
process_audio_chunk(audio_chunk, enable_noise_canceling=True)
¶
Process an audio chunk with optional noise cancellation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_chunk
|
ndarray
|
The audio chunk to process. |
required |
enable_noise_canceling
|
bool
|
Whether to apply noise cancellation. |
True
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The processed audio chunk. |
Source code in live_audio_capture\audio_capture.py
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 |
|
save_recording(audio_data, output_file, format='wav')
¶
Save the recorded audio to a file in the specified format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_data
|
ndarray
|
The recorded audio data. |
required |
output_file
|
str
|
Path to save the recorded audio file. |
required |
format
|
str
|
Output format (e.g., "wav", "mp3", "ogg"). |
'wav'
|
Source code in live_audio_capture\audio_capture.py
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
|
stop()
¶
Stop both streaming and recording.
Source code in live_audio_capture\audio_capture.py
380 381 382 383 384 385 386 |
|
stop_streaming()
¶
Stop the audio stream and terminate the FFmpeg process.
Source code in live_audio_capture\audio_capture.py
243 244 245 246 247 248 249 250 251 252 253 |
|
stream_audio()
¶
Stream live audio from the microphone.
Source code in live_audio_capture\audio_capture.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
MicUtils
¶
Utilities for managing and interacting with microphones.
Source code in live_audio_capture\audio_utils\mic_utils.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
get_default_mic()
staticmethod
¶
Get the default microphone device based on the platform.
Source code in live_audio_capture\audio_utils\mic_utils.py
105 106 107 108 109 110 111 112 113 114 115 |
|
list_mics()
staticmethod
¶
List all available microphones on the system.
Returns:
Type | Description |
---|---|
Dict[str, str]
|
Dict[str, str]: A dictionary mapping microphone names to their OS-specific device IDs. |
Source code in live_audio_capture\audio_utils\mic_utils.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
VoiceActivityDetector
¶
A simplified voice activity detector (VAD) similar to WebRTC VAD. Features: - Energy-based speech detection. - Aggressiveness level to control detection strictness. - Hysteresis for stable speech detection.
Source code in live_audio_capture\vad.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
__init__(sample_rate=16000, frame_duration=0.03, aggressiveness=1, hysteresis_high=1.5, hysteresis_low=0.5, enable_noise_canceling=False, calibration_duration=2.0, use_adaptive_threshold=True, audio_format='f32le', channels=1)
¶
Initialize the VoiceActivityDetector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_rate
|
int
|
Sample rate of the audio (default: 16000 Hz). |
16000
|
frame_duration
|
float
|
Duration of each frame in seconds (default: 0.03 seconds). |
0.03
|
aggressiveness
|
int
|
Aggressiveness level (0 = least aggressive, 3 = most aggressive). |
1
|
hysteresis_high
|
float
|
Multiplier for the threshold when speech is detected. |
1.5
|
hysteresis_low
|
float
|
Multiplier for the threshold when speech is not detected. |
0.5
|
enable_noise_canceling
|
bool
|
Whether to apply noise cancellation. |
False
|
calibration_duration
|
float
|
Duration of the calibration phase in seconds. |
2.0
|
use_adaptive_threshold
|
bool
|
Whether to use adaptive thresholding. |
True
|
audio_format
|
str
|
Audio format for calibration (e.g., "f32le" or "s16le"). |
'f32le'
|
channels
|
int
|
Number of audio channels (1 for mono, 2 for stereo). |
1
|
Source code in live_audio_capture\vad.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
process_audio(audio_chunk)
¶
Process an audio chunk and determine if speech is detected.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_chunk
|
ndarray
|
Audio chunk to process. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if speech is detected, False otherwise. |
Source code in live_audio_capture\vad.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|