auditok.util¶
Class summary¶
DataSource | Base class for objects passed to auditok.core.StreamTokenizer.tokenize(). |
StringDataSource(data) | A class that represent a DataSource as a string buffer. |
ADSFactory | Factory class that makes it easy to create an ADSFactory.AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize(). |
ADSFactory.AudioDataSource | |
ADSFactory.ADSDecorator | |
ADSFactory.OverlapADS | |
ADSFactory.LimiterADS | |
ADSFactory.RecorderADS | |
DataValidator | Base class for a validator object used by core.StreamTokenizer to check if read data is valid. |
AudioEnergyValidator(sample_width[, ...]) | The most basic auditok audio frame validator. |
- class auditok.util.DataSource[source]¶
Base class for objects passed to auditok.core.StreamTokenizer.tokenize(). Subclasses should implement a DataSource.read() method.
- class auditok.util.DataValidator[source]¶
Base class for a validator object used by core.StreamTokenizer to check if read data is valid. Subclasses should implement is_valid() method.
- class auditok.util.StringDataSource(data)[source]¶
A class that represent a DataSource as a string buffer. Each call to DataSource.read() returns on character and moves one step forward. If the end of the buffer is reached, read() returns None.
Parameters: - data :
a basestring object.
- class auditok.util.ADSFactory[source]¶
Factory class that makes it easy to create an ADSFactory.AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize().
Whether you read audio data from a file, the microphone or a memory buffer, this factory instantiates and returns the right ADSFactory.AudioDataSource object.
There are many other features you want your ADSFactory.AudioDataSource object to have, such as: memorize all read audio data so that you can rewind and reuse it (especially useful when reading data from the microphone), read a fixed amount of data (also useful when reading from the microphone), read overlapping audio frames (often needed when dosing a spectral analysis of data).
ADSFactory.ads() automatically creates and return object with the desired behavior according to the supplied keyword arguments.
- class ADSFactory.AudioDataSource(audio_source, block_size)[source]¶
Base class for AudioDataSource objects. It inherits from DataSource and encapsulates an AudioSource object.
- class ADSFactory.LimiterADS(ads, max_time)[source]¶
A class for AudioDataSource objects that can read a fixed amount of data. This can be useful when reading data from the microphone or from large audio files.
- class ADSFactory.OverlapADS(ads, hop_size)[source]¶
A class for AudioDataSource objects that can read and return overlapping audio frames
- class ADSFactory.RecorderADS(ads)[source]¶
A class for AudioDataSource objects that can record all audio data they read, with a rewind facility.
- static ADSFactory.ads(**kwargs)[source]¶
Create an return an ADSFactory.AudioDataSource. The type and behavior of the object is the result of the supplied parameters.
Parameters: - No parameters :
- read audio data from the available built-in microphone with the default parameters. The returned ADSFactory.AudioDataSource encapsulate an io.PyAudioSource object and hence it accepts the next four parameters are passed to use instead of their default values.
- sampling_rate, sr : (int)
- number of samples per second. Default = 16000.
- sample_width, sw : (int)
- number of bytes per sample (must be in (1, 2, 4)). Default = 2
- channels, ch : (int)
- number of audio channels. Default = 1 (only this value is currently accepted)
- frames_per_buffer, fpb : (int)
- number of samples of PyAudio buffer. Default = 1024.
- audio_source, asrc : an AudioSource object
- read data from this audio source
- filename, fn : (string)
- build an io.AudioSource object using this file (currently only wave format is supported)
- data_buffer, db : (string)
- build an io.BufferAudioSource using data in data_buffer. If this keyword is used, sampling_rate, sample_width and channels are passed to io.BufferAudioSource constructor and used instead of default values.
- max_time, mt : (float)
- maximum time (in seconds) to read. Default behavior: read until there is no more data available.
- record, rec : (bool)
- save all read data in cache. Provide a navigable object which boasts a rewind method. Default = False.
- block_dur, bd : (float)
- processing block duration in seconds. This represents the quantity of audio data to return each time the read() method is invoked. If block_dur is 0.025 (i.e. 25 ms) and the sampling rate is 8000 and the sample width is 2 bytes, read() returns a buffer of 0.025 * 8000 * 2 = 400 bytes at most. This parameter will be looked for (and used if available) before block_size. If neither parameter is given, block_dur will be set to 0.01 second (i.e. 10 ms)
- hop_dur, hd : (float)
- quantity of data to skip from current processing window. if hop_dur is supplied then there will be an overlap of block_dur - hop_dur between two adjacent blocks. This parameter will be looked for (and used if available) before hop_size. If neither parameter is given, hop_dur will be set to block_dur which means that there will be no overlap between two consecutively read blocks.
- block_size, bs : (int)
- number of samples to read each time the read method is called. Default: a block size that represents a window of 10ms, so for a sampling rate of 16000, the default block_size is 160 samples, for a rate of 44100, block_size = 441 samples, etc.
- hop_size, hs : (int)
- determines the number of overlapping samples between two adjacent read windows. For a hop_size of value N, the overlap is block_size - N. Default : hop_size = block_size, means that there is no overlap.
Returns: An AudioDataSource object that has the desired features.
Exampels: - Create an AudioDataSource that reads data from the microphone (requires Pyaudio) with default audio parameters:
from auditok import ADSFactory ads = ADSFactory.ads() ads.get_sampling_rate() 16000 ads.get_sample_width() 2 ads.get_channels() 1
- Create an AudioDataSource that reads data from the microphone with a sampling rate of 48KHz:
from auditok import ADSFactory ads = ADSFactory.ads(sr=48000) ads.get_sampling_rate() 48000
- Create an AudioDataSource that reads data from a wave file:
import auditok from auditok import ADSFactory ads = ADSFactory.ads(fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence) ads.get_sampling_rate() 44100 ads.get_sample_width() 2 ads.get_channels() 1
- Define size of read blocks as 20 ms
import auditok from auditok import ADSFactory ''' we know samling rate for previous file is 44100 samples/second so 10 ms are equivalent to 441 samples and 20 ms to 882 ''' block_size = 882 ads = ADSFactory.ads(bs = 882, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence) ads.open() # read one block data = ads.read() ads.close() len(data) 1764 assert len(data) == ads.get_sample_width() * block_size
- Define block size as a duration (use block_dur or bd):
import auditok from auditok import ADSFactory dur = 0.25 # second ads = ADSFactory.ads(bd = dur, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence) ''' we know samling rate for previous file is 44100 samples/second for a block duration of 250 ms, block size should be 0.25 * 44100 = 11025 ''' ads.get_block_size() 11025 assert ads.get_block_size() == int(0.25 * 44100) ads.open() # read one block data = ads.read() ads.close() len(data) 22050 assert len(data) == ads.get_sample_width() * ads.get_block_size()
- Read overlapping blocks (one of hope_size, hs, hop_dur or hd > 0):
For better readability we’d better use auditok.io.BufferAudioSource with a string buffer:
import auditok from auditok import ADSFactory ''' we supply a data beffer instead of a file (keyword 'bata_buffer' or 'db') sr : sampling rate = 16 samples/sec sw : sample width = 1 byte ch : channels = 1 ''' buffer = "abcdefghijklmnop" # 16 bytes = 1 second of data bd = 0.250 # block duration = 250 ms = 4 bytes hd = 0.125 # hop duration = 125 ms = 2 bytes ads = ADSFactory.ads(db = "abcdefghijklmnop", bd = bd, hd = hd, sr = 16, sw = 1, ch = 1) ads.open() ads.read() 'abcd' ads.read() 'cdef' ads.read() 'efgh' ads.read() 'ghij' data = ads.read() assert data == 'ijkl'
- Limit amount of read data (use max_time or mt):
''' We know audio file is larger than 2.25 seconds We want to read up to 2.25 seconds of audio data ''' ads = ADSFactory.ads(mt = 2.25, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence) ads.open() data = [] while True: d = ads.read() if d is None: break data.append(d) ads.close() data = b''.join(data) assert len(data) == int(ads.get_sampling_rate() * 2.25 * ads.get_sample_width() * ads.get_channels())
- class auditok.util.AudioEnergyValidator(sample_width, energy_threshold=45)[source]¶
The most basic auditok audio frame validator. This validator computes the log energy of an input audio frame and return True if the result is >= a given threshold, False otherwise.
Parameters: - sample_width : int
- Number of bytes of one audio sample. This is used to convert data from basestring or Bytes to an array of floats.
- energy_threshold : float
- A threshold used to check whether an input data buffer is valid.
- is_valid(data)[source]¶
Check if data is valid. Audio data will be converted into an array (of signed values) of which the log energy is computed. Log energy is computed as follows:
arr = AudioEnergyValidator._convert(signal, sample_width) energy = float(numpy.dot(arr, arr)) / len(arr) log_energy = 10. * numpy.log10(energy)
Parameters: - data : either a string or a Bytes buffer
data is converted into a numerical arra using the sample_width given in the constructor.
Retruns: True if log_energy > energy_threshold, False otherwise.