auditok.util

Class summary

DataSource Base class for objects passed to auditok.core.StreamTokenizer.tokenize().
StringDataSource(data) A class that represent a DataSource as a string buffer.
ADSFactory Factory class that makes it easy to create an ADSFactory.AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize().
ADSFactory.AudioDataSource
ADSFactory.ADSDecorator
ADSFactory.OverlapADS
ADSFactory.LimiterADS
ADSFactory.RecorderADS
DataValidator Base class for a validator object used by core.StreamTokenizer to check if read data is valid.
AudioEnergyValidator(sample_width[, ...]) The most basic auditok audio frame validator.
class auditok.util.DataSource[source]

Base class for objects passed to auditok.core.StreamTokenizer.tokenize(). Subclasses should implement a DataSource.read() method.

read()[source]

Read a piece of data read from this source. If no more data is available, return None.

class auditok.util.DataValidator[source]

Base class for a validator object used by core.StreamTokenizer to check if read data is valid. Subclasses should implement is_valid() method.

is_valid(data)[source]

Check whether data is valid

class auditok.util.StringDataSource(data)[source]

A class that represent a DataSource as a string buffer. Each call to DataSource.read() returns on character and moves one step forward. If the end of the buffer is reached, read() returns None.

Parameters:
data :

a basestring object.

read()[source]

Read one character from buffer.

Returns:Current character or None if end of buffer is reached
set_data(data)[source]

Set a new data buffer.

Parameters:
data : a basestring object

New data buffer.

class auditok.util.ADSFactory[source]

Factory class that makes it easy to create an ADSFactory.AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize().

Whether you read audio data from a file, the microphone or a memory buffer, this factory instantiates and returns the right ADSFactory.AudioDataSource object.

There are many other features you want your ADSFactory.AudioDataSource object to have, such as: memorize all read audio data so that you can rewind and reuse it (especially useful when reading data from the microphone), read a fixed amount of data (also useful when reading from the microphone), read overlapping audio frames (often needed when dosing a spectral analysis of data).

ADSFactory.ads() automatically creates and return object with the desired behavior according to the supplied keyword arguments.

class ADSDecorator(ads)[source]

Base decorator class for AudioDataSource objects.

class ADSFactory.AudioDataSource(audio_source, block_size)[source]

Base class for AudioDataSource objects. It inherits from DataSource and encapsulates an AudioSource object.

class ADSFactory.LimiterADS(ads, max_time)[source]

A class for AudioDataSource objects that can read a fixed amount of data. This can be useful when reading data from the microphone or from large audio files.

class ADSFactory.OverlapADS(ads, hop_size)[source]

A class for AudioDataSource objects that can read and return overlapping audio frames

class ADSFactory.RecorderADS(ads)[source]

A class for AudioDataSource objects that can record all audio data they read, with a rewind facility.

static ADSFactory.ads(**kwargs)[source]

Create an return an ADSFactory.AudioDataSource. The type and behavior of the object is the result of the supplied parameters.

Parameters:
No parameters :
read audio data from the available built-in microphone with the default parameters. The returned ADSFactory.AudioDataSource encapsulate an io.PyAudioSource object and hence it accepts the next four parameters are passed to use instead of their default values.
sampling_rate, sr : (int)
number of samples per second. Default = 16000.
sample_width, sw : (int)
number of bytes per sample (must be in (1, 2, 4)). Default = 2
channels, ch : (int)
number of audio channels. Default = 1 (only this value is currently accepted)
frames_per_buffer, fpb : (int)
number of samples of PyAudio buffer. Default = 1024.
audio_source, asrc : an AudioSource object
read data from this audio source
filename, fn : (string)
build an io.AudioSource object using this file (currently only wave format is supported)
data_buffer, db : (string)
build an io.BufferAudioSource using data in data_buffer. If this keyword is used, sampling_rate, sample_width and channels are passed to io.BufferAudioSource constructor and used instead of default values.
max_time, mt : (float)
maximum time (in seconds) to read. Default behavior: read until there is no more data available.
record, rec : (bool)
save all read data in cache. Provide a navigable object which boasts a rewind method. Default = False.
block_dur, bd : (float)
processing block duration in seconds. This represents the quantity of audio data to return each time the read() method is invoked. If block_dur is 0.025 (i.e. 25 ms) and the sampling rate is 8000 and the sample width is 2 bytes, read() returns a buffer of 0.025 * 8000 * 2 = 400 bytes at most. This parameter will be looked for (and used if available) before block_size. If neither parameter is given, block_dur will be set to 0.01 second (i.e. 10 ms)
hop_dur, hd : (float)
quantity of data to skip from current processing window. if hop_dur is supplied then there will be an overlap of block_dur - hop_dur between two adjacent blocks. This parameter will be looked for (and used if available) before hop_size. If neither parameter is given, hop_dur will be set to block_dur which means that there will be no overlap between two consecutively read blocks.
block_size, bs : (int)
number of samples to read each time the read method is called. Default: a block size that represents a window of 10ms, so for a sampling rate of 16000, the default block_size is 160 samples, for a rate of 44100, block_size = 441 samples, etc.
hop_size, hs : (int)
determines the number of overlapping samples between two adjacent read windows. For a hop_size of value N, the overlap is block_size - N. Default : hop_size = block_size, means that there is no overlap.
Returns:

An AudioDataSource object that has the desired features.

Exampels:
  1. Create an AudioDataSource that reads data from the microphone (requires Pyaudio) with default audio parameters:
from auditok import ADSFactory
ads = ADSFactory.ads()
ads.get_sampling_rate()
16000
ads.get_sample_width()
2
ads.get_channels()
1
  1. Create an AudioDataSource that reads data from the microphone with a sampling rate of 48KHz:
from auditok import ADSFactory
ads = ADSFactory.ads(sr=48000)
ads.get_sampling_rate()
48000
  1. Create an AudioDataSource that reads data from a wave file:
import auditok
from auditok import ADSFactory
ads = ADSFactory.ads(fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.get_sampling_rate()
44100
ads.get_sample_width()
2
ads.get_channels()
1
  1. Define size of read blocks as 20 ms
import auditok
from auditok import ADSFactory
'''
we know samling rate for previous file is 44100 samples/second
so 10 ms are equivalent to 441 samples and 20 ms to 882
'''
block_size = 882
ads = ADSFactory.ads(bs = 882, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.open()
# read one block
data = ads.read()
ads.close()
len(data)
1764
assert len(data) ==  ads.get_sample_width() * block_size
  1. Define block size as a duration (use block_dur or bd):
import auditok
from auditok import ADSFactory
dur = 0.25 # second
ads = ADSFactory.ads(bd = dur, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
'''
we know samling rate for previous file is 44100 samples/second
for a block duration of 250 ms, block size should be 0.25 * 44100 = 11025
'''
ads.get_block_size()
11025
assert ads.get_block_size() ==  int(0.25 * 44100)
ads.open()
# read one block
data = ads.read()
ads.close()
len(data)
22050
assert len(data) ==  ads.get_sample_width() * ads.get_block_size()
  1. Read overlapping blocks (one of hope_size, hs, hop_dur or hd > 0):

For better readability we’d better use auditok.io.BufferAudioSource with a string buffer:

import auditok
from auditok import ADSFactory
'''
we supply a data beffer instead of a file (keyword 'bata_buffer' or 'db')
sr : sampling rate = 16 samples/sec
sw : sample width = 1 byte
ch : channels = 1
'''
buffer = "abcdefghijklmnop" # 16 bytes = 1 second of data
bd = 0.250 # block duration = 250 ms = 4 bytes
hd = 0.125 # hop duration = 125 ms = 2 bytes 
ads = ADSFactory.ads(db = "abcdefghijklmnop", bd = bd, hd = hd, sr = 16, sw = 1, ch = 1)
ads.open()
ads.read()
'abcd'
ads.read()
'cdef'
ads.read()
'efgh'
ads.read()
'ghij'
data = ads.read()
assert data == 'ijkl'
  1. Limit amount of read data (use max_time or mt):
'''
We know audio file is larger than 2.25 seconds
We want to read up to 2.25 seconds of audio data
'''
ads = ADSFactory.ads(mt = 2.25, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence)
ads.open()
data = []
while True:
    d = ads.read()
    if d is None:
        break
    data.append(d)
    
ads.close()
data = b''.join(data)
assert len(data) == int(ads.get_sampling_rate() * 2.25 * ads.get_sample_width() * ads.get_channels())
class auditok.util.AudioEnergyValidator(sample_width, energy_threshold=45)[source]

The most basic auditok audio frame validator. This validator computes the log energy of an input audio frame and return True if the result is >= a given threshold, False otherwise.

Parameters:
sample_width : int
Number of bytes of one audio sample. This is used to convert data from basestring or Bytes to an array of floats.
energy_threshold : float
A threshold used to check whether an input data buffer is valid.
is_valid(data)[source]

Check if data is valid. Audio data will be converted into an array (of signed values) of which the log energy is computed. Log energy is computed as follows:

arr = AudioEnergyValidator._convert(signal, sample_width)
energy = float(numpy.dot(arr, arr)) / len(arr)
log_energy = 10. * numpy.log10(energy)
Parameters:
data : either a string or a Bytes buffer

data is converted into a numerical arra using the sample_width given in the constructor.

Retruns:

True if log_energy > energy_threshold, False otherwise.