from nbdev.config import get_config
from tsfast.data.core import CreateDict, ValidClmContains,DfHDFCreateWindows
Corefunctions
Corefunctionality for data preparation of sequential data for pytorch, fastai models
5. Dataloaders Creation
A Datasets combines all implemented components on item level.
pad_sequence
pad_sequence (batch, sorting=False)
collate_fn for padding of sequences of different lengths, use in before_batch of databunch, still quite slow
5.1 Low-Level with Transforms
= get_config().config_file.parent
project_root = project_root / 'test_data/WienerHammerstein'
f_path = get_files(f_path,extensions='.hdf5',recurse=True)
hdf_files = CreateDict([ValidClmContains(['valid']),DfHDFCreateWindows(win_sz=100+1,stp_sz=10,clm='u')])
tfm_src = tfm_src(hdf_files) src_dicts
= CreateDict([ValidClmContains(['valid']),DfHDFCreateWindows(win_sz=100+1,stp_sz=10,clm='u')])
tfm_src = tfm_src(hdf_files)
src_dicts
=[ [HDF2Sequence(['u','y']),SeqSlice(l_slc=1),toTensorSequencesInput],
tfms'y']),SeqSlice(r_slc=-1),toTensorSequencesOutput]]
[HDF2Sequence([= PercentageSplitter()([x['path'] for x in src_dicts])
splits = Datasets(src_dicts,tfms=tfms,splits=splits) dsrc
# %%timeit
# dsrc[0]
= dsrc.dataloaders(bs=128,after_batch=[SeqNoiseInjection(std=[1.1,0.01]),Normalize(axes=[0,1])],before_batch=pad_sequence)
db 0].shape db.one_batch()[
torch.Size([128, 100, 2])
5.2 Mid-Level with Datablock API
SequenceBlock
SequenceBlock (seq_extract, padding=False)
A basic wrapper that links defaults transforms for the data block API
= DataBlock(blocks=(SequenceBlock.from_hdf(['u','y'],TensorSequencesInput,padding=True,cached=None),
seq 'y'],TensorSequencesOutput,cached=None)),
SequenceBlock.from_hdf([=tfm_src,
get_items=ApplyToDict(ParentSplitter())) splitter
= seq.dataloaders(hdf_files) dls
ScalarBlock
ScalarBlock (scl_extract)
A basic wrapper that links defaults transforms for the data block API
ScalarNormalize
ScalarNormalize (mean=None, std=None, axes=(0,))
A transform with a __repr__
that shows its attrs