| |
- builtins.object
-
- DLStudio
class DLStudio(builtins.object) |
|
DLStudio(*args, **kwargs)
|
|
Methods defined here:
- __init__(self, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- build_convo_layers(self, configs_for_all_convo_layers)
- build_fc_layers(self)
- check_a_sampling_of_images(self)
- Displays the first batch_size number of images in your dataset.
- display_tensor_as_image(self, tensor, title='')
- This method converts the argument tensor into a photo image that you can display
in your terminal screen. It can convert tensors of three different shapes
into images: (3,H,W), (1,H,W), and (H,W), where H, for height, stands for the
number of pixels in the vertical direction and W, for width, for the same
along the horizontal direction. When the first element of the shape is 3,
that means that the tensor represents a color image in which each pixel in
the (H,W) plane has three values for the three color channels. On the other
hand, when the first element is 1, that stands for a tensor that will be
shown as a grayscale image. And when the shape is just (H,W), that is
automatically taken to be for a grayscale image.
- imshow(self, img)
- called by display_tensor_as_image() for displaying the image
- load_cifar_10_dataset(self)
- We make sure that the transformation applied to the image end the images being normalized.
Consider this call to normalize: "Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))". The three
numbers in the first tuple affect the means in the three color channels and the three
numbers in the second tuple affect the standard deviations. In this case, we want the
image value in each channel to be changed to:
image_channel_val = (image_channel_val - mean) / std
So with mean and std both set 0.5 for all three channels, if the image tensor originally
was between 0 and 1.0, after this normalization, the tensor will be between -1.0 and +1.0.
If needed we can do inverse normalization by
image_channel_val = (image_channel_val * std) + mean
- load_cifar_10_dataset_with_augmentation(self)
- In general, we want to do data augmentation for training:
- parse_config_string_for_convo_layers(self)
- Each collection of 'n' otherwise identical layers in a convolutional network is
specified by a string that looks like:
"nx[a,b,c,d]-MaxPool(k)"
where
n = num of this type of convo layer
a = number of out_channels [in_channels determined by prev layer]
b,c = kernel for this layer is of size (b,c) [b along height, c along width]
d = stride for convolutions
k = maxpooling over kxk patches with stride of k
Example:
"n1x[a1,b1,c1,d1]-MaxPool(k1) n2x[a2,b2,c2,d2]-MaxPool(k2)"
- run_code_for_testing(self, net, display_images=False)
- run_code_for_training(self, net, display_images=False)
- save_model(self, model)
- Save the trained model to a disk file
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- CustomDataLoading = <class 'DLStudio.DLStudio.CustomDataLoading'>
- This is a testbed for experimenting with a completely grounds-up attempt at
designing a custom data loader. Ordinarily, if the basic format of how the
dataset is stored is similar to one of the datasets that the Torchvision
module knows about, you can go ahead and use that for your own dataset. At
worst, you may need to carry out some light customizations depending on the
number of classes involved, etc.
However, if the underlying dataset is stored in a manner that does not look
like anything in Torchvision, you have no choice but to supply yourself all
of the data loading infrastructure. That is what this inner class of the
DLStudio module is all about.
The custom data loading exercise here is related to a dataset called
PurdueShapes5 that contains 32x32 images of binary shapes belonging to the
following five classes:
1. rectangle
2. triangle
3. disk
4. oval
5. star
The dataset was generated by randomizing the sizes and the orientations
of these five patterns. Since the patterns are rotated with a very simple
non-interpolating transform, just the act of random rotations can introduce
boundary and even interior noise in the patterns.
Each 32x32 image is stored in the dataset as the following list:
[R, G, B, Bbox, Label]
where
R : is a 1024 element list of the values for the red component
of the color at all the pixels
B : the same as above but for the green component of the color
G : the same as above but for the blue component of the color
Bbox : a list like [x1,y1,x2,y2] that defines the bounding box
for the object in the image
Label : the shape of the object
I serialize the dataset with Python's pickle module and then compress it with
the gzip module.
You will find the following dataset directories in the "data" subdirectory
of Examples in the DLStudio distro:
PurdueShapes5-10000-train.gz
PurdueShapes5-1000-test.gz
PurdueShapes5-20-train.gz
PurdueShapes5-20-test.gz
The number that follows the main name string "PurdueShapes5-" is for the
number of images in the dataset.
You will find the last two datasets, with 20 images each, useful for debugging
your logic for object detection and bounding-box regression.
- DetectAndLocalize = <class 'DLStudio.DLStudio.DetectAndLocalize'>
- The purpose of this inner class is to focus on object detection in images --- as
opposed to image classification. Most people would say that object detection
is a more challenging problem than image classification because, in general,
the former also requires localization. The simplest interpretation of what
is meant by localization is that the code that carries out object detection
must also output a bounding-box rectangle for the object that was detected.
You will find in this inner class some examples of LOADnet classes meant
for solving the object detection and localization problem. The acronym
"LOAD" in "LOADnet" stands for
"LOcalization And Detection"
The different network examples included here are LOADnet1, LOADnet2, and
LOADnet3. For now, only pay attention to LOADnet2 since that's the class I
have worked with the most for the 1.0.7 distribution.
- ExperimentsWithCIFAR = <class 'DLStudio.DLStudio.ExperimentsWithCIFAR'>
- ExperimentsWithSequential = <class 'DLStudio.DLStudio.ExperimentsWithSequential'>
- Demonstrates how to use the torch.nn.Sequential container class
- Net = <class 'DLStudio.DLStudio.Net'>
- SemanticSegmentation = <class 'DLStudio.DLStudio.SemanticSegmentation'>
- The purpose of this inner class is to be able to use the DLStudio module for
experiments with semantic segmentation. At its simplest level, the
purpose of semantic segmentation is to assign correct labels to the
different objects in a scene, while localizing them at the same time. At
a more sophisticated level, a system that carries out semantic
segmentation should also output a symbolic expression based on the objects
found in the image and their spatial relationships with one another.
The workhorse of this inner class is the mUnet network that is based
on the UNET network that was first proposed by Ronneberger, Fischer and
Brox in the paper "U-Net: Convolutional Networks for Biomedical Image
Segmentation". Their Unet extracts binary masks for the cell pixel blobs
of interest in biomedical images. The output of their Unet can
therefore be treated as a pixel-wise binary classifier at each pixel
position. The mUnet class, on the other hand, is intended for
segmenting out multiple objects simultaneously form an image. [A weaker
reason for "Multi" in the name of the class is that it uses skip
connections not only across the two arms of the "U", but also also along
the arms. The skip connections in the original Unet are only between the
two arms of the U. In mUnet, each object type is assigned a separate
channel in the output of the network.
This version of DLStudio also comes with a new dataset,
PurdueShapes5MultiObject, for experimenting with mUnet. Each image in
this dataset contains a random number of selections from five different
shapes, with the shapes being randomly scaled, oriented, and located in
each image. The five different shapes are: rectangle, triangle, disk,
oval, and star.
- Seq2SeqWithLearnableEmbeddings = <class 'DLStudio.DLStudio.Seq2SeqWithLearnableEmbeddings'>
- As the name implies, sequence-to-sequence (Seq2Seq) learning is about predicting an outcome
sequence from a causation sequence, or, said another way, a target sequence from a source
sequence. Automatic machine translation is probably one of the most popular application
of Seq2Seq learning. Since deep learning algorithms can only deal with numerical data,
an important issue related to Seq2Seq for machine translation is representing the purely
symbolic entities (such as words) involved with numbers. This is the same issue that was
addressed in the TextClassification class of DLStudio. As mentioned there, we have the
following choices:
1. use one-hot vectors for the words
2. learning the embeddings directly from the training data.
3. use pre-trained embedding vectors for the words (as provided by word2vec
and fasttext)
As previously mentioned in the context of text classification, using one-hot vectors
directly is out of the question. So that leaves us with just two options: learning the
embeddings directly from the training data and using pre-trained embeddings.
The goal of this class, Seq2SeqWithLearnableEmbeddings, is to illustrate the basic
notions of Seq2Seq learning with learnable embeddings for the words in a vocabulary.
I'll use the problem of English-to-Spanish translation as a case study for the code
shown in this class.
Basic to any modern implementation of Seq2Seq learning is the notion of attention.
In general, the different grammatical units in a source-language sentence will not
align with the corresponding units in a translation of the same sentence in the
target language. Consider the following English-Spanish phrasal pair:
the cabin roof
el techo de la cabaña
The word "techo" in Spanish means "roof". A word-for-word translation of the English
phase would lead to "la cabaña techo" which is unlikely to be understood by a native
speaker of the Spanish language. The goal of attention is for a seq2seq framework
to learn how to align the different parts of a pair of sentences in two different
languages. The attention models I will use here are explained in the slides for the
seq2seq lecture at the deep-learning course website at Purdue.
About the dataset I'll be using to demonstrate seq2seq, version 2.0.9 of DLStudio
comes with a data archive named en_es_corpus that contains a large number of
English-Spanish sentence pairs. This archive is a lightly curated version of the
main dataset provided at
http://www.manythings.org/anki/
The data at the above website is from the sentences_detailed.csv file at tatoeba.org:
http://tatoeba.org/files/downloads/sentences_detailed.csv
The curated data archive that you can download from the DLStudio website includes the
copyright notice from tatoeba.org.
My alteration to the original dataset consists mainly of expanding the contractions
like "it's", "I'm", "don't", "didn't", "you'll", etc., into their expansions "it is",
"i am", "do not", "did not", "you will", etc. The English/Spanish dataset as provided
at the above URL contains 417 such unique contractions. Another alteration I made to
the original data archive is to surround each sentence in both English and Spanish
by the "SOS" and "EOS" tokens, with the former standing for "Start of Sentence" and
the latter for "End of Sentence".
I have used the following convention for naming data archives at the DLStudio website:
en_es_N_M.tar.gz
where N specifies the maximum number of words in the sentences in the archive and M is
the total number sentence pairs available. For example, the name of one of the archives
is:
en_es_8_98988.tar.gz
This archive contains a total of 98988 sentence pairs, with no sentence exceeds 8 words
in length.
class path: DLStudio -> Seq2SeqWithLearnableEmbeddings
- Seq2SeqWithPretrainedEmbeddings = <class 'DLStudio.DLStudio.Seq2SeqWithPretrainedEmbeddings'>
- Please read the doc section of the previous DLStudio class, Seq2SeqWithLearnableEmbeddings,
for the basic documentation that also applies to the class being presented here.
While the previous class shows how to carry out Seq2Seq learning when you allow the
framework to learn their own numeric embeddings for the words, in the class shown in
this section of DLStudio we use the pre-trained word2vec embeddings from Google for
the source language sentences.
At the moment, I am using the pre-trained embeddings for only the source language
sentence because of the constraints on the fast memory that come into existence
when you use pre-trained embeddings for multiple languages simultaneously. My
original plan was to use word2vec embeddings for the source language English and
the Fasttext embeddings for the target language Spanish. The pre-trained word2vec
embeddings for English occupy nearly 4GB of RAM and the pre-trained Fasttext
embeddings another 8GB. The two objects co-residing in the fast memory brings
down to heel a 32GB machine.
Another interesting thing to keep in mind is the two different ways in which the
target language is used in seq2seq learning. In addition to the word embeddings
needed for the decoder GRU, you also use the integer word indexes directly for the
following reason: You see, one would like to use nn.LogSoftmax for the final
activation in the overall network and nn.NLLLoss for the loss. These choices
allow you to use the classifier-network principles for training. That is, you
ask the decoder to correctly label the next output word by giving it a class
label that is an integer index spanning the size of the target vocabulary.
With nn.NLLLoss, for the target needed by the loss function, all you need to is
to supply it with the integer index of the ground-truth target word.
For the classifier based logic mentioned above to work, you need to ensure that
the output layer of the decoder network has the same number of nodes as the size
of the target vocabulary. As mentioned above, during training, for calculating the
loss, the nn.NLLLoss is supplied with the integer index of the target word at that
step of the decoder RNN. The loss function returns the negative of the value
stored in the corresponding output node of the network. Recall, the values in
the output nodes would be produced by the application of nn.LogSoftmax to the
values calculated by there by forward propagation.
An alternative to using classifier-network based principles for guiding the design
of the decoder would be to cast the problem of predicting the output word as an
exercise in regression (when using pre-trained embeddings for both the source and
the target languages). I have played with that approach. Eventually, I gave up
on it because it yielded poor results even on short sequences.
I should also mention that the attention mechanism used in this class is exactly
the same as for the case of learnable embeddings and the need for attention the same.
I have used the same dataset for the demonstrations that follow as in the previous
class with learnable embeddings. Please see the doc section of
Seq2SeqWithLearnableEmbeddings for the dataset related information.
ClassPath: DLStudio -> Seq2SeqWithPretrainedEmbeddings
- SkipConnections = <class 'DLStudio.DLStudio.SkipConnections'>
- This educational class is meant for illustrating the concepts related to the
use of skip connections in neural network. It is now well known that deep
networks are difficult to train because of the vanishing gradients problem.
What that means is that as the depth of network increases, the loss gradients
calculated for the early layers become more and more muted, which suppresses
the learning of the parameters in those layers. An important mitigation
strategy for addressing this problem consists of creating a CNN using blocks
with skip connections.
With the code shown in this inner class of the module, you can now experiment
with skip connections in a CNN to see how a deep network with this feature
might improve the classification results. As you will see in the code shown
below, the network that allows you to construct a CNN with skip connections
is named BMEnet. As shown in the script playing_with_skip_connections.py in
the Examples directory of the distribution, you can easily create a CNN with
arbitrary depth just by using the "depth" constructor option for the BMEnet
class. The basic block of the network constructed by BMEnet is called
SkipBlock which, very much like the BasicBlock in ResNet-18, has a couple of
convolutional layers whose output is combined with the input to the block.
Note that the value given to the "depth" constructor option for the
BMEnet class does NOT translate directly into the actual depth of the
CNN. [Again, see the script playing_with_skip_connections.py in the Examples
directory for how to use this option.] The value of "depth" is translated
into how many instances of SkipBlock to use for constructing the CNN.
- TextClassification = <class 'DLStudio.DLStudio.TextClassification'>
- The purpose of this inner class is to be able to use the DLStudio module for simple
experiments in text classification. Consider, for example, the problem of automatic
classification of variable-length user feedback: you want to create a neural network
that can label an uploaded product review of arbitrary length as positive or negative.
One way to solve this problem is with a recurrent neural network in which you use a
hidden state for characterizing a variable-length product review with a fixed-length
state vector. This inner class allows you to carry out such experiments.
Class Path: DLStudio -> TextClassification
- TextClassificationWithEmbeddings = <class 'DLStudio.DLStudio.TextClassificationWithEmbeddings'>
- The text processing class described previously, TextClassification, was based on
using one-hot vectors for representing the words. The main challenge we faced
with one-hot vectors was that the larger the size of the training dataset, the
larger the size of the vocabulary, and, therefore, the larger the size of the
one-hot vectors. The increase in the size of the one-hot vectors led to a
model with a significantly larger number of learnable parameters --- and, that,
in turn, created a need for a still larger training dataset. Sounds like a classic
example of a vicious circle. In this section, I use the idea of word embeddings
to break out of this vicious circle.
Word embeddings are fixed-sized numerical representations for words that are
learned on the basis of the similarity of word contexts. The original and still
the most famous of these representations are known as the word2vec
embeddings. The embeddings that I use in this section consist of pre-trained
300-element word vectors for 3 million words and phrases as learned from Google
News reports. I access these embeddings through the popular Gensim library.
Class Path: DLStudio -> TextClassificationWithEmbeddings
| |