--- title: Computer Vision Introduction (Beginner) keywords: fastai sidebar: home_sidebar summary: "A quick overview of the main features in the vision API" description: "A quick overview of the main features in the vision API" nb_path: "nbs/02_vision.intro.ipynb" ---
In this notebook we'll be looking at the main bits and pieces that revolve around the Computer Vision sublibrary in fastai
. We won't train a model, instead we'll show a few functions specific to vision, brielfly explain at a high level what they do, and show examples.
fastai
utilizes the Pillow
library to open images and apply transforms. To open up any image using Pillow
inside the fastai
library, we have PILImage.create
:
from fastai.vision.all import *
We'll quickly grab the PETS
dataset to examine colored images:
path = untar_data(URLs.PETS)
fnames = get_image_files(path/'images')
And open one of them with their filename:
im = PILImage.create(fnames[0])
We can show the image with im.show()
:
im.show()
We can also call the usual functions you may inside of Pillow
, such as .shape
and .size
:
im.shape, im.size
PILImage
can accept a varity of inputs to cover the most common types you will see in the wild:
TensorImage
(fastai
specific)TensorMask
(fastai
specific)Tensor
ndarray
io.BytesIO
to open it)Pillow
's Image.open
functionWe can also have black and white images, which has it's own PIL
denomer, PILImageBW
. We'll see an example with MNIST
below:
path_m = untar_data(URLs.MNIST_SAMPLE)
imgs = get_image_files(path_m/'train')
im_bw = PILImageBW.create(imgs[0])
And we can show just like the previous one:
im_bw.show()
Along with PILImage
and PILImageBW
we have PILMask
designed to open masks. Let's see a quick example:
path_c = untar_data(URLs.CAMVID_TINY)
msks = get_image_files(path_c/'labels')
msk = PILMask.create(msks[0])
msk.show()
Each of these functions inherit from PILBase
, which is a simple class that expands the usage of Image.Image
.
DataBlock
When using vision in the DataBlock API, two blocks are generally used:
When specifying if we want a black and white image, we can pass in a cls
to ImageBlock
like so:
ImageBlock(cls=PILImageBW)
There are more tasks than simply just Semantic Segmentation, so the entire list of vision-related blocks are below:
Each subsection of the library tends to have its own special Learner
wrappers to apply a bit of magic. For Computer Vision, this comes in the form of cnn_learner
and unet_learner
.
A quick high-level explaination of cnn_learner
is we can pass in a callable backbone model, fastai
will freeze the weights, and apply their own custom head on top with two pooling layers. You will see this referenced the most in the vision section of this website
unet_learner
is a method for generating a Learner
paired with the Dynamic Unet architecture and is designed specifically for segmentation (though this model can be used for other tasks)
GANLearner
, as it would suggest, is a Learner
that should be used when working with GANs
. It's a different API altogether compared to the previous two, given that GANs operate on a generator/discriminator dynamic.
item_tfms = [Resize(224)]
batch_tfms = [*aug_transforms(size=256)]
aug_transforms
will generate a few random transforms that are applied efficiently on your batch.