Manual#

Coordinates#

Our choice of cartesian coordinate system is the same as the DICOM standard, where \(x\) points from patient right to patient left, \(y\) points from anterior to posterior, and \(z\) points from inferior to superior.

SPECT Imaging#

A positive scanner angle \(\beta\) in the DICOM standard is defines as a counterclockwise rotation of angle \(\beta\) from 12 o’clock when looking into the scanner. When compared to the standard azimuthal angle our defined cartesian coordinate system, it follows that \(\phi = 3 \pi / 2 - \beta\). Since \(\beta\) is defined in the range \([0, 2\pi]\) it follows that \(\phi\) is defined in the range \([-\pi/2, 3\pi/2]\).

There are two primary coordinate systems considered here

  1. Cartesian: Specified by the \(x\), \(y\), and \(z\) coordinates above. Any item in this coordinate system is referred to as an Object

  2. Sinogram: Specified by \(r\), \(\beta\), and \(z\). Sinogram space is used to represent a series of 2D scans (in the \(r-z\) plane) at different angles \(\beta\). Any item in this coordinate system are referred to as Image.

As a convention, \(r\) is aligned with the \(x\)-axis at \(\beta=0\). (Note this implies that \(r\) is aligned with the negative \(y\)-axis at \(\beta=90^{\circ}\). which can be counterintuitive when viewing images)

76fad79bb66b48899c59d1c5f96955f7

Datatypes#

Objects/Images#

Objects and images are stored using pytorch’s torch.Tensor class. The dimensions of objects are \([1, L_x, L_y, L_z]\) and the dimensions of images are \([1,...]\) where … depends on the imaging modality.

  • SPECT: Image has dimensions \([1,L_{\theta}, L_r, L_z]\) where \(L_{\theta}\) is the number of projections and \(L_r\) and \(L_z\) give the dimensions of the scanner.

  • 2D PET: Image has dimensions \([1,L_{\theta}, L_r, L_z]\) where \(L_{\theta}\) is the number of discrete angles considered and \(L_r\) and \(L_z\) give number of radial and transaxial bins.

Indices are arranged such that smaller indices correspond to smaller coordinate values. For example, object_tensor[0,-1,0,0] gives the voxel at the largest value of \(x\), and the smallest values of \(y\) and \(z\). As another example, image_tensor[0,10,0,-1] for SPECT imaging returns the number of counts detected by the pixel at the 10th detector angle corresponding to the smallest value of \(r\) and the largest value of \(z\).

Mathematical Foundations#

Throughout tutorials and documentation, mathematical notation is often used to represent different operations. Unless otherwise spcified, the symbols refer to the following:

  • \(f\) refers to an object, and \(f_j\) refers to the value of the object at voxel \(j\)

  • \(g\) refers to an object, and \(g_i\) refers to the value of the image at detector element \(i\)

  • \(H\) refers to the system matrix with components \(H_{ij}\): the contribution voxel \(j\) in object space makes to detector element \(j\) in image space

This section establishes a mathematical paradigm for tomography in medical imaging, and is thus mostly intended for those who wish to use PyTomography to implement novel reconstruction algorithms. It is still useful, however, for everyone to know.

Projections#

PyTomography is built around two fundamental operations used in image reconstruction: Forward Projection and Back Projection.

  • Forward Projection: Takes an object \(a\) (in vector space \(\mathbb{U}\)) and maps to to an image \(b\) (in vector space \(\mathbb{V}\)) using the system matrix: \(b_i = \sum_{i} H_{ij} a_j\) (or \(b = Ha\)). This operation is implemented by the forward method a SystemMatrix class corresponding to a particular imaging modality.

  • Back Projection: Takes an image \(b\) (in vector space \(\mathbb{V}\)) and maps to to an image \(\hat{a}\) (in vector space \(\mathbb{U}\)) using the system matrix: \(\hat{a}_i = \sum_{i} H_{ij} b_i\) (or \(\hat{a}=H^T b\)). This operation is implemented by the backward method of a SystemMatrix class.

Note that projections map between distinct vector space \(H:\mathbb{U} \to \mathbb{V}\). This makes them distinct from transforms, which are discussed next.

Transforms#

Consider the case of of a 128x128x128 object being scanned at 64 different angles, each with resolution 128x128: in this situation, the object is a vector of length 2097152 and the image is a vector of length 1048576. If each component \(H_{ij}\) is stored using an 8 byte float, the system matrix would require 17.6TB of harddrive space to store. Fortunately, \(H\) is a sparse matrix containing mostly zeros, and can be implemented using efficient techniques. One of the building blocks for these techniques is the transform.

In practice, it’s useful to decompose \(H\) into a combination of operations consisting of square matrices \(A_i:\mathbb{U}\to\mathbb{U}\) (object to object) and \(B_i:\mathbb{V}\to \mathbb{V}\) (image to image) and a single projection operator \(P:\mathbb{U} \to \mathbb{V}\) (object to image). The \(A_i\)’s and \(B_i\)’s are known as transforms and can be used to model phenomena such as atteunation/PSF in SPECT/PET.

A peculiar feature of most imaging modalities is that they can be decomposed into a series of projections: namely, that image space \(\mathbb{V}\) consists of a sequence of projections. In SPECT, this corresponds to a particular rotation of the scanner, while in 3D PET, this corresponds to a particular rotatation and difference between axial detection coordinates of a photon pair. In either case, we can choose to express the image as \(g = \sum_{\theta} g_{\theta} \otimes \hat{\theta}\) where \(\theta\) corresponds to a particular projection, and \(\hat{\theta}\) is a unit vector that represents a specific projection. Note that \(g\) and \(g_{\theta}\) therefor do not lie in the same vector space. In this paradigm, we can represent \(H\) as

\[H = \sum_{\theta} \left(\prod_i B_i(\theta) \right) P(\theta) \left(\prod_i A_i(\theta) \right) \otimes \hat{\theta}\]

To implement back projection, we also need \(H^T\), which can be written as

\[H^T = \sum_{\theta} \left(\prod_{i,\text{reverse}} A_i^T(\theta) \right) P^T(\theta) \left(\prod_{i,\text{reverse}} B_i^T(\theta) \right) \otimes \hat{\theta}^T\]

Example:: Modeling of a SPECT scanner can be written as \(H_{\text{SPECT}} = \sum_{\theta} P(\theta) A_1(\theta) A_2(\theta) \otimes \hat{\theta}\). Consider a particular projection: say \(\theta = 10^{\circ}\). The operator \(A_2(10^{\circ})\) implements atteunation modeling when the object is being projected at a scanner angle of \(10^{\circ}\). It will adjust the object based on the amount of attenuating material photons have to travel through to reach the scanner at that particular angle. The matrix \(A_1\) (which is independent of scanner angle for a circular orbit) implements PSF blurring for that particular projection by blurring planes parallel to the \(10^{\circ}\) scanner based on the distance between the plane and the scanner. The matrix \(P(\theta)\) sums all the voxels together in the direction of the scanner, turning a 3D object into a 2D projection. The projection at that particular angle becomes \(g_{10^{\circ}} = P(\theta) A_1(\theta) A_2(\theta) f\) and the corresponding image (containing only 1 projection) would be \(g = g_{10^{\circ}} \otimes \hat{10^{\circ}} = P(\theta) A_1(\theta) A_2(\theta) f \otimes \hat{10^{\circ}}\) The net image (consisting of all projections) requires summing over all the different projections: \(g = \sum_{\theta} g_{\theta} \otimes \hat{\theta}\).

Example:: Modeling of a PET scanner (2D mode, no scatter) can be written as \(H_{\text{PET}} = \sum_{\theta} B_1(\theta) B_2(\theta) P(\theta) \otimes \hat{\theta}\). The operator \(B_2(\theta)\) implements atteunation modeling in PET. Unlike SPECT, where attenuation modeling is done in object space, it is implemented for PET in image space due to the fact that the probability of detection is adjusted by the same value for each LOR in PET. The matrix \(B_1\) implements PSF blurring: unlike in SPECT, it is assumed that the blurring is constant as a function from distance to the scanner, and thus the operation can be implemented in image space. The matrix \(P(\theta)\) sums all the voxels together in the direction of the scanner, turning a 3D object into a 2D projection.

Operations \(A_i\) and \(B_i\) are referred to as transforms: many predefined transforms are located in the transforms folder.

Reconstruction Algorithms#

In realtity, the object \(f\) (and hence the image \(g\)) are random vectors, while the system matrix \(H\) is deterministic. In addition, only the vector \(g\) is measured. For notational simplicity, we’ll let \(\tilde{f}\) represent the random vector, and \(f=E[\tilde{f}]\) represent the mean value of \(f\). This notation will be convention for the entire manual and API. As such, we can write \(g=H\tilde{f}\)

The standard reconstruction algorithm for PET and SPECT is known as the ordered-subsets expectation maximum (OSEM) algorithm. It assumes that \(\tilde{f}\) (and hence \(g\)) is a Poisson random vector, which holds when \(\tilde{f}\) represents the number of emissions from a radionuclide in a spatial location and in a given time interval. Before we begin the derivation, we define a new matrix \(\tilde{F}\) such that \(\tilde{F}_{ij} = H_{ji} \tilde{f}_i\). The components of \(\tilde{F}\) denoted \(\tilde{F}_{ij}\) represent the number of counts from voxel \(i\) in image space contributing to pixel \(j\) in image space. Since \(F\) is a random vector that counts number of emissions, it is also Poisson with \(\tilde{F} \sim \text{Poisson}(F)\). We now seek a maximum liklihood solution for \(f\), and write the liklihood function for probability density function as

\[\begin{split}\begin{align*} L(\tilde{f},f) &= \prod_i \prod_j \frac{F_{ij}^{\tilde{F}_{ij}}e^{-F_{ij}}}{\tilde{F}_{ij}!}\\ \implies \ln L(\tilde{f},f) &= \sum_i \sum_j -F_{ij} + \tilde{F}_{ij} \ln(F_{ij}) - \ln(\tilde{F}_{ij}!)\\ &= \sum_i \sum_j -H_{ji}f_i + H_{ji}\tilde{f}_i \ln(H_{ji}f_i) - \ln(H_{ji}\tilde{f}_i!) \end{align*}\end{split}\]

Setting \(\nabla_{f} \ln L(\tilde{f},f) = 0\) simply yields \(f = \tilde{f}\). In reality, however, we measure \(g\), not \(\tilde{f}\), so we need to obtain \(\bar{f}\) as some function of \(f\). As such, the standard maximum liklihood technique will not work. What we can do, however, is consider the quantity:

\[E_{\tilde{f}}[\ln L(\tilde{f},f) | g, f^{(n)}] = \sum_i \sum_j -H_{ji}f_i + E[H_{ji}\tilde{f}_i|g, f^{(n)}] \ln(H_{ji}f_i) + ...\]

where \(E_{\tilde{f}}\) represents an operator that yields the expectation value over \(\tilde{f}\). It’s important that you properly understand the interpretation of this expression. It yields the expected value of the log-liklihood, given the measured projection data \(g\) and a “guess” about what the distribution would look like: \(f^{(n)}\). There’s just one question: what does \(E_f[H_{ji}\tilde{f}_i|g, f^{(n)}]\) (i.e. the expected number of emissions from voxel \(i\) contributing to image pixel \(j\)) look like? I claim

\[E_f[H_{ji}\tilde{f}_i|g, f^{(n)}] = \frac{g_j}{(Hf^{(n)})_j} H_{ji}f_i^{(n)}\]

Why? Because we’re also given information about \(g\), we know information about the sums of counts along each projection line, and we can adjust the \(H_{ji}f_i^{(n)}\) by the ratio \(\frac{g}{Hf^{(n)}}\) to ensure the counts add up along projection lines. Substituting this in yields

\[E_{\tilde{f}}[\ln L(\tilde{f},f) | g, f^{(n)}] = \sum_i \sum_j -H_{ji}f_i + \frac{g_j}{(Hf^{(n)})_j} H_{ji}f_i^{(n)} \ln(H_{ji}f_i) + ...\]

Setting \(\nabla_{f} E[\ln L(\tilde{f},f) | g, f^{(n)}]= 0\) now yields

\[f_i = \frac{1}{\sum_j H_{ji}} \sum_j \frac{g_j}{(Hf^{(n)})_j} H_{ji} f_i^{(n)}\]

We can rewrite this in vector notation as

\[f = \left[\frac{1}{H^T \vec{1}} H^T \left( \frac{g}{Hf^{(n)}}\right) \right]f^{(n)}\]

The \(f\) on the LHS becomes the “next guess” for the distribution \(f\), so it’s better to rewrite the equation as

\[\boxed{f^{(n+1)} = \left[\frac{1}{H^T \vec{1}} H^T \left( \frac{g}{Hf^{(n)}}\right) \right]f^{(n)}}\]

This is the basic form of the maximum liklihood expectation maximum (MLEM) algorithm. It requires an initial guess \(f^{(0)}\), which is typically set to all 1’s. The ordered-subset expectation maximum (OSEM) algorithm is a projection-imaging based technique that uses a subset of the total number of angles during each iteration. While it requires more iterations to converge to a solution, it often saves time due to the smaller computational cost of projecting a small subset of angles. Using the same notation as the previous section, we can express \(g = \sum_{\theta} g_{\theta} \otimes \hat{\theta}\) and \(H = \sum_{\theta} H_{\theta} \otimes \hat{\theta}\). If we seperate all the angles \(\theta\) into \(M\) distinct subsets \(\Theta_0...\Theta_{M-1}\), we can write \(g_m = \sum_{\theta \in \Theta_m} g_{\theta} \otimes \hat{\theta}\) and \(H_m = \sum_{\theta \in \Theta_m} H_{\theta} \otimes \hat{\theta}\). We can then write the OSEM algorithm as

\[\boxed{f^{(n,m+1)} = \left[\frac{1}{H_m^T \vec{1}} H_m^T \left( \frac{g_m}{H_mf^{(n,m)}}\right) \right]f^{(n,m)}}\]

where \(f^{n,M} \equiv f^{n+1,0}\) (so we cycle through all the subsets, then move to the next iteration).

Scatter#

Scatter for PET is not currently implemented in PyTomography, but it is planned for the near future. Scatter in SPECT involves modififying the denominator of the MLEM/OSEM algorithm to include scatter projections:

\[f^{(n,m+1)} = \left[\frac{1}{H_m^T \vec{1}} H_m^T \left( \frac{g_m}{H_mf^{(n,m)} + s_m}\right) \right]f^{(n,m)}\]

where \(s_m\) represents a scatter image (which is often obtained in SPECT through the triple energy window technique).

Priors#

Prior functions are used to encapsulate prior beliefs about what the reconstructed object should look like before reconstructing. For example, it may be a reasonable prior belief that adjacent voxels should have similar radiopharmaceutical concentration. Prior information can be included by modifying the liklihood function:

\[L(\tilde{f},f) \to L(\tilde{f},f)e^{-\beta V(f)}\]

where \(\beta\) is a factor that scales the strength of the prior (note the similarity to temperature \(\beta\) used in statistical mechanics). Using the log liklihood method:

\[f^{(n,m+1)} = \left[\frac{1}{H_m^T \vec{1} + \beta \nabla_{f} V(f)} H_m^T \left( \frac{g_m}{H_mf^{(n,m)} + s_m}\right) \right]f^{(n,m)}\]

We run into a problem: what value of \(f\) do we use when computing the gradient of \(V\)? There are a few approaches to solve this issue. The first is the one step late (OSL) formalism, that uses the previous iteration value of \(f\):

\[f^{(n,m+1)} = \left[\frac{1}{H_m^T \vec{1} + \beta \nabla_{f} V(f)|_{f=f^{(n,m)}}} H_m^T \left( \frac{g_m}{H_mf^{(n,m)} + s_m}\right) \right]f^{(n,m)}\]

The second is the block sequential regularizer (BSR) technique, which seperates each iteration into two steps:

  1. \[f^{(n,m+1)}_{1/2} = \left[\frac{1}{H_m^T \vec{1}} H_m^T \left( \frac{g_m}{H_mf^{(n,m)}}\right) \right]f^{(n,m)}\]
  2. \[f^{(n,m+1)} = f^{(n,m+1)}_{1/2}\left(1-\beta \frac{\alpha_n}{H_m^T \vec{1}} \nabla_{f} V(f)|_{f=f^{(n,m)}_{1/2}}\right)\]