Introduction
T cells are part of the adaptive immune system and can detect foreign molecules present in cells of the body. They exert their function via their T cell receptor (TCR), a surface receptor generated by rearranging DNA fragments in a semi-stochastic manner. Each cell expresses exactly one type of TCR generated in this process. Within an organism, this specificity on cellular level creates large diversity on population level, thereby allowing for a strong immune response against pathogens.
On a molecular level, the immune response of a T cell is driven by the interaction of its TCRs with peptides (short protein fragments) that are bound to major histocompatibility (MHC) molecules on the interacting cell. In healthy cells, this interaction will not elicit an immune response as T cells have been ‘trained’ to ignore self-peptides. In an infected cell or a cell specifically licensed to sample and present peptides to T cells (antigen presenting cells, APCs), this peptide may be foreign and trigger an immune response. A major task in studying T cell responses is identifying the peptides that a given TCR can recognize.
Task
Experimentally, identifying these peptides requires mixing T cells with the TCR of interest with APCs that present the peptide(s) of interest on their MHC. An ideal experiment will allow for a high-throughput screening of a large number of peptides against a given TCR, with simple read-out of function, mechanisms of detecting experimental errors (false positives and false negatives) and minimal number of experimental setups.

Figure 1. Experimental layout of TCR peptide screen. There are three common approaches to designing a TCR peptide screen. They differ in experimental complexity (high to low, left to right) and computation design (low to high, left to right). Details are described in the text.
The most simple experimental design would be to test each peptide individually against a given T-cell (see Figure 1, A). However, this approach is time- and reagent-consuming, especially if a large number of peptides needs to be tested. In such cases, an alternative approach called matrix pooling can be applied. In matrix pooling (Figure 1, B), peptides are organized into a matrix, with each peptide occupying one cell. The peptides in each row and column are then mixed together to create peptide pools. By looking at the intersection of the two pools leading to T cell activation, it is possible to identify the cognate peptide. Matrix pooling is a more efficient method compared to testing each peptide individually, as it requires less time and fewer reagents. An approach to further reduce the number of tested peptide pools is combinatorial pooling. This setup is similar to matrix pooling, but instead of using a matrix, it relies on a table of addresses to determine which peptides should be mixed together (see Figure 1, C). However, designing this table can be challenging, particularly when the region recognized by a T-cell is present in multiple peptides from the tested peptide list.
Here, we develop a computational algorithm that will aid in the design of a combinatorial pooling scheme, specifically the table of addresses. Constraints imposed on the computational algorithm are biology and experiment-based; they are outlined below.
Let’s consider a single protein \(E\). We create a library of peptides by a sliding window approach across its sequence, i.e., each peptide overlaps with its predecessor and successor by \(Y\) amino acids. The only exception is the first and last peptide in the protein, which only overlap with the successor and predecessor, respectively. In our experiment, we have a total number of \(M\) overlapping peptides \(E_j\) derived from \(E\):
for which we want to find a mixing scheme into \(p\) pools \(p_i\):
with \(n\) the total number of pools.
Throughout the document, uppercase letters represent constants provided by the experimental design/constraints, while lowercase letters represent variables that need to be determined.
Objective
Find the optimal distribution of peptides into the pools, such that the number of pools \(n\) is minimized, the total occurrence of each peptide across pools equals \(x\), where \(x\) is minimized, and the total number of peptides per pool is approximately constant and not higher than \(R\).
Design and Constraints
We consider the distribution of peptides \(E_j\) into pools \(p_i\) as assigning addresses \(a_j\) to each peptide \(E_j\):
with
The construction of the pools is limited by the following constraints:
The number of peptides per pool should be approximately the same for each pool, with the upper limit of peptides per pool \(R\):
\[\overline{\overline{p_i}} \approx w \le R \text{ where } w = \frac{M*x}{n}\]Each address \(a_j\) consists of a combination of unique pools:
\[\forall p_b, p_c \in a_j : b \ne c\]Each address \(a_j\) differs only in one pool from its successor \(a_{j+1}\):
\[a_{j+1} \text{ with } D_H \, (a_j, \: a_{j+1}) = 1\]where \(D_H\) is the Hamming distance.
For all other addresses, the Hamming distance is greater than 1:
\[D_H \, (a_j, \: a_k) \geq 1 \text{ where } |j-k| \geq 2\]The Hamming distance between the union of two adjacent addresses and any other union of adjacent addresses is equal or greater than 1:
\[\forall j, k: D_H\,(a_j \cup a_{j+1}, \: a_k \cup a_{k+1}) \geq 1\]
Algorithm
We designed an algorithm that navigates the peptide space by seeking a Hamiltonian path in its corresponding graph to meet the given constraints. The package offers two versions of this algorithm:
A basic search for a Hamiltonian path of a given length, simultaneously checking for union and address uniqueness.
A faster version based on the same principle, but it condences the path by considering both vertices and edges.
Our initial inspiration came from the reflective binary code by Frank Gray. Thus, we have incorporated functions in the package for producing balanced Gray code and its flexible-length option. However, we currently advise against using these for address arrangement due to potential imbalances and non-unique unions.
[Here the picture about three features of the algorithm that we want to see: balance, uniqueness of unions, hamming distance of 1?]