Multivariate Poisson Abundance Models for Analyzing Antigen Receptor Data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Antigen receptor data is an important source of information for immunologists that is highly statistically challenging to analyze due to the presence of a huge number of T-cell receptors in mammalian immune systems and the severe undersampling bias associated with the commonly used data collection procedures. Many important immunological questions can be stated in terms of richness and diversity of T-cell subsets under various experimental conditions. This dissertation presents a class of parametric models and uses a special case of them to compare the richness and diversity of antigen receptor populations in mammalian T-cells. The parametric models are based on a representation of the observed receptor counts as a multivariate Poisson abundance model (mPAM). A Bayesian model tting procedure is developed which allows tting of the mPAM parameters with the help of the complete likelihood as opposed to its conditional version which was used previously. The new procedure is shown to be often considerably more e cient (as measured by the amount of Fisher information) in the regions of the mPAM parameter space relevant to modeling T-cell data. A richness estimator based on the special case of the mPAM is shown to be superior to several existing richness estimators from the statistical ecology literature under the severe undersampling conditions encountered in antigen receptor data collection. The comparative diversity analyses based on the mPAM special case yield biologically meaningful results when applied to the T-cell receptor repertoires in mice. It is also shown that the amount of time to implement the Bayesian model tting procedure for the mPAM special case scales well as the dimension increases and that the amount of computational resources required to conduct complete statistical analyses for the mPAM special case can be drastically lower for our Bayesian model tting procedure than for code based on the conditional likelihood approach.