ch expression value from the median instead of the squared difference of each expression value from the average. In this study, we buy Nutlin-3 adapted an existing method from economics with a comparable goal, addressing the question of how many people live below the poverty line in any given country, a formula developed for socioeconomic studies by Amartya Sen. To adapt this algorithm for gene expression analysis, we inverted the original question here by asking, ��in how many samples from the same body part is a gene X expressed above a fixed cut-off threshold��Since the index is determined as a robust proportion of outlying samples, we assume that every gene is represented by an adequate number of samples. To this end, our aim was to establish an index that determines whether there is significantly increased gene expression in a sub-group of disease samples compared to the normal control group, without the restriction of making distribution assumptions for the various group populations. In preliminary studies, we observed that poverty indices derived from economics are well suited to measure the proportion of outlying samples within the disease sub-group relative to the reference group. Motivated by these observations, we modified the original poverty index formula, and in this paper we introduce the gene tissue index. The GTI is then systematically compared with the existing methods, i.e. t-statistics, COPA, OS and ORT. Furthermore, we compare the outlier detection capability of existing methods with the GTI using a simulated 2 February 2011 | Volume 6 | Issue 2 | e17259 Gene Tissue Index Outlier Algorithm and real clinical large-scale integrated dataset. No comparative studies are currently available to support the suitability of the existing methods for the analysis of real, large-scale integrated meta-datasets such as those collected in the GeneSapiens database. Materials and Methods Existing Statistical Methods Let xij be the expression values for genes j~1,2,……,p and samples i~1,2,……,n. We assume that the gene expression samples are obtained from two different groups, where n~n zn. In our case, n represents the number of samples from the normal group and n represents the number of samples from the cancer group. Let Ck be the set of indices of the observations in group k, for k = 1 and 2. where the product of madj and the constant 1.4826 is approximately equal to the standard error for normally distributed random variables. The approach used in the COPA statistic addresses the problem of more accurately identifying genes with an outlier population than the t-statistic. The COPA statistic is described as qr ~ x qr {medj, madj :4 t-statistic The formula for the standard unpaired t-statistic is: j j x { x Tj~ sj rffiffiffiffiffiffiffiffiffiffiffiffiffiffi n n, n where the rth percentile of the disease samples is qr. Compared to the t-statistic, COPA intuitively replaces the normal sample mean by the all-sample median medj, the sample standard error sj by the median absolute deviation madj, and the disease sample mean by the rth percentile qr. It is evident that the COPA statistic may not be very robust, since a fixed rth sample percentile is almost equal to using information from a single sample. Outlier Sums :1The outlier sums statistic was introduced as an improvement over the COPA statistic. Here, the OS statistic was proposed to replace the rth percentile with a sum over the outlier samples from the disease group above a given cut-off