That by the inverse of the ^d d whitening matrix Wi.
That by the inverse of the ^d d whitening matrix Wi. Here Vd is the generalized inverseof Vd, defined as V = ( V T V ) -1 V T . d d d d The first criterion measuring the data-specific variation after the dimensionality reduction to d dimensions is defined asWe preprocessed the data by imputing missing values with the K-nearest neighbor method, using K = 10. After that the data was Fourier-transformed, and the power spectrum was used for the analysis. In the end we had 5,670 genes, including 724 out of 800 cell-cycle regulated genes defined in [11]. The total number of features in the 5 data sets was 38.Yeast stress data We used the yeast gene expression data under various stress conditions from [12,13]. We picked 15 different conditions, 9 from [12] and 6 from [13], resulting in 97 dimensions in total. We then combined them in order to study genes related to general environmental stress response (ESR).VarS =Trace(Ci =pXi).(6)Each term in the sum is simply the variance of a single reconstruction, and the sum matches the total variation in the collection of data sets. The measure is further normalized so that the value for d = N, the full dimensionality, is one. For the shared variation we measure the pairwise variation between all pairs of data sets. The measure uses the same reconstructed data sets, and is defined asWe normalized all time series with their respective zeropoints, and imputed missing values by gene-wise averagesVarD-S =Trace(Xi =1 j =i +p -pT i X j ),(7)Page 11 of(page number not for citation purposes)BMC Bioinformatics 2008, 9:http://www.biomedcentral.com/1471-2105/9/Classification accuracy0.0.0.CCA PCA Baseline1 5 10 15 200.Dimensionality of the projectionFigure 6 KNN classification for stress data KNN classification for stress data. The classification accuracy obtained using the combined representation as a function of dimensionality. The CCA-based combination (solid line) is consistently buy Avermectin B1a pubmed ID:https://www.ncbi.nlm.nih.gov/pubmed/25768400 worse than the PCA-based approach (dashed line), implying that the class labels might not correlate that well with the true shared response. As a baseline, the classification accuracy obtained by the concatenation of all original data sets (dotted line) is also included.again normalized so that the full dimensionality gives the value one. It is worth noticing that the sum of pairwise variations is not a perfect measure for the shared variation for collections with more than two data sets, but it is computationally simple and intuitive.rithm, designing of the experiments, and writing of the manuscript. All authors read and approved the final manuscript.Additional material Additional fileA software package in R. An R implementation of the method including the source codes and documentation of the software. Click here for file [http://www.biomedcentral.com/content/supplementary/14712105-9-111-S1.GZ]Availability and requirementsProject name: drCCA; Project home page: http://www.cis.hut.fi/projects/mi/ software/drCCA/; Operating system(s): Platform independent; Programming language: RAcknowledgementsLicense: GNU LGPL; Any restrictions to use by non-academics: Read GNU LGPL conditionsThe authors are with the Adaptive Informatics Research Centre. This work was supported in part by the Academy of Finland, decision number 207467, in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778, and in part by a grant from the University of Helsinki’s Research Funds. This publication only reflec.