├── .gitattributes ├── .gitignore ├── README.md ├── data └── multiclass.zip ├── datasets └── matlab code ├── AMLALL_nature.mat ├── Lymphoma.mat ├── README.txt ├── binarize.m ├── calVIP.m ├── ctransform.m ├── kernel.m ├── kernelPLS.m ├── kernelmi.m ├── main.m ├── normalizemeanstd.m └── ssqd.m /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | 4 | # Custom for Visual Studio 5 | *.cs diff=csharp 6 | *.sln merge=union 7 | *.csproj merge=union 8 | *.vbproj merge=union 9 | *.fsproj merge=union 10 | *.dbproj merge=union 11 | 12 | # Standard to msysgit 13 | *.doc diff=astextplain 14 | *.DOC diff=astextplain 15 | *.docx diff=astextplain 16 | *.DOCX diff=astextplain 17 | *.dot diff=astextplain 18 | *.DOT diff=astextplain 19 | *.pdf diff=astextplain 20 | *.PDF diff=astextplain 21 | *.rtf diff=astextplain 22 | *.RTF diff=astextplain 23 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Windows image file caches 2 | Thumbs.db 3 | ehthumbs.db 4 | 5 | # Folder config file 6 | Desktop.ini 7 | 8 | # Recycle Bin used on file shares 9 | $RECYCLE.BIN/ 10 | 11 | # Windows Installer files 12 | *.cab 13 | *.msi 14 | *.msm 15 | *.msp 16 | 17 | # ========================= 18 | # Operating System Files 19 | # ========================= 20 | 21 | # OSX 22 | # ========================= 23 | 24 | .DS_Store 25 | .AppleDouble 26 | .LSOverride 27 | 28 | # Icon must ends with two \r. 29 | Icon 30 | 31 | # Thumbnails 32 | ._* 33 | 34 | # Files that might appear on external disk 35 | .Spotlight-V100 36 | .Trashes 37 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # kernelPLS 2 | kernel partial least squares for gene selection 3 | 4 | We developed a non-linear gene selection method from microarray data. 5 | -------------------------------------------------------------------------------- /data/multiclass.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sqsun/kernelPLS/41b2efcfe1063ebc8ce09ce190f97235b0395319/data/multiclass.zip -------------------------------------------------------------------------------- /datasets: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /matlab code/AMLALL_nature.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sqsun/kernelPLS/41b2efcfe1063ebc8ce09ce190f97235b0395319/matlab code/AMLALL_nature.mat -------------------------------------------------------------------------------- /matlab code/Lymphoma.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sqsun/kernelPLS/41b2efcfe1063ebc8ce09ce190f97235b0395319/matlab code/Lymphoma.mat -------------------------------------------------------------------------------- /matlab code/README.txt: -------------------------------------------------------------------------------- 1 | 2 | This is a Matlab source code of kernelPLS algorithm for feature(gene) selection problem, 3 | 4 | ===================================================== 5 | 6 | Information 7 | ------------------------- 8 | Author: S.Q. Sun and Q.K. Peng 9 | Affiliation: Xi'an Jiaotong University 10 | Contact: ssqsxf@stu.xjtu.edu.cn 11 | Release date: April 20, 2014 12 | Version: 1.0 13 | ====================================================== 14 | main function: main.m 15 | There are two test datasets, namely AMLALL.mat and Lymphoma.mat, were used in our work. 16 | Others can be download from here: https://github.com/sqsun/kernelPLS-datasets 17 | 18 | 19 | Fox example: 20 | >> main 21 | How many genes you want to select? 22 | Please input here: 5 (Enter) 23 | selectedgenes = 24 | 25 | 'M23197_at' 'Y12670_at' 'U50136_rna1_at' 'X95735_at' 'D49950_at' 26 | >> 27 | 28 | 29 | 30 | kernelPLS program is developed by Shiquan Sun (ssqsxf@stu.xjtu.edu.cn) 31 | 32 | Contact: ssqsxf@stu.xjtu.edu.cn; qkpeng@xjtu.edu.cn; 33 | released April 24, 2014. 34 | 35 | Acknowledge 36 | In the paper, the correlationship betweent two vectors was estimated by kernel mutual information(kernelmi.m), which provided by Mikhail. 37 | Download free here: http://www.mathworks.com/matlabcentral/fileexchange/30998-kernel-estimate-for--conditional--mutual-information 38 | -------------------------------------------------------------------------------- /matlab code/binarize.m: -------------------------------------------------------------------------------- 1 | % Encodes class membership in binary form 2 | 3 | function YY = binarize(Y) 4 | 5 | unics = unique(Y); 6 | YY = zeros(length(Y),length(unics)); 7 | for i=1:length(unics) 8 | YY(find(Y==unics(i)),i) = 1; 9 | end; 10 | 11 | 12 | -------------------------------------------------------------------------------- /matlab code/calVIP.m: -------------------------------------------------------------------------------- 1 | function VIP = calVIP( Y, t, w ) 2 | 3 | p = 3; 4 | [~,m]=size(t); 5 | [~,q]=size(Y); 6 | [~,~]=size(w); 7 | 8 | Rd = zeros(1,m); 9 | for i = 1:m 10 | for j = 1:q 11 | r = kernelmi( Y(:,j)',t(:,i)' ); 12 | Rd( i ) = Rd( i ) + r^2; 13 | % r = corrcoef(Y(:,j),t(:,i)); 14 | % Rd(i)=Rd(i) + r(1,2)^2; 15 | end 16 | end 17 | % Rd = Rd./sum(Rd); 18 | RdY = sum( Rd./q ); 19 | 20 | w2 = w.^2; 21 | dor = p*sum( ( ( ones(size(w2,1),1 )*Rd ).*w2 ), 2 ); 22 | % VIP = sqrt(dor); 23 | VIP = sqrt( dor./RdY ); 24 | end -------------------------------------------------------------------------------- /matlab code/ctransform.m: -------------------------------------------------------------------------------- 1 | function [ ar ] = ctransform(a) 2 | % Copula-transform array - rank and scale to [0, 1] 3 | [as ai] = sort(a, 2); 4 | [aa ar] = sort(ai, 2); 5 | ar = (ar - 1) / (size(ar, 2) - 1); 6 | end 7 | 8 | -------------------------------------------------------------------------------- /matlab code/kernel.m: -------------------------------------------------------------------------------- 1 | function K = kernel(X,Y,type,par1,coef) 2 | 3 | % K=kernel(X,Y,type,par1) 4 | % type: 'polynomial' or 'gaussian' 5 | % par1: parameter for kernel 6 | X = X'; 7 | Y = Y'; 8 | % if nargin < 4 9 | % alpha = 0.25; 10 | % par1 = (size(X,1) + 1) / sqrt(12) / size(X,1) ^ (1 + alpha); 11 | % end 12 | 13 | if strcmp(type,'polynomial') 14 | 15 | K=((X'*Y)+coef).^par1; 16 | 17 | elseif strcmp(type,'gaussian') 18 | sigma = 250; 19 | [X2,Y2]=meshgrid(sum(Y.^2),sum(X.^2)); 20 | K=exp(-(X2+Y2-2*X'*Y)/(2*sigma^2)); 21 | % K=exp(-(X2+Y2-2*X'*Y)/(2*mpower(sig,2))); 22 | 23 | end 24 | -------------------------------------------------------------------------------- /matlab code/kernelPLS.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sqsun/kernelPLS/41b2efcfe1063ebc8ce09ce190f97235b0395319/matlab code/kernelPLS.m -------------------------------------------------------------------------------- /matlab code/kernelmi.m: -------------------------------------------------------------------------------- 1 | function [ I ] = kernelmi( x, y, h, ind ) 2 | % Kernel-based estimate for mutual information I(X, Y) 3 | % h - kernel width; ind - subset of data on which to estimate MI 4 | 5 | [Nx, Mx]=size(x); 6 | [Ny, My]=size(y); 7 | 8 | if any([Nx Ny My] ~= [1 1 Mx]) 9 | error('Bad sizes of arguments'); 10 | end 11 | 12 | if nargin < 3 13 | % Yields unbiased estiamte when Mx->inf 14 | % and low MSE for two joint gaussian variables 15 | alpha = 0.25; 16 | h = (Mx + 1) / sqrt(12) / Mx ^ (1 + alpha); 17 | end 18 | 19 | if nargin < 4 20 | ind = 1:Mx; 21 | end 22 | 23 | % Copula-transform variables 24 | x = ctransform(x); 25 | y = ctransform(y); 26 | 27 | h2 = 2*h^2; 28 | 29 | % Pointwise values for kernels 30 | Kx = squareform(exp(-ssqd([x;x])/h2))+eye(Mx); 31 | Ky = squareform(exp(-ssqd([y;y])/h2))+eye(Mx); 32 | 33 | % Kernel sums for marginal probabilities 34 | Cx = sum(Kx); 35 | Cy = sum(Ky); 36 | 37 | % Kernel product for joint probabilities 38 | Kxy = Kx.*Ky; 39 | 40 | f = sum(Cx.*Cy)*sum(Kxy)./(Cx*Ky)./(Cy*Kx); 41 | I = mean(log(f(ind))); 42 | 43 | end 44 | 45 | -------------------------------------------------------------------------------- /matlab code/main.m: -------------------------------------------------------------------------------- 1 | % Algorithm to (try to) select the features(genes) using kernelPLS method. 2 | % Function name: main.m 3 | % Authors: S.Q. Sun and Q.K. Peng 4 | % Affiliation: Xi'an Jiaotong University 5 | % Contact: ssqsxf@stu.xjtu.edu.cn 6 | % Release date: April 20, 2014 7 | % Version: 1.0 8 | %% ====================================== 9 | clear all 10 | clc 11 | %-------------------% 12 | load AMLALL_nature % loading dataset % 13 | % load Lymphoma %-------------------% 14 | 15 | %% 16 | num_SelectedGenes = input('How many genes you want to select? \nPlease input here:'); 17 | 18 | 19 | % mean:0,std:1 %-------------------% 20 | X = normalizemeanstd( xapp ); % normalizing % 21 | %-------------------% 22 | 23 | %============================================================% 24 | % KERNEL PARTIAL LEAST SQUARES % 25 | %============================================================% 26 | Y = binarize( yapp ); 27 | 28 | % number of components 29 | num_Component = 10; 30 | %---------------------% 31 | alpha = 1; % parameter setting % 32 | coef = 0.1; %---------------------% 33 | %---------------------% 34 | % polynomial kernel % 35 | Kxx = kernel( X, X, 'polynomial', alpha, coef ); %---------------------% 36 | Kxy = kernel( X, X([1:2:size(X,1)], : ), 'polynomial', alpha, coef ); 37 | %---------------------% 38 | % gaussian kernel % 39 | % Kxx = kernel( X, X, 'gaussian' ); %---------------------% 40 | % Kxy = kernel( X, X([1:2:size(X,1)],:), 'gaussian' ); 41 | 42 | 43 | [ kplsXS ] = kernelPLS( Kxx, Kxy, Y, num_Component ); 44 | 45 | kX0 = X - ones( size(X,1), 1 )*mean( X ); 46 | kWeight = pinv( kX0 )*kplsXS; 47 | 48 | kVIP = calVIP( Y, kplsXS( :, 1:num_Component ), kWeight( :, 1:num_Component ) ); 49 | 50 | [ ~, FeatureRank ] = sort( kVIP, 'descend' ); 51 | 52 | 53 | for i = 1:num_SelectedGenes 54 | SelectedGenes{ i } = GeneNames{ FeatureRank( i ) }; 55 | end 56 | SelectedGenes 57 | -------------------------------------------------------------------------------- /matlab code/normalizemeanstd.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sqsun/kernelPLS/41b2efcfe1063ebc8ce09ce190f97235b0395319/matlab code/normalizemeanstd.m -------------------------------------------------------------------------------- /matlab code/ssqd.m: -------------------------------------------------------------------------------- 1 | function d = ssqd(X) 2 | % Taken from Matlab pdist.m 3 | % Computes pairwise sum of squared differences between rows of X 4 | % Use squareform(.) to convert to square symmetric distance matrix 5 | [p,n] = size(X); 6 | d = zeros(1,n*(n-1)./2); 7 | k = 1; 8 | for i = 1:n-1 9 | ssq = zeros(1, n-i); 10 | for q = 1:p 11 | ssq = ssq + (X(q, i) - X(q,(i+1):n)).^2; 12 | end 13 | d(k:(k+n-i-1)) = ssq; 14 | k = k + (n-i); 15 | end 16 | end --------------------------------------------------------------------------------