├── README.md ├── ATS ├── LICENSE ├── README.md ├── ATS_SVM_FS.py ├── gen_testing_set.py └── prepare_ABC_sets.py ├── PPD └── ppd_cose.c └── MA_PPD └── MA_PPD.py /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Code from my research papers 4 | 5 | * [ATS](ATS/): Implementation of the classification method for steganalysis 6 | proposed in the paper 7 | [*Unsupervised steganalysis based on Artificial Training Sets* (2016)](http://www.sciencedirect.com/science/article/pii/S0952197616000026), 8 | [[arxiv](https://arxiv.org/abs/1703.00796)]. 9 | 10 | * [MA_PPD](MA_PPD/): Implementation of the manifold alignment techniques proposed in the paper 11 | [*Manifold alignment approach to cover source mismatch in steganalysis* (2016)](http://daniellerch.me/doc/dlerch2016ma.pdf). 12 | 13 | * [PPD](PPD/): Implementation of the feature extractor proposed in the paper 14 | [*LSB Matching Steganalysis Based on Patterns of Pixel Differences and 15 | Random Embedding* (2013)](http://www.sciencedirect.com/science/article/pii/S0167404812001745), 16 | [[arxiv](https://arxiv.org/abs/1703.00817)]. 17 | 18 | 19 | -------------------------------------------------------------------------------- /ATS/LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Daniel Lerch 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /ATS/README.md: -------------------------------------------------------------------------------- 1 | ## ATS Steganalysis 2 | 3 | Implementation of the method proposed in the paper [Unsupervised steganalysis based on Artificial Training Sets](https://www.sciencedirect.com/science/article/abs/pii/S0952197616000026). 4 | 5 | ### Example: 6 | 7 | #### The testing set: 8 | 9 | First, we need to prepare a testing set. 10 | 11 | ```bash 12 | $ ./gen_testing_set.py 13 | ./gen_testing_set.py 14 | ``` 15 | 16 | We need cover images, the algorithm and bitrate that we want to use and the percentage of stego images that we want in the testing set. In this example we use 500 images from the BossBase 1.01. 17 | 18 | http://dde.binghamton.edu/download/ImageDB/BOSSbase_1.01.zip 19 | 20 | And we use as a steganographic algorithm HUGO with a 0.4 bitrate and s 50% of stego images: 21 | 22 | http://dde.binghamton.edu/download/stego_algorithms/ 23 | 24 | Remember to download and compile the steganographic tools that you need. You can change the path of the tools in the config section inside the scripts. 25 | 26 | This is the command to generate the testing set: 27 | 28 | ```bash 29 | ./gen_testing_set.py pgm_cover_images 50 out HUGO 0.4 30 | ``` 31 | 32 | 33 | #### A, B anc C sets: 34 | 35 | The second step is to generate A, B and C sets and extract features using Rich Models: 36 | 37 | http://dde.binghamton.edu/download/feature_extractors/ 38 | 39 | Again, we need to download and compile the tools. 40 | 41 | This is the command to prepare the A, B and C sets: 42 | 43 | ```bash 44 | ./prepare_ABC_sets.py out/HUGO_0.4_boss500_50/ out/ HUGO 0.4 45 | ``` 46 | 47 | #### Classification: 48 | 49 | The last step is to classify into cover and stego. 50 | 51 | ```bash 52 | $ ./ATS_SVM_FS.py 53 | ./ATS_SVM_FS.py 54 | ``` 55 | If we do not give the labels to the script it performs a prediction. But in our case, as far as we know the labels of the testing set, the script can calculate the accuracy of the prediction: 56 | 57 | ```bash 58 | $ ./ATS_SVM_FS.py 59 | ./ATS_SVM_FS.py out/ATS_RM_HUGO_0.4_boss500_50/A_COMMON/ out/ATS_RM_HUGO_0.4_boss500_50/B_HUGO_040 out/ATS_RM_HUGO_0.4_boss500_50/C_HUGO_040 out/HUGO_0.4_boss500_50/labels.txt 60 | Accuracy: 0.828 61 | ``` 62 | 63 | 64 | -------------------------------------------------------------------------------- /ATS/ATS_SVM_FS.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python -W ignore 2 | 3 | from __future__ import print_function 4 | import os 5 | import sys 6 | import glob 7 | import tarfile 8 | from random import randint 9 | import numpy 10 | from sklearn import svm 11 | from sklearn.grid_search import GridSearchCV 12 | from sklearn.metrics import roc_curve, auc, roc_auc_score 13 | from sklearn.feature_selection import SelectPercentile, f_classif, chi2, SelectKBest 14 | 15 | 16 | # {{{ untar_to_tmpdir() 17 | def untar_to_tmpdir(tar_file): 18 | 19 | dirname=os.path.basename(os.path.splitext(tar_file)[0]) 20 | tmpdir1="/tmp/"+str(random.randint(10000000,99999999)) 21 | tmpdir2="/tmp/"+str(random.randint(10000000,99999999)) 22 | tar = tarfile.open(tar_file) 23 | tar.extractall(tmpdir1) 24 | tar.close() 25 | 26 | # remove one level 27 | shutil.move(tmpdir1+'/'+dirname, tmpdir2) 28 | shutil.rmtree(tmpdir1) 29 | 30 | return tmpdir2 31 | # }}} 32 | 33 | # {{{ read_SRM() 34 | def read_SRM(path): 35 | 36 | remove_path=False 37 | if not os.path.isdir(path): 38 | path=untar_to_tmpdir(path) 39 | remove_path=True 40 | 41 | directory = glob.glob(path+"/*") 42 | 43 | submodel_X={} 44 | submodel_names={} 45 | 46 | i=0 47 | for d in directory: 48 | i+=1 49 | files = glob.glob(d+"/*.fea") 50 | for f in files: 51 | model_name=os.path.splitext(os.path.basename(f))[0] 52 | features=open(f, 'r').readlines()[0].split(' '); 53 | 54 | if model_name not in submodel_X.keys(): 55 | submodel_X[model_name]=[] 56 | submodel_names[model_name]=[] 57 | 58 | 59 | submodel_names[model_name].append(os.path.basename(d)) 60 | 61 | fea_line=[] 62 | features.pop() 63 | for field in features: 64 | try: 65 | fea_line.append(float(field)) 66 | except: 67 | pass 68 | 69 | if len(submodel_X[model_name]) == 0: 70 | submodel_X[model_name]=[] 71 | submodel_X[model_name].append(fea_line) 72 | 73 | for k in submodel_X.keys(): 74 | submodel_X[k]=numpy.array(submodel_X[k]) 75 | 76 | if remove_path: 77 | shutil.rmtree(path) 78 | 79 | return submodel_X, submodel_names 80 | # }}} 81 | 82 | # {{{ read_SRM_ABC() 83 | def read_SRM_ABC( pathA, pathB, pathC): 84 | 85 | A, A_names=read_SRM(pathA) 86 | B, B_names=read_SRM(pathB) 87 | C, C_names=read_SRM(pathC) 88 | 89 | full_A=[] 90 | full_B=[] 91 | full_C=[] 92 | names=[] 93 | for k in A.keys(): 94 | if len(full_A)==0: 95 | full_A=A[k] 96 | full_B=B[k] 97 | full_C=C[k] 98 | names=A_names[k] 99 | continue 100 | 101 | full_A=numpy.append(full_A, A[k], axis=1) 102 | full_B=numpy.append(full_B, B[k], axis=1) 103 | full_C=numpy.append(full_C, C[k], axis=1) 104 | 105 | return full_A, full_B, full_C, names 106 | # }}} 107 | 108 | # {{{ grid_search() 109 | def grid_search(X, y): 110 | 111 | # Set the parameters by cross-validation 112 | tuned_parameters = [{'kernel': ['rbf'], 113 | 'gamma': [1e+3,1e-2,1e-1,1e-0,1e-1,1e-2,1e-3,1e-4], 114 | 'C': [0.25,0.5,1,10,100,10000]}] 115 | 116 | clf = GridSearchCV(svm.SVC(C=1), tuned_parameters) 117 | clf.fit(X, y) 118 | 119 | best_score=0 120 | best_params={} 121 | for params, mean_score, scores in clf.grid_scores_: 122 | if mean_score>best_score: 123 | best_score=mean_score 124 | best_params=params 125 | 126 | #print "best_score: %r" % best_score 127 | #print "best_params: %r" % best_params 128 | 129 | return best_params 130 | # }}} 131 | 132 | if len(sys.argv) < 4: 133 | print("%s \n" % sys.argv[0]) 134 | sys.exit(0) 135 | 136 | directory_A=sys.argv[1] 137 | directory_B=sys.argv[2] 138 | directory_C=sys.argv[3] 139 | 140 | A, B, C, names = read_SRM_ABC(directory_A, directory_B, directory_C) 141 | 142 | X=numpy.vstack((A, C)) 143 | Xt=numpy.hstack(([0]*len(A), [1]*len(C))) 144 | 145 | selector = SelectKBest(f_classif, k=500) 146 | selector.fit(X, Xt) 147 | X=selector.transform(X) 148 | B=selector.transform(B) 149 | 150 | 151 | pm = grid_search(X, Xt) 152 | clf = svm.SVC(kernel=pm['kernel'], C=pm['C'], gamma=pm['gamma']) 153 | clf.fit(X, Xt) 154 | 155 | Z = clf.predict(B) 156 | 157 | # Calculate accuracy 158 | if len(sys.argv)==5 and os.path.exists(sys.argv[4]): 159 | with open(sys.argv[4], 'r') as f: 160 | lines = f.read().splitlines() 161 | d={} 162 | for l in lines: 163 | pair=l.split(":") 164 | d[pair[0]]=pair[1] 165 | 166 | ok=0 167 | for i in range(len(Z)): 168 | if int(d[names[i]]) == Z[i]: ok+=1 169 | 170 | print("Accuracy: ", float(ok)/len(Z)) 171 | 172 | 173 | # Make a prediction 174 | else: 175 | for i in range(len(Z)): 176 | r='cover' 177 | if Z[i]==1: r='stego' 178 | print(names[i], r) 179 | 180 | 181 | 182 | 183 | -------------------------------------------------------------------------------- /PPD/ppd_cose.c: -------------------------------------------------------------------------------- 1 | 2 | /* Compilation: 3 | $ gcc ppd_cose.c -ltiff 4 | */ 5 | 6 | 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | #define S 4 15 | 16 | 17 | // {{{ matrix_alloc() 18 | int **matrix_alloc(size_t cols, size_t rows) 19 | { 20 | int i, j; 21 | int **m = (int**)malloc(sizeof(m) * cols); 22 | if(!m) 23 | { 24 | perror("out of memory (cols)"); 25 | return NULL; 26 | } 27 | 28 | for(i=0; i0)) 82 | { 83 | // 0 or 1? 84 | int bit = rand()%2; 85 | 86 | // + or -? 87 | int s = -1; 88 | if(rand()%2==0) 89 | s = 1; 90 | 91 | // Bit insertion 92 | if(rand()%br==0) 93 | { 94 | if(bit==0 && matrix[i][j]%2==1) 95 | matrix[i][j] += s; 96 | 97 | if(bit==1 && matrix[i][j]%2==0) 98 | matrix[i][j] += s; 99 | } 100 | } 101 | } 102 | } 103 | } 104 | // }}} 105 | 106 | // {{{ count_patterns() 107 | void count_patterns(int shapes[S][S][S][S], int **I, int cols, int rows) 108 | { 109 | int i, j, k, l, x, y; 110 | 111 | // initialize shapes 112 | for(i=0; i=S?S-1:l-mn); 146 | int i2 = (c1-mn>=S?S-1:c1-mn); 147 | int i3 = (c2-mn>=S?S-1:c2-mn); 148 | int i4 = (r-mn>=S?S-1:r-mn); 149 | 150 | shapes[i1][i2][i3][i4]++; 151 | 152 | if(_a>mx) { mx = _a; l=_b; r=_c; c1=_d; c2=_e; } 153 | if(_b>mx) { mx = _b; l=_d; r=_a; c1=_c; c2=_e; } 154 | if(_c>mx) { mx = _c; l=_a; r=_e; c1=_b; c2=_d; } 155 | if(_d>mx) { mx = _d; l=_e; r=_b; c1=_c; c2=_a; } 156 | if(_e>mx) { mx = _e; l=_c; r=_d; c1=_a; c2=_b; } 157 | 158 | i1 = (mx-l>=S?S-1:mx-l); 159 | i2 = (mx-c1>=S?S-1:mx-c1); 160 | i3 = (mx-c2>=S?S-1:mx-c2); 161 | i4 = (mx-r>=S?S-1:mx-r); 162 | 163 | shapes[i1][i2][i3][i4]++; 164 | } 165 | } 166 | 167 | } 168 | // }}} 169 | 170 | 171 | int main(int argc, char* argv[]) 172 | { 173 | uint32 rows; 174 | tsize_t cols; 175 | tdata_t buf; 176 | uint32 x; 177 | uint32 y; 178 | 179 | 180 | if(argc!=2) 181 | { 182 | printf("Usage: %s \n", argv[0]); 183 | return -1; 184 | } 185 | 186 | srand(time(NULL)); 187 | //srand(0); 188 | 189 | TIFF* tif_i = TIFFOpen(argv[1], "r"); 190 | if (!tif_i) 191 | { 192 | printf("Error reading Tiff image\n"); 193 | return -1; 194 | } 195 | 196 | TIFFGetField(tif_i, TIFFTAG_IMAGELENGTH, &rows); 197 | cols = TIFFScanlineSize(tif_i); 198 | buf = _TIFFmalloc(cols); 199 | 200 | 201 | // Matrix I 202 | int **I = matrix_alloc(cols, rows); 203 | 204 | for (y = 0; y < rows; y++) 205 | { 206 | TIFFReadScanline(tif_i, buf, y, 0); 207 | for (x = 0; x < cols; x++) 208 | { 209 | unsigned char *bytes = buf; 210 | I[x][y]=bytes[x]; 211 | } 212 | } 213 | 214 | _TIFFfree(buf); 215 | TIFFClose(tif_i); 216 | 217 | 218 | int i, j, k, l; 219 | 220 | 221 | int shapes[S][S][S][S]; 222 | int shapes_s[S][S][S][S]; 223 | 224 | 225 | // Create Is, inserting a random message 226 | int **Is = matrix_alloc(cols, rows); 227 | matrix_copy(Is, I, cols, rows); 228 | message_hide_random_br(Is, cols, rows, 1); 229 | 230 | 231 | count_patterns(shapes, I, cols, rows); 232 | count_patterns(shapes_s, Is, cols, rows); 233 | 234 | 235 | float R[S*S*S*S]; 236 | float mx=0; 237 | float mn=10; 238 | int idx=0; 239 | 240 | for(i=0; i0) 252 | f = stego / cover; 253 | 254 | if(f>mx) mx=f; 255 | if(f> CONFIGURATION >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 15 | 16 | # PATHS USED BY STEGANOGRAPHY TOOLS 17 | # Available at http://dde.binghamton.edu/download/stego_algorithms/ 18 | HUGO_BIN="bin/HUGO_like" 19 | WOW_BIN="bin/WOW" 20 | UNIW_BIN="bin/S-UNIWARD" 21 | 22 | # Number of concurrent processes 23 | NUMBER_OF_PROCESSES=cpu_count() 24 | #NUMBER_OF_PROCESSES=4 25 | 26 | 27 | # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 28 | 29 | # {{{ process_embedding() 30 | def process_embedding(f_pgm, algo, br, f_dst, output_dir): 31 | hide_message(f_pgm, algo, br, output_dir) 32 | os.rename(f_pgm, f_dst) 33 | 34 | # }}} 35 | 36 | # {{{ to_tmp_pgm() 37 | def to_tmp_pgm(f, output_dir): 38 | fn=extract_name_from_file(f) 39 | tmp=output_dir+"/tmp/"+fn+"_"+str(random.randint(10000000,99999999))+".pgm" 40 | I=misc.imread(f); 41 | misc.imsave(tmp, I); 42 | return tmp 43 | # }}} 44 | 45 | # {{{ read_image_filenames() 46 | def read_image_filenames(base_dir): 47 | files = glob.glob(base_dir+"/*.pgm") # +glob.glob(base_dir+"/*.tif") 48 | return files 49 | # }}} 50 | 51 | # {{{ extract_name_from_file() 52 | def extract_name_from_file(f): 53 | fn=os.path.basename(f) 54 | fn=os.path.splitext(fn)[0] 55 | return fn 56 | # }}} 57 | 58 | # {{{ hide_message() 59 | def hide_message(f, algo, br, output): 60 | 61 | global HUGO_BIN 62 | global WOW_BIN 63 | global UNIW_BIN 64 | br_str=str(br) 65 | 66 | seed=str( random.randint(-(2**31-1), 2**31-1) ) 67 | 68 | if algo=="HUGO": 69 | if not os.path.exists(HUGO_BIN): 70 | print("FATAL ERROR: command not found:", HUGO_BIN) 71 | return -1 72 | 73 | filename=extract_name_from_file(f) 74 | rdir=output+"/tmp/out_"+str(random.randint(10000000,99999999)) 75 | os.makedirs(rdir) 76 | os.system(HUGO_BIN+" -r "+seed+" -i "+f+" -O "+rdir+" -a "+br_str) 77 | os.rename(rdir+"/"+os.path.basename(f), f) 78 | shutil.rmtree(rdir) 79 | 80 | elif algo=="UNIW": 81 | if not os.path.exists(UNIW_BIN): 82 | print("FATAL ERROR: command not found:", UNIW_BIN) 83 | return -1 84 | 85 | filename=extract_name_from_file(f) 86 | rdir=output+"/tmp/out_"+str(random.randint(10000000,99999999)) 87 | os.makedirs(rdir) 88 | os.system(UNIW_BIN+" -r "+seed+" -i "+f+" -O "+rdir+" -a "+br_str) 89 | os.rename(rdir+"/"+os.path.basename(f), f) 90 | shutil.rmtree(rdir) 91 | 92 | elif algo=="WOW": 93 | if not os.path.exists(WOW_BIN): 94 | print("FATAL ERROR: command not found:", WOW_BIN) 95 | return -1 96 | 97 | filename=extract_name_from_file(f) 98 | rdir=output+"/tmp/out_"+str(random.randint(10000000,99999999)) 99 | os.makedirs(rdir) 100 | os.system(WOW_BIN+" -r "+seed+" -i "+f+" -O "+rdir+" -a "+br_str) 101 | os.rename(rdir+"/"+os.path.basename(f), f) 102 | shutil.rmtree(rdir) 103 | 104 | else: 105 | print("FATAL ERROR: Unknown algorithm") 106 | return -1 107 | # }}} 108 | 109 | # {{{ gen_testing_set() 110 | def gen_testing_set(cover_dir, perc_stego, algo, bitrate, output_dir): 111 | 112 | global NUMBER_OF_PROCESSES 113 | pool=Pool(processes=NUMBER_OF_PROCESSES) 114 | 115 | if not os.path.isdir(output_dir): 116 | os.mkdir(output_dir) 117 | 118 | if not os.path.isdir(output_dir+'/tmp'): 119 | os.mkdir(output_dir+'/tmp') 120 | 121 | if not os.path.isdir(cover_dir): 122 | print("FATAL ERROR: cover dir does not exists:", cover_dir) 123 | sys.exit(0) 124 | 125 | if not os.path.isdir(output_dir): 126 | os.mkdir(output_dir) 127 | 128 | 129 | if cover_dir[-1]=='/': 130 | cover_dir=cover_dir[:-1] 131 | image_dir=output_dir+'/'+algo+'_'+str(bitrate)+'_'+\ 132 | os.path.basename(cover_dir)+'_'+str(perc_stego) 133 | 134 | if os.path.isdir(image_dir): 135 | print("FATAL ERROR: image dir already exists:", image_dir) 136 | sys.exit(0) 137 | 138 | os.mkdir(image_dir) 139 | out_cover_dir=image_dir+'/cover' 140 | os.mkdir(out_cover_dir) 141 | out_stego_dir=image_dir+'/stego' 142 | os.mkdir(out_stego_dir) 143 | 144 | files = read_image_filenames(cover_dir) 145 | n=1 146 | for f in files: 147 | sys.stdout.flush() 148 | fname=extract_name_from_file(f) 149 | f_pgm=to_tmp_pgm(f, output_dir) 150 | 151 | try: 152 | # Use as stego images the percentage requested and move to output dir 153 | if n<=len(files)*float(perc_stego)/100: 154 | f_dst=out_stego_dir+'/'+fname+'.pgm' 155 | with open(image_dir+"/labels.txt", "a+") as myfile: 156 | myfile.write(fname+":1\n") 157 | 158 | r=pool.apply_async(process_embedding, 159 | args=( f_pgm, algo, bitrate, f_dst, output_dir)) 160 | 161 | # Move cover images to output dir 162 | else: 163 | with open(image_dir+"/labels.txt", "a+") as myfile: 164 | myfile.write(fname+":0\n") 165 | os.rename(f_pgm, out_cover_dir+'/'+fname+'.pgm') 166 | 167 | except Exception, e: 168 | print("Error: "+str(e)) 169 | pass 170 | 171 | n+=1 172 | 173 | pool.close() 174 | pool.join() 175 | 176 | # }}} 177 | 178 | # {{{ main() 179 | def main(): 180 | if len(sys.argv) < 6: 181 | print("%s \n" % sys.argv[0]) 182 | sys.exit(0) 183 | 184 | cover_dir=sys.argv[1] 185 | perc_stego=sys.argv[2] 186 | output_dir=sys.argv[3] 187 | algo=sys.argv[4] 188 | bitrate=float(sys.argv[5]) 189 | 190 | gen_testing_set(cover_dir, perc_stego, algo, bitrate, output_dir) 191 | # }}} 192 | 193 | 194 | if __name__ == "__main__": 195 | main() 196 | 197 | 198 | -------------------------------------------------------------------------------- /MA_PPD/MA_PPD.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | from __future__ import print_function 4 | 5 | import sys 6 | import multiprocessing 7 | 8 | from numpy import * 9 | 10 | import scipy 11 | import scipy.spatial.distance as sd 12 | 13 | from sklearn import neighbors 14 | from sklearn import svm 15 | from sklearn.model_selection import GridSearchCV 16 | from sklearn import preprocessing 17 | 18 | MAX_PROC=8 19 | 20 | # {{{ read_features() 21 | def read_features(fea_file): 22 | 23 | fin=open(fea_file, 'r') 24 | lines = fin.readlines() 25 | 26 | fea=[] 27 | label=[] 28 | 29 | i=0 30 | for l in lines: 31 | features=l.split(','); 32 | if "stego" in features[len(features)-1]: 33 | label.append(1) 34 | else: 35 | label.append(0) 36 | 37 | fea_line=[] 38 | features.pop() 39 | for field in features: 40 | if len(field)>0: 41 | fea_line.append(float(field)) 42 | 43 | fea.append(fea_line) 44 | 45 | X = array(fea) 46 | Xt = array(label) 47 | 48 | return X, Xt 49 | # }}}} 50 | 51 | # {{{ Metric() 52 | class Metric(object): 53 | def __init__(self,dist,name): 54 | self.dist = dist # dist(x,y): distance between two points 55 | self.name = name 56 | 57 | def within(self,A): 58 | '''pairwise distances between each pair of rows in A''' 59 | return sd.squareform(sd.pdist(A,self.name),force='tomatrix') 60 | 61 | def between(self,A,B): 62 | '''cartesian product distances between pairs of rows in A and B''' 63 | return sd.cdist(A,B,self.name) 64 | 65 | def pairwise(self,A,B): 66 | '''distances between pairs of rows in A and B''' 67 | return np.array([self.dist(a,b) for a,b in izip(A,B)]) 68 | 69 | 70 | SquaredL2 = Metric(sd.sqeuclidean,'sqeuclidean') 71 | 72 | # }}} 73 | 74 | 75 | # {{{ adjacency_matrix() 76 | # - W[i,j]=1 when the ith and jth points are neighbors. 77 | # - Otherwise Wij=0. 78 | def adjacency_matrix(X, k): 79 | # Distances 80 | metric=SquaredL2 81 | dist = metric.within(X) 82 | 83 | adj = zeros(dist.shape) 84 | 85 | # k-nearest neighbors 86 | nn = argsort(dist)[:,:min(k+1,len(X))] 87 | 88 | # nn's first column is the point idx, rest are neighbor idxs 89 | for idx in nn: 90 | adj[idx[0],idx[1:]] = 1 91 | adj[idx[1:],idx[0]] = 1 92 | 93 | n = X.shape[0] 94 | for i in range(n): 95 | for j in range(n): 96 | # geodesic distance inside the manifold 97 | hk = exp(-(linalg.norm(X[i]-X[j])**2)) 98 | if adj[i,j]==1: 99 | adj[i,j]*=hk 100 | 101 | return adj 102 | # }}} 103 | 104 | # {{{ adjacency_matrix_similarity() 105 | # - W[i,j]=1 when the ith and jth points are neighbors. 106 | # - Otherwise Wij=0. 107 | def adjacency_matrix_similarity(X, Xt, k, ms, md): 108 | # Distances 109 | metric=SquaredL2 110 | dist = metric.within(X) 111 | 112 | adj = zeros(dist.shape) 113 | 114 | # k-nearest neighbors 115 | nn = argsort(dist)[:,:min(k+1,len(X))] 116 | 117 | # nn's first column is the point idx, rest are neighbor idxs 118 | for idx in nn: 119 | adj[idx[0],idx[1:]] = 1 120 | adj[idx[1:],idx[0]] = 1 121 | 122 | n = X.shape[0] 123 | for i in range(n): 124 | for j in range(n): 125 | # geodesic distance inside the manifold 126 | hk = exp(-(linalg.norm(X[i]-X[j])**2)) 127 | if Xt[i]==Xt[j]: 128 | hk *= ms; 129 | else: 130 | hk *= md; 131 | 132 | if adj[i,j]==1: 133 | adj[i,j]*=hk 134 | 135 | return adj 136 | # }}} 137 | 138 | # {{{ laplacian() 139 | # L=D-W 140 | def laplacian(W): 141 | 142 | n_nodes = W.shape[0] 143 | lap = -asarray(W) # minus sign leads to a copy 144 | # set diagonal to zero, in case it isn't already 145 | lap.flat[::n_nodes + 1] = 0 146 | d = -lap.sum(axis=0) # re-negate to get positive degrees 147 | 148 | # put the degrees on the diagonal 149 | lap.flat[::n_nodes + 1] = d 150 | return lap 151 | # }}} 152 | 153 | # {{{ svm_grid_search() 154 | def svm_grid_search(X, Xt): 155 | 156 | # Set the parameters by cross-validation 157 | tuned_parameters = [{'kernel': ['rbf'], 158 | 'gamma': [1e+3,1e-2,1e-1,1e-0,1e-1,1e-2,1e-3,1e-4], 159 | 'C': [0.25,0.5,1,10,100,10000]}] 160 | 161 | clf = GridSearchCV(svm.SVC(C=1), tuned_parameters, n_jobs=-1) 162 | clf.fit(X, Xt) 163 | return clf.best_params_ 164 | # }}} 165 | 166 | # {{{ svm_accuracy() 167 | def svm_accuracy(X, Xt, Y, Yt): 168 | 169 | n = X.shape[0] 170 | 171 | pm=svm_grid_search(X, Xt) 172 | clf=svm.SVC(kernel=pm['kernel'],C=pm['C'],gamma=pm['gamma'],probability=True) 173 | clf.fit(X, Xt) 174 | Yt2 = clf.predict(Y) 175 | 176 | cnt=0 177 | for i in range(n): 178 | if Yt[i]==Yt2[i]: 179 | cnt=cnt+1 180 | 181 | return 100*float(cnt)/n 182 | # }}} 183 | 184 | # {{{ domain_adaptation() 185 | def domain_adaptation(X, Xt, Y, d, k1, k2, eps=1e-8): 186 | 187 | n = X.shape[0] 188 | 189 | 190 | pm=svm_grid_search(X, Xt) 191 | clf=svm.SVC(kernel=pm['kernel'],C=pm['C'],gamma=pm['gamma'],probability=True) 192 | clf.fit(X, Xt) 193 | 194 | Ty = clf.predict(Y) 195 | Py = clf.predict_proba(Y) 196 | Py = array([a for (a, b) in Py]) 197 | Iy = array([i for i in range(n)]) 198 | zipped=zip(Py, Ty, Iy) 199 | zipped.sort() 200 | Py = array([a for (a, b, c) in zipped]) 201 | Ty = array([b for (a, b, c) in zipped]) 202 | Iy = array([c for (a, b, c) in zipped]) 203 | 204 | Tx = clf.predict(X) 205 | Px = clf.predict_proba(X) 206 | Px = array([a for (a, b) in Px]) 207 | Ix = array([i for i in range(n)]) 208 | zipped=zip(Px, Tx, Ix) 209 | zipped.sort() 210 | Px = array([a for (a, b, c) in zipped]) 211 | Tx = array([b for (a, b, c) in zipped]) 212 | Ix = array([c for (a, b, c) in zipped]) 213 | 214 | # Local geometry (min cost) 215 | Wx = adjacency_matrix_similarity(X, Xt, k1, 1000, 0.0010) 216 | Wy = adjacency_matrix(Y, k1) 217 | Wxy = zeros(shape=(n,n)) 218 | for i in range(n): 219 | # We can not use geodesic distance because they are in different manifolds 220 | Wxy[Ix[i], Iy[i]]=1 221 | 222 | W=asarray(bmat(((Wx, Wxy),(Wxy.T, Wy)))) 223 | Ll = laplacian(W) 224 | 225 | # Linear algebra 226 | #vals,vecs = scipy.linalg.eig(A, B) 227 | vals,vecs = scipy.linalg.eig(Ll) 228 | idx = argsort(vals) 229 | for i in xrange(len(idx)): 230 | if vals[idx[i]] >= eps: 231 | break 232 | vecs = vecs.real[:,idx[i:]] 233 | 234 | # Normalization 235 | for i in xrange(vecs.shape[1]): 236 | vecs[:,i] /= linalg.norm(vecs[:,i]) 237 | 238 | # New Coordinates 239 | n1=X.shape[0] 240 | n2=Y.shape[0] 241 | map1 = vecs[ : n1, : d] 242 | map2 = vecs[n1 : n1+n2, : d] 243 | 244 | return map1,map2 245 | 246 | # }}} 247 | 248 | 249 | 250 | if __name__ == '__main__': 251 | 252 | if len(sys.argv)!=3: 253 | print("Usage: ") 254 | sys.exit(0) 255 | 256 | d=2 # dimensions 257 | X, Xt = read_features(sys.argv[1]) 258 | Y, Yt = read_features(sys.argv[2]) 259 | k1=int(round(sqrt(X.shape[0]))) 260 | k2=k1 # number of neighbors 261 | noDA_acc = svm_accuracy(X, Xt, Y, Yt) 262 | Xnew, Ynew = domain_adaptation(X, Xt, Y, d, k1, k2) 263 | DA_acc = svm_accuracy(Xnew, Xt, Ynew, Yt) 264 | 265 | print("no Da:",noDA_acc, " DA:",DA_acc) 266 | 267 | 268 | 269 | 270 | 271 | 272 | -------------------------------------------------------------------------------- /ATS/prepare_ABC_sets.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python -W ignore 2 | # -*- coding: utf-8 -*- 3 | 4 | from __future__ import print_function 5 | import sys 6 | import os 7 | import shutil 8 | import glob 9 | import random 10 | from multiprocessing import Pool, cpu_count 11 | from scipy import misc 12 | 13 | 14 | # >> CONFIGURATION >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 15 | 16 | # PATHS USED BY STEGANOGRAPHY TOOLS 17 | # Available at http://dde.binghamton.edu/download/stego_algorithms/ 18 | HUGO_BIN="bin/HUGO_like" 19 | WOW_BIN="bin/WOW" 20 | UNIW_BIN="bin/S-UNIWARD" 21 | 22 | # PATHS USED BY FEATURE EXTRACTORS 23 | # Available at http://dde.binghamton.edu/download/feature_extractors/ 24 | RM_BIN="bin/SRM" 25 | 26 | # Number of concurrent processes 27 | NUMBER_OF_PROCESSES=cpu_count() 28 | #NUMBER_OF_PROCESSES=4 29 | 30 | 31 | # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 32 | 33 | 34 | 35 | # {{{ process_extractor() 36 | def process_extractor(fea_ext, f, output_dir, name, remove=False): 37 | extract_features(fea_ext, f, output_dir, name, remove) 38 | # }}} 39 | 40 | # {{{ process_embed_and_extract() 41 | def process_embed_and_extract(fea_ext, f_pgm, 42 | name, algo, br, dirB, dirC, output_dir, remove): 43 | 44 | hide_and_extract(fea_ext, f_pgm, name, algo, br, dirB, dirC, output_dir, remove) 45 | # }}} 46 | 47 | # {{{ hide_and_extract() 48 | # hide info and extract features 49 | def hide_and_extract(fea_ext, f_pgm, name, algo, 50 | br, dirB, dirC, output_dir, remove=True): 51 | fn = extract_name_from_file(f_pgm) 52 | 53 | f_dst=output_dir+"/tmp/"+fn+"_"+str(random.randint(10000000,99999999))+".pgm" 54 | hide_message(f_pgm, algo, br, output_dir) 55 | os.rename(f_pgm, f_dst) 56 | extract_features(fea_ext, f_dst, dirB, name, False) 57 | 58 | hide_message(f_dst, algo, br, output_dir) 59 | extract_features(fea_ext, f_dst, dirC, name, remove) 60 | # }}} 61 | 62 | # {{{ extract_features() 63 | def extract_features(fea_ext, f, odir, name, remove=False): 64 | if fea_ext=="RM": 65 | os.makedirs(odir+"/"+name) 66 | os.system(RM_BIN+" -i "+f+" -O "+odir+"/"+name) 67 | 68 | else: 69 | print("FATAL ERROR: Unknown feature extractor:", fea_ext) 70 | return 0 71 | 72 | if remove: 73 | os.remove(f) 74 | # }}} 75 | 76 | # {{{ to_tmp_pgm() 77 | def to_tmp_pgm(f, output_dir): 78 | fn=extract_name_from_file(f) 79 | tmp=output_dir+"/tmp/"+fn+"_"+str(random.randint(10000000,99999999))+".pgm" 80 | I=misc.imread(f); 81 | misc.imsave(tmp, I); 82 | return tmp 83 | # }}} 84 | 85 | # {{{ read_image_filenames() 86 | def read_image_filenames(base_dir): 87 | files = glob.glob(base_dir+"/*.pgm")+glob.glob(base_dir+"/*/*.pgm") 88 | return files 89 | # }}} 90 | 91 | # {{{ extract_name_from_file() 92 | def extract_name_from_file(f): 93 | fn=os.path.basename(f) 94 | fn=os.path.splitext(fn)[0] 95 | return fn 96 | # }}} 97 | 98 | # {{{ hide_message() 99 | def hide_message(f, algo, br, output): 100 | 101 | global HUGO_BIN 102 | global WOW_BIN 103 | global UNIW_BIN 104 | br_str=str(br) 105 | 106 | seed=str( random.randint(-(2**31-1), 2**31-1) ) 107 | 108 | if algo=="HUGO": 109 | if not os.path.exists(HUGO_BIN): 110 | print("FATAL ERROR: command not found:", HUGO_BIN) 111 | return -1 112 | 113 | filename=extract_name_from_file(f) 114 | rdir=output+"/tmp/out_"+str(random.randint(10000000,99999999)) 115 | os.makedirs(rdir) 116 | os.system(HUGO_BIN+" -r "+seed+" -i "+f+" -O "+rdir+" -a "+br_str) 117 | os.rename(rdir+"/"+os.path.basename(f), f) 118 | shutil.rmtree(rdir) 119 | 120 | elif algo=="UNIW": 121 | if not os.path.exists(UNIW_BIN): 122 | print("FATAL ERROR: command not found:", UNIW_BIN) 123 | return -1 124 | 125 | filename=extract_name_from_file(f) 126 | rdir=output+"/tmp/out_"+str(random.randint(10000000,99999999)) 127 | os.makedirs(rdir) 128 | os.system(UNIW_BIN+" -r "+seed+" -i "+f+" -O "+rdir+" -a "+br_str) 129 | os.rename(rdir+"/"+os.path.basename(f), f) 130 | shutil.rmtree(rdir) 131 | 132 | elif algo=="WOW": 133 | if not os.path.exists(WOW_BIN): 134 | print("FATAL ERROR: command not found:", WOW_BIN) 135 | return -1 136 | 137 | filename=extract_name_from_file(f) 138 | rdir=output+"/tmp/out_"+str(random.randint(10000000,99999999)) 139 | os.makedirs(rdir) 140 | os.system(WOW_BIN+" -r "+seed+" -i "+f+" -O "+rdir+" -a "+br_str) 141 | os.rename(rdir+"/"+os.path.basename(f), f) 142 | shutil.rmtree(rdir) 143 | 144 | else: 145 | print("FATAL ERROR: Unknown algorithm") 146 | return -1 147 | # }}} 148 | 149 | # {{{ prepare_ABC_sets() 150 | def prepare_ABC_sets(input_dir, algo, br, output_dir, fea_ext='RM'): 151 | 152 | if input_dir[-1]=='/': 153 | input_dir=input_dir[:-1] 154 | label='ATS_'+fea_ext+'_'+os.path.basename(input_dir) 155 | 156 | if not os.path.isdir(input_dir): 157 | print ("FATAL ERROR: input dir does not exists:", input_dir) 158 | sys.exit(0) 159 | 160 | if not os.path.isdir(output_dir+'/'+label): 161 | os.mkdir(output_dir+'/'+label) 162 | 163 | pool=Pool(processes=NUMBER_OF_PROCESSES) 164 | 165 | # Al the A sets are the same. We only need one 166 | dirA=output_dir+'/'+label+'/A_COMMON' 167 | if os.path.isdir(dirA): 168 | print("Using data from cache:", dirA) 169 | 170 | else: 171 | os.mkdir(dirA) 172 | files = read_image_filenames(input_dir); 173 | n=1 174 | for f in files: 175 | print("A: Extracting", f, "image", n) 176 | sys.stdout.flush() 177 | fname=extract_name_from_file(f) 178 | 179 | # The set A is the original testing set. 180 | f_pgm=to_tmp_pgm(f, output_dir) 181 | r=pool.apply_async( 182 | process_extractor, 183 | args=( fea_ext, f_pgm, dirA, fname, True)) 184 | 185 | n+=1 186 | 187 | # Prepare sets B and C 188 | br_str=str(int(float(br)*100)).zfill(3) 189 | dirB=output_dir+'/'+label+'/B_'+algo+'_'+br_str 190 | dirC=output_dir+'/'+label+'/C_'+algo+'_'+br_str 191 | 192 | use_BC_cache=False 193 | if os.path.isdir(dirB) and os.path.isdir(dirC): 194 | print("Using data from cache:", dirB) 195 | print("Using data from cache:", dirC) 196 | use_BC_cache=True 197 | else: 198 | if os.path.isdir(dirB): 199 | shutil.rmtree(dirB) 200 | if os.path.isdir(dirC): 201 | shutil.rmtree(dirC) 202 | os.mkdir(dirB) 203 | os.mkdir(dirC) 204 | 205 | if not use_BC_cache: 206 | files = read_image_filenames(input_dir); 207 | n=1 208 | for f in files: 209 | print("BC: Embedding into", f, "image", n) 210 | sys.stdout.flush() 211 | fname=extract_name_from_file(f) 212 | 213 | # The set B is the set A with one embedding 214 | # The set C is the set A with two embedding 215 | try: 216 | f_pgm=to_tmp_pgm(f, output_dir) 217 | r=pool.apply_async(process_embed_and_extract, args=( 218 | fea_ext,f_pgm,fname,algo,br,dirB,dirC,output_dir,True)) 219 | 220 | except Exception, e: 221 | print("Exception hiding data into:", f, ",",str(e)) 222 | pass 223 | 224 | n+=1 225 | 226 | pool.close() 227 | pool.join() 228 | 229 | # }}} 230 | 231 | # {{{ main() 232 | def main(): 233 | if len(sys.argv) < 5: 234 | print("%s \n" % sys.argv[0]) 235 | sys.exit(0) 236 | 237 | input_dir=sys.argv[1] 238 | output_dir=sys.argv[2] 239 | algo=sys.argv[3] 240 | bitrate=float(sys.argv[4]) 241 | 242 | prepare_ABC_sets(input_dir, algo, bitrate, output_dir) 243 | # }}} 244 | 245 | 246 | if __name__ == "__main__": 247 | main() 248 | 249 | 250 | --------------------------------------------------------------------------------