├── .DS_Store ├── LICENSE ├── README.md ├── data ├── GM12878_replicate_down16_chr17_17.npy.gz ├── GM12878_replicate_down16_chr19_22.npy.gz └── GM12878_replicate_original_chr19_22.npy.gz ├── model └── test_model.net └── src ├── Gaussian_kernel_smoothing.py ├── predict.py └── trainConvNet.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/.DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Yan Zhang 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 14 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 15 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 16 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 17 | DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 18 | OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE 19 | OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # HiCPlus 2 | Resolution Enhancement of HiC interaction heatmap. 3 | ## Deprecation warning 4 | Since Theano is no long deveopled, as well as to avoid too many dependencies, we impletemented the Pytorch version 5 | https://github.com/wangjuan001/hicplus 6 | Currently, this repo is no longer maintained. 7 | 8 | 9 | ## Dependency 10 | 11 | * [Python] (https://www.python.org) (2.7) with Numpy and Scipy. We recommand use the [Anaconda] (https://www.continuum.io) distribution to install Python. 12 | * [Theano] (https://github.com/Theano/Theano) (0.8.0). GPU acceleration is not required but strongly recommended. CuDNN is also recommended to maximize the GPU performance. 13 | * [Lasagne] (https://github.com/Lasagne/Lasagne) (0.2.dev1) 14 | * [Nolearn] (https://github.com/dnouri/nolearn) (0.6a0.dev0) 15 | 16 | 17 | 18 | ## Installation 19 | Just clone the repo to your local folder. 20 | 21 | ``` 22 | $ git clone https://github.com/zhangyan32/HiCPlus.git 23 | 24 | ``` 25 | 26 | ## Usage 27 | 28 | ### Training 29 | In the training stage, both high-resolution Hi-C samples and low-resolution Hi-C samples are needed. Two samples should be in the same shape as (N, 1, n, n), where N is the number of the samples, and n is the size of the samples. The sample index of the sample should be from the sample genomic location in two input data sets. 30 | 31 | ### Prediction 32 | Only low-resolution Hi-C samples are needed. The shape of the samples should be the same with the training stage. The prediction generates the enhanced Hi-C data, and the user should recombine the output to obtain the entire Hi-C matrix. 33 | 34 | ### Suggested way to generate samples 35 | We suggest that generate a file containing the location of each samples when generate the samples with n x n size. Therefore, after obtaining the high-resolution Hi-C, it is easy to recombine all of the samples to obtain high-resolution Hi-C matrix. 36 | 37 | ### Normalization and experimental condition 38 | Hi-C experiments have several different types of cutting enzyme as well as different normalization method. Our model can handle all of the conditions as long as the training and testing are under the same condition. For example, if the KR normalized samples are used in the training stage, the trained model only works for the KR normalized low-resolution sample. 39 | 40 | ## Citation 41 | 42 | http://biorxiv.org/content/early/2017/03/01/112631 43 | 44 | 45 | 46 | 47 | -------------------------------------------------------------------------------- /data/GM12878_replicate_down16_chr17_17.npy.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/data/GM12878_replicate_down16_chr17_17.npy.gz -------------------------------------------------------------------------------- /data/GM12878_replicate_down16_chr19_22.npy.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/data/GM12878_replicate_down16_chr19_22.npy.gz -------------------------------------------------------------------------------- /data/GM12878_replicate_original_chr19_22.npy.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/data/GM12878_replicate_original_chr19_22.npy.gz -------------------------------------------------------------------------------- /model/test_model.net: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/model/test_model.net -------------------------------------------------------------------------------- /src/Gaussian_kernel_smoothing.py: -------------------------------------------------------------------------------- 1 | 2 | import numpy as np 3 | import scipy.stats as st 4 | 5 | # Generate the 2D Gaussian Kernel 6 | # kernlen, size of the kernal 7 | # nsig, sigma in Gaussian distribution. 8 | # the code is partially from code.google.com/p/iterative-fusion 9 | def gkern(kernlen, nsig): 10 | """Returns a 2D Gaussian kernel array.""" 11 | interval = (2*nsig+1.)/(kernlen) 12 | x = np.linspace(-nsig-interval/2., nsig+interval/2., kernlen+1) 13 | kern1d = np.diff(st.norm.cdf(x)) 14 | kernel_raw = np.sqrt(np.outer(kern1d, kern1d)) 15 | kernel = kernel_raw/kernel_raw.sum() 16 | return kernel 17 | 18 | # Run the Gaussian smoothing on Hi-C matrix 19 | # matrix is a numpy matrix form of the Hi-C interaction heatmap 20 | def Gaussian_filter(matrix, sigma=4, size=13): 21 | result = np.zeros(matrix.shape) 22 | padding = size / 2 23 | kernel = gkern(13, nsig=sigma) 24 | for i in range(padding, matrix.shape[0] - padding): 25 | for j in range(padding, matrix.shape[0] - padding): 26 | result[i][j] = np.sum(matrix[i - padding : i + padding + 1, j - padding : j + padding + 1] * kernel) 27 | return result 28 | -------------------------------------------------------------------------------- /src/predict.py: -------------------------------------------------------------------------------- 1 | # Author: Yan Zhang 2 | # Email: zhangyan.cse (@) gmail.com 3 | 4 | import sys 5 | import os 6 | import pickle 7 | import gzip 8 | import numpy as np 9 | 10 | sys.setrecursionlimit(10000) 11 | 12 | input_model_name = '../model/test_model.net' 13 | 14 | f = open(input_model_name) 15 | net1 = pickle.load(f) 16 | down_sample_ratio = 16 17 | low_resolution_samples = np.load(gzip.GzipFile('../data/GM12878_replicate_down16_chr17_17.npy.gz', "r")) * down_sample_ratio 18 | 19 | enhanced_low_resolution_samples = net1.predict(low_resolution_samples) 20 | np.save('../data/enhanced_GM12878_replicate_down16_chr17_17.npy', enhanced_low_resolution_samples) 21 | 22 | # The output samples shape is (N, 1, n, n), where N is the number of the sample, and n is the sample size. The user should recombine the enhanced Hi-C samples to the entire Hi-C based on the rules which divided the sample. 23 | 24 | -------------------------------------------------------------------------------- /src/trainConvNet.py: -------------------------------------------------------------------------------- 1 | # Author: Yan Zhang 2 | # Email: zhangyan.cse (@) gmail.com 3 | 4 | import sys 5 | import numpy as np 6 | import matplotlib.pyplot as plt 7 | import pickle 8 | import os 9 | import gzip 10 | 11 | import lasagne 12 | from lasagne import layers 13 | from nolearn.lasagne import NeuralNet 14 | from lasagne.updates import sgd 15 | 16 | sys.setrecursionlimit(10000) 17 | conv2d1_filters_numbers = 8 18 | conv2d1_filters_size = 9 19 | conv2d2_filters_numbers = 8 20 | conv2d2_filters_size = 1 21 | conv2d3_filters_numbers = 1 22 | conv2d3_filters_size = 5 23 | 24 | down_sample_ratio = 16 25 | learning_rate = 0.00001 26 | epochs = 10 27 | HiC_max_value = 100 28 | 29 | # Read the input sample pairs. 30 | # The shape of the samples should be (N, 1, n, n), where N is the number of the samples, and n is the size of the samples. Both of the samples should have exact the same size. 31 | low_resolution_samples = np.load(gzip.GzipFile('../data/GM12878_replicate_down16_chr19_22.npy.gz', "r")) * down_sample_ratio 32 | high_resolution_samples = np.load(gzip.GzipFile('../data/GM12878_replicate_original_chr19_22.npy.gz', "r")) 33 | 34 | low_resolution_samples = np.minimum(HiC_max_value, low_resolution_samples) 35 | high_resolution_samples = np.minimum(HiC_max_value, high_resolution_samples) 36 | 37 | # Reshape the high-quality Hi-C sample as the target value of the training. 38 | sample_size = low_resolution_samples.shape[-1] 39 | padding = conv2d1_filters_size + conv2d2_filters_size + conv2d3_filters_size - 3 40 | half_padding = padding / 2 41 | output_length = sample_size - padding 42 | Y = [] 43 | for i in range(high_resolution_samples.shape[0]): 44 | no_padding_sample = high_resolution_samples[i][0][half_padding:(sample_size-half_padding) , half_padding:(sample_size - half_padding)] 45 | Y.append(no_padding_sample) 46 | Y = np.array(Y).astype(np.float32) 47 | Y = Y.reshape((Y.shape[0], -1)) 48 | 49 | X = low_resolution_samples 50 | 51 | net1 = NeuralNet( 52 | layers=[ 53 | ('input', layers.InputLayer), 54 | ('conv2d1', layers.Conv2DLayer), 55 | ('conv2d2', layers.Conv2DLayer), 56 | ('conv2d3', layers.Conv2DLayer), 57 | ('output_layer', layers.FlattenLayer), 58 | ], 59 | input_shape=(None, 1, sample_size, sample_size), 60 | conv2d1_num_filters=conv2d1_filters_numbers, 61 | conv2d1_filter_size = (conv2d1_filters_size, conv2d1_filters_size), 62 | conv2d1_nonlinearity=lasagne.nonlinearities.rectify, 63 | conv2d1_W=lasagne.init.GlorotUniform(), 64 | conv2d2_num_filters=conv2d2_filters_numbers, 65 | conv2d2_filter_size = (conv2d2_filters_size, conv2d2_filters_size), 66 | conv2d2_nonlinearity=lasagne.nonlinearities.rectify, 67 | conv2d3_num_filters=conv2d3_filters_numbers, 68 | conv2d3_nonlinearity=lasagne.nonlinearities.rectify, 69 | conv2d3_filter_size = (conv2d3_filters_size, conv2d3_filters_size), 70 | update=sgd, 71 | update_learning_rate = learning_rate, 72 | regression=True, 73 | max_epochs= epochs, 74 | verbose=1, 75 | ) 76 | 77 | net1.fit(X, Y) 78 | 79 | output_model_name = '../model/test_model' 80 | 81 | f = file(output_model_name + '.net', 'wb') 82 | 83 | pickle.dump(net1,f,protocol=pickle.HIGHEST_PROTOCOL) 84 | f.close() 85 | 86 | --------------------------------------------------------------------------------