├── .DS_Store
├── LICENSE
├── README.md
├── data
    ├── GM12878_replicate_down16_chr17_17.npy.gz
    ├── GM12878_replicate_down16_chr19_22.npy.gz
    └── GM12878_replicate_original_chr19_22.npy.gz
├── model
    └── test_model.net
└── src
    ├── Gaussian_kernel_smoothing.py
    ├── predict.py
    └── trainConvNet.py


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/.DS_Store


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2017 Yan Zhang
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy
 4 | of this software and associated documentation files (the "Software"), to deal
 5 | in the Software without restriction, including without limitation the rights
 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 7 | copies of the Software, and to permit persons to whom the Software is
 8 | furnished to do so, subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
14 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
15 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
16 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
17 | DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
18 | OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
19 | OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # HiCPlus
 2 | Resolution Enhancement of HiC interaction heatmap. 
 3 | ## Deprecation warning
 4 | Since Theano is no long deveopled, as well as to avoid too many dependencies, we impletemented the Pytorch version 
 5 | https://github.com/wangjuan001/hicplus
 6 | Currently, this repo is no longer maintained. 
 7 | 
 8 | 
 9 | ## Dependency
10 | 
11 | * [Python] (https://www.python.org) (2.7) with Numpy and Scipy. We recommand use the  [Anaconda] (https://www.continuum.io) distribution to install Python. 
12 | * [Theano] (https://github.com/Theano/Theano) (0.8.0). GPU acceleration is not required but strongly recommended. CuDNN is also recommended to maximize the GPU performance. 
13 | * [Lasagne] (https://github.com/Lasagne/Lasagne) (0.2.dev1)
14 | * [Nolearn] (https://github.com/dnouri/nolearn) (0.6a0.dev0)
15 | 
16 | 
17 | 
18 | ## Installation
19 | Just clone the repo to your local folder. 
20 | 
21 | ```
22 | $ git clone https://github.com/zhangyan32/HiCPlus.git
23 | 
24 | ```
25 | 
26 | ## Usage
27 | 
28 | ### Training
29 | In the training stage, both high-resolution Hi-C samples and low-resolution Hi-C samples are needed. Two samples should be in the same shape as (N, 1, n, n), where N is the number of the samples, and n is the size of the samples. The sample index of the sample should be from the sample genomic location in two input data sets. 
30 | 
31 | ### Prediction
32 | Only low-resolution Hi-C samples are needed. The shape of the samples should be the same with the training stage. The prediction generates the enhanced Hi-C data, and the user should recombine the output to obtain the entire Hi-C matrix. 
33 | 
34 | ### Suggested way to generate samples
35 | We suggest that generate a file containing the location of each samples when generate the samples with n x n size. Therefore, after obtaining the high-resolution Hi-C, it is easy to recombine all of the samples to obtain high-resolution Hi-C matrix. 
36 | 
37 | ### Normalization and experimental condition
38 | Hi-C experiments have several different types of cutting enzyme as well as different normalization method. Our model can handle all of the conditions as long as the training and testing are under the same condition. For example, if the KR normalized samples are used in the training stage, the trained model only works for the KR normalized low-resolution sample. 
39 | 
40 | ## Citation
41 | 
42 | http://biorxiv.org/content/early/2017/03/01/112631
43 | 
44 | 
45 | 
46 | 
47 | 


--------------------------------------------------------------------------------
/data/GM12878_replicate_down16_chr17_17.npy.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/data/GM12878_replicate_down16_chr17_17.npy.gz


--------------------------------------------------------------------------------
/data/GM12878_replicate_down16_chr19_22.npy.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/data/GM12878_replicate_down16_chr19_22.npy.gz


--------------------------------------------------------------------------------
/data/GM12878_replicate_original_chr19_22.npy.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/data/GM12878_replicate_original_chr19_22.npy.gz


--------------------------------------------------------------------------------
/model/test_model.net:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhangyan32/HiCPlus/5b8d56e44408b3eab6df8944f75f649c75d2d88d/model/test_model.net


--------------------------------------------------------------------------------
/src/Gaussian_kernel_smoothing.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import numpy as np
 3 | import scipy.stats as st
 4 | 
 5 | # Generate the 2D Gaussian Kernel
 6 | # kernlen, size of the kernal
 7 | # nsig, sigma in Gaussian distribution. 
 8 | # the code is partially from code.google.com/p/iterative-fusion
 9 | def gkern(kernlen, nsig):
10 |     """Returns a 2D Gaussian kernel array."""
11 |     interval = (2*nsig+1.)/(kernlen)
12 |     x = np.linspace(-nsig-interval/2., nsig+interval/2., kernlen+1)
13 |     kern1d = np.diff(st.norm.cdf(x))
14 |     kernel_raw = np.sqrt(np.outer(kern1d, kern1d))
15 |     kernel = kernel_raw/kernel_raw.sum()
16 |     return kernel
17 | 
18 | # Run the Gaussian smoothing on Hi-C matrix
19 | # matrix is a numpy matrix form of the Hi-C interaction heatmap
20 | def Gaussian_filter(matrix, sigma=4, size=13):
21 |     result = np.zeros(matrix.shape)
22 |     padding = size / 2
23 |     kernel = gkern(13, nsig=sigma)
24 |     for i in range(padding, matrix.shape[0] - padding):
25 |         for j in range(padding, matrix.shape[0] - padding):
26 |             result[i][j] = np.sum(matrix[i - padding : i + padding + 1, j - padding : j + padding + 1] * kernel)
27 |     return result
28 | 


--------------------------------------------------------------------------------
/src/predict.py:
--------------------------------------------------------------------------------
 1 | # Author: Yan Zhang  
 2 | # Email: zhangyan.cse (@) gmail.com
 3 | 
 4 | import sys
 5 | import os
 6 | import pickle
 7 | import gzip
 8 | import numpy as np
 9 | 
10 | sys.setrecursionlimit(10000)
11 | 
12 | input_model_name = '../model/test_model.net'
13 | 
14 | f = open(input_model_name)
15 | net1 = pickle.load(f)
16 | down_sample_ratio = 16
17 | low_resolution_samples = np.load(gzip.GzipFile('../data/GM12878_replicate_down16_chr17_17.npy.gz', "r")) * down_sample_ratio
18 | 
19 | enhanced_low_resolution_samples = net1.predict(low_resolution_samples)
20 | np.save('../data/enhanced_GM12878_replicate_down16_chr17_17.npy', enhanced_low_resolution_samples)
21 | 
22 | # The output samples shape is (N, 1, n, n), where N is the number of the sample, and n is the sample size. The user should recombine the enhanced Hi-C samples to the entire Hi-C based on the rules which divided the sample. 
23 | 
24 | 


--------------------------------------------------------------------------------
/src/trainConvNet.py:
--------------------------------------------------------------------------------
 1 | # Author: Yan Zhang  
 2 | # Email: zhangyan.cse (@) gmail.com
 3 | 
 4 | import sys
 5 | import numpy as np
 6 | import matplotlib.pyplot as plt
 7 | import pickle
 8 | import os
 9 | import gzip
10 | 
11 | import lasagne
12 | from lasagne import layers
13 | from nolearn.lasagne import NeuralNet
14 | from lasagne.updates import sgd
15 | 
16 | sys.setrecursionlimit(10000)
17 | conv2d1_filters_numbers = 8
18 | conv2d1_filters_size = 9
19 | conv2d2_filters_numbers = 8
20 | conv2d2_filters_size = 1
21 | conv2d3_filters_numbers = 1
22 | conv2d3_filters_size = 5
23 | 
24 | down_sample_ratio = 16
25 | learning_rate = 0.00001
26 | epochs = 10
27 | HiC_max_value = 100
28 | 
29 | # Read the input sample pairs. 
30 | # The shape of the samples should be (N, 1, n, n), where N is the number of the samples, and n is the size of the samples. Both of the samples should have exact the same size. 
31 | low_resolution_samples = np.load(gzip.GzipFile('../data/GM12878_replicate_down16_chr19_22.npy.gz', "r")) * down_sample_ratio
32 | high_resolution_samples = np.load(gzip.GzipFile('../data/GM12878_replicate_original_chr19_22.npy.gz', "r")) 
33 | 
34 | low_resolution_samples = np.minimum(HiC_max_value, low_resolution_samples)
35 | high_resolution_samples = np.minimum(HiC_max_value, high_resolution_samples)
36 | 
37 | # Reshape the high-quality Hi-C sample as the target value of the training. 
38 | sample_size = low_resolution_samples.shape[-1]
39 | padding = conv2d1_filters_size + conv2d2_filters_size + conv2d3_filters_size - 3
40 | half_padding = padding / 2
41 | output_length = sample_size - padding
42 | Y = []
43 | for i in range(high_resolution_samples.shape[0]):
44 |     no_padding_sample = high_resolution_samples[i][0][half_padding:(sample_size-half_padding) , half_padding:(sample_size - half_padding)]
45 |     Y.append(no_padding_sample)
46 | Y = np.array(Y).astype(np.float32)
47 | Y = Y.reshape((Y.shape[0], -1))
48 | 
49 | X = low_resolution_samples
50 | 
51 | net1 = NeuralNet(
52 |     layers=[
53 |         ('input', layers.InputLayer),
54 |         ('conv2d1', layers.Conv2DLayer),
55 |         ('conv2d2', layers.Conv2DLayer),
56 |         ('conv2d3', layers.Conv2DLayer),
57 |         ('output_layer', layers.FlattenLayer),
58 |         ],
59 |     input_shape=(None, 1, sample_size, sample_size),
60 |     conv2d1_num_filters=conv2d1_filters_numbers, 
61 |     conv2d1_filter_size = (conv2d1_filters_size, conv2d1_filters_size),
62 |     conv2d1_nonlinearity=lasagne.nonlinearities.rectify,
63 |     conv2d1_W=lasagne.init.GlorotUniform(),  
64 |     conv2d2_num_filters=conv2d2_filters_numbers, 
65 |     conv2d2_filter_size = (conv2d2_filters_size, conv2d2_filters_size), 
66 |     conv2d2_nonlinearity=lasagne.nonlinearities.rectify,   
67 |     conv2d3_num_filters=conv2d3_filters_numbers, 
68 |     conv2d3_nonlinearity=lasagne.nonlinearities.rectify,
69 |     conv2d3_filter_size = (conv2d3_filters_size, conv2d3_filters_size),
70 |     update=sgd,       
71 |     update_learning_rate = learning_rate,
72 |     regression=True,
73 |     max_epochs= epochs,
74 |     verbose=1,
75 |     )
76 | 
77 | net1.fit(X, Y)
78 | 
79 | output_model_name = '../model/test_model'
80 | 
81 | f = file(output_model_name + '.net', 'wb')
82 | 
83 | pickle.dump(net1,f,protocol=pickle.HIGHEST_PROTOCOL)
84 | f.close()
85 | 
86 | 


--------------------------------------------------------------------------------