├── .DS_Store
├── LICENSE
├── README.md
├── model
    ├── pytorch_HindIII_model_40000
    └── pytorch_model_12000
└── src
    ├── HindIII_train.txt
    ├── model.py
    ├── runHiCPlus.py
    ├── testConvNet.py
    ├── trainConvNet.py
    └── utils.py


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhangyan32/HiCPlus_pytorch/62c3cd674a32d8f2f8ecd296da7ca7bdd3ba087d/.DS_Store


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2017 Yan Zhang
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy
 4 | of this software and associated documentation files (the "Software"), to deal
 5 | in the Software without restriction, including without limitation the rights
 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 7 | copies of the Software, and to permit persons to whom the Software is
 8 | furnished to do so, subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
14 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
15 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
16 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
17 | DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
18 | OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
19 | OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 |  HiCPlus
 2 | Impletmented by PyTorch 
 3 | 
 4 | ## Dependency
 5 | 
 6 | * [Python] (https://www.python.org) (2.7) with Numpy and Scipy. We recommand use the  [Anaconda] (https://www.continuum.io) distribution to install Python. 
 7 | 
 8 | * [PyTorch] (https://http://pytorch.org/) (0.1.12_2). GPU acceleration is not required but strongly recommended. 
 9 | 
10 | ## Installation
11 | Clone the repo to your local folder. 
12 | 
13 | ```
14 | $ git clone https://github.com/zhangyan32/HiCPlus_pytorch.git
15 | 
16 | ```
17 | ## Usage
18 | 
19 | ### Prediction
20 | If the user doesn't train the model, just use [runHiCPlus.py](https://github.com/zhangyan32/HiCPlus_pytorch/blob/master/src/runHiCPlus.py) to generate the enhanced Hi-C interaction matrix. 
21 | 
22 | 
23 | ### Training
24 | In the training stage, both high-resolution Hi-C samples and low-resolution Hi-C samples are needed. Two samples should be in the same shape as (N, 1, n, n), where N is the number of the samples, and n is the size of the samples. The sample index of the sample should be from the sample genomic location in two input data sets. 
25 | 
26 | ### Prediction
27 | Only low-resolution Hi-C samples are needed. The shape of the samples should be the same with the training stage. The prediction generates the enhanced Hi-C data, and the user should recombine the output to obtain the entire Hi-C matrix. 
28 | 
29 | ### Suggested way to generate samples
30 | We suggest that generate a file containing the location of each samples when generate the samples with n x n size. Therefore, after obtaining the high-resolution Hi-C, it is easy to recombine all of the samples to obtain high-resolution Hi-C matrix. 
31 | 
32 | ### Normalization and experimental condition
33 | Hi-C experiments have several different types of cutting enzyme as well as different normalization method. Our model can handle all of the conditions as long as the training and testing are under the same condition. For example, if the KR normalized samples are used in the training stage, the trained model only works for the KR normalized low-resolution sample. 
34 | 
35 | 


--------------------------------------------------------------------------------
/model/pytorch_HindIII_model_40000:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhangyan32/HiCPlus_pytorch/62c3cd674a32d8f2f8ecd296da7ca7bdd3ba087d/model/pytorch_HindIII_model_40000


--------------------------------------------------------------------------------
/model/pytorch_model_12000:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhangyan32/HiCPlus_pytorch/62c3cd674a32d8f2f8ecd296da7ca7bdd3ba087d/model/pytorch_model_12000


--------------------------------------------------------------------------------
/src/model.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | from torch.autograd import Variable
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import numpy as np
  6 | from torch.utils import data
  7 | import gzip
  8 | import sys
  9 | import torch.optim as optim
 10 | conv2d1_filters_numbers = 8
 11 | conv2d1_filters_size = 9
 12 | conv2d2_filters_numbers = 8
 13 | conv2d2_filters_size = 1
 14 | conv2d3_filters_numbers = 1
 15 | conv2d3_filters_size = 5
 16 | 
 17 | class Net(nn.Module):
 18 |     def __init__(self, D_in, D_out):
 19 |         super(Net, self).__init__()
 20 |         # 1 input image channel, 6 output channels, 5x5 square convolution
 21 |         # kernel
 22 |         self.conv1 = nn.Conv2d(1, conv2d1_filters_numbers, conv2d1_filters_size)
 23 |         self.conv2 = nn.Conv2d(conv2d1_filters_numbers, conv2d2_filters_numbers, conv2d2_filters_size)
 24 |         self.conv3 = nn.Conv2d(conv2d2_filters_numbers, 1, conv2d3_filters_size)
 25 | 
 26 |     def forward(self, x):
 27 |         print("start forwardingf")
 28 |         x = self.conv1(x)
 29 |         x = F.relu(x)
 30 |         x = self.conv2(x)
 31 |         x = F.relu(x)
 32 |         x = self.conv3(x)
 33 |         x = F.relu(x)
 34 |         return x
 35 | '''
 36 |     def num_flat_features(self, x):
 37 |         size = x.size()[1:]  # all dimensions except the batch dimension
 38 |         num_features = 1
 39 |         for s in size:
 40 |             num_features *= s
 41 |         return num_features
 42 | '''
 43 | '''
 44 | net = Net(40, 24)
 45 | 
 46 | 
 47 | 
 48 | #sys.exit()
 49 | #low_resolution_samples = low_resolution_samples.reshape((low_resolution_samples.shape[0], 40, 40))
 50 | #print low_resolution_samples[0:1, :,: ,: ].shape
 51 | #low_resolution_samples = torch.from_numpy(low_resolution_samples[0:1, :,: ,: ])
 52 | #X = Variable(low_resolution_samples)
 53 | #print X
 54 | #Y = Variable(torch.from_numpy(Y[0]))
 55 | #X = Variable(torch.randn(1, 1, 40, 40))
 56 | #print X
 57 | optimizer = optim.SGD(net.parameters(), lr=0.0001, momentum=0.9)
 58 | criterion = nn.MSELoss()
 59 | for epoch in range(2):  # loop over the dataset multiple times
 60 |     print "epoch", epoch
 61 | 
 62 |     running_loss = 0.0
 63 |     for i, data in enumerate(train_loader, 0):
 64 |         # get the inputs
 65 |         inputs, labels = data
 66 |         #print(inputs.size())
 67 |         #print(labels.size())
 68 |         #print type(inputs)
 69 | 
 70 |         # wrap them in Variable
 71 |         inputs, labels = Variable(inputs), Variable(labels)
 72 | 
 73 |         # zero the parameter gradients
 74 |         optimizer.zero_grad()
 75 | 
 76 |         # forward + backward + optimize
 77 |         outputs = net(inputs)
 78 |         #print outputs
 79 |         loss = criterion(outputs, labels)
 80 | 
 81 |         loss.backward()
 82 |         optimizer.step()
 83 |         print i
 84 |         # print statistics
 85 |         #print type(loss)
 86 |         #print loss
 87 |         #print loss.data[0]
 88 |         #print loss.data
 89 |         #print type(data), len(data)
 90 |         #print "the key is ", type(data[0])
 91 |         
 92 | 
 93 | 
 94 | print('Finished Training')
 95 | 
 96 | 
 97 | output = net(X)
 98 | print(output)
 99 | print type(output)
100 | 
101 | loss = criterion(output, Y)
102 | 
103 | 
104 | net.zero_grad()     # zeroes the gradient buffers of all parameters
105 | 
106 | print('conv1.bias.grad before backward')
107 | print(net.conv1.bias.grad)
108 | 
109 | loss.backward()
110 | 
111 | print('conv1.bias.grad after backward')
112 | print(net.conv1.weight.grad)
113 | 
114 | '''
115 | 
116 | 


--------------------------------------------------------------------------------
/src/runHiCPlus.py:
--------------------------------------------------------------------------------
  1 | # Author: Yan Zhang  
  2 | # Email: zhangyan.cse (@) gmail.com
  3 | 
  4 | import sys
  5 | import numpy as np
  6 | import matplotlib.pyplot as plt
  7 | import pickle
  8 | import os
  9 | import gzip
 10 | import model
 11 | from torch.utils import data
 12 | import torch
 13 | import torch.optim as optim
 14 | from torch.autograd import Variable
 15 | from time import gmtime, strftime
 16 | import sys
 17 | import torch.nn as nn
 18 | import utils
 19 | import math
 20 | 
 21 | use_gpu = 1
 22 | 
 23 | conv2d1_filters_numbers = 8
 24 | conv2d1_filters_size = 9
 25 | conv2d2_filters_numbers = 8
 26 | conv2d2_filters_size = 1
 27 | conv2d3_filters_numbers = 1
 28 | conv2d3_filters_size = 5
 29 | 
 30 | 
 31 | down_sample_ratio = 16
 32 | epochs = 10
 33 | HiC_max_value = 100
 34 | 
 35 | 
 36 | 
 37 | # This block is the actual training data used in the training. The training data is too large to put on Github, so only toy data is used. 
 38 | input_file = '/home/zhangyan/Desktop/chr21.10kb.matrix'
 39 | low_resolution_samples, index = utils.divide(input_file)
 40 | 
 41 | low_resolution_samples = np.minimum(HiC_max_value, low_resolution_samples)
 42 | 
 43 | batch_size = low_resolution_samples.shape[0]
 44 | 
 45 | # Reshape the high-quality Hi-C sample as the target value of the training. 
 46 | sample_size = low_resolution_samples.shape[-1]
 47 | padding = conv2d1_filters_size + conv2d2_filters_size + conv2d3_filters_size - 3
 48 | half_padding = padding / 2
 49 | output_length = sample_size - padding
 50 | 
 51 | 
 52 | print low_resolution_samples.shape
 53 | 
 54 | lowres_set = data.TensorDataset(torch.from_numpy(low_resolution_samples), torch.from_numpy(np.zeros(low_resolution_samples.shape[0])))
 55 | lowres_loader = torch.utils.data.DataLoader(lowres_set, batch_size=batch_size, shuffle=False)
 56 | 
 57 | hires_loader = lowres_loader
 58 | 
 59 | model = model.Net(40, 28)
 60 | model.load_state_dict(torch.load('../model/pytorch_model_12000'))
 61 | if use_gpu:
 62 |     model = model.cuda()
 63 | 
 64 | _loss = nn.MSELoss()
 65 | 
 66 | 
 67 | running_loss = 0.0
 68 | running_loss_validate = 0.0
 69 | reg_loss = 0.0
 70 | 
 71 | 
 72 | for i, (v1, v2) in enumerate(zip(lowres_loader, hires_loader)):    
 73 |     _lowRes, _ = v1
 74 |     _highRes, _ = v2
 75 | 
 76 |     _lowRes = Variable(_lowRes).float()
 77 |     _highRes = Variable(_highRes).float()
 78 | 
 79 |     
 80 |     if use_gpu:
 81 |         _lowRes = _lowRes.cuda()
 82 |         _highRes = _highRes.cuda()
 83 |     y_prediction = model(_lowRes)
 84 | 
 85 |     
 86 | print '-------', i, running_loss, strftime("%Y-%m-%d %H:%M:%S", gmtime())
 87 | 
 88 | 
 89 | y_predict = y_prediction.data.cpu().numpy()
 90 | 
 91 | 
 92 | print y_predict.shape
 93 | 
 94 | # recombine samples
 95 | 
 96 | length = int(y_predict.shape[2])
 97 | y_predict = np.reshape(y_predict, (y_predict.shape[0], length, length))
 98 | 
 99 | 
100 | chrs_length = [249250621,243199373,198022430,191154276,180915260,171115067,159138663,146364022,141213431,135534747,135006516,133851895,115169878,107349540,102531392,90354753,81195210,78077248,59128983,63025520,48129895,51304566]
101 | 
102 | chrN = 21
103 | 
104 | length = chrs_length[chrN-1]/10000
105 | 
106 | prediction_1 = np.zeros((length, length))
107 | 
108 | 
109 | print 'predicted sample: ', y_predict.shape, '; index shape is: ', index.shape
110 | #print index
111 | for i in range(0, y_predict.shape[0]):          
112 |     if (int(index[i][1]) != chrN):
113 |         continue
114 | #print index[i]
115 | x = int(index[i][2])
116 | y = int(index[i][3])
117 | #print np.count_nonzero(y_predict[i])
118 | prediction_1[x+6:x+34, y+6:y+34] = y_predict[i]
119 | 
120 | np.save(input_file + 'enhanced.npy', prediction_1)
121 | 
122 | 
123 | 
124 | 
125 | 
126 | 


--------------------------------------------------------------------------------
/src/testConvNet.py:
--------------------------------------------------------------------------------
  1 | # Author: Yan Zhang  
  2 | # Email: zhangyan.cse (@) gmail.com
  3 | 
  4 | import sys
  5 | import numpy as np
  6 | import matplotlib.pyplot as plt
  7 | import pickle
  8 | import os
  9 | import gzip
 10 | import model
 11 | from torch.utils import data
 12 | import torch
 13 | import torch.optim as optim
 14 | from torch.autograd import Variable
 15 | from time import gmtime, strftime
 16 | import sys
 17 | import torch.nn as nn
 18 | 
 19 | use_gpu = 1
 20 | 
 21 | conv2d1_filters_numbers = 8
 22 | conv2d1_filters_size = 9
 23 | conv2d2_filters_numbers = 8
 24 | conv2d2_filters_size = 1
 25 | conv2d3_filters_numbers = 1
 26 | conv2d3_filters_size = 5
 27 | 
 28 | 
 29 | down_sample_ratio = 16
 30 | epochs = 10
 31 | HiC_max_value = 100
 32 | 
 33 | 
 34 | 
 35 | # This block is the actual training data used in the training. The training data is too large to put on Github, so only toy data is used. 
 36 | # cell = "GM12878_replicate"
 37 | # chrN_range1 = '1_8'
 38 | # chrN_range = '1_8'
 39 | 
 40 | # low_resolution_samples = np.load(gzip.GzipFile('/home/zhangyan/SRHiC_samples/'+cell+'down16_chr'+chrN_range+'.npy.gz', "r")).astype(np.float32) * down_sample_ratio
 41 | # high_resolution_samples = np.load(gzip.GzipFile('/home/zhangyan/SRHiC_samples/original10k/'+cell+'_original_chr'+chrN_range+'.npy.gz', "r")).astype(np.float32)
 42 | 
 43 | # low_resolution_samples = np.minimum(HiC_max_value, low_resolution_samples)
 44 | # high_resolution_samples = np.minimum(HiC_max_value, high_resolution_samples)
 45 | 
 46 | 
 47 | low_resolution_samples = np.load(gzip.GzipFile('../../data/GM12878_replicate_down16_chr19_22.npy.gz', "r")).astype(np.float32) * down_sample_ratio
 48 | 
 49 | low_resolution_samples = np.minimum(HiC_max_value, low_resolution_samples)
 50 | 
 51 | batch_size = low_resolution_samples.shape[0]
 52 | 
 53 | # Reshape the high-quality Hi-C sample as the target value of the training. 
 54 | sample_size = low_resolution_samples.shape[-1]
 55 | padding = conv2d1_filters_size + conv2d2_filters_size + conv2d3_filters_size - 3
 56 | half_padding = padding / 2
 57 | output_length = sample_size - padding
 58 | 
 59 | 
 60 | print low_resolution_samples.shape
 61 | 
 62 | lowres_set = data.TensorDataset(torch.from_numpy(low_resolution_samples), torch.from_numpy(np.zeros(low_resolution_samples.shape[0])))
 63 | lowres_loader = torch.utils.data.DataLoader(lowres_set, batch_size=batch_size, shuffle=False)
 64 | 
 65 | production = False
 66 | try:
 67 |     high_resolution_samples = np.load(gzip.GzipFile('../../data/GM12878_replicate_original_chr19_22.npy.gz', "r")).astype(np.float32)
 68 |     high_resolution_samples = np.minimum(HiC_max_value, high_resolution_samples)
 69 |     Y = []
 70 |     for i in range(high_resolution_samples.shape[0]):
 71 |         no_padding_sample = high_resolution_samples[i][0][half_padding:(sample_size-half_padding) , half_padding:(sample_size - half_padding)]
 72 |         Y.append(no_padding_sample)
 73 |     Y = np.array(Y).astype(np.float32)
 74 |     hires_set = data.TensorDataset(torch.from_numpy(Y), torch.from_numpy(np.zeros(Y.shape[0])))
 75 |     hires_loader = torch.utils.data.DataLoader(hires_set, batch_size=batch_size, shuffle=False)
 76 | except:
 77 |     production = True
 78 |     hires_loader = lowres_loader
 79 | 
 80 | Net = model.Net(40, 28)
 81 | Net.load_state_dict(torch.load('../model/pytorch_model_12000'))
 82 | if use_gpu:
 83 |     Net = Net.cuda()
 84 | 
 85 | _loss = nn.MSELoss()
 86 | 
 87 | 
 88 | running_loss = 0.0
 89 | running_loss_validate = 0.0
 90 | reg_loss = 0.0
 91 | 
 92 | 
 93 | for i, (v1, v2) in enumerate(zip(lowres_loader, hires_loader)):    
 94 |     _lowRes, _ = v1
 95 |     _highRes, _ = v2
 96 |     
 97 | 
 98 |     _lowRes = Variable(_lowRes)
 99 |     _highRes = Variable(_highRes)
100 | 
101 |     
102 |     if use_gpu:
103 |         _lowRes = _lowRes.cuda()
104 |         _highRes = _highRes.cuda()
105 |     y_prediction = Net(_lowRes)
106 |     if (not production):
107 |         loss = _loss(y_prediction, _highRes) 
108 | 
109 | 
110 |     running_loss += loss.data[0]
111 |     
112 | print '-------', i, running_loss, strftime("%Y-%m-%d %H:%M:%S", gmtime())
113 | 
114 | 
115 | y_prediction = y_prediction.data.cpu().numpy()
116 | 
117 | print y_prediction.shape
118 | 
119 | 
120 | 
121 | 
122 | 
123 | 
124 | 


--------------------------------------------------------------------------------
/src/trainConvNet.py:
--------------------------------------------------------------------------------
  1 | # Author: Yan Zhang  
  2 | # Email: zhangyan.cse (@) gmail.com
  3 | 
  4 | import sys
  5 | import numpy as np
  6 | import matplotlib.pyplot as plt
  7 | import pickle
  8 | import os
  9 | import gzip
 10 | import model
 11 | from torch.utils import data
 12 | import torch
 13 | import torch.optim as optim
 14 | from torch.autograd import Variable
 15 | from time import gmtime, strftime
 16 | import sys
 17 | import torch.nn as nn
 18 | 
 19 | use_gpu = 1
 20 | 
 21 | conv2d1_filters_numbers = 8
 22 | conv2d1_filters_size = 9
 23 | conv2d2_filters_numbers = 8
 24 | conv2d2_filters_size = 1
 25 | conv2d3_filters_numbers = 1
 26 | conv2d3_filters_size = 5
 27 | 
 28 | 
 29 | down_sample_ratio = 16
 30 | epochs = 10
 31 | HiC_max_value = 100
 32 | batch_size = 256
 33 | 
 34 | 
 35 | # This block is the actual training data used in the training. The training data is too large to put on Github, so only toy data is used. 
 36 | # cell = "GM12878_replicate"
 37 | # chrN_range1 = '1_8'
 38 | # chrN_range = '1_8'
 39 | 
 40 | # low_resolution_samples = np.load(gzip.GzipFile('/home/zhangyan/SRHiC_samples/'+cell+'down16_chr'+chrN_range+'.npy.gz', "r")).astype(np.float32) * down_sample_ratio
 41 | # high_resolution_samples = np.load(gzip.GzipFile('/home/zhangyan/SRHiC_samples/original10k/'+cell+'_original_chr'+chrN_range+'.npy.gz', "r")).astype(np.float32)
 42 | 
 43 | # low_resolution_samples = np.minimum(HiC_max_value, low_resolution_samples)
 44 | # high_resolution_samples = np.minimum(HiC_max_value, high_resolution_samples)
 45 | 
 46 | 
 47 | #low_resolution_samples = np.load(gzip.GzipFile('../../data/GM12878_replicate_down16_chr19_22.npy.gz', "r")).astype(np.float32) * down_sample_ratio
 48 | #high_resolution_samples = np.load(gzip.GzipFile('../../data/GM12878_replicate_original_chr19_22.npy.gz', "r")).astype(np.float32)
 49 | 
 50 | low_resolution_samples = np.load(gzip.GzipFile('/home/zhangyan/SRHiC_samples/IMR90_down_HINDIII16_chr1_8.npy.gz', "r")).astype(np.float32) * down_sample_ratio
 51 | high_resolution_samples = np.load(gzip.GzipFile('/home/zhangyan/SRHiC_samples/original10k/_IMR90_HindIII_original_chr1_8.npy.gz', "r")).astype(np.float32)
 52 | 
 53 | 
 54 | low_resolution_samples = np.minimum(HiC_max_value, low_resolution_samples)
 55 | high_resolution_samples = np.minimum(HiC_max_value, high_resolution_samples)
 56 | 
 57 | 
 58 | 
 59 | # Reshape the high-quality Hi-C sample as the target value of the training. 
 60 | sample_size = low_resolution_samples.shape[-1]
 61 | padding = conv2d1_filters_size + conv2d2_filters_size + conv2d3_filters_size - 3
 62 | half_padding = padding / 2
 63 | output_length = sample_size - padding
 64 | Y = []
 65 | for i in range(high_resolution_samples.shape[0]):
 66 |     no_padding_sample = high_resolution_samples[i][0][half_padding:(sample_size-half_padding) , half_padding:(sample_size - half_padding)]
 67 |     Y.append(no_padding_sample)
 68 | Y = np.array(Y).astype(np.float32)
 69 | 
 70 | print low_resolution_samples.shape, Y.shape
 71 | 
 72 | lowres_set = data.TensorDataset(torch.from_numpy(low_resolution_samples), torch.from_numpy(np.zeros(low_resolution_samples.shape[0])))
 73 | lowres_loader = torch.utils.data.DataLoader(lowres_set, batch_size=batch_size, shuffle=False)
 74 | 
 75 | hires_set = data.TensorDataset(torch.from_numpy(Y), torch.from_numpy(np.zeros(Y.shape[0])))
 76 | hires_loader = torch.utils.data.DataLoader(hires_set, batch_size=batch_size, shuffle=False)
 77 | 
 78 | 
 79 | Net = model.Net(40, 28)
 80 | 
 81 | if use_gpu:
 82 |     Net = Net.cuda()
 83 | 
 84 | optimizer = optim.SGD(Net.parameters(), lr = 0.00001)
 85 | _loss = nn.MSELoss()
 86 | Net.train()
 87 | 
 88 | running_loss = 0.0
 89 | running_loss_validate = 0.0
 90 | reg_loss = 0.0
 91 | 
 92 | # write the log file to record the training process
 93 | log = open('HindIII_train.txt', 'w')
 94 | for epoch in range(0, 100000):
 95 |     for i, (v1, v2) in enumerate(zip(lowres_loader, hires_loader)):    
 96 |         if (i == len(lowres_loader) - 1):
 97 |             continue 
 98 |         _lowRes, _ = v1
 99 |         _highRes, _ = v2
100 |         
101 | 
102 |         _lowRes = Variable(_lowRes)
103 |         _highRes = Variable(_highRes)
104 | 
105 |         
106 |         if use_gpu:
107 |             _lowRes = _lowRes.cuda()
108 |             _highRes = _highRes.cuda()
109 |         optimizer.zero_grad()
110 |         y_prediction = Net(_lowRes)
111 | 
112 |         loss = _loss(y_prediction, _highRes) 
113 | 
114 |         loss.backward()  
115 |         optimizer.step()
116 | 
117 |         running_loss += loss.data[0]
118 |         
119 |     print '-------', i, epoch, running_loss/i, strftime("%Y-%m-%d %H:%M:%S", gmtime())
120 |     
121 |     log.write(str(epoch) + ', ' + str(running_loss/i,) + '\n')
122 |     running_loss = 0.0
123 |     running_loss_validate = 0.0
124 |     # save the model every 100 epoches
125 |     if (epoch % 100 == 0):
126 |         torch.save(Net.state_dict(), '/home/zhangyan/pytorch_models/pytorch_HindIII_model_' + str(epoch))
127 |         pass
128 | 
129 | 
130 | 
131 | 
132 | 
133 | 
134 | 
135 | 
136 | 


--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import matplotlib.pyplot as plt
 3 | import os
 4 | 
 5 | 
 6 | def readSparseMatrix(filename, total_length):
 7 |     print "reading Rao's HiC "
 8 |     infile = open(filename).readlines()
 9 |     print len(infile)
10 |     HiC = np.zeros((total_length,total_length)).astype(np.int16)
11 |     percentage_finish = 0
12 |     for i in range(0, len(infile)):
13 |         if (i %  (len(infile) / 10)== 0):
14 |             print 'finish ', percentage_finish, '%'
15 |             percentage_finish += 10
16 |         nums = infile[i].split('\t')
17 |         x = int(nums[0])
18 |         y = int(nums[1])
19 |         val = int(float(nums[2]))
20 | 
21 |         HiC[x][y] = val
22 |         HiC[y][x] = val
23 |     return HiC
24 | 
25 | def readSquareMatrix(filename, total_length):
26 |     print "reading Rao's HiC "
27 |     infile = open(filename).readlines()
28 |     print('size of matrix is ' + str(len(infile)))
29 |     print('number of the bins based on the length of chromsomes is ' + str(total_length) )
30 |     result = []
31 |     for line in infile:
32 |         tokens = line.split('\t')
33 |         line_int = list(map(int, tokens))
34 |         result.append(line_int)
35 |     result = np.array(result)
36 |     print(result.shape)
37 |     return result
38 | 
39 | 
40 | def divide(HiCfile):
41 |     subImage_size = 40
42 |     step = 25
43 |     chrs_length = [249250621,243199373,198022430,191154276,180915260,171115067,159138663,146364022,141213431,135534747,135006516,133851895,115169878,107349540,102531392,90354753,81195210,78077248,59128983,63025520,48129895,51304566]
44 |     input_resolution = 10000
45 |     result = []
46 |     index = []
47 |     chrN = 21
48 |     matrix_name = HiCfile + '_npy_form_tmp.npy'
49 |     if os.path.exists(matrix_name):
50 |         print 'loading ', matrix_name
51 |         HiCsample = np.load(matrix_name)
52 |     else:
53 |         print matrix_name, 'not exist, creating'
54 |         print HiCfile           
55 |         HiCsample = readSquareMatrix(HiCfile, (chrs_length[chrN-1]/input_resolution + 1))
56 |         #HiCsample = np.loadtxt('/home/zhangyan/private_data/IMR90.nodup.bam.chr'+str(chrN)+'.10000.matrix', dtype=np.int16)
57 |         print HiCsample.shape
58 |         np.save(matrix_name, HiCsample)
59 |     print HiCsample.shape
60 |     path = '/home/zhangyan/HiCPlus_pytorch_production/' 
61 |     if not os.path.exists(path):
62 |         os.makedirs(path)
63 |     total_loci = HiCsample.shape[0]
64 |     for i in range(0, total_loci, step):
65 |         for j in range(0, total_loci, ):
66 |             if (abs(i-j) > 201 or i + subImage_size >= total_loci or j + subImage_size >= total_loci):
67 |                 continue
68 |             subImage = HiCsample[i:i+subImage_size, j:j+subImage_size]
69 | 
70 |             result.append([subImage,])
71 |             tag = 'test'
72 |             index.append((tag, chrN, i, j))
73 |     result = np.array(result)
74 |     print result.shape
75 |     result = result.astype(np.double)
76 |     index = np.array(index)
77 |     return result, index
78 | 
79 |   
80 | if __name__ == "__main__":
81 |     main()


--------------------------------------------------------------------------------