├── README.md
├── boxscaler.py
├── coords_from_data.sh
├── determine_relative_pixel_size.py
├── remove_edge_particles.sh
├── rescale_particles.py
├── scale_ctf.sh
└── star_apply_matrix.py


/README.md:
--------------------------------------------------------------------------------
 1 | # CryoEM-scripts
 2 | 
 3 | Some Python and bash scripts to make cryoEM easier.
 4 | 
 5 | 
 6 | **Scripts to help scaling and merging data sets**
 7 | 
 8 | These four scripts are featured in our Acta D paper "Methods for Merging Data Sets in Electron Cryo-Microscopy" and aim to make merging data sets a bit more tolerable.
 9 | 
10 | A couple of scripts are written by me, apologies for the poor python/bash styling:
11 | 
12 | `boxscaler.py` will find combinations of even box sizes that give a desired scaling factor.
13 | 
14 | `scale_ctf.sh` will do a pretty good* approximate job of rescaling your defocus values to a different pixel size. This bypasses having to re-run CTFFIND or Gctf, which is useful e.g. if micrographs are no longer on disk.
15 | 
16 | `star_apply_matrix.py` will apply a Chimera transformation matrix to the Euler angles and offsets of particles belonging to a specified class in a STAR file. This can be used to align particles based on the orientation of domains in their 3D classes.
17 | 
18 | And a couple of much more nicely written scripts by Thomas Martin, Ana Casanal, and Takanori Nakane:
19 | 
20 | `determine_relative_pixel_size.py` will find the pixel size that maximises the correlation (measured with FSC) of one map to a reference map - useful for finding relative pixel sizes when merging data sets.
21 | 
22 | `rescale_particles.py` will rescale particle coordinates, for if you've decided to scale your datasets by rescaling your micrographs. Also does some STAR file wizardry to fix up various paths.
23 | 
24 | \*"pretty good" means +/- 40A compared to actually re-runnning GCTF. Considering this is probably very close or better than the precision of GCTF itself, especially at normal resolution regimes, I'd say "pretty good" really means "entirely good enough, especially if you plan on running per-particle CTF refinement anyway." At least in my experience, YMMV.
25 | 


--------------------------------------------------------------------------------
/boxscaler.py:
--------------------------------------------------------------------------------
 1 | #!/bin/python
 2 | # BoxScaler: finds optimal box size ratios for scaling data
 3 | # Max Wilkinson, MRC LMB, October 2018
 4 | 
 5 | 
 6 | 
 7 | import numpy as np
 8 | 
 9 | ################
10 | 
11 | startbox = input("What is the smallest box size allowed? ")
12 | finishbox = input("What is the largest box size allowed? ")
13 | startApix = input("What is your starting pixel size? ")
14 | endApix = input("What is the desired final pixel size? ")
15 | ntimes = input("and how many answers do you want? ")
16 | 
17 | # if provided box size is odd
18 | if startbox % 2 != 0:
19 |     startbox += 1
20 | 
21 | # get desired ratio
22 | apixRatio=startApix/endApix
23 | 
24 | # produce range of possible box sizes and compute a division matrix between them all
25 | boxArray = np.arange(startbox,finishbox,2,dtype=float)
26 | ratioArray = np.divide.outer(boxArray,boxArray)
27 | i = 1
28 | while i <= ntimes:
29 |     # find index of closest box size ratio to the desired angpix ratio
30 |     idxMin = np.abs(ratioArray - apixRatio).argmin()
31 |     indxMin2D = np.unravel_index(idxMin,ratioArray.shape)  #unravel_index converts a 1D index into a 2D index based on the dimensions given by .shape
32 | 
33 |     # use x and y of this index to find the starting and ending box sizes
34 |     startscale=boxArray[indxMin2D[1]]
35 |     endscale=boxArray[indxMin2D[0]]
36 | 
37 |     # print output
38 |     print('Starting with a {:.0f} pixel box, scale to a {:.0f} pixel box'.format(startscale, endscale))
39 |     print('This will give a scaling factor of {:.5f}, compared to a desired pixel size ratio of {:.5f}, giving a {:.3f} percent error.\n'.format(endscale/startscale, apixRatio, (endscale/startscale-apixRatio)/apixRatio*100))
40 |     # remove this answer from ratioarray by setting it to a very large number
41 |     ratioArray.flat[idxMin] = 999.
42 |     i += 1
43 | 
44 | 


--------------------------------------------------------------------------------
/coords_from_data.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | ###############
 4 | # Make coordinate files from _data.star
 5 | # Max Wilkinson
 6 | # #############
 7 | 
 8 | if [ $# -eq 0 ]
 9 |     then
10 |     echo "Usage:        $0 run_data.star <2dmatch>"
11 |     exit 1
12 | fi
13 | 
14 | if [ -z "$2" ]
15 |     then
16 |     match=2dmatch
17 |     else
18 |     match=$2
19 | fi
20 | 
21 | 
22 | Xidx=`gawk 'NF < 3 && /_rlnCoordinateX/{print $2}' $1 | cut -c 2-`
23 | Yidx=`gawk 'NF < 3 && /_rlnCoordinateY/{print $2}' $1 | cut -c 2-`
24 | Micidx=`gawk 'NF < 3 && /_rlnMicrographName/{print $2}' $1 | cut -c 2-`
25 | Classidx=`gawk 'NF < 3 && /_rlnClassNumber/{print $2}' $1 | cut -c 2-`
26 | FOMidx=`gawk 'NF < 3 && /_rlnAutopickFigureOfMerit/{print $2}' $1 | cut -c 2-`
27 | 
28 | Mics=$(gawk "NF > 3 {print \$$Micidx}" $1 | sort | uniq)
29 | 
30 | for mic in $Mics; do 
31 | out=`echo $mic | sed "s/.mrc/_"$match".star/"`
32 | 
33 | echo '
34 | data_
35 | 
36 | loop_
37 | _rlnCoordinateX #1
38 | _rlnCoordinateY #2
39 | _rlnClassNumber #3
40 | _rlnAutopickFigureOfMerit #4
41 | ' > $out
42 | 
43 | gawk -v m="$mic" "\$$Micidx == m {print \$$Xidx, \$$Yidx, \$$Classidx, \$$FOMidx}" $1 >> $out
44 | 
45 | echo "Wrote $out"
46 | done
47 | 
48 | 


--------------------------------------------------------------------------------
/determine_relative_pixel_size.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | """
  5 | determine_pixel_size
  6 | ---------
  7 | 
  8 | Script to determine relative pixel size of map to map by FSC in RELION. 
  9 | 
 10 | Authors: Ana Casañal, Thomas G. Martin & Takanori Nakane
 11 | License:GPLv3
 12 | 
 13 | """
 14 | 
 15 | import argparse
 16 | import numpy as np
 17 | import subprocess
 18 | from collections import OrderedDict
 19 | 
 20 | parser = argparse.ArgumentParser(description='')
 21 | 
 22 | parser.add_argument('--ref_map',nargs='?', help='filename of the reference map of which the pixel size is known',type=str)
 23 | parser.add_argument('--angpix_ref_map',nargs='?', help='pixel size in Å of the reference map. Will be used as fixed reference',type=float)
 24 | parser.add_argument('--map',nargs='?', help='filename of map of which the pixel size is not known',type=str)
 25 | parser.add_argument('--angpix_map_nominal',nargs='?', help='Starting pixel size in Å of the map that needs the pixel size to be determined. Will be used as a starting point to search for the relative pixel size.',type=float)
 26 | args = parser.parse_args()
 27 | 
 28 | 
 29 | def load_star(filename):
 30 |     from collections import OrderedDict
 31 |     
 32 |     datasets = OrderedDict()
 33 |     current_data = None
 34 |     current_colnames = None
 35 |     
 36 |     in_loop = 0 # 0: outside 1: reading colnames 2: reading data
 37 | 
 38 |     for line in open(filename):
 39 |         line = line.strip()
 40 |         
 41 |         # remove comments
 42 |         comment_pos = line.find('#')
 43 |         if comment_pos > 0:
 44 |             line = line[:comment_pos]
 45 | 
 46 |         if line == "":
 47 |             continue
 48 | 
 49 |         if line.startswith("data_"):
 50 |             in_loop = 0
 51 | 
 52 |             data_name = line[5:]
 53 |             current_data = OrderedDict()
 54 |             datasets[data_name] = current_data
 55 | 
 56 |         elif line.startswith("loop_"):
 57 |             current_colnames = []
 58 |             in_loop = 1
 59 | 
 60 |         elif line.startswith("_"):
 61 |             if in_loop == 2:
 62 |                 in_loop = 0
 63 | 
 64 |             elems = line[1:].split()
 65 |             if in_loop == 1:
 66 |                 current_colnames.append(elems[0])
 67 |                 current_data[elems[0]] = []
 68 |             else:
 69 |                 current_data[elems[0]] = elems[1]
 70 | 
 71 |         elif in_loop > 0:
 72 |             in_loop = 2
 73 |             elems = line.split()
 74 |             assert len(elems) == len(current_colnames)
 75 |             for idx, e in enumerate(elems):
 76 |                 current_data[current_colnames[idx]].append(e)        
 77 |         
 78 |     return datasets
 79 | 
 80 | def load_mrc(filename, maxz=9999): 
 81 | 	inmrc = open(filename, "rb") 
 82 | 	header_int = np.fromfile(inmrc, dtype=np.uint32, count=256) 
 83 | 	inmrc.seek(0, 0) 
 84 | 	header_float = np.fromfile(inmrc, dtype=np.float32, count=256) 
 85 | 
 86 | 	nx, ny, nz = header_int[0:3] 
 87 | 	eheader = header_int[23] 
 88 | 	mrc_type = None 
 89 | 	if header_int[3] == 2: 
 90 | 		mrc_type = np.float32 
 91 | 	elif header_int[3] == 6: 
 92 | 		mrc_type = np.uint16 
 93 | 	nz = np.min([maxz, nz]) 
 94 | 
 95 | 	inmrc.seek(1024 + eheader, 0) 
 96 | 	map_slice = np.fromfile(inmrc, mrc_type, nx * ny * nz).reshape(nz, ny, 
 97 | 	nx).astype(np.float32) 
 98 | 
 99 | 	return nx, ny, nz, map_slice 
100 | 
101 | 
102 | def determine_fsc_dropoff_point (ref_map, map2, angpix_ref_map, angpix_map_2, box_map):
103 | 
104 | 	fsc_aims = [0.5,0.4,0.3,0.2]
105 | 
106 | 	if (map2.find('.mrc') >= 0):
107 | 		tmp_output_name = map2.replace('.mrc','_tmp.mrc')
108 | 		tmp_fsc_output_name = map2.replace('.mrc','_tmp_fsc.star')
109 | 	else:
110 | 		tmp_output_name = map2 + '_tmp.mrc'
111 | 		tmp_fsc_output_name = map2  + '_tmp_fsc.star'
112 | 	subprocess.check_output(['relion_image_handler','--i', map2, '--o', tmp_output_name, '--angpix', str(angpix_map_2), '--rescale_angpix', str(angpix_ref_map), '--new_box', str(box_map), '--shift_com'])
113 | 	star = subprocess.check_output(['relion_image_handler', '--i', ref_map, '--angpix', str(angpix_ref_map), '--fsc', tmp_output_name])
114 | 	f = open(tmp_fsc_output_name, "w") 
115 | 	f.write(star) 
116 | 	f.close() 
117 | 	fsc_star = load_star(tmp_fsc_output_name)
118 | 	fsc_sum = 0
119 | 	for fsc_aim in fsc_aims:
120 | 		fsc_sum = fsc_sum + get_fsc_dropoff_point_in_star(fsc_star, fsc_aim)
121 | 	return fsc_sum/len(fsc_aims)
122 | 
123 | 
124 | def get_fsc_dropoff_point_in_star (fsc_star, fsc_aim):
125 | 	fsc_list = fsc_star['fsc']['rlnFourierShellCorrelation']
126 | 	res_list = fsc_star['fsc']['rlnAngstromResolution']
127 | 	i_threshold = 0
128 | 	i = 0
129 | 	fsc_above_threshold = True
130 | 	for fsc in fsc_list:
131 | 		if fsc_above_threshold and (float(fsc) > fsc_aim):
132 | 			i_threshold = i
133 | 		elif (fsc_above_threshold) and (float(fsc) > -0.5):
134 | 			return interpolate(float(res_list[i_threshold]),float(res_list[i]),float(fsc_list[i_threshold]),float(fsc_list[i]),fsc_aim)
135 | 			fsc_above_threshold = False
136 | 		i = i + 1
137 | 	return float(res_list[-1])
138 | 
139 | def interpolate (x1, x2, y1, y2, y_goal):
140 | 	try:
141 | 		return x1+((x2-x1)*(y1-y_goal)/(y1-y2))
142 | 	except ZeroDivisionError:
143 | 		return 999
144 | 
145 | print("determine_relative_pixel_size.py GPL 2019")
146 | 
147 | ref_map = args.ref_map
148 | angpix_ref_map = args.angpix_ref_map
149 | map2 = args.map
150 | angpix_map_nominal = args.angpix_map_nominal
151 | 
152 | 
153 | nx, ny, nz, _ = load_mrc(ref_map) 
154 | box_map = nx
155 | 
156 | 
157 | if (map2.find('.mrc') >= 0):
158 | 	tmp_output_name = ref_map.replace('.mrc','_tmp.mrc')
159 | else:
160 | 	tmp_output_name = ref_map + '_tmp.mrc'
161 | 
162 | subprocess.check_output(['relion_image_handler','--i', ref_map, '--o', tmp_output_name, '--shift_com'])
163 | ref_map = tmp_output_name
164 | 
165 | initial_step_range = 3
166 | step_sizes = [0.1, 0.05, 0.02, 0.01, 0.005, 0.002]
167 | 
168 | angpix_start = angpix_map_nominal - (initial_step_range * step_sizes[0]) - step_sizes[0]
169 | angpix_end = angpix_map_nominal + (initial_step_range * step_sizes[0]) 
170 | 
171 | 
172 | angpix_list = []
173 | res_list = []
174 | for step_size in step_sizes:
175 | 	print ("------------------")
176 | 	print("step: " + str(step_size) + " range: " + str(angpix_start) + " - " + str(angpix_end))
177 | 	print ("------------------")
178 | 	
179 | 	angpix = angpix_start
180 | 	while angpix <= angpix_end:
181 | 		res = determine_fsc_dropoff_point(ref_map, map2, angpix_ref_map, angpix, box_map)
182 | 		print("angpix: " + str(angpix) + " fsc: " + str(res))
183 | 		angpix_list.append(angpix)
184 | 		res_list.append(res)
185 | 
186 | 		angpix = round(angpix + step_size,5)
187 | 	
188 | 	min_res = res_list[0]
189 | 	index_start = 0
190 | 	index_end = 0
191 | 	best_index_start = 0
192 | 	best_index_end = 0
193 | 	for k in range(len(res_list)-1):
194 | 		i = k + 1
195 | 		if (res_list[i] < min_res):
196 | 			min_res = res_list[i]
197 | 			index_start = i -1
198 | 			index_end = i + 1
199 | 			best_index_start = i
200 | 			best_index_end = i
201 | 		elif (res_list[i] == min_res):
202 | 			index_end = i + 1
203 | 			best_index_end = i
204 | 
205 | 	estimate_pixel_size = angpix_list[best_index_start] + (angpix_list[best_index_end] - angpix_list[best_index_start])/2
206 | 	if (best_index_end == best_index_start):
207 | 		print ("------------------")
208 | 		print ("BEST:" + str(estimate_pixel_size))
209 | 	else:
210 | 		print ("------------------")
211 | 		print ("BEST:" + str(estimate_pixel_size) + " range: " + str(angpix_list[best_index_start]) + " - " + str(angpix_list[best_index_end]))
212 | 
213 | 	if (index_end > len(res_list)-1):
214 | 		index_end = len(res_list)-1
215 | 	angpix_start = angpix_list[index_start]
216 | 	angpix_end = angpix_list[index_end]
217 | 	res_start = res_list[index_start]
218 | 
219 | 	angpix_list = []
220 | 	res_list = []
221 | 	angpix_list.append(angpix_start)
222 | 	res_list.append(res_start)
223 | 
224 | 
225 | 
226 | 
227 | 
228 | 
229 | 
230 | 
231 | 
232 | 
233 | 
234 | 
235 | 
236 | 
237 | 
238 | 


--------------------------------------------------------------------------------
/remove_edge_particles.sh:
--------------------------------------------------------------------------------
 1 | ############################################################
 2 | # A script to remove particles from edges of micrographs   #
 3 | #         by Max Wilkinson, MRC-LMB Oct 2018               #
 4 | ############################################################
 5 | #!/bin/bash
 6 | 
 7 | CLEAR='\033[0m'
 8 | RED='\033[0;31m'
 9 | 
10 | function usage() {
11 |     if [ -n "$1" ]; then
12 |     echo -e "${RED}👉 $1${CLEAR}\n";
13 |     fi
14 |     echo "Usage: $0 [-x mic_x] [-y mic_y] [-box boxsize] [-tolerance tolerance level] [-i starfile] [-o starfile]"
15 |     echo "  -x           Width of micrograph"
16 |     echo "  -y           Height of micrograph"
17 |     echo "  -box         Box size particles were extracted with"
18 |     echo "  -tolerance   How much overlap in pixels is allowed between the box and edge of micrograph"
19 |     echo "  -i           The star file to operate on"
20 |     echo "  -o           Star file to write out"
21 |     echo ""
22 |     echo "Example: $0 -x 3838 -y 3710 -box 512 -tolerance 20 -i Extract/job413/particles.star -o particles_noedge.star"
23 |     exit 1
24 | } 
25 | 
26 | 
27 | #defaults
28 | mic_x=3838
29 | mic_y=3710
30 | box=512
31 | tolerance=20
32 | starfile=Extract/job413/particles.star
33 | outstar=particles_test.star
34 | 
35 | # parse params
36 | if [[ "$#" -eq 0 ]]; then
37 |     usage
38 | fi
39 | while [[ "$#" > 1 ]]; do case $1 in
40 |     -x|--x) mic_x="$2"; shift;shift;;
41 |     -y|--y) mic_y="$2"; shift;shift;;
42 |     -box|--box) box="$2"; shift;shift;;
43 |     -tolerance|--tolerance) tolerance="$2"; shift;shift;;
44 |     -i|--i) starfile="$2"; shift;shift;;
45 |     -o|--o) outstar="$2"; shift;shift;;
46 |     *) usage "Unknown parameter passed: $1"; shift;shift;;
47 | esac; done
48 | 
49 | 
50 | coord_x_idx=`awk '/_rlnCoordinateX/{split($2, a, "#"); print a[2]}' $starfile`
51 | coord_y_idx=`awk '/_rlnCoordinateY/{split($2, a, "#"); print a[2]}' $starfile`
52 | 
53 | 
54 | awk '{if (NF<= 2) {print} else \
55 | {if ($'$coord_x_idx'>'$box'/2-'$tolerance' && \
56 |      $'$coord_x_idx'<'$mic_x'-'$box'/2+'$tolerance' && \
57 |      $'$coord_y_idx'>'$box'/2-'$tolerance' && \
58 |      $'$coord_y_idx'<'$mic_y'-'$box'/2+'$tolerance' \
59 |     ) {print}}}' $starfile > $outstar
60 | 
61 | 


--------------------------------------------------------------------------------
/rescale_particles.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | """
  5 | rescale_particles
  6 | ---------
  7 | 
  8 | Script to rescale particles
  9 | 
 10 | Authors: Thomas G. Martin & Takanori Nakane
 11 | License: GPLv3
 12 | 
 13 | """
 14 | 
 15 | import argparse
 16 | import numpy as np
 17 | import subprocess
 18 | from collections import OrderedDict
 19 | 
 20 | parser = argparse.ArgumentParser(description='')
 21 | 
 22 | parser.add_argument('--i',nargs='?', help='input filename',type=str)
 23 | parser.add_argument('--o',nargs='?', help='output filename',type=str)
 24 | parser.add_argument('--pix_nominal',nargs='?', help='Nominal pixel size. This is the pixel size that was used for the relion run of the original star file with the particle coordinates.',type=float)
 25 | parser.add_argument('--pix_relative',nargs='?', help='The ralative pixel size in comparison to the dataset that this data should be merged with. This can be determined e.g. by cross correlating 2 maps.',type=float)
 26 | parser.add_argument('--pix_target',nargs='?', help='Target pixel size. Pixel size you want it rescaled to.',type=float)
 27 | parser.add_argument('--mrc_name_path',nargs='?', help='Path to mrc files (including last /)',type=str)
 28 | parser.add_argument('--mrc_name_prefix',nargs='?', help='mrc prefix that is differnt to the original (including _).',type=str)
 29 | parser.add_argument('--mrc_name_suffix',nargs='?', help='mrc suffix that is differnt to the original (including _). If you need to use a - in the beginning use = and the string in \' instead of space (e.g. --mrc_name_suffix=\'-example\').',type=str)
 30 | parser.add_argument('--mrc_name_replacement_in',nargs='?', help='Part of the mrc name that is changed. If you need to use a - in the beginning use = and the string in \' instead of space (e.g. --mrc_name_replacement_in=\'-example\').',type=str)
 31 | parser.add_argument('--mrc_name_replacement_out',nargs='?', help='Replacement part for the changed part. If you need to use a - in the beginning use = and the string in \' instead of space (e.g. --mrc_name_replacement_out=\'-example\').',type=str)
 32 | args = parser.parse_args()
 33 | 
 34 | def load_star(filename):
 35 |     from collections import OrderedDict
 36 |     
 37 |     datasets = OrderedDict()
 38 |     current_data = None
 39 |     current_colnames = None
 40 |     
 41 |     in_loop = 0 # 0: outside 1: reading colnames 2: reading data
 42 | 
 43 |     for line in open(filename):
 44 |         line = line.strip()
 45 |         
 46 |         # remove comments
 47 |         comment_pos = line.find('#')
 48 |         if comment_pos > 0:
 49 |             line = line[:comment_pos]
 50 | 
 51 |         if line == "":
 52 |             continue
 53 | 
 54 |         if line.startswith("data_"):
 55 |             in_loop = 0
 56 | 
 57 |             data_name = line[5:]
 58 |             current_data = OrderedDict()
 59 |             datasets[data_name] = current_data
 60 | 
 61 |         elif line.startswith("loop_"):
 62 |             current_colnames = []
 63 |             in_loop = 1
 64 | 
 65 |         elif line.startswith("_"):
 66 |             if in_loop == 2:
 67 |                 in_loop = 0
 68 | 
 69 |             elems = line[1:].split()
 70 |             if in_loop == 1:
 71 |                 current_colnames.append(elems[0])
 72 |                 current_data[elems[0]] = []
 73 |             else:
 74 |                 current_data[elems[0]] = elems[1]
 75 | 
 76 |         elif in_loop > 0:
 77 |             in_loop = 2
 78 |             elems = line.split()
 79 |             assert len(elems) == len(current_colnames)
 80 |             for idx, e in enumerate(elems):
 81 |                 current_data[current_colnames[idx]].append(e)        
 82 |         
 83 |     return datasets
 84 | 
 85 | def write_star(filename, datasets): 
 86 | 	f = open(filename, "w") 
 87 | 
 88 | 	for data_name, data in datasets.items(): 
 89 | 		f.write( "\ndata_" + data_name + "\n\n") 
 90 | 
 91 | 		col_names = list(data.keys())
 92 | 		need_loop = isinstance(data[col_names[0]], list) 
 93 | 		if need_loop: 
 94 | 			f.write("loop_\n") 
 95 | 			for idx, col_name in enumerate(col_names): 
 96 | 				f.write("_%s #%d\n" % (col_name, idx + 1)) 
 97 | 
 98 | 			nrow = len(data[col_names[0]]) 
 99 | 			for row in range(nrow): 
100 | 				f.write("\t".join([data[x][row] for x in col_names])) 
101 | 				f.write("\n") 
102 | 		else: 
103 | 			for col_name, value in data.items(): 
104 | 				f.write("_%s\t%s\n" % (col_name, value)) 
105 | 
106 | 		f.write("\n") 
107 | 	f.close() 
108 | 
109 | 
110 | print("rescale_particles.py GPL 2019")
111 | 
112 | input_name = args.i
113 | output_name = args.o
114 | pix_a = args.pix_nominal
115 | pix_o = args.pix_relative
116 | pix_n = args.pix_target
117 | starFile = load_star(input_name)
118 | 
119 | mrc_name_path = args.mrc_name_path
120 | mrc_name_prefix = args.mrc_name_prefix
121 | mrc_name_suffix = args.mrc_name_suffix
122 | mrc_name_replacement_in = args.mrc_name_replacement_in
123 | mrc_name_replacement_out = args.mrc_name_replacement_out
124 | 
125 | 
126 | rlnMicrographName = starFile['']['rlnMicrographName']
127 | corrected_rlnMicrographName = []
128 | for name_with_path in rlnMicrographName:
129 | 	name = name_with_path
130 | 	while (name.find("/") >= 0):
131 | 		name = name[name.find("/")+1:]
132 | 	name = name.replace(".mrc","")
133 | 	if (mrc_name_replacement_in):
134 | 		if (mrc_name_replacement_out):
135 | 			name = name.replace(mrc_name_replacement_in,mrc_name_replacement_out)
136 | 		else :
137 | 			name = name.replace(mrc_name_replacement_in,"")
138 | 	new_name = ""
139 | 	if (mrc_name_path):
140 | 		new_name = new_name + mrc_name_path
141 | 	if (mrc_name_prefix):
142 | 		new_name = new_name + mrc_name_prefix
143 | 	new_name = new_name + name
144 | 	if (mrc_name_suffix):
145 | 		new_name = new_name + mrc_name_suffix
146 | 	new_name = new_name + ".mrc"
147 | 
148 | 	corrected_rlnMicrographName.append(new_name)
149 | rlnCoordinateX = starFile['']['rlnCoordinateX']
150 | corrected_rlnCoordinateX = []
151 | for x in rlnCoordinateX:
152 | 	corrected_rlnCoordinateX.append(str(round(float(x)*pix_o/pix_n,6)))
153 | rlnCoordinateY = starFile['']['rlnCoordinateY']
154 | corrected_rlnCoordinateY = []
155 | for x in rlnCoordinateY:
156 | 	corrected_rlnCoordinateY.append(str(round(float(x)*pix_o/pix_n,6)))
157 | rlnOriginX = starFile['']['rlnOriginX']
158 | corrected_rlnOriginX = []
159 | for x in rlnOriginX:
160 | 	corrected_rlnOriginX.append(str(round(float(x)*pix_o/pix_n,6)))
161 | rlnOriginY = starFile['']['rlnOriginY']
162 | corrected_rlnOriginY = []
163 | for x in rlnOriginY:
164 | 	corrected_rlnOriginY.append(str(round(float(x)*pix_o/pix_n,6)))
165 | rlnMagnification = starFile['']['rlnMagnification']
166 | corrected_rlnMagnification = []
167 | for x in rlnMagnification:
168 | 	corrected_rlnMagnification.append(str(int(round(float(x)*pix_a/pix_n,0))))
169 | rlnDetectorPixelSize = starFile['']['rlnDetectorPixelSize']
170 | 
171 | 
172 | outFile = OrderedDict()
173 | outFile['rlnMicrographName'] = corrected_rlnMicrographName
174 | outFile['rlnCoordinateX'] = corrected_rlnCoordinateX
175 | outFile['rlnCoordinateY'] = corrected_rlnCoordinateY
176 | outFile['rlnOriginX'] = corrected_rlnOriginX
177 | outFile['rlnOriginY'] = corrected_rlnOriginY
178 | outFile['rlnMagnification'] = corrected_rlnMagnification
179 | outFile['rlnDetectorPixelSize'] = rlnDetectorPixelSize
180 | outputFile = OrderedDict()
181 | outputFile[''] = outFile
182 | write_star(output_name,outputFile)
183 | 


--------------------------------------------------------------------------------
/scale_ctf.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | #########################################################################################
 4 | #
 5 | # Description: Script to (approximately*) correct defocus values for different angpix
 6 | # Max Wilkinson, MRC LMB
 7 | #
 8 | ########################################################################################
 9 | # *Approximately = +/- 40A or so ###
10 | ###################################
11 | 
12 | proc_name=$(echo $0 | gawk '{n=split($1,scr,"/");print scr[n];}')
13 | 
14 | if [ $# -eq 0 ]
15 |     then
16 |     echo "This program will scale defocus U, defocus V and magnification based on starting and new apix values."
17 |     echo "Usage:           $proc_name <data file>"
18 |     echo "User will then be prompted for apix values"
19 |     exit 1
20 | fi
21 | 
22 | 
23 | 
24 | echo -n "Starting apix: "
25 | read apix1
26 | echo -n "New apix: "
27 | read apix2
28 | #apix1=1.43
29 | #apix2=1.38
30 | 
31 | rlnDefocusUIndex=$(gawk 'NF < 3 && /_rlnDefocusU/{print $2}' $1 | cut -c 2-)
32 | rlnDefocusVIndex=$(gawk 'NF < 3 && /_rlnDefocusV/{print $2}' $1 | cut -c 2-)
33 | rlnMagnificationIndex=$(gawk 'NF < 3 && /_rlnMagnification/{print $2}' $1 | cut -c 2-)
34 | rlnSphericalAberrationIndex=$(gawk 'NF < 3 && /_rlnSphericalAberration/{print $2}' $1 | cut -c 2-)
35 | rlnVoltageIndex=$(gawk 'NF < 3 && /_rlnVoltage/{print $2}' $1 | cut -c 2-)
36 | 
37 | output_file=$(echo $1 | sed 's/.star/_newapix.star/')
38 | 
39 | 
40 | 
41 | cs=$(head -200 $1 | gawk "/mrc/ {print \$$rlnSphericalAberrationIndex}" | sort | uniq | gawk '{printf "%.f",10000000 * $1}')
42 | #echo $cs
43 | #cs=27000000
44 | echo "Read spherical aberration as" $cs "A from star-file header"
45 | 
46 | kv=$(head -200 $1 | gawk "/mrc/ {print \$$rlnVoltageIndex}" | sort | uniq | xargs printf %.f)
47 | #kv=300
48 | if [ $kv == "300" ]; then
49 |     lambda=0.0197
50 | elif [ $kv == "200" ]; then
51 |     lambda=0.0251
52 | else
53 |     echo "Acceleration voltage not 200 or 300 keV, please modify script to allow use of correct electron wavelength"
54 |     exit 1
55 | fi
56 | echo "Acceleration voltage read as" $kv", will use electron wavelength of" $lambda "A"
57 | 
58 | 
59 | #Fudge factor: average-ish spatial resolution (A) for applying constant correction. This value is a bit arbitrary, something like 5-7A works ok. The correction involved is only about 20A, i.e. about 0.1 percent of the defocus
60 | avgS=5.68
61 | 
62 | varC=$(gawk "BEGIN {printf \"%.6f\",-0.5*${cs}*${lambda}**2}")
63 | alpha=$(gawk "BEGIN {printf \"%.6f\",${apix1}/${apix2}}")
64 | alpha2=$(gawk "BEGIN {printf \"%.6f\",${alpha}**2}")
65 | 
66 | invalpha2=$(gawk "BEGIN {printf \"%.6f\",1/${alpha2}}")
67 | 
68 | const=$(gawk "BEGIN {printf \"%.6f\",${varC}*${alpha}**2 - ${varC}/${alpha}**2}")
69 | correction=$(gawk "BEGIN {printf \"%.6f\",${const}/${avgS}**2}")
70 | 
71 | echo "Defocus scaled  by" $invalpha2 "plus a constant correction of" $correction 
72 | 
73 | 
74 | gawk "BEGIN{OFS = \"\t\"} NF>3 {
75 |        \$$rlnDefocusUIndex=sprintf(\"%.6f\",\$${rlnDefocusUIndex}/${alpha2} - ${correction})
76 |        \$$rlnDefocusVIndex=sprintf(\"%.6f\",\$${rlnDefocusVIndex}/${alpha2} - ${correction})
77 |        \$$rlnMagnificationIndex=sprintf(\"%.6f\",\$${rlnMagnificationIndex} * ${alpha})
78 | }1" $1 > $output_file
79 | echo "Wrote out" $output_file
80 | 


--------------------------------------------------------------------------------
/star_apply_matrix.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import math
  3 | import sys
  4 | import argparse
  5 | import os
  6 | from collections import OrderedDict
  7 | 
  8 | 
  9 | def load_star(filename):
 10 |     # thanks Takanori
 11 |     datasets = OrderedDict()
 12 |     current_data = None
 13 |     current_colnames = None
 14 |     
 15 |     in_loop=0 # where 0:outside 1: reading colnames 2: reading data
 16 |     for line in open(filename):
 17 |         line=line.strip()
 18 |         
 19 |         comment_pos = line.find('#')
 20 |         if comment_pos >= 0:
 21 |             line = line[:comment_pos]
 22 |         if line == "":
 23 |             continue
 24 |         if line.startswith("data_"):
 25 |             in_loop=0
 26 |             
 27 |             data_name = line[5:]
 28 |             print(data_name)
 29 |             current_data = OrderedDict()
 30 |             datasets[data_name] = current_data
 31 |         
 32 |         elif line.startswith("loop_"):
 33 |             current_colnames = []
 34 |             in_loop=1
 35 |         
 36 |         elif line.startswith("_"):
 37 |             if in_loop ==2:
 38 |                 in_loop = 0
 39 |             
 40 |             elems = line[1:].split()
 41 |             if in_loop == 1:
 42 |                 current_colnames.append(elems[0])
 43 |                 current_data[elems[0]] = []
 44 |             else:
 45 |                 current_data[elems[0]] = elems[1]
 46 |         
 47 |         elif in_loop > 0:
 48 |             in_loop = 2
 49 |             elems = line.split()
 50 |             #print(elems)
 51 |             assert len(elems) == len(current_colnames)
 52 |             for idx, e in enumerate(elems):
 53 |                 current_data[current_colnames[idx]].append(e)
 54 |         
 55 |     return datasets
 56 | 
 57 | 
 58 | def Euler_angles2matrix(rot,tilt,psi): #rot = alpha, tilt = beta, psi = gamma
 59 |     A=np.zeros((3,3))
 60 |     rot=math.radians(rot)
 61 |     tilt=math.radians(tilt)
 62 |     psi=math.radians(psi)
 63 |     
 64 |     ca=np.cos(rot)
 65 |     cb=np.cos(tilt)
 66 |     cg=np.cos(psi)
 67 |     sa=np.sin(rot)
 68 |     sb=np.sin(tilt)
 69 |     sg=np.sin(psi)
 70 |     cc = cb*ca
 71 |     cs = cb*sa
 72 |     sc = sb*ca
 73 |     ss = sb*sa
 74 |     
 75 |     A[0,0] =  cg * cc - sg * sa
 76 |     A[0,1] =  cg * cs + sg * ca
 77 |     A[0,2] = -cg * sb
 78 |     A[1,0] = -sg * cc - cg * sa
 79 |     A[1,1] = -sg * cs + cg * ca
 80 |     A[1,2] = sg * sb
 81 |     A[2,0] =  sc
 82 |     A[2,1] =  ss
 83 |     A[2,2] = cb
 84 |     
 85 |     return(A)
 86 |     
 87 | 
 88 | def Euler_matrix2angles(A):
 89 |     if A.shape != (3,3):
 90 |         print("Matrix should be 3x3")
 91 |     FLT_EPSILON=sys.float_info.epsilon
 92 |     if np.abs(A[1,1]) > FLT_EPSILON:
 93 |         abs_sb = np.sqrt((-A[2,2]*A[1,2]*A[2,1]-A[0,2]*A[2,0])/A[1,1])
 94 |     elif np.abs(A[0,1]) > FLT_EPSILON:
 95 |         abs_sb = np.sqrt((-A[2,1]*A[2,2]*A[0,2]+A[2,0]*A[1,2])/A[0,1])
 96 |     elif np.abs(A[0,0]) > FLT_EPSILON:
 97 |         abs_sb = np.sqrt((-A[2,0]*A[2,2]*A[0,2]-A[2,1]*A[1,2])/A[0,0])
 98 |     else:
 99 |         print("NOPE")
100 |     if abs_sb > FLT_EPSILON:
101 |         beta = np.arctan2(abs_sb, A[2,2])
102 |         alpha = np.arctan2(A[2,1]/abs_sb, A[2,0] / abs_sb)
103 |         gamma = np.arctan2(A[1,2] / abs_sb, -A[0,2] / abs_sb)
104 |     else:
105 |         alpha=0
106 |         beta=0
107 |         gamma = np.arctan2(A[1,0],A[0,0])
108 |     gamma = math.degrees(gamma)
109 |     beta = math.degrees(beta)
110 |     alpha = math.degrees(alpha)
111 |     return(alpha,beta,gamma)  
112 | 
113 |     
114 | def write_star(mystar,filename):
115 |     f = open(filename,"w")
116 |     datasets=[data for data in mystar.keys()]
117 |     for data in datasets:
118 |         f.write("data_{}\n".format(data))
119 |         f.write("\n")
120 |         f.write("loop_\n")
121 |         fields=[field for field in mystar[data].keys()]
122 |         i=1
123 |         for field in fields:
124 |             f.write("_{} #{}\n".format(field,i))
125 |             i+=1
126 |         totalN=len(mystar[data][fields[1]])
127 |         for n in range(totalN):
128 |             for field in fields:
129 |                 f.write("{}\t".format(mystar[data][field][n]))
130 |             f.write("\n")
131 |         f.write("\n")
132 |     f.close()
133 | 
134 | 
135 | 
136 | def star_apply_matrix(starpath,matrix,classToMove,outstarpath,boxsize,angpix):
137 |     star=load_star(starpath)
138 |     print(classToMove)
139 |     shift=matrix[:,3]
140 |     rotmat=matrix[:,:3]
141 |     #rotmat=Euler_angles2matrix(-90,0,0)
142 |     #shift=np.array([336,0,0])
143 | 
144 |     Xs=np.array(star['particles']['rlnOriginXAngst']).astype(np.float)
145 |     Ys=np.array(star['particles']['rlnOriginYAngst']).astype(np.float)
146 |     Psis=np.array(star['particles']['rlnAnglePsi']).astype(np.float)
147 |     Rots=np.array(star['particles']['rlnAngleRot']).astype(np.float)
148 |     Tilts=np.array(star['particles']['rlnAngleTilt']).astype(np.float)
149 |     Classes=np.array(star['particles']['rlnClassNumber']).astype(np.int)
150 | 
151 |     newXs=[]
152 |     newYs=[]
153 |     
154 |     newPsis=[]
155 |     newRots=[]
156 |     newTilts=[]
157 |     
158 |     for n in range(len(Xs)):
159 |         if n % 50000 ==0:
160 |             print(n)
161 |         if Classes[n]==classToMove:
162 |             A3D=Euler_angles2matrix(Rots[n],Tilts[n],Psis[n])
163 |             A3D=A3D.dot(np.linalg.inv(rotmat))
164 |             mapcenter=np.array([boxsize,boxsize,boxsize])*angpix/2
165 |             #mapcenter=np.array([200,200,200])*1.12
166 |             trueshift=shift-(mapcenter-rotmat.dot(mapcenter))
167 |             new_center=A3D.dot(trueshift)
168 |             newX=Xs[n]+new_center[0]
169 |             newY=Ys[n]+new_center[1]
170 |             newrot,newtilt,newpsi=Euler_matrix2angles(A3D)
171 |             newXs.append(str(newX))
172 |             newYs.append(str(newY))
173 |             newPsis.append(str(newpsi))
174 |             newTilts.append(str(newtilt))
175 |             newRots.append(str(newrot))
176 |         
177 |         else:
178 |             newXs.append(str(Xs[n]))
179 |             newYs.append(str(Ys[n]))
180 |             newPsis.append(str(Psis[n]))
181 |             newTilts.append(str(Tilts[n]))
182 |             newRots.append(str(Rots[n]))
183 | 
184 |     star['particles']['rlnOriginXAngst']=newXs
185 |     star['particles']['rlnOriginYAngst']=newYs
186 |     star['particles']['rlnAnglePsi']=newPsis
187 |     star['particles']['rlnAngleTilt']=newTilts
188 |     star['particles']['rlnAngleRot']=newRots
189 | 
190 |     write_star(star,outstarpath)
191 | 
192 | 
193 | def main():
194 | 	usage = """Example: python star_apply_matrix.py --i Class3D/job049/run_it025_data.star --matrix chimeramatrix.txt --classN 1 --boxsize 300 --angpix 1.12 --o class1shifted.star
195 | 	
196 | 	Applies a rotation/translation transformation matrix to the Euler angles and offsets of particles of a particular class in a STAR file. 
197 |     Can be used to align particles based on class orientations.
198 |     """
199 | 
200 | 	#parser = OptionParser(usage=usage)
201 | 	parser = argparse.ArgumentParser(description='')
202 |     
203 | 	parser.add_argument("--classN", help="Number of the class to move")
204 | 	parser.add_argument("--matrix", help="Text file with 3x4 transformation matrix from Chimera")
205 | 	parser.add_argument("--boxsize", help="The box size of your reference volume")
206 | 	parser.add_argument("--angpix", help="The pixel size of your reference volume")
207 | 	parser.add_argument("--i", help="Input STAR file")
208 | 	parser.add_argument("--o", help="Output STAR file")
209 | 	if len(sys.argv)==1:
210 | 		print(usage)
211 | 		parser.print_help(sys.stderr)
212 | 
213 | 		sys.exit(1)
214 | 	args = parser.parse_args()
215 |     
216 | 	inputstar=str(args.i)
217 | 	outputstar=str(args.o)
218 | 	matrixpath=str(args.matrix)
219 | 	classN=int(args.classN)
220 | 	angpix=float(args.angpix)
221 | 	boxsize=int(args.boxsize)
222 |     
223 | 	matrix=np.loadtxt(matrixpath)
224 | 	star_apply_matrix(inputstar,matrix,classN,outputstar,boxsize,angpix)
225 | 
226 | 
227 | if __name__ == "__main__":
228 |     main()
229 | 


--------------------------------------------------------------------------------