├── README.md ├── boxscaler.py ├── coords_from_data.sh ├── determine_relative_pixel_size.py ├── remove_edge_particles.sh ├── rescale_particles.py ├── scale_ctf.sh └── star_apply_matrix.py /README.md: -------------------------------------------------------------------------------- 1 | # CryoEM-scripts 2 | 3 | Some Python and bash scripts to make cryoEM easier. 4 | 5 | 6 | **Scripts to help scaling and merging data sets** 7 | 8 | These four scripts are featured in our Acta D paper "Methods for Merging Data Sets in Electron Cryo-Microscopy" and aim to make merging data sets a bit more tolerable. 9 | 10 | A couple of scripts are written by me, apologies for the poor python/bash styling: 11 | 12 | `boxscaler.py` will find combinations of even box sizes that give a desired scaling factor. 13 | 14 | `scale_ctf.sh` will do a pretty good* approximate job of rescaling your defocus values to a different pixel size. This bypasses having to re-run CTFFIND or Gctf, which is useful e.g. if micrographs are no longer on disk. 15 | 16 | `star_apply_matrix.py` will apply a Chimera transformation matrix to the Euler angles and offsets of particles belonging to a specified class in a STAR file. This can be used to align particles based on the orientation of domains in their 3D classes. 17 | 18 | And a couple of much more nicely written scripts by Thomas Martin, Ana Casanal, and Takanori Nakane: 19 | 20 | `determine_relative_pixel_size.py` will find the pixel size that maximises the correlation (measured with FSC) of one map to a reference map - useful for finding relative pixel sizes when merging data sets. 21 | 22 | `rescale_particles.py` will rescale particle coordinates, for if you've decided to scale your datasets by rescaling your micrographs. Also does some STAR file wizardry to fix up various paths. 23 | 24 | \*"pretty good" means +/- 40A compared to actually re-runnning GCTF. Considering this is probably very close or better than the precision of GCTF itself, especially at normal resolution regimes, I'd say "pretty good" really means "entirely good enough, especially if you plan on running per-particle CTF refinement anyway." At least in my experience, YMMV. 25 | -------------------------------------------------------------------------------- /boxscaler.py: -------------------------------------------------------------------------------- 1 | #!/bin/python 2 | # BoxScaler: finds optimal box size ratios for scaling data 3 | # Max Wilkinson, MRC LMB, October 2018 4 | 5 | 6 | 7 | import numpy as np 8 | 9 | ################ 10 | 11 | startbox = input("What is the smallest box size allowed? ") 12 | finishbox = input("What is the largest box size allowed? ") 13 | startApix = input("What is your starting pixel size? ") 14 | endApix = input("What is the desired final pixel size? ") 15 | ntimes = input("and how many answers do you want? ") 16 | 17 | # if provided box size is odd 18 | if startbox % 2 != 0: 19 | startbox += 1 20 | 21 | # get desired ratio 22 | apixRatio=startApix/endApix 23 | 24 | # produce range of possible box sizes and compute a division matrix between them all 25 | boxArray = np.arange(startbox,finishbox,2,dtype=float) 26 | ratioArray = np.divide.outer(boxArray,boxArray) 27 | i = 1 28 | while i <= ntimes: 29 | # find index of closest box size ratio to the desired angpix ratio 30 | idxMin = np.abs(ratioArray - apixRatio).argmin() 31 | indxMin2D = np.unravel_index(idxMin,ratioArray.shape) #unravel_index converts a 1D index into a 2D index based on the dimensions given by .shape 32 | 33 | # use x and y of this index to find the starting and ending box sizes 34 | startscale=boxArray[indxMin2D[1]] 35 | endscale=boxArray[indxMin2D[0]] 36 | 37 | # print output 38 | print('Starting with a {:.0f} pixel box, scale to a {:.0f} pixel box'.format(startscale, endscale)) 39 | print('This will give a scaling factor of {:.5f}, compared to a desired pixel size ratio of {:.5f}, giving a {:.3f} percent error.\n'.format(endscale/startscale, apixRatio, (endscale/startscale-apixRatio)/apixRatio*100)) 40 | # remove this answer from ratioarray by setting it to a very large number 41 | ratioArray.flat[idxMin] = 999. 42 | i += 1 43 | 44 | -------------------------------------------------------------------------------- /coords_from_data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ############### 4 | # Make coordinate files from _data.star 5 | # Max Wilkinson 6 | # ############# 7 | 8 | if [ $# -eq 0 ] 9 | then 10 | echo "Usage: $0 run_data.star <2dmatch>" 11 | exit 1 12 | fi 13 | 14 | if [ -z "$2" ] 15 | then 16 | match=2dmatch 17 | else 18 | match=$2 19 | fi 20 | 21 | 22 | Xidx=`gawk 'NF < 3 && /_rlnCoordinateX/{print $2}' $1 | cut -c 2-` 23 | Yidx=`gawk 'NF < 3 && /_rlnCoordinateY/{print $2}' $1 | cut -c 2-` 24 | Micidx=`gawk 'NF < 3 && /_rlnMicrographName/{print $2}' $1 | cut -c 2-` 25 | Classidx=`gawk 'NF < 3 && /_rlnClassNumber/{print $2}' $1 | cut -c 2-` 26 | FOMidx=`gawk 'NF < 3 && /_rlnAutopickFigureOfMerit/{print $2}' $1 | cut -c 2-` 27 | 28 | Mics=$(gawk "NF > 3 {print \$$Micidx}" $1 | sort | uniq) 29 | 30 | for mic in $Mics; do 31 | out=`echo $mic | sed "s/.mrc/_"$match".star/"` 32 | 33 | echo ' 34 | data_ 35 | 36 | loop_ 37 | _rlnCoordinateX #1 38 | _rlnCoordinateY #2 39 | _rlnClassNumber #3 40 | _rlnAutopickFigureOfMerit #4 41 | ' > $out 42 | 43 | gawk -v m="$mic" "\$$Micidx == m {print \$$Xidx, \$$Yidx, \$$Classidx, \$$FOMidx}" $1 >> $out 44 | 45 | echo "Wrote $out" 46 | done 47 | 48 | -------------------------------------------------------------------------------- /determine_relative_pixel_size.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | """ 5 | determine_pixel_size 6 | --------- 7 | 8 | Script to determine relative pixel size of map to map by FSC in RELION. 9 | 10 | Authors: Ana Casañal, Thomas G. Martin & Takanori Nakane 11 | License:GPLv3 12 | 13 | """ 14 | 15 | import argparse 16 | import numpy as np 17 | import subprocess 18 | from collections import OrderedDict 19 | 20 | parser = argparse.ArgumentParser(description='') 21 | 22 | parser.add_argument('--ref_map',nargs='?', help='filename of the reference map of which the pixel size is known',type=str) 23 | parser.add_argument('--angpix_ref_map',nargs='?', help='pixel size in Å of the reference map. Will be used as fixed reference',type=float) 24 | parser.add_argument('--map',nargs='?', help='filename of map of which the pixel size is not known',type=str) 25 | parser.add_argument('--angpix_map_nominal',nargs='?', help='Starting pixel size in Å of the map that needs the pixel size to be determined. Will be used as a starting point to search for the relative pixel size.',type=float) 26 | args = parser.parse_args() 27 | 28 | 29 | def load_star(filename): 30 | from collections import OrderedDict 31 | 32 | datasets = OrderedDict() 33 | current_data = None 34 | current_colnames = None 35 | 36 | in_loop = 0 # 0: outside 1: reading colnames 2: reading data 37 | 38 | for line in open(filename): 39 | line = line.strip() 40 | 41 | # remove comments 42 | comment_pos = line.find('#') 43 | if comment_pos > 0: 44 | line = line[:comment_pos] 45 | 46 | if line == "": 47 | continue 48 | 49 | if line.startswith("data_"): 50 | in_loop = 0 51 | 52 | data_name = line[5:] 53 | current_data = OrderedDict() 54 | datasets[data_name] = current_data 55 | 56 | elif line.startswith("loop_"): 57 | current_colnames = [] 58 | in_loop = 1 59 | 60 | elif line.startswith("_"): 61 | if in_loop == 2: 62 | in_loop = 0 63 | 64 | elems = line[1:].split() 65 | if in_loop == 1: 66 | current_colnames.append(elems[0]) 67 | current_data[elems[0]] = [] 68 | else: 69 | current_data[elems[0]] = elems[1] 70 | 71 | elif in_loop > 0: 72 | in_loop = 2 73 | elems = line.split() 74 | assert len(elems) == len(current_colnames) 75 | for idx, e in enumerate(elems): 76 | current_data[current_colnames[idx]].append(e) 77 | 78 | return datasets 79 | 80 | def load_mrc(filename, maxz=9999): 81 | inmrc = open(filename, "rb") 82 | header_int = np.fromfile(inmrc, dtype=np.uint32, count=256) 83 | inmrc.seek(0, 0) 84 | header_float = np.fromfile(inmrc, dtype=np.float32, count=256) 85 | 86 | nx, ny, nz = header_int[0:3] 87 | eheader = header_int[23] 88 | mrc_type = None 89 | if header_int[3] == 2: 90 | mrc_type = np.float32 91 | elif header_int[3] == 6: 92 | mrc_type = np.uint16 93 | nz = np.min([maxz, nz]) 94 | 95 | inmrc.seek(1024 + eheader, 0) 96 | map_slice = np.fromfile(inmrc, mrc_type, nx * ny * nz).reshape(nz, ny, 97 | nx).astype(np.float32) 98 | 99 | return nx, ny, nz, map_slice 100 | 101 | 102 | def determine_fsc_dropoff_point (ref_map, map2, angpix_ref_map, angpix_map_2, box_map): 103 | 104 | fsc_aims = [0.5,0.4,0.3,0.2] 105 | 106 | if (map2.find('.mrc') >= 0): 107 | tmp_output_name = map2.replace('.mrc','_tmp.mrc') 108 | tmp_fsc_output_name = map2.replace('.mrc','_tmp_fsc.star') 109 | else: 110 | tmp_output_name = map2 + '_tmp.mrc' 111 | tmp_fsc_output_name = map2 + '_tmp_fsc.star' 112 | subprocess.check_output(['relion_image_handler','--i', map2, '--o', tmp_output_name, '--angpix', str(angpix_map_2), '--rescale_angpix', str(angpix_ref_map), '--new_box', str(box_map), '--shift_com']) 113 | star = subprocess.check_output(['relion_image_handler', '--i', ref_map, '--angpix', str(angpix_ref_map), '--fsc', tmp_output_name]) 114 | f = open(tmp_fsc_output_name, "w") 115 | f.write(star) 116 | f.close() 117 | fsc_star = load_star(tmp_fsc_output_name) 118 | fsc_sum = 0 119 | for fsc_aim in fsc_aims: 120 | fsc_sum = fsc_sum + get_fsc_dropoff_point_in_star(fsc_star, fsc_aim) 121 | return fsc_sum/len(fsc_aims) 122 | 123 | 124 | def get_fsc_dropoff_point_in_star (fsc_star, fsc_aim): 125 | fsc_list = fsc_star['fsc']['rlnFourierShellCorrelation'] 126 | res_list = fsc_star['fsc']['rlnAngstromResolution'] 127 | i_threshold = 0 128 | i = 0 129 | fsc_above_threshold = True 130 | for fsc in fsc_list: 131 | if fsc_above_threshold and (float(fsc) > fsc_aim): 132 | i_threshold = i 133 | elif (fsc_above_threshold) and (float(fsc) > -0.5): 134 | return interpolate(float(res_list[i_threshold]),float(res_list[i]),float(fsc_list[i_threshold]),float(fsc_list[i]),fsc_aim) 135 | fsc_above_threshold = False 136 | i = i + 1 137 | return float(res_list[-1]) 138 | 139 | def interpolate (x1, x2, y1, y2, y_goal): 140 | try: 141 | return x1+((x2-x1)*(y1-y_goal)/(y1-y2)) 142 | except ZeroDivisionError: 143 | return 999 144 | 145 | print("determine_relative_pixel_size.py GPL 2019") 146 | 147 | ref_map = args.ref_map 148 | angpix_ref_map = args.angpix_ref_map 149 | map2 = args.map 150 | angpix_map_nominal = args.angpix_map_nominal 151 | 152 | 153 | nx, ny, nz, _ = load_mrc(ref_map) 154 | box_map = nx 155 | 156 | 157 | if (map2.find('.mrc') >= 0): 158 | tmp_output_name = ref_map.replace('.mrc','_tmp.mrc') 159 | else: 160 | tmp_output_name = ref_map + '_tmp.mrc' 161 | 162 | subprocess.check_output(['relion_image_handler','--i', ref_map, '--o', tmp_output_name, '--shift_com']) 163 | ref_map = tmp_output_name 164 | 165 | initial_step_range = 3 166 | step_sizes = [0.1, 0.05, 0.02, 0.01, 0.005, 0.002] 167 | 168 | angpix_start = angpix_map_nominal - (initial_step_range * step_sizes[0]) - step_sizes[0] 169 | angpix_end = angpix_map_nominal + (initial_step_range * step_sizes[0]) 170 | 171 | 172 | angpix_list = [] 173 | res_list = [] 174 | for step_size in step_sizes: 175 | print ("------------------") 176 | print("step: " + str(step_size) + " range: " + str(angpix_start) + " - " + str(angpix_end)) 177 | print ("------------------") 178 | 179 | angpix = angpix_start 180 | while angpix <= angpix_end: 181 | res = determine_fsc_dropoff_point(ref_map, map2, angpix_ref_map, angpix, box_map) 182 | print("angpix: " + str(angpix) + " fsc: " + str(res)) 183 | angpix_list.append(angpix) 184 | res_list.append(res) 185 | 186 | angpix = round(angpix + step_size,5) 187 | 188 | min_res = res_list[0] 189 | index_start = 0 190 | index_end = 0 191 | best_index_start = 0 192 | best_index_end = 0 193 | for k in range(len(res_list)-1): 194 | i = k + 1 195 | if (res_list[i] < min_res): 196 | min_res = res_list[i] 197 | index_start = i -1 198 | index_end = i + 1 199 | best_index_start = i 200 | best_index_end = i 201 | elif (res_list[i] == min_res): 202 | index_end = i + 1 203 | best_index_end = i 204 | 205 | estimate_pixel_size = angpix_list[best_index_start] + (angpix_list[best_index_end] - angpix_list[best_index_start])/2 206 | if (best_index_end == best_index_start): 207 | print ("------------------") 208 | print ("BEST:" + str(estimate_pixel_size)) 209 | else: 210 | print ("------------------") 211 | print ("BEST:" + str(estimate_pixel_size) + " range: " + str(angpix_list[best_index_start]) + " - " + str(angpix_list[best_index_end])) 212 | 213 | if (index_end > len(res_list)-1): 214 | index_end = len(res_list)-1 215 | angpix_start = angpix_list[index_start] 216 | angpix_end = angpix_list[index_end] 217 | res_start = res_list[index_start] 218 | 219 | angpix_list = [] 220 | res_list = [] 221 | angpix_list.append(angpix_start) 222 | res_list.append(res_start) 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | -------------------------------------------------------------------------------- /remove_edge_particles.sh: -------------------------------------------------------------------------------- 1 | ############################################################ 2 | # A script to remove particles from edges of micrographs # 3 | # by Max Wilkinson, MRC-LMB Oct 2018 # 4 | ############################################################ 5 | #!/bin/bash 6 | 7 | CLEAR='\033[0m' 8 | RED='\033[0;31m' 9 | 10 | function usage() { 11 | if [ -n "$1" ]; then 12 | echo -e "${RED}👉 $1${CLEAR}\n"; 13 | fi 14 | echo "Usage: $0 [-x mic_x] [-y mic_y] [-box boxsize] [-tolerance tolerance level] [-i starfile] [-o starfile]" 15 | echo " -x Width of micrograph" 16 | echo " -y Height of micrograph" 17 | echo " -box Box size particles were extracted with" 18 | echo " -tolerance How much overlap in pixels is allowed between the box and edge of micrograph" 19 | echo " -i The star file to operate on" 20 | echo " -o Star file to write out" 21 | echo "" 22 | echo "Example: $0 -x 3838 -y 3710 -box 512 -tolerance 20 -i Extract/job413/particles.star -o particles_noedge.star" 23 | exit 1 24 | } 25 | 26 | 27 | #defaults 28 | mic_x=3838 29 | mic_y=3710 30 | box=512 31 | tolerance=20 32 | starfile=Extract/job413/particles.star 33 | outstar=particles_test.star 34 | 35 | # parse params 36 | if [[ "$#" -eq 0 ]]; then 37 | usage 38 | fi 39 | while [[ "$#" > 1 ]]; do case $1 in 40 | -x|--x) mic_x="$2"; shift;shift;; 41 | -y|--y) mic_y="$2"; shift;shift;; 42 | -box|--box) box="$2"; shift;shift;; 43 | -tolerance|--tolerance) tolerance="$2"; shift;shift;; 44 | -i|--i) starfile="$2"; shift;shift;; 45 | -o|--o) outstar="$2"; shift;shift;; 46 | *) usage "Unknown parameter passed: $1"; shift;shift;; 47 | esac; done 48 | 49 | 50 | coord_x_idx=`awk '/_rlnCoordinateX/{split($2, a, "#"); print a[2]}' $starfile` 51 | coord_y_idx=`awk '/_rlnCoordinateY/{split($2, a, "#"); print a[2]}' $starfile` 52 | 53 | 54 | awk '{if (NF<= 2) {print} else \ 55 | {if ($'$coord_x_idx'>'$box'/2-'$tolerance' && \ 56 | $'$coord_x_idx'<'$mic_x'-'$box'/2+'$tolerance' && \ 57 | $'$coord_y_idx'>'$box'/2-'$tolerance' && \ 58 | $'$coord_y_idx'<'$mic_y'-'$box'/2+'$tolerance' \ 59 | ) {print}}}' $starfile > $outstar 60 | 61 | -------------------------------------------------------------------------------- /rescale_particles.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | """ 5 | rescale_particles 6 | --------- 7 | 8 | Script to rescale particles 9 | 10 | Authors: Thomas G. Martin & Takanori Nakane 11 | License: GPLv3 12 | 13 | """ 14 | 15 | import argparse 16 | import numpy as np 17 | import subprocess 18 | from collections import OrderedDict 19 | 20 | parser = argparse.ArgumentParser(description='') 21 | 22 | parser.add_argument('--i',nargs='?', help='input filename',type=str) 23 | parser.add_argument('--o',nargs='?', help='output filename',type=str) 24 | parser.add_argument('--pix_nominal',nargs='?', help='Nominal pixel size. This is the pixel size that was used for the relion run of the original star file with the particle coordinates.',type=float) 25 | parser.add_argument('--pix_relative',nargs='?', help='The ralative pixel size in comparison to the dataset that this data should be merged with. This can be determined e.g. by cross correlating 2 maps.',type=float) 26 | parser.add_argument('--pix_target',nargs='?', help='Target pixel size. Pixel size you want it rescaled to.',type=float) 27 | parser.add_argument('--mrc_name_path',nargs='?', help='Path to mrc files (including last /)',type=str) 28 | parser.add_argument('--mrc_name_prefix',nargs='?', help='mrc prefix that is differnt to the original (including _).',type=str) 29 | parser.add_argument('--mrc_name_suffix',nargs='?', help='mrc suffix that is differnt to the original (including _). If you need to use a - in the beginning use = and the string in \' instead of space (e.g. --mrc_name_suffix=\'-example\').',type=str) 30 | parser.add_argument('--mrc_name_replacement_in',nargs='?', help='Part of the mrc name that is changed. If you need to use a - in the beginning use = and the string in \' instead of space (e.g. --mrc_name_replacement_in=\'-example\').',type=str) 31 | parser.add_argument('--mrc_name_replacement_out',nargs='?', help='Replacement part for the changed part. If you need to use a - in the beginning use = and the string in \' instead of space (e.g. --mrc_name_replacement_out=\'-example\').',type=str) 32 | args = parser.parse_args() 33 | 34 | def load_star(filename): 35 | from collections import OrderedDict 36 | 37 | datasets = OrderedDict() 38 | current_data = None 39 | current_colnames = None 40 | 41 | in_loop = 0 # 0: outside 1: reading colnames 2: reading data 42 | 43 | for line in open(filename): 44 | line = line.strip() 45 | 46 | # remove comments 47 | comment_pos = line.find('#') 48 | if comment_pos > 0: 49 | line = line[:comment_pos] 50 | 51 | if line == "": 52 | continue 53 | 54 | if line.startswith("data_"): 55 | in_loop = 0 56 | 57 | data_name = line[5:] 58 | current_data = OrderedDict() 59 | datasets[data_name] = current_data 60 | 61 | elif line.startswith("loop_"): 62 | current_colnames = [] 63 | in_loop = 1 64 | 65 | elif line.startswith("_"): 66 | if in_loop == 2: 67 | in_loop = 0 68 | 69 | elems = line[1:].split() 70 | if in_loop == 1: 71 | current_colnames.append(elems[0]) 72 | current_data[elems[0]] = [] 73 | else: 74 | current_data[elems[0]] = elems[1] 75 | 76 | elif in_loop > 0: 77 | in_loop = 2 78 | elems = line.split() 79 | assert len(elems) == len(current_colnames) 80 | for idx, e in enumerate(elems): 81 | current_data[current_colnames[idx]].append(e) 82 | 83 | return datasets 84 | 85 | def write_star(filename, datasets): 86 | f = open(filename, "w") 87 | 88 | for data_name, data in datasets.items(): 89 | f.write( "\ndata_" + data_name + "\n\n") 90 | 91 | col_names = list(data.keys()) 92 | need_loop = isinstance(data[col_names[0]], list) 93 | if need_loop: 94 | f.write("loop_\n") 95 | for idx, col_name in enumerate(col_names): 96 | f.write("_%s #%d\n" % (col_name, idx + 1)) 97 | 98 | nrow = len(data[col_names[0]]) 99 | for row in range(nrow): 100 | f.write("\t".join([data[x][row] for x in col_names])) 101 | f.write("\n") 102 | else: 103 | for col_name, value in data.items(): 104 | f.write("_%s\t%s\n" % (col_name, value)) 105 | 106 | f.write("\n") 107 | f.close() 108 | 109 | 110 | print("rescale_particles.py GPL 2019") 111 | 112 | input_name = args.i 113 | output_name = args.o 114 | pix_a = args.pix_nominal 115 | pix_o = args.pix_relative 116 | pix_n = args.pix_target 117 | starFile = load_star(input_name) 118 | 119 | mrc_name_path = args.mrc_name_path 120 | mrc_name_prefix = args.mrc_name_prefix 121 | mrc_name_suffix = args.mrc_name_suffix 122 | mrc_name_replacement_in = args.mrc_name_replacement_in 123 | mrc_name_replacement_out = args.mrc_name_replacement_out 124 | 125 | 126 | rlnMicrographName = starFile['']['rlnMicrographName'] 127 | corrected_rlnMicrographName = [] 128 | for name_with_path in rlnMicrographName: 129 | name = name_with_path 130 | while (name.find("/") >= 0): 131 | name = name[name.find("/")+1:] 132 | name = name.replace(".mrc","") 133 | if (mrc_name_replacement_in): 134 | if (mrc_name_replacement_out): 135 | name = name.replace(mrc_name_replacement_in,mrc_name_replacement_out) 136 | else : 137 | name = name.replace(mrc_name_replacement_in,"") 138 | new_name = "" 139 | if (mrc_name_path): 140 | new_name = new_name + mrc_name_path 141 | if (mrc_name_prefix): 142 | new_name = new_name + mrc_name_prefix 143 | new_name = new_name + name 144 | if (mrc_name_suffix): 145 | new_name = new_name + mrc_name_suffix 146 | new_name = new_name + ".mrc" 147 | 148 | corrected_rlnMicrographName.append(new_name) 149 | rlnCoordinateX = starFile['']['rlnCoordinateX'] 150 | corrected_rlnCoordinateX = [] 151 | for x in rlnCoordinateX: 152 | corrected_rlnCoordinateX.append(str(round(float(x)*pix_o/pix_n,6))) 153 | rlnCoordinateY = starFile['']['rlnCoordinateY'] 154 | corrected_rlnCoordinateY = [] 155 | for x in rlnCoordinateY: 156 | corrected_rlnCoordinateY.append(str(round(float(x)*pix_o/pix_n,6))) 157 | rlnOriginX = starFile['']['rlnOriginX'] 158 | corrected_rlnOriginX = [] 159 | for x in rlnOriginX: 160 | corrected_rlnOriginX.append(str(round(float(x)*pix_o/pix_n,6))) 161 | rlnOriginY = starFile['']['rlnOriginY'] 162 | corrected_rlnOriginY = [] 163 | for x in rlnOriginY: 164 | corrected_rlnOriginY.append(str(round(float(x)*pix_o/pix_n,6))) 165 | rlnMagnification = starFile['']['rlnMagnification'] 166 | corrected_rlnMagnification = [] 167 | for x in rlnMagnification: 168 | corrected_rlnMagnification.append(str(int(round(float(x)*pix_a/pix_n,0)))) 169 | rlnDetectorPixelSize = starFile['']['rlnDetectorPixelSize'] 170 | 171 | 172 | outFile = OrderedDict() 173 | outFile['rlnMicrographName'] = corrected_rlnMicrographName 174 | outFile['rlnCoordinateX'] = corrected_rlnCoordinateX 175 | outFile['rlnCoordinateY'] = corrected_rlnCoordinateY 176 | outFile['rlnOriginX'] = corrected_rlnOriginX 177 | outFile['rlnOriginY'] = corrected_rlnOriginY 178 | outFile['rlnMagnification'] = corrected_rlnMagnification 179 | outFile['rlnDetectorPixelSize'] = rlnDetectorPixelSize 180 | outputFile = OrderedDict() 181 | outputFile[''] = outFile 182 | write_star(output_name,outputFile) 183 | -------------------------------------------------------------------------------- /scale_ctf.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ######################################################################################### 4 | # 5 | # Description: Script to (approximately*) correct defocus values for different angpix 6 | # Max Wilkinson, MRC LMB 7 | # 8 | ######################################################################################## 9 | # *Approximately = +/- 40A or so ### 10 | ################################### 11 | 12 | proc_name=$(echo $0 | gawk '{n=split($1,scr,"/");print scr[n];}') 13 | 14 | if [ $# -eq 0 ] 15 | then 16 | echo "This program will scale defocus U, defocus V and magnification based on starting and new apix values." 17 | echo "Usage: $proc_name " 18 | echo "User will then be prompted for apix values" 19 | exit 1 20 | fi 21 | 22 | 23 | 24 | echo -n "Starting apix: " 25 | read apix1 26 | echo -n "New apix: " 27 | read apix2 28 | #apix1=1.43 29 | #apix2=1.38 30 | 31 | rlnDefocusUIndex=$(gawk 'NF < 3 && /_rlnDefocusU/{print $2}' $1 | cut -c 2-) 32 | rlnDefocusVIndex=$(gawk 'NF < 3 && /_rlnDefocusV/{print $2}' $1 | cut -c 2-) 33 | rlnMagnificationIndex=$(gawk 'NF < 3 && /_rlnMagnification/{print $2}' $1 | cut -c 2-) 34 | rlnSphericalAberrationIndex=$(gawk 'NF < 3 && /_rlnSphericalAberration/{print $2}' $1 | cut -c 2-) 35 | rlnVoltageIndex=$(gawk 'NF < 3 && /_rlnVoltage/{print $2}' $1 | cut -c 2-) 36 | 37 | output_file=$(echo $1 | sed 's/.star/_newapix.star/') 38 | 39 | 40 | 41 | cs=$(head -200 $1 | gawk "/mrc/ {print \$$rlnSphericalAberrationIndex}" | sort | uniq | gawk '{printf "%.f",10000000 * $1}') 42 | #echo $cs 43 | #cs=27000000 44 | echo "Read spherical aberration as" $cs "A from star-file header" 45 | 46 | kv=$(head -200 $1 | gawk "/mrc/ {print \$$rlnVoltageIndex}" | sort | uniq | xargs printf %.f) 47 | #kv=300 48 | if [ $kv == "300" ]; then 49 | lambda=0.0197 50 | elif [ $kv == "200" ]; then 51 | lambda=0.0251 52 | else 53 | echo "Acceleration voltage not 200 or 300 keV, please modify script to allow use of correct electron wavelength" 54 | exit 1 55 | fi 56 | echo "Acceleration voltage read as" $kv", will use electron wavelength of" $lambda "A" 57 | 58 | 59 | #Fudge factor: average-ish spatial resolution (A) for applying constant correction. This value is a bit arbitrary, something like 5-7A works ok. The correction involved is only about 20A, i.e. about 0.1 percent of the defocus 60 | avgS=5.68 61 | 62 | varC=$(gawk "BEGIN {printf \"%.6f\",-0.5*${cs}*${lambda}**2}") 63 | alpha=$(gawk "BEGIN {printf \"%.6f\",${apix1}/${apix2}}") 64 | alpha2=$(gawk "BEGIN {printf \"%.6f\",${alpha}**2}") 65 | 66 | invalpha2=$(gawk "BEGIN {printf \"%.6f\",1/${alpha2}}") 67 | 68 | const=$(gawk "BEGIN {printf \"%.6f\",${varC}*${alpha}**2 - ${varC}/${alpha}**2}") 69 | correction=$(gawk "BEGIN {printf \"%.6f\",${const}/${avgS}**2}") 70 | 71 | echo "Defocus scaled by" $invalpha2 "plus a constant correction of" $correction 72 | 73 | 74 | gawk "BEGIN{OFS = \"\t\"} NF>3 { 75 | \$$rlnDefocusUIndex=sprintf(\"%.6f\",\$${rlnDefocusUIndex}/${alpha2} - ${correction}) 76 | \$$rlnDefocusVIndex=sprintf(\"%.6f\",\$${rlnDefocusVIndex}/${alpha2} - ${correction}) 77 | \$$rlnMagnificationIndex=sprintf(\"%.6f\",\$${rlnMagnificationIndex} * ${alpha}) 78 | }1" $1 > $output_file 79 | echo "Wrote out" $output_file 80 | -------------------------------------------------------------------------------- /star_apply_matrix.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import math 3 | import sys 4 | import argparse 5 | import os 6 | from collections import OrderedDict 7 | 8 | 9 | def load_star(filename): 10 | # thanks Takanori 11 | datasets = OrderedDict() 12 | current_data = None 13 | current_colnames = None 14 | 15 | in_loop=0 # where 0:outside 1: reading colnames 2: reading data 16 | for line in open(filename): 17 | line=line.strip() 18 | 19 | comment_pos = line.find('#') 20 | if comment_pos >= 0: 21 | line = line[:comment_pos] 22 | if line == "": 23 | continue 24 | if line.startswith("data_"): 25 | in_loop=0 26 | 27 | data_name = line[5:] 28 | print(data_name) 29 | current_data = OrderedDict() 30 | datasets[data_name] = current_data 31 | 32 | elif line.startswith("loop_"): 33 | current_colnames = [] 34 | in_loop=1 35 | 36 | elif line.startswith("_"): 37 | if in_loop ==2: 38 | in_loop = 0 39 | 40 | elems = line[1:].split() 41 | if in_loop == 1: 42 | current_colnames.append(elems[0]) 43 | current_data[elems[0]] = [] 44 | else: 45 | current_data[elems[0]] = elems[1] 46 | 47 | elif in_loop > 0: 48 | in_loop = 2 49 | elems = line.split() 50 | #print(elems) 51 | assert len(elems) == len(current_colnames) 52 | for idx, e in enumerate(elems): 53 | current_data[current_colnames[idx]].append(e) 54 | 55 | return datasets 56 | 57 | 58 | def Euler_angles2matrix(rot,tilt,psi): #rot = alpha, tilt = beta, psi = gamma 59 | A=np.zeros((3,3)) 60 | rot=math.radians(rot) 61 | tilt=math.radians(tilt) 62 | psi=math.radians(psi) 63 | 64 | ca=np.cos(rot) 65 | cb=np.cos(tilt) 66 | cg=np.cos(psi) 67 | sa=np.sin(rot) 68 | sb=np.sin(tilt) 69 | sg=np.sin(psi) 70 | cc = cb*ca 71 | cs = cb*sa 72 | sc = sb*ca 73 | ss = sb*sa 74 | 75 | A[0,0] = cg * cc - sg * sa 76 | A[0,1] = cg * cs + sg * ca 77 | A[0,2] = -cg * sb 78 | A[1,0] = -sg * cc - cg * sa 79 | A[1,1] = -sg * cs + cg * ca 80 | A[1,2] = sg * sb 81 | A[2,0] = sc 82 | A[2,1] = ss 83 | A[2,2] = cb 84 | 85 | return(A) 86 | 87 | 88 | def Euler_matrix2angles(A): 89 | if A.shape != (3,3): 90 | print("Matrix should be 3x3") 91 | FLT_EPSILON=sys.float_info.epsilon 92 | if np.abs(A[1,1]) > FLT_EPSILON: 93 | abs_sb = np.sqrt((-A[2,2]*A[1,2]*A[2,1]-A[0,2]*A[2,0])/A[1,1]) 94 | elif np.abs(A[0,1]) > FLT_EPSILON: 95 | abs_sb = np.sqrt((-A[2,1]*A[2,2]*A[0,2]+A[2,0]*A[1,2])/A[0,1]) 96 | elif np.abs(A[0,0]) > FLT_EPSILON: 97 | abs_sb = np.sqrt((-A[2,0]*A[2,2]*A[0,2]-A[2,1]*A[1,2])/A[0,0]) 98 | else: 99 | print("NOPE") 100 | if abs_sb > FLT_EPSILON: 101 | beta = np.arctan2(abs_sb, A[2,2]) 102 | alpha = np.arctan2(A[2,1]/abs_sb, A[2,0] / abs_sb) 103 | gamma = np.arctan2(A[1,2] / abs_sb, -A[0,2] / abs_sb) 104 | else: 105 | alpha=0 106 | beta=0 107 | gamma = np.arctan2(A[1,0],A[0,0]) 108 | gamma = math.degrees(gamma) 109 | beta = math.degrees(beta) 110 | alpha = math.degrees(alpha) 111 | return(alpha,beta,gamma) 112 | 113 | 114 | def write_star(mystar,filename): 115 | f = open(filename,"w") 116 | datasets=[data for data in mystar.keys()] 117 | for data in datasets: 118 | f.write("data_{}\n".format(data)) 119 | f.write("\n") 120 | f.write("loop_\n") 121 | fields=[field for field in mystar[data].keys()] 122 | i=1 123 | for field in fields: 124 | f.write("_{} #{}\n".format(field,i)) 125 | i+=1 126 | totalN=len(mystar[data][fields[1]]) 127 | for n in range(totalN): 128 | for field in fields: 129 | f.write("{}\t".format(mystar[data][field][n])) 130 | f.write("\n") 131 | f.write("\n") 132 | f.close() 133 | 134 | 135 | 136 | def star_apply_matrix(starpath,matrix,classToMove,outstarpath,boxsize,angpix): 137 | star=load_star(starpath) 138 | print(classToMove) 139 | shift=matrix[:,3] 140 | rotmat=matrix[:,:3] 141 | #rotmat=Euler_angles2matrix(-90,0,0) 142 | #shift=np.array([336,0,0]) 143 | 144 | Xs=np.array(star['particles']['rlnOriginXAngst']).astype(np.float) 145 | Ys=np.array(star['particles']['rlnOriginYAngst']).astype(np.float) 146 | Psis=np.array(star['particles']['rlnAnglePsi']).astype(np.float) 147 | Rots=np.array(star['particles']['rlnAngleRot']).astype(np.float) 148 | Tilts=np.array(star['particles']['rlnAngleTilt']).astype(np.float) 149 | Classes=np.array(star['particles']['rlnClassNumber']).astype(np.int) 150 | 151 | newXs=[] 152 | newYs=[] 153 | 154 | newPsis=[] 155 | newRots=[] 156 | newTilts=[] 157 | 158 | for n in range(len(Xs)): 159 | if n % 50000 ==0: 160 | print(n) 161 | if Classes[n]==classToMove: 162 | A3D=Euler_angles2matrix(Rots[n],Tilts[n],Psis[n]) 163 | A3D=A3D.dot(np.linalg.inv(rotmat)) 164 | mapcenter=np.array([boxsize,boxsize,boxsize])*angpix/2 165 | #mapcenter=np.array([200,200,200])*1.12 166 | trueshift=shift-(mapcenter-rotmat.dot(mapcenter)) 167 | new_center=A3D.dot(trueshift) 168 | newX=Xs[n]+new_center[0] 169 | newY=Ys[n]+new_center[1] 170 | newrot,newtilt,newpsi=Euler_matrix2angles(A3D) 171 | newXs.append(str(newX)) 172 | newYs.append(str(newY)) 173 | newPsis.append(str(newpsi)) 174 | newTilts.append(str(newtilt)) 175 | newRots.append(str(newrot)) 176 | 177 | else: 178 | newXs.append(str(Xs[n])) 179 | newYs.append(str(Ys[n])) 180 | newPsis.append(str(Psis[n])) 181 | newTilts.append(str(Tilts[n])) 182 | newRots.append(str(Rots[n])) 183 | 184 | star['particles']['rlnOriginXAngst']=newXs 185 | star['particles']['rlnOriginYAngst']=newYs 186 | star['particles']['rlnAnglePsi']=newPsis 187 | star['particles']['rlnAngleTilt']=newTilts 188 | star['particles']['rlnAngleRot']=newRots 189 | 190 | write_star(star,outstarpath) 191 | 192 | 193 | def main(): 194 | usage = """Example: python star_apply_matrix.py --i Class3D/job049/run_it025_data.star --matrix chimeramatrix.txt --classN 1 --boxsize 300 --angpix 1.12 --o class1shifted.star 195 | 196 | Applies a rotation/translation transformation matrix to the Euler angles and offsets of particles of a particular class in a STAR file. 197 | Can be used to align particles based on class orientations. 198 | """ 199 | 200 | #parser = OptionParser(usage=usage) 201 | parser = argparse.ArgumentParser(description='') 202 | 203 | parser.add_argument("--classN", help="Number of the class to move") 204 | parser.add_argument("--matrix", help="Text file with 3x4 transformation matrix from Chimera") 205 | parser.add_argument("--boxsize", help="The box size of your reference volume") 206 | parser.add_argument("--angpix", help="The pixel size of your reference volume") 207 | parser.add_argument("--i", help="Input STAR file") 208 | parser.add_argument("--o", help="Output STAR file") 209 | if len(sys.argv)==1: 210 | print(usage) 211 | parser.print_help(sys.stderr) 212 | 213 | sys.exit(1) 214 | args = parser.parse_args() 215 | 216 | inputstar=str(args.i) 217 | outputstar=str(args.o) 218 | matrixpath=str(args.matrix) 219 | classN=int(args.classN) 220 | angpix=float(args.angpix) 221 | boxsize=int(args.boxsize) 222 | 223 | matrix=np.loadtxt(matrixpath) 224 | star_apply_matrix(inputstar,matrix,classN,outputstar,boxsize,angpix) 225 | 226 | 227 | if __name__ == "__main__": 228 | main() 229 | --------------------------------------------------------------------------------