├── .gitignore ├── CITATION.bib ├── LICENSE ├── README.md ├── cellscape ├── __init__.py ├── cartoon.py ├── cli.py ├── interface.py ├── parse_alignment.py ├── parse_uniprot_xml.py ├── scene.py ├── structure.py └── util.py ├── examples ├── cartoon.ipynb ├── ceacam5 │ ├── P06731.xml │ └── ceacam5.pdb └── ig │ ├── 1igt.pdb │ └── view ├── ig.png ├── pyproject.toml ├── setup.cfg └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | *.egg-info 3 | *.DS_Store 4 | */.ipynb_checkpoints 5 | -------------------------------------------------------------------------------- /CITATION.bib: -------------------------------------------------------------------------------- 1 | @article {Silvestre-Ryan2022.06.14.495869, 2 | author = {Silvestre-Ryan, Jordi and Fletcher, Daniel A. and Holmes, Ian}, 3 | title = {CellScape: Protein structure visualization with vector graphics cartoons}, 4 | elocation-id = {2022.06.14.495869}, 5 | year = {2022}, 6 | doi = {10.1101/2022.06.14.495869}, 7 | publisher = {Cold Spring Harbor Laboratory}, 8 | abstract = {Motivation: Illustrative renderings of proteins are useful aids for scientific communication and education. Nevertheless, few software packages exist to automate the generation of these visualizations. Results: We introduce CellScape, a tool designed to generate 2D molecular cartoons from atomic coordinates and combine them into larger cellular scenes. These illustrations can outline protein regions in different levels of detail. Unlike most molecular visualization tools which use raster image formats, these illustrations are represented as vector graphics, making them easily editable and composable with other graphics. Availability and Implementation: CellScape is implemented in Python 3 and freely available at https://github.com/jordisr/cellscape. It can be run as a command-line tool or interactively in a Jupyter notebook.Competing Interest StatementThe authors have declared no competing interest.}, 9 | URL = {https://www.biorxiv.org/content/early/2022/06/16/2022.06.14.495869}, 10 | eprint = {https://www.biorxiv.org/content/early/2022/06/16/2022.06.14.495869.full.pdf}, 11 | journal = {bioRxiv} 12 | } 13 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Jordi Silvestre-Ryan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CellScape: Protein structure visualization with vector graphics cartoons 2 | logo 3 | 4 | ## Installation 5 | To run CellScape you will need: 6 | * Python 3 7 | * [PyMOL](https://pymol.org/2/) or [Chimera](https://www.cgl.ucsf.edu/chimera/) (optional, needed to orient the protein if not using the Jupyter notebook interface) 8 | 9 | CellScape and its dependencies can be installed with: 10 | 11 | ``` 12 | git clone https://github.com/jordisr/cellscape 13 | cd cellscape 14 | pip install -e . 15 | ``` 16 | 17 | ## Making a cartoon from a PDB structure 18 | 19 | ### Jupyter notebook interface 20 | The most interactive way of building cartoons is through the Python package interface. An example notebook is provided [here](examples/cartoon.ipynb). 21 | 22 | ### Command-line interface 23 | 24 | Cartoons can also be built in one-go from the command-line, as illustrated below. 25 | 26 | #### Generating molecular outlines 27 | The following examples should yield images similar to the top figure (from right to left): 28 | 29 | The simplest visualization is a space-filling outline of the entire structure. 30 | The `--view` option specifies the camera rotation matrix (see [below](#exporting-the-camera-view)). 31 | 32 | ``` 33 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline all --save outline_all.svg 34 | ``` 35 | 36 | The `--outline` option specifies which regions of the protein to outline (each residue, each chain, the entire molecule etc). 37 | In the following example we outline each chain separately. 38 | The `--depth flat` option ensures that if the chains overlap, only the portion that is visible (i.e. closer to the camera) is incorporated into the outline. 39 | 40 | ``` 41 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline chain --depth flat --save outline_chain.svg 42 | ``` 43 | 44 | The most realistic visualization outlines each residue separately. 45 | Shading by residue depth is used to simulate 3D lighting in a style inspired by [David Goodsell](https://pdb101.rcsb.org/motm/21). 46 | 47 | ``` 48 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline residue --color_by chain --depth_shading --depth_lines --save outline_residue.svg 49 | ``` 50 | 51 | Full description of all options is available by running `cellscape cartoon -h`. 52 | 53 | ### Exporting the camera view 54 | The camera orientation can be set interactively through the Jupyter notebook interface, however to use the command-line interface you will need a separate file with the rotation matrix. 55 | One option is to export it from another molecular visualization tool (currently PyMOL and Chimera formats are supported). 56 | 57 | #### PyMOL 58 | Open the protein structure in PyMOL, and choose the desired rotation (zoom is irrelevant). Next, enter `get_view` in the PyMOL console. The output should look something like this: 59 | ``` 60 | ### cut below here and paste into script ### 61 | set_view (\ 62 | -0.273240060, -0.516133010, 0.811750829,\ 63 | 0.870557129, 0.226309016, 0.436930388,\ 64 | -0.409222305, 0.826064587, 0.387488008,\ 65 | 0.000000000, 0.000000000, -544.673034668,\ 66 | -0.071666718, -17.390396118, 8.293336868,\ 67 | 455.182373047, 634.163574219, -20.000000000 ) 68 | ### cut above here and paste into script ### 69 | ``` 70 | Copy and paste the indicated region (between the ### lines) into a new text file, which can be passed to CellScape. 71 | 72 | #### Chimera 73 | Open the protein structure in Chimera, and choose the desired rotation (zoom is irrelevant). 74 | Enter the command `matrixget` (if no output filename is given it will prompt you for one). 75 | This will write the rotation matrix to a file that can be understood by CellScape. 76 | It should look something like this: 77 | ``` 78 | Model 0.0 79 | -0.607365 0.792409 0.0565265 9.04218 80 | -0.309318 -0.301425 0.901923 -30.7393 81 | 0.731731 0.530312 0.428181 15.789 82 | ``` 83 | 84 | ## Composing cartoons into a cellular scene 85 | 86 | Re-running the above `cellscape cartoon` examples with the `--export` flag will write each cartoon's data to a Python pickle file, which can then be read by `cellscape scene`. 87 | 88 | The simplest usage of `cellscape scene` takes a list of pickled cartoons as input and lays them out in a row, preserving the relative sizes of each protein. 89 | The `--padding` option specifies how far apart each protein should be (in angstroms). 90 | 91 | ``` 92 | cellscape scene --files outline_residue.pickle outline_chain.pickle outline_all.pickle --padding 10 --save scene.png 93 | ``` 94 | 95 | Full description of all options is available by running `cellscape scene -h`. 96 | -------------------------------------------------------------------------------- /cellscape/__init__.py: -------------------------------------------------------------------------------- 1 | from .cartoon import Cartoon 2 | from .structure import Structure 3 | from .interface import plot_pairs 4 | -------------------------------------------------------------------------------- /cellscape/cartoon.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib 3 | import matplotlib.pyplot as plt 4 | from matplotlib.path import Path 5 | from matplotlib.patches import PathPatch 6 | import matplotlib.colors as mcolors 7 | from matplotlib.colors import LinearSegmentedColormap 8 | import shapely.geometry as sg 9 | import shapely.ops as so 10 | import pickle 11 | import os 12 | import sys 13 | import colorsys 14 | from Bio.PDB import * 15 | 16 | import cellscape 17 | 18 | def scale_line_width(x, lw_min, lw_max): 19 | return lw_max*(1-x) + lw_min*x 20 | 21 | def shade_from_color(color, x, range): 22 | (r, g, b, a) = mcolors.to_rgba(color) 23 | h, l, s = colorsys.rgb_to_hls(r,g,b) 24 | l_dark = max(l-range/2, 0) 25 | l_light = min(l+range/2, 1) 26 | l_new = l_dark*(1-x) + l_light*x 27 | return colorsys.hls_to_rgb(h, l_new, s) 28 | 29 | def get_sequential_colors(colors='Set1', n=1): 30 | """ 31 | Sample n colors sequentially from a named matplotlib ColorMap. 32 | """ 33 | # uses matplotlib.colors.ColorMap.N to distinguish continuous/discrete 34 | cmap = matplotlib.cm.get_cmap(colors) 35 | if cmap.N == 256: 36 | # continuous color map 37 | sequential_colors = [cmap(x) for x in np.linspace(0.0,1.0, n)] 38 | else: 39 | # discrete color map 40 | sequential_colors = [cmap(x) for x in range(n)] 41 | return sequential_colors 42 | 43 | def smooth_polygon(p, level=0): 44 | # somewhat arbitrary but a lot easier than interpolation 45 | if level == 0: 46 | return p.simplify(0.3).buffer(-2, join_style=1).buffer(3, join_style=1) 47 | elif level == 1: 48 | return p.simplify(1).buffer(3, join_style=1).buffer(-5, join_style=1).buffer(4, join_style=1) 49 | elif level == 2: 50 | return p.simplify(3).buffer(5, join_style=1).buffer(-9, join_style=1).buffer(5, join_style=1) 51 | elif level == 3: 52 | return p.simplify(0.1).buffer(2, join_style=1) 53 | else: 54 | return p 55 | 56 | def ring_coding(ob): 57 | # https://sgillies.net/2010/04/06/painting-punctured-polygons-with-matplotlib.html 58 | # The codes will be all "LINETO" commands, except for "MOVETO"s at the 59 | # beginning of each subpath 60 | #n = len(ob.coords) 61 | n = len(np.asarray(ob)) 62 | codes = np.ones(n, dtype=Path.code_type) * Path.LINETO 63 | codes[0] = Path.MOVETO 64 | return codes 65 | 66 | def placeholder_polygon(height, buffer_width=25, origin=[0,0]): 67 | return sg.LineString([(buffer_width+origin[0],0+origin[1]),(buffer_width+origin[0],height+origin[1])]).buffer(buffer_width) 68 | 69 | def composite_polygon(cartoon, height_before, height_after, buffer_width=25): 70 | # placeholder + structure cartoon + placeholder 71 | if height_before > 0: 72 | before_poly = placeholder_polygon(height_before, origin=cartoon.bottom_coord[:2]-[buffer_width,height_before], buffer_width=buffer_width) 73 | cartoon._styled_polygons.append({"polygon":before_poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1}) 74 | 75 | if height_after > 0: 76 | after_poly = placeholder_polygon(height_after, origin=cartoon.top_coord[:2]-[buffer_width, 0], buffer_width=buffer_width) 77 | cartoon._styled_polygons.append({"polygon":after_poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1}) 78 | 79 | cartoon.image_height = cartoon.image_height + buffer_width + height_before + height_after 80 | cartoon.bottom_coord = cartoon.bottom_coord - np.array([0,height_before,0]) 81 | cartoon.top_coord = cartoon.top_coord + np.array([0,height_after,0]) 82 | 83 | def export_placeholder(height, name, fname, buffer_width=25): 84 | # placeholder by itself 85 | poly = placeholder_polygon(height, origin=[buffer_width, 0], buffer_width=buffer_width) 86 | styled_polygons = [{"polygon":poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1}] 87 | 88 | data = {'polygons':styled_polygons, 'name':name, 'width':buffer_width*2, 'height':height+buffer_width, 'start':np.array([buffer_width,0]), 'end':np.array([height+2*buffer_width,0]), 'bottom':np.array([buffer_width,0]), 'top':np.array([height+2*buffer_width,0])} 89 | 90 | with open('{}.pickle'.format(fname),'wb') as f: 91 | pickle.dump(data, f) 92 | 93 | def transform_coord(xy, translate_post=np.array([0,0]), translate_pre=np.array([0,0]), scale=1.0, flip=False): 94 | # 2d coordinates 95 | xy_ = xy 96 | if translate_pre is not None: 97 | # optionally shift coordinates before rotation 98 | xy_ += translate_pre 99 | if flip: 100 | xy_ = np.dot(xy_, np.array([[-1,0],[0,-1]]).T) 101 | #xy_ = np.dot(xy_, np.array([[-1,0],[0,-1]])) 102 | #offset_x = np.min(xy_[:,0]) 103 | #offset_y = np.min(xy_[:,1]) 104 | #xy_ -= np.array([offset_x, offset_y]) 105 | return (xy_+translate_post)*scale 106 | 107 | def polygon_to_path(polygon, min_interior_length=40, translate_pre=np.array([0,0]), translate_post=np.array([0,0]), scale=1.0, flip=False): 108 | # generate matplotlib Path object from Shapely polygon 109 | # filter out small interior holes and apply a scaling factor if desired 110 | # 111 | # https://sgillies.net/2010/04/06/painting-punctured-polygons-with-matplotlib.html 112 | # Convert coordinates to path vertices. Objects produced by Shapely's 113 | # analytic methods have the proper coordinate order, no need to sort. 114 | interiors = list(filter(lambda x: x.length > min_interior_length, polygon.interiors)) 115 | vertices = np.concatenate( 116 | [np.asarray(polygon.exterior)] 117 | + [np.asarray(r) for r in interiors]) 118 | codes = np.concatenate( 119 | [ring_coding(polygon.exterior)] 120 | + [ring_coding(r) for r in interiors]) 121 | transformed_vertices = transform_coord(vertices, translate_pre=translate_pre, translate_post=translate_post, scale=scale, flip=flip) 122 | return Path(transformed_vertices, codes) 123 | 124 | def plot_polygon(poly, facecolor='orange', edgecolor='k', linewidth=0.7, axes=None, zorder_mod=0, translate_pre=np.array([0,0]), translate_post=np.array([0,0]), scale=1.0, flip=False, min_area=7, linestyle='solid'): 125 | """Draw a Shapely polygon using matplotlib Patches.""" 126 | if axes is None: 127 | axs = plt.gca() 128 | axs.set_aspect('equal') 129 | else: 130 | axs = axes 131 | if isinstance(poly, sg.polygon.Polygon): 132 | if poly.area > min_area: 133 | path = polygon_to_path(poly, translate_pre=translate_pre, translate_post=translate_post, scale=scale, flip=flip) 134 | patch = PathPatch(path, facecolor=facecolor, edgecolor='black', linewidth=linewidth, zorder=3+zorder_mod, linestyle=linestyle) 135 | axs.add_patch(patch) 136 | elif isinstance(poly, sg.multipolygon.MultiPolygon): 137 | for p in poly: 138 | plot_polygon(p, axes=axs, facecolor=facecolor, edgecolor=edgecolor, linewidth=linewidth, scale=scale, zorder_mod=zorder_mod, translate_pre=translate_pre, translate_post=translate_post, flip=flip) 139 | 140 | class Cartoon: 141 | """A class for molecular outlines generated by Structure class""" 142 | def __init__(self, name, polygons, residues, outline_by, back_outline, group_outlines, num_groups, dimensions, groups): 143 | # TODO currently just copying over all variables needed, should condense a little 144 | self.name = name 145 | self._polygons = polygons 146 | self.residues_flat = residues 147 | self.outline_by = outline_by 148 | self.num_groups = num_groups 149 | self.groups = groups 150 | self._back_outline = back_outline 151 | self._group_outlines = group_outlines 152 | self.dimensions = dimensions 153 | 154 | def plot(self, colors=None, axes_labels=False, color_residues_by=None, edge_color="black", line_width=0.7, 155 | depth_shading=False, depth_lines=False, shading_range=0.4, smoothing=False, do_show=True, axes=None, save=None, dpi=300, placeholder=None): 156 | """Plot styled protein cartoon 157 | 158 | Color schemes for plotting can be specified in multiple ways 159 | - named matplotlib-compatible color e.g. "red" (string) 160 | - hexadecimal color e.g. "#F8F8FF" (string) 161 | - list/tuple of colors e.g. ["red", "#F8F8FF"] (list/tuple) 162 | - dict of names to colors e.g. {"domain A": "red", "domain B":"blue"} (dict) 163 | - named discrete or continuous color scheme e.g. "Set1" (string) 164 | 165 | By default, plot() creates a new matplotlib Axes instance, though one can be passed explicitly. 166 | This mirrors Biopython's phylogeny drawing https://biopython.org/DIST/docs/api/Bio.Phylo._utils-module.html. 167 | 168 | Args: 169 | colors (optional): Explicitly pass color scheme (see description). Defaults to None. 170 | axes_labels (bool, optional): Include axes labels on plot. Defaults to False. 171 | color_residues_by (str, optional): If outlining all residues, color based on attribute (e.g. "chain"). Defaults to None. 172 | edge_color (str, optional): Color for outline edges. Defaults to "black". 173 | line_width (float, optional): Width of outlines. Defaults to 0.7. 174 | depth_shading (bool, optional): Use lighter shades for outlines closer to the front. Defaults to False. 175 | depth_lines (bool, optional): Use lighter lines for outlines closer to the front. Defaults to False. 176 | shading_range (float, optional): Dynamic range for depth_shading effect. Defaults to 0.4. 177 | smoothing (bool, optional): Apply smoothing to polygons. Defaults to False. 178 | do_show (bool, optional): Whether to show figure (otherwise just returns Axes object). Defaults to True. 179 | axes (Axes, optional): Explicitly pass matplotlib Axes object. Defaults to None. 180 | save (str, optional): Path to save cartoon image. Defaults to None. 181 | dpi (int, optional): DPI of rasterized images. Defaults to 300. 182 | placeholder (float, optional): Specify expected protein height (in angstroms). Will add a placeholder shape to add up to total height. Defaults to None. 183 | 184 | Returns: 185 | Axes: Returns matplotlib Axes if do_show=False, otherwise return None 186 | """ 187 | self._styled_polygons = [] 188 | 189 | if axes is None: 190 | # create a new matplotlib figure if none provided 191 | fig, axs = plt.subplots() 192 | else: 193 | assert(isinstance(axes, matplotlib.axes.Axes)) 194 | axs = axes 195 | 196 | if axes_labels: 197 | axs.axis('on') 198 | axs.set_axis_on() 199 | axs.xaxis.grid(False) 200 | axs.yaxis.grid(True) 201 | axs.axes.xaxis.set_ticklabels([]) 202 | else: 203 | axs.axis('off') 204 | axs.set_axis_off() 205 | #plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0) 206 | #axs.xaxis.set_major_locator(plt.NullLocator()) 207 | #axs.yaxis.set_major_locator(plt.NullLocator()) 208 | 209 | # color schemes 210 | default_color = 'tab:blue' 211 | default_cmap = 'Set1' 212 | named_colors = [*mcolors.BASE_COLORS.keys(), *mcolors.TABLEAU_COLORS.keys(), *mcolors.CSS4_COLORS.keys(), *mcolors.XKCD_COLORS.keys()] 213 | 214 | # if outlining residues don't know number of color groups until plot is called 215 | if self.outline_by == "residue": 216 | if color_residues_by is None: 217 | num_colors_needed = 1 218 | residue_color_groups = {"all":self.residues_flat} 219 | else: 220 | residue_color_groups = cellscape.util.group_by(self.residues_flat, lambda x: x.get(color_residues_by)) 221 | num_colors_needed = len(residue_color_groups) 222 | self.num_groups = num_colors_needed 223 | 224 | # parse options and get list of base colors needed for plotting 225 | if colors is None: 226 | # choose default sequential color scheme based on number of colors needed 227 | if self.num_groups == 1: 228 | sequential_colors = [default_color] 229 | elif self.num_groups <= 9: 230 | sequential_colors = get_sequential_colors(colors="Set1", n=self.num_groups) 231 | elif self.num_groups <= 10: 232 | sequential_colors = get_sequential_colors(colors="tab10", n=self.num_groups) 233 | else: 234 | sequential_colors = get_sequential_colors(colors="tab20", n=self.num_groups) 235 | else: 236 | if isinstance(colors, dict): 237 | sequential_colors = [] 238 | else: 239 | if isinstance(colors, str): 240 | if self.num_groups == 1: 241 | sequential_colors = [colors] 242 | else: 243 | sequential_colors = get_sequential_colors(colors=colors, n=self.num_groups) 244 | elif isinstance(colors, (list, tuple)): 245 | if self.num_groups == 1: 246 | if (len(colors) == 4) or (len(colors) == 3): 247 | # assume single RGBA or RGB color 248 | sequential_colors = [colors] 249 | else: 250 | sequential_colors = [colors[0]] 251 | elif self.num_groups == len(colors): 252 | sequential_colors = colors 253 | else: 254 | sys.exit("Insufficient colors provided") 255 | assert(len(sequential_colors) == self.num_groups) 256 | 257 | # color scheme represented as dict that maps group names to colors 258 | if self.outline_by == "residue": 259 | if len(sequential_colors) > 0: 260 | color_map = {k:sequential_colors[i] for i,k in enumerate(residue_color_groups.keys())} 261 | else: 262 | color_map = colors 263 | elif self.outline_by == "all": 264 | color_map = {None:sequential_colors[0]} 265 | else: 266 | if len(sequential_colors) > 0: 267 | color_map = {k:sequential_colors[i] for i,k in enumerate(self.groups)} 268 | else: 269 | color_map = colors 270 | assert(isinstance(color_map, dict)) 271 | 272 | if self._back_outline is not None: 273 | if smoothing: 274 | smoothed_poly = smooth_polygon(self._back_outline, level=3) 275 | plot_polygon(smoothed_poly, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=-1) 276 | self._styled_polygons.append({"polygon":smoothed_poly, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width}) 277 | else: 278 | plot_polygon(self._back_outline, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=-1) 279 | self._styled_polygons.append({"polygon":self._back_outline, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width}) 280 | 281 | if len(self._group_outlines) > 0: 282 | for p in self._group_outlines: 283 | plot_polygon(p, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=2) 284 | self._styled_polygons.append({"polygon":p, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width, "zorder":2}) 285 | 286 | # TODO optionally show placeholder for unstructured regions 287 | if placeholder is not None: 288 | placeholder_poly = placeholder_polygon(placeholder-self.image_height, origin=[self.image_width/2-25, self.image_height+25]) 289 | self._styled_polygons.append({"polygon":placeholder_poly, "facecolor":"None", "shade":0.5, "edgecolor":'black', "linewidth":1}) 290 | plot_polygon(placeholder_poly, facecolor="#eeeeee", scale=1.0, axes=axs, edgecolor='black', linewidth=1, zorder_mod=-1) 291 | self.image_height = 25 + placeholder 292 | 293 | # main plotting loop 294 | for i, p in enumerate(self._polygons): 295 | if smoothing: 296 | poly_to_draw = smooth_polygon(p[1], level=3) 297 | else: 298 | poly_to_draw = p[1] 299 | 300 | # look up color for polygon 301 | if self.outline_by == "residue": 302 | key_for_color = p[0].get(color_residues_by) 303 | else: 304 | key_for_color = p[0].get(self.outline_by) 305 | fc = color_map.get(key_for_color, sequential_colors[0]) 306 | base_fc = fc # store original color as well as shading 307 | 308 | shade_value = None 309 | if depth_shading: 310 | #fc = shade_from_color(fc, i/len(self._polygons), range=shading_range) 311 | shade_value = p[0].get("depth", 0.5) 312 | fc = shade_from_color(fc, shade_value, range=shading_range) 313 | if depth_lines: 314 | shade_value = p[0].get("depth", 0.5) 315 | lw = scale_line_width(shade_value, 0, 0.5) 316 | else: 317 | lw = line_width 318 | plot_polygon(poly_to_draw, facecolor=fc, axes=axs, edgecolor=edge_color, linewidth=lw) 319 | self._styled_polygons.append({"polygon":poly_to_draw, "facecolor":fc, "edgecolor":edge_color, "linewidth":lw, "shade":shade_value, "base_fc":base_fc}) 320 | 321 | axs.set_aspect('equal') 322 | axs.margins(0,0) 323 | self._axes= axs 324 | 325 | if save is not None: 326 | file_ext = os.path.splitext(save)[1].lower() 327 | assert file_ext in ['.png','.pdf','.svg','.ps'], "Image file extension not supported" 328 | #plt.gcf().savefig(save, dpi=dpi, transparent=True, pad_inches=0, bbox_inches='tight') 329 | fig.savefig(save, dpi=dpi, transparent=True, pad_inches=0, bbox_inches='tight') 330 | 331 | if do_show: 332 | plt.show() 333 | else: 334 | return axs 335 | 336 | def export(self, fname): 337 | """Export a pickle object containing styled polygons than can be combined using ``cellscape scene``""" 338 | assert(len(self._styled_polygons) > 0) 339 | 340 | data = {'polygons':self._styled_polygons, 'name':self.name} 341 | for k in ['width', 'height', 'start', 'end', 'top', 'bottom']: 342 | data[k] = self.dimensions[k] 343 | 344 | with open('{}.pickle'.format(fname),'wb') as f: 345 | pickle.dump(data, f) 346 | 347 | print("Exported polygon data to {}.pickle".format(fname), file=sys.stderr) 348 | 349 | def make_cartoon(args): 350 | """Build a cartoon in one-go. Called when running ``cellscape cartoon``.""" 351 | 352 | # accept list of chains for backwards-compatibility 353 | # convert to string e.g. ABCD for current interface 354 | # can be an issue if chains have more than one letter 355 | if len(args.chain) == 1: 356 | chain = args.chain[0] 357 | else: 358 | chain = ''.join(args.chain) 359 | 360 | molecule = cellscape.Structure(args.pdb, chain=chain, model=args.model, uniprot=args.uniprot, view=False) 361 | 362 | # open first line to identify view file 363 | if args.view is not None: 364 | with open(args.view) as view_f: 365 | first_line = view_f.readline() 366 | if first_line[:8] == 'set_view': 367 | molecule.load_pymol_view(args.view) 368 | elif first_line[:5] == 'Model': 369 | molecule.load_chimera_view(args.view) 370 | else: 371 | molecule.load_view_matrix(args.view) 372 | else: 373 | # if no view matrix provided just use default PDB orientation for now 374 | molecule.view_matrix = np.identity(3) 375 | 376 | cartoon = molecule.outline( 377 | args.outline_by, 378 | depth=args.depth, 379 | radius=args.radius, 380 | only_annotated=args.only_annotated, 381 | only_ca=args.only_ca, 382 | depth_contour_interval=args.depth_contour_interval, 383 | back_outline=args.back_outline 384 | ) 385 | 386 | if args.outline_by == "residue" and args.color_by != "same": 387 | color_residues_by = args.color_by 388 | else: 389 | color_residues_by = None 390 | 391 | if len(args.colors) > 0: 392 | colors = args.colors 393 | else: 394 | colors = None 395 | 396 | cartoon.plot( 397 | do_show=False, 398 | axes_labels=args.axes, 399 | colors=colors, 400 | color_residues_by=color_residues_by, 401 | dpi=args.dpi, 402 | save=args.save, 403 | depth_shading=args.depth_shading, 404 | depth_lines=args.depth_lines, 405 | edge_color=args.edge_color, 406 | line_width=args.line_width 407 | ) 408 | 409 | if args.export: 410 | cartoon.export(os.path.splitext(args.save)[0]) 411 | -------------------------------------------------------------------------------- /cellscape/cli.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from cellscape.cartoon import make_cartoon 3 | from cellscape.scene import make_scene 4 | 5 | def main(): 6 | # set up argument parser 7 | parser = argparse.ArgumentParser(description='CellScape: Protein structure visualization with vector graphics cartoons') 8 | subparsers = parser.add_subparsers(dest="command") 9 | subparsers.required=True 10 | 11 | # cartoon 12 | parser_cartoon = subparsers.add_parser('cartoon', help="Make a cartoon from a protein structure", formatter_class=argparse.ArgumentDefaultsHelpFormatter, description="Make a cartoon from a protein structure") 13 | parser_cartoon.set_defaults(func=make_cartoon) 14 | # input/output options 15 | parser_cartoon_io = parser_cartoon.add_argument_group('input/output options') 16 | parser_cartoon_io.add_argument('--pdb', help='Protein coordinates file (must be .pdb/.ent/.cif/.mcif)', required=True) 17 | parser_cartoon_io.add_argument('--model', type=int, default=0, help='Model number in PDB to load') 18 | parser_cartoon_io.add_argument('--chain', default=['all'], help='Chain(s) in structure to outline', nargs='+') 19 | parser_cartoon_io.add_argument('--view', help='Camera rotation matrix (saved from cellscape, PyMOL get_view, or Chimera matrixget)') 20 | parser_cartoon_io.add_argument('--uniprot', help='UniProt XML file to parse for sequence/domain/topology information') 21 | parser_cartoon_io.add_argument('--save', default='out.svg', help='Image output file (valid formats are png/pdf/svg/ps)') 22 | parser_cartoon_io.add_argument('--export', default=False, action="store_true", help='Export Python object with structural information') 23 | # outline building options 24 | parser_cartoon_outline = parser_cartoon.add_argument_group('outline-building options') 25 | parser_cartoon_outline.add_argument('--only_annotated', action='store_true', default=False, help='Ignore regions without UniProt annotations') 26 | parser_cartoon_outline.add_argument('--only_ca', action='store_true', default=False, help='Only use alpha carbons for outline') 27 | parser_cartoon_outline.add_argument('--outline_by', '--outline', default='all', choices=['all', 'chain', 'domain', 'topology', 'residue'], help='Outline protein regions') 28 | parser_cartoon_outline.add_argument('--depth', default=None, choices=['flat', 'contours', None], help='Represent depth with flat occluded outlines or contour slices') 29 | parser_cartoon_outline.add_argument('--depth_contour_interval', type=float, default=3, help='Width of depth contour bins in angstroms (if --depth contours)') 30 | parser_cartoon_outline.add_argument('--radius', default=1.5, help='Atomic radius, in angstroms', type=float) 31 | parser_cartoon_outline.add_argument('--back_outline', action='store_true', help='Outline entire molecule separately from group outlines') 32 | 33 | # visual style options 34 | parser_cartoon_style = parser_cartoon.add_argument_group('styling options') 35 | parser_cartoon_style.add_argument('--axes', action='store_true', default=False, help='Draw x and y axes around molecule') 36 | parser_cartoon_style.add_argument('--colors', default=[], nargs='+', help='Specify color scheme for protein (list of colors or matplotlib named color map)') 37 | parser_cartoon_style.add_argument('--edge_color', default='black', help='Edge color') 38 | parser_cartoon_style.add_argument('--line_width', default=0.7, type=float, help='Line width') 39 | parser_cartoon_style.add_argument('--color_by', default='same', choices=['same', 'chain', 'domain', 'topology'], help='Color residues by attribute (if --outline_by residues is selected)') 40 | parser_cartoon_style.add_argument('--depth_shading', action='store_true', default=False, help='Shade regions darker in the back to simulate depth') 41 | parser_cartoon_style.add_argument('--depth_lines', action='store_true', default=False, help='Use thicker lines the back to simulate depth') 42 | parser_cartoon_style.add_argument('--dpi', type=int, default=300, help='DPI to use if exporting to a raster format like PNG') 43 | 44 | # scene 45 | parser_scene = subparsers.add_parser('scene', help="Compose multiple cartoons together", description="Compose multiple cartoons together", formatter_class=argparse.ArgumentDefaultsHelpFormatter) 46 | parser_scene.set_defaults(func=make_scene) 47 | # input/output options 48 | parser_scene_io = parser_scene.add_argument_group('input/output options') 49 | parser_scene_io.add_argument('--files', nargs='+', help='Pickled objects to load') 50 | parser_scene_io.add_argument('--save', default='out.svg', help='Image output path (valid formats are png/pdf/svg/ps)') 51 | # visual style options 52 | parser_scene_style = parser_scene.add_argument_group('styling options') 53 | parser_scene_style.add_argument('--offsets', nargs='+', default=[], help='Vertical offsets for each molecule specified manually') 54 | parser_scene_style.add_argument('--padding', type=int, default=0, help='Horizontal padding to add between each molecule (in angstroms)') 55 | parser_scene_style.add_argument('--axes', action='store_true', default=False, help='Draw x and y axes') 56 | parser_scene_style.add_argument('--membrane', default=None, choices=[None, 'arc', 'flat', 'wave'], help='Draw membrane on X axis') 57 | parser_scene_style.add_argument('--membrane_thickness', default=40, type=float, help='Thickness of the membrane (in angstroms)') 58 | parser_scene_style.add_argument('--membrane_lipids', action='store_true', help='Draw lipid head groups') 59 | parser_scene_style.add_argument('--no_membrane_offset', action='store_true', help=argparse.SUPPRESS) # don't adjust y-axis to position bottom of structure in membrane 60 | parser_scene_style.add_argument('--order_by', default='input', choices=['input', 'random', 'height','top', 'membrane'], help='How to order proteins in scene') 61 | parser_scene_style.add_argument('--recolor', action='store_true', default=False, help='Recolor proteins in scene') 62 | parser_scene_style.add_argument('--recolor_cmap', default=['hsv'], nargs='+', help='Named cmap or color scheme for re-coloring') 63 | parser_scene_style.add_argument('--dpi', type=int, default=300, help='DPI to use if exporting to a raster format like PNG') 64 | parser_scene_style.add_argument('--use_placeholders', action='store_true', help=argparse.SUPPRESS) 65 | parser_scene_style.add_argument('--labels', action='store_true', default=False, help=argparse.SUPPRESS) # still testing 66 | parser_scene_style.add_argument('--label_size', type=float, default=0.5, help=argparse.SUPPRESS) # fraction of the screen to use for labels 67 | parser_scene_style.add_argument('--label_orientation', choices=["vertical", "horizontal", "diagonal"], default="vertical", help=argparse.SUPPRESS) 68 | parser_scene_style.add_argument('--label_position', choices=["above", "below"], default="below", help=argparse.SUPPRESS) 69 | parser_scene_style.add_argument('--fig_height', type=float, default=11, help=argparse.SUPPRESS) # passed to figsize 70 | parser_scene_style.add_argument('--fig_width', type=float, default=8.5, help=argparse.SUPPRESS) # passed to figsize 71 | # for simulating according to stoichiometry 72 | parser_scene_sim = parser_scene.add_argument_group('random scene options') 73 | parser_scene_sim.add_argument('--csv', help='Table of protein information') 74 | parser_scene_sim.add_argument('--seed', type=int, help='Random seed for scene generation') 75 | parser_scene_sim.add_argument('--sample_from', help='Column to use for sampling (with --csv)', default='stoichiometry') 76 | parser_scene_sim.add_argument('--num_mol', type=int, help='Number of molecules to sample for scene', default=0) 77 | parser_scene_sim.add_argument('--background', action='store_true', default=False, help='Add background plane using same frequencies') 78 | 79 | # parse arguments and call corresponding command 80 | args = parser.parse_args() 81 | args.func(args) 82 | -------------------------------------------------------------------------------- /cellscape/interface.py: -------------------------------------------------------------------------------- 1 | """ 2 | Testing code for visualizing protein interactions across membrane interfaces 3 | """ 4 | 5 | import numpy as np 6 | import matplotlib 7 | import matplotlib.pyplot as plt 8 | import matplotlib.patches as mpatches 9 | import matplotlib.lines as mlines 10 | from matplotlib import lines, text, cm 11 | from matplotlib.colors import LinearSegmentedColormap, ListedColormap 12 | from scipy import interpolate 13 | import shapely.geometry as sg 14 | import shapely.ops as so 15 | import os, sys, argparse, pickle 16 | import glob 17 | import csv 18 | 19 | from cellscape.cartoon import plot_polygon, shade_from_color 20 | 21 | class MembraneInterface: 22 | """ 23 | just use piecemeal flat + connector, no lipids 24 | """ 25 | def __init__(self, axes, lengths, bottom_y, top_y, thickness=40, padding=10, base_y=0): 26 | 27 | # axes 28 | self.axes = axes 29 | 30 | # membrane thickness (angstroms) 31 | self.thickness = thickness 32 | 33 | # padding between each segment, scalar 34 | self.padding = padding 35 | 36 | # length of each segment, array 37 | self.lengths = lengths 38 | 39 | # y coordinate of each bottom membrane segment, array 40 | self.bottom_y = bottom_y 41 | 42 | # y coordinate of each top membrane segment, array 43 | self.top_y = top_y 44 | 45 | assert(len(lengths) == len(top_y)) 46 | assert(len(top_y) == len(bottom_y)) 47 | 48 | def draw(self, color='#C4E7EF'): 49 | if isinstance(color, (list,tuple)): 50 | top_color = color[0] 51 | bot_color = color[1] 52 | else: 53 | top_color = color 54 | bot_color = color 55 | 56 | membrane_x = [] 57 | membrane_bot_y = [] 58 | membrane_top_y = [] 59 | x_cum = 0 60 | for i, w in enumerate(self.lengths): 61 | membrane_x.append(x_cum) 62 | x_cum += w 63 | membrane_x.append(x_cum) 64 | x_cum += self.padding 65 | 66 | membrane_bot_y.append(self.bottom_y[i]) 67 | membrane_bot_y.append(self.bottom_y[i]) 68 | 69 | membrane_top_y.append(self.top_y[i]) 70 | membrane_top_y.append(self.top_y[i]) 71 | 72 | membrane_x = np.array(membrane_x) 73 | membrane_bot_y = np.array(membrane_bot_y) 74 | membrane_top_y = np.array(membrane_top_y) 75 | 76 | # plot bottom membrane 77 | self.axes.fill_between(membrane_x, membrane_bot_y, membrane_bot_y-self.thickness, color=bot_color, zorder=1.6, capstyle='round', joinstyle='miter') 78 | 79 | # plot top membrane 80 | self.axes.fill_between(membrane_x, membrane_top_y, membrane_top_y+self.thickness, color=top_color, zorder=1.6, capstyle='round', joinstyle='round') 81 | 82 | def plot_pairs(pairs, labels=None, thickness=40, padding=50, align="bottom", membrane_color="#E8E8E8", colors=None, axes=True, linewidth=None, sort=False): 83 | 84 | assert align in ["bottom", "middle", "top"] 85 | assert sort in [False, "height", "horseshoe"] 86 | 87 | if labels is not None: 88 | assert len(labels) == len(pairs) 89 | 90 | # optionally sort proteins by height 91 | if sort == "height": 92 | pair_heights = np.array(list(map(lambda x: x[0]['height']+x[1]['height'], pairs))) 93 | sorted_order = np.argsort(pair_heights)[::-1] 94 | pairs_ = [pairs[i] for i in sorted_order] 95 | if labels is not None: 96 | labels_ = [labels[i] for i in sorted_order] 97 | 98 | elif sort == "horseshoe": 99 | pair_heights = np.array(list(map(lambda x: x[0]['height']+x[1]['height'], pairs))) 100 | sorted_order = np.argsort(pair_heights)[::-1] 101 | new_order = np.zeros_like(sorted_order) 102 | first_half = sorted_order[::2] 103 | if len(sorted_order) % 2: 104 | second_half = sorted_order[-2::-2] 105 | else: 106 | second_half = sorted_order[::-2] 107 | new_order[:len(first_half)] = first_half 108 | new_order[len(first_half):] = second_half 109 | 110 | pairs_ = [pairs[i] for i in new_order] 111 | if labels is not None: 112 | labels_ = [labels[i] for i in new_order] 113 | if colors is not None: 114 | colors_ = [colors[i] for i in new_order] 115 | 116 | else: 117 | pairs_ = pairs[:] 118 | if labels is not None: 119 | labels_ = labels[:] 120 | 121 | fig, axs = plt.subplots(figsize=(11,8.5)) 122 | axs.set_aspect('equal') 123 | 124 | if axes: 125 | axs.xaxis.grid(False) 126 | axs.yaxis.grid(False) 127 | axs.axes.xaxis.set_ticklabels([]) 128 | else: 129 | plt.axis('off') 130 | plt.gca().set_axis_off() 131 | plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0) 132 | plt.margins(0,0) 133 | plt.gca().xaxis.set_major_locator(plt.NullLocator()) 134 | plt.gca().yaxis.set_major_locator(plt.NullLocator()) 135 | 136 | # set font options 137 | font_options = {'family':'Arial', 'weight':'normal', 'size':10} 138 | matplotlib.rc('font', **font_options) 139 | 140 | assert(align in ["top","bottom","middle"]) 141 | 142 | # get all the interface heights 143 | all_heights = np.array([p[0]['height']+p[1]['height'] for p in pairs_]) 144 | max_height = np.max(all_heights) 145 | 146 | # calculate membrane geometry 147 | bot_y = [] 148 | top_y = [] 149 | lengths = [] 150 | for p in pairs_: 151 | o1, o2 = p 152 | if align == "bottom": 153 | top_y.append(o1['height']+o2['height']) 154 | bot_y.append(0) 155 | elif align == "top": 156 | top_y.append(max_height) 157 | bot_y.append(max_height-(o1['height']+o2['height'])) 158 | elif align == "middle": 159 | top_y.append(max_height-(max_height-o1['height']-o2['height'])/2) 160 | bot_y.append((max_height-o1['height']-o2['height'])/2) 161 | lengths.append(max(o1['width'], o2['width'])) 162 | top_y = np.array(top_y) 163 | bot_y = np.array(bot_y) 164 | lengths = np.array(lengths) 165 | 166 | total_width = np.sum(lengths)+len(pairs_)*padding 167 | 168 | # draw membrane 169 | mem = MembraneInterface(axes=axs, lengths=lengths, bottom_y=bot_y, top_y=top_y, padding=padding, thickness=thickness) 170 | mem.draw(color=membrane_color) 171 | 172 | # draw proteins 173 | w=0 174 | for i, o in enumerate(pairs_): 175 | o1, o2 = o 176 | this_width = max(o1['width'], o2['width']) 177 | this_height = o1['height']+o2['height'] 178 | y_offset = bot_y[i] 179 | if colors is not None: 180 | color_top = colors_[i][1] 181 | color_bot = colors_[i][0] 182 | else: 183 | color_top = None 184 | color_bot = None 185 | 186 | # TODO rotation needs to be a little cleaner, making some assumptions here 187 | for p in o1["polygons"]: 188 | xy = np.array(o1['polygons'][0]['polygon'].exterior.xy) # assuming first polygon is outline, TODO fix 189 | recenter = np.array([np.min(xy[:,0]), np.min(xy[:,1])]) 190 | facecolor = shade_from_color(color_bot, p.get("shade", 0.5), range=p.get("shading_range", 0.4)) 191 | plot_polygon(p['polygon'], axes=axs, translate_pre=[-recenter[0]+w+(this_width-o1['width'])/2, -recenter[1]+y_offset], flip=False, facecolor=facecolor, linewidth=p['linewidth']) 192 | 193 | for p in o2["polygons"]: 194 | xy = np.array(o2['polygons'][0]['polygon'].exterior.xy) # assuming first polygon is outline, TODO fix 195 | recenter = np.array([np.min(xy[:,0]), np.min(xy[:,1])]) 196 | facecolor = shade_from_color(color_top, p.get("shade", 0.5), range=p.get("shading_range", 0.4)) 197 | plot_polygon(p['polygon'], axes=axs, translate_pre=-1*recenter, translate_post=[w+(this_width+o2['width'])/2, y_offset+10+o2['height']+o1['height']], flip=True, facecolor=facecolor, linewidth=p['linewidth']) 198 | #plot_polygon(o2["polygons"][0]['polygon'], axes=axs, offset=[w+(this_width-o2['width'])/2, y_offset+o1['height']], flip=True, facecolor=o2["polygons"][0]['facecolor'], linewidth=linewidth) 199 | 200 | if labels_ is not None: 201 | angstroms_per_inch = total_width/11 202 | fontsize = total_width*0.5/len(pairs_)/angstroms_per_inch*72 203 | font_inches = fontsize/72 204 | plt.text(w+this_width/2, y_offset+this_height+50, labels_[i][1], rotation=90, fontsize=fontsize, va='bottom', ha='center') 205 | plt.text(w+this_width/2, y_offset-1.1*angstroms_per_inch*font_inches, labels_[i][0], rotation=90, fontsize=fontsize, va='top', ha='center') 206 | 207 | w += this_width+padding 208 | 209 | fig.set_size_inches(18.5, 10.5) 210 | return fig 211 | -------------------------------------------------------------------------------- /cellscape/parse_alignment.py: -------------------------------------------------------------------------------- 1 | from Bio import pairwise2 2 | from Bio.PDB import * 3 | from Bio.Align import substitution_matrices, PairwiseAligner 4 | import numpy as np 5 | 6 | def identity_from_alignment(a): 7 | s1 = np.array(list(a[0])) 8 | s2 = np.array(list(a[1])) 9 | return np.sum(s1 == s2) / len(np.where( s1 != '-')[0]) 10 | 11 | def overlap_from_alignment(a): 12 | s1 = np.array(list(a[0])) 13 | s2 = np.array(list(a[1])) 14 | s1_nogap = np.where( s1 != '-') 15 | s2_nogap = np.where( s2 != '-') 16 | s1_start_align = np.min(s1_nogap) 17 | s1_end_align = np.max(s1_nogap) 18 | s2_start_align = np.min(s2_nogap) 19 | s2_end_align = np.max(s2_nogap) 20 | overlap_align = (max(s1_start_align, s2_start_align), min(s1_end_align, s2_end_align)) 21 | return( 22 | np.where(s1_nogap == overlap_align[0])[1][0], 23 | np.where(s1_nogap == overlap_align[1])[1][0], 24 | np.where(s2_nogap == overlap_align[0])[1][0], 25 | np.where(s2_nogap == overlap_align[1])[1][0]) + np.array([1,1,1,1]) 26 | 27 | def align_pair(s1, s2): 28 | # wrapper for biopython pairwise alignment 29 | blosum62 = substitution_matrices.load("BLOSUM62") 30 | return pairwise2.align.localds(s1, s2, blosum62, -3, -3, one_alignment_only=True)[0] 31 | 32 | def align_all_pairs(s): 33 | for i in range(len(s)): 34 | for j in range(i+1, len(s)): 35 | s1 = s[i][1] 36 | s2 = s[j][1] 37 | alignments = align_pair(s1,s2) 38 | print(s[i][0], len(s1), s[j][0], len(s2), *overlap_from_alignment(alignments[0]), identity_from_alignment(alignments[0])) 39 | 40 | def sequence_overlap(s1, s2): 41 | aligner = PairwiseAligner() 42 | aligner.mode = "global" 43 | aligner.substitution_matrix = substitution_matrices.load("BLOSUM62") 44 | alignments = aligner.align(s1, s2) 45 | alignment = list(alignments)[0] 46 | alignment_bounds = alignment.aligned 47 | return np.array([alignment_bounds[0][0][0], alignment_bounds[0][-1][1], alignment_bounds[-1][0][0], alignment_bounds[-1][-1][1]]) + np.array([1,0,1,0]) 48 | 49 | if __name__ == '__main__': 50 | a1 = ( 51 | '---------AAAAAAAABBBBBBB', 52 | 'BBBBBBBBBAAAAAAAA-------' 53 | ) 54 | print(overlap_from_alignment(a1)) 55 | print(sequence_overlap("AAAAAAAABBBBBBB", "BBBBBBBBBAAAAAAAA")) 56 | 57 | a2 = ( 58 | 'BBBBBBBBBAAAAABBBBBBB', 59 | '---------AAAAA-------' 60 | ) 61 | print(overlap_from_alignment(a2)) 62 | print(sequence_overlap("BBBBBBBBBAAAAABBBBBBB", "AAAAA")) 63 | 64 | a3 = ( 65 | 'BBBBBBBBBAAAAABBBBAAAABBB', 66 | '---------AAAAA----AAAA----' 67 | ) 68 | print(overlap_from_alignment(a3)) 69 | print(sequence_overlap("BBBBBBBBBAAAAABBBBAAAABBB", "AAAAAAAAA")) 70 | 71 | s1 = "AACDAEECDAECDEADAEEAEADADCADEAEAECDDAEACDAECDA" 72 | s2 = "ACDAEECDADEADWAEEAEADAWDCADEAEAECGDDAEAGCDACDA" 73 | a = align_pair(s1,s2) 74 | print(a[0],a[1]) 75 | -------------------------------------------------------------------------------- /cellscape/parse_uniprot_xml.py: -------------------------------------------------------------------------------- 1 | import xml.etree.ElementTree as ET 2 | import os 3 | import urllib 4 | import sys 5 | import argparse 6 | import json 7 | 8 | class UniprotRecord: 9 | """Data structure to hold Uniprot annotations for single sequence.""" 10 | def __init__(self, id, name=None): 11 | self.id = id 12 | self.name = name 13 | self.domains = [] 14 | self.topology = [] 15 | self.ptm = {} 16 | self.sequence = "" 17 | 18 | def add_domain(self,name, start, end): 19 | self.domains.append((name, int(start), int(end))) 20 | def add_topology(self, name, start, end): 21 | self.topology.append((name, int(start), int(end))) 22 | def add_ptm(self, name, start, end): 23 | self.ptm[name] = (int(start), int(end)) 24 | def process_segments(self): 25 | if 'chain' in self.ptm: 26 | (self.chain_start, self.chain_end) = self.ptm['chain'] 27 | else: 28 | (self.chain_start, self.chain_end) = (1, 99999) 29 | 30 | last = self.chain_start 31 | self.domain_segments = [] 32 | 33 | for domain in self.domains: 34 | if (domain[1] - last) > 1: 35 | self.domain_segments.append(('None',last, domain[1]-1)) 36 | 37 | self.domain_segments.append(domain) 38 | last = domain[2] 39 | 40 | if (self.chain_end - last) > 1: 41 | self.domain_segments.append(('None',last, self.chain_end)) 42 | 43 | def parse_xml(xmlpath): 44 | """ 45 | Parse Uniprot XML file to return list of UniprotRecord objects. 46 | """ 47 | tree = ET.parse(xmlpath) 48 | root = tree.getroot() 49 | ns = '{http://uniprot.org/uniprot}' 50 | sequences = [] 51 | 52 | for entry in tree.iter(tag=ns+'entry'): 53 | accession = entry.find(ns+'accession').text 54 | gene = entry.find(ns+'name').text 55 | sequence = UniprotRecord(accession, gene) 56 | 57 | for feature in entry.iter(tag=ns+'feature'): 58 | 59 | # look for transmembrane regions 60 | if feature.get('type') in ('topological domain','transmembrane region'): 61 | try: 62 | begin = feature.find(ns+'location').find(ns+'begin').get('position') 63 | end = feature.find(ns+'location').find(ns+'end').get('position') 64 | feature_description = feature.get('description').split(';')[0] 65 | sequence.add_topology(feature_description, begin, end) 66 | except: 67 | pass 68 | 69 | # look for protein domains 70 | elif feature.get('type') == 'domain': 71 | try: 72 | begin = feature.find(ns+'location').find(ns+'begin').get('position') 73 | end = feature.find(ns+'location').find(ns+'end').get('position') 74 | sequence.add_domain(feature.get('description'),begin,end) 75 | except: 76 | pass 77 | 78 | # look for signal peptide and mature chain 79 | elif feature.get('type') in ('chain', 'propeptide','signal peptide'): 80 | try: 81 | begin = feature.find(ns+'location').find(ns+'begin').get('position') 82 | end = feature.find(ns+'location').find(ns+'end').get('position') 83 | sequence.add_ptm(feature.get('type'), begin, end) 84 | except: 85 | pass 86 | 87 | sequence.process_segments() 88 | sequences.append(sequence) 89 | 90 | for seq in entry.iter(tag=ns+'sequence'): 91 | if seq.text is not None: 92 | sequence.sequence = seq.text.replace('\n','') 93 | 94 | return(sequences) 95 | 96 | def split_uniprot_xml(xmlpath, outpath='.'): 97 | """Take a multi-record XML file and split to one XML file per entry.""" 98 | tree = ET.parse(xmlpath) 99 | root = tree.getroot() 100 | ns = '{http://uniprot.org/uniprot}' 101 | for entry in tree.iter(tag=ns+'entry'): 102 | accession = entry.find(ns+'accession') 103 | with open("{}/{}.xml".format(outpath, accession.text), "w") as xml_out: 104 | xml_out.write(ET.tostring(entry).decode('utf-8')) 105 | 106 | def download_uniprot_record(record, fileformat, outdir): 107 | """Download record from Uniprot server.""" 108 | file_path = "{}.{}".format(record, fileformat) 109 | out_path = os.path.join(outdir, file_path) 110 | if not os.path.exists(out_path): 111 | print("Requesting {}".format(out_path)) 112 | urllib.request.urlretrieve("https://www.uniprot.org/uniprot/{}".format(file_path), out_path) 113 | else: 114 | pass 115 | #print("UniProt file already there", file=sys.stderr) 116 | return out_path 117 | 118 | if __name__ == "__main__": 119 | 120 | parser = argparse.ArgumentParser(description='Parse UniProt XML file', formatter_class=argparse.ArgumentDefaultsHelpFormatter) 121 | parser.add_argument('--xml', help='Input XML file', required=True) 122 | parser.add_argument('--json', action='store_true', default=False, help='Output relevant information in JSON') 123 | args = parser.parse_args() 124 | 125 | uniprot = parse_xml(args.xml) 126 | 127 | for entry in uniprot: 128 | if args.json: 129 | data = { 130 | 'name': entry.name, 131 | 'sequence': entry.sequence, 132 | 'domains': entry.domain_segments, 133 | 'topology': entry.topology 134 | } 135 | print(json.dumps(data, indent=2)) 136 | else: 137 | with open(entry.name+'.domains.csv','w') as f: 138 | f.write(','.join(['res_start','res_end','description'])+'\n') 139 | for domain in entry.domain_segments: 140 | f.write(','.join(map(str,[domain[1],domain[2],domain[0]]))+'\n') 141 | 142 | with open(entry.name+'.topology.csv','w') as f: 143 | f.write(','.join(['res_start','res_end','description'])+'\n') 144 | for domain in entry.topology: 145 | f.write(','.join(map(str,[domain[1],domain[2],domain[0]]))+'\n') 146 | -------------------------------------------------------------------------------- /cellscape/scene.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib 3 | import matplotlib.pyplot as plt 4 | import matplotlib.patches as mpatches 5 | import matplotlib.lines as mlines 6 | from matplotlib import lines, text, cm 7 | from matplotlib.colors import LinearSegmentedColormap, ListedColormap 8 | from scipy import interpolate 9 | import os 10 | import sys 11 | import pickle 12 | import csv 13 | 14 | from cellscape.cartoon import plot_polygon, shade_from_color, placeholder_polygon 15 | 16 | def rotation_matrix_2d(theta): 17 | """Return matrix to rotate 2D coordinates by angle theta.""" 18 | return np.array([[np.cos(theta), -1*np.sin(theta)],[np.sin(theta), np.cos(theta)]]) 19 | 20 | class Membrane: 21 | def __init__(self, width, thickness, axes, base_y=0): 22 | self.width = width 23 | self.thickness = thickness 24 | self.y = base_y 25 | self.axes = axes 26 | # other constants 27 | self.head_radius = 4 28 | 29 | def flat(self): 30 | self.height_at = lambda x: self.y + self.thickness/2 31 | 32 | def sinusoidal(self, frequency=1, amplitude=1): 33 | self.height_at = lambda x: self.y + self.thickness/2*amplitude*np.sin(x*frequency*2*np.pi/self.width) 34 | 35 | def interpolate(self, x, y, kind='linear'): 36 | #self.height_at = interpolate.interp1d(x, y, kind=kind) 37 | self.height_fn = interpolate.PchipInterpolator(x, y) 38 | self.height_at = lambda x: self.height_fn(x) + self.y 39 | 40 | def draw(self, lipids=False): 41 | 42 | membrane_x = np.linspace(0,self.width,200) 43 | membrane_y_top = np.array([self.height_at(x) for x in membrane_x]) 44 | membrane_y_bot = membrane_y_top-self.thickness 45 | 46 | if lipids: 47 | membrane_box_fc='#C4E7EF' 48 | lipid_head_fc='#D6D1EF' 49 | lipid_tail_fc='#A3DCEF' 50 | plt.fill_between(membrane_x, membrane_y_top-self.head_radius, membrane_y_bot+self.head_radius, color=membrane_box_fc, zorder=1.6) 51 | num_lipids = int(self.width/(2*self.head_radius)) 52 | for i in range(num_lipids): 53 | membrane_y = self.height_at(i/num_lipids*self.width) 54 | self.axes.add_line(mlines.Line2D([i*self.head_radius*2, i*self.head_radius*2], [-4+membrane_y, -18+membrane_y], zorder=1.7, c=lipid_tail_fc, linewidth=self.head_radius*.7, alpha=1, solid_capstyle='round')) 55 | self.axes.add_line(mlines.Line2D([i*self.head_radius*2, i*self.head_radius*2], [-38+membrane_y, -24+membrane_y], zorder=1.7, c=lipid_tail_fc, linewidth=self.head_radius*.7, alpha=1, solid_capstyle='round')) 56 | self.axes.add_patch(mpatches.Circle((i*self.head_radius*2, -1*self.head_radius+membrane_y), self.head_radius, facecolor=lipid_head_fc, ec='k', linewidth=0.3, alpha=1, zorder=2)) 57 | self.axes.add_patch(mpatches.Circle((i*self.head_radius*2, -1*self.thickness+membrane_y), self.head_radius, facecolor=lipid_head_fc, ec='k', linewidth=0.3, alpha=1, zorder=2)) 58 | 59 | else: 60 | membrane_box_fc='#C8C8C8' 61 | plt.fill_between(membrane_x, membrane_y_top, membrane_y_bot, color=membrane_box_fc, zorder=1.6) 62 | 63 | def make_scene(args): 64 | """Build a scene in one-go. Called when running ``cellscape scene``.""" 65 | 66 | assert args.save.split('.')[-1] in ['png','pdf','svg','ps'], "image format not recognized" 67 | 68 | # list of protein polygons to draw 69 | object_list = [] 70 | num_files = 0 71 | 72 | # set random seed for reproducibility 73 | if args.seed: 74 | np.random.seed(args.seed) 75 | 76 | if args.files: 77 | for path in args.files: 78 | with open(path,'rb') as f: 79 | data = pickle.load(f) 80 | object_list.append(data) 81 | 82 | # allow random scene generation even if manually specifying files 83 | if args.num_mol > 0: 84 | object_list = np.random.choice(object_list, size=args.num_mol) 85 | num_files = len(object_list) 86 | 87 | elif args.csv: 88 | protein_data = dict() 89 | with open(args.csv) as csvfile: 90 | reader = csv.DictReader(csvfile) 91 | for row in reader: 92 | (name, stoich, path) = (row['name'], float(row[args.sample_from]), row.get('file')) 93 | if path != "": 94 | with open(path,'rb') as f: 95 | data = pickle.load(f) 96 | data['name'] = name 97 | data['stoichiometry'] = stoich 98 | # TEST specifying color in CSV file 99 | if 'color' in row: 100 | data['color'] = row['color'] 101 | protein_data[name] = (stoich, data) 102 | elif args.use_placeholders: 103 | height = float(row.get('height'))*10 # assuming in nanometers 104 | data = {'name':name, 'stoichiometry':stoich, 'height':height, 'bottom':np.array([25,0]), 'width':50, 'polygons':[{'polygon':placeholder_polygon(height), 'edgecolor':'k', 'linewidth':1, 'facecolor':"#eeeeee"}]} 105 | protein_data[name] = (stoich, data) 106 | 107 | num_files = len(protein_data) 108 | 109 | else: 110 | sys.exit("No input files specified, see options with --help") 111 | 112 | if len(args.offsets) > 0: 113 | assert(len(args.files) == len(args.offsets)) 114 | y_offsets = list(map(float, args.offsets)) 115 | else: 116 | y_offsets = np.zeros(len(object_list)) 117 | 118 | if args.csv: 119 | # total sum of protein counts 120 | protein_names = np.array(list(protein_data.keys())) 121 | protein_stoich = np.array([protein_data[p][0] for p in protein_names]) 122 | sum_stoich = np.sum(protein_stoich) 123 | stoich_weights = protein_stoich / sum_stoich 124 | 125 | if args.num_mol > 0: 126 | # protein copy number 127 | sampled_protein = np.random.choice(protein_names, size=args.num_mol, p=stoich_weights) 128 | object_list = [protein_data[p][1] for p in sampled_protein] 129 | else: 130 | object_list = [protein_data[p][1] for p in protein_names] 131 | 132 | # assemble objects for background 133 | if args.background and args.num_mol > 0: 134 | scaling_factor = 0.7 135 | sampled_protein = np.random.choice(protein_names, int(args.num_mol*1/scaling_factor), p=stoich_weights) 136 | background_object_list = [protein_data[p][1] for p in sampled_protein] 137 | elif 'name' in object_list[0]: 138 | protein_names = [o['name'] for o in object_list] 139 | else: 140 | for i,o in enumerate(object_list): 141 | o['name'] = i 142 | protein_names = range(len(object_list)) 143 | 144 | # sort proteins 145 | if args.order_by == "random": 146 | np.random.shuffle(object_list) 147 | elif args.order_by == "height": 148 | object_list = sorted(object_list, key=lambda x: x['height'], reverse=True) 149 | elif args.order_by == "top": 150 | # TODO should be renamed, maybe length for overall size and height for above membrane? 151 | object_list = sorted(object_list, key=lambda x: x['top'][1], reverse=True) 152 | elif args.order_by == "membrane": 153 | # sorted by maximum height above or below the membrane 154 | def max_abs(l1, l2): 155 | if abs(l1) > abs(l2): 156 | return l1 157 | else: 158 | return l2 159 | object_list = sorted(object_list, key=lambda x: max_abs(x['top'][1], x['bottom'][1]), reverse=True) 160 | 161 | # set font options 162 | font_options = {'family':'Arial', 'weight':'normal', 'size':10} 163 | matplotlib.rc('font', **font_options) 164 | 165 | # set up plot 166 | # POSSIBLE BUG: while coordinates and scale are prserved in the pickle files, 167 | # this doesn't necessarily apply to the images. Hence if someone tries to 168 | # manually add a protein to a scene that has been generated there could be 169 | # sizing issues. Is the solution to use a constant angstrom/inch scaling? 170 | scene_height_in = args.fig_height 171 | scene_width_in = args.fig_width 172 | fig, axs = plt.subplots(figsize=(scene_width_in, scene_height_in)) 173 | axs.set_aspect('equal') 174 | 175 | if args.axes: 176 | plt.axis('on') 177 | axs.xaxis.grid(False) 178 | axs.yaxis.grid(True) 179 | axs.axes.xaxis.set_ticklabels([]) 180 | axs.autoscale() 181 | plt.margins(0.01,0.01) # is this needed? 182 | 183 | else: 184 | plt.axis('off') 185 | plt.gca().set_axis_off() 186 | plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0) 187 | axs.autoscale() 188 | plt.margins(0.01,0.01) 189 | plt.gca().xaxis.set_major_locator(plt.NullLocator()) 190 | plt.gca().yaxis.set_major_locator(plt.NullLocator()) 191 | 192 | if args.recolor: 193 | # default cmap is hsv. for discrete could try Set1 or Pastel1 194 | if len(args.recolor_cmap) == 1: 195 | cmap = cm.get_cmap(args.recolor_cmap[0]) 196 | else: 197 | # TESTING interpret as continous color scheme 198 | # cmap = LinearSegmentedColormap.from_list("cmap", args.recolor_cmap) 199 | cmap = ListedColormap(args.recolor_cmap) 200 | color_scheme = dict() 201 | for i,c in enumerate(sorted(object_list, key=lambda x: x['height'])): 202 | name = c['name'] 203 | if isinstance(cmap, ListedColormap): 204 | color_scheme[name] = cmap(i) 205 | else: 206 | color_scheme[name] = cmap(i/len(object_list)) 207 | 208 | # TESTING 209 | # so colors are by height (what about duplicated molecules) 210 | # np.random.shuffle(object_list) 211 | 212 | total_width = np.sum([o['width'] for o in object_list])+len(object_list)*args.padding 213 | if args.membrane is not None: 214 | membrane = Membrane(width=total_width, axes=axs, thickness=args.membrane_thickness) 215 | 216 | if args.membrane == "flat": 217 | membrane.flat() 218 | elif args.membrane == "arc": 219 | membrane.sinusoidal(frequency=0.5, amplitude=2) 220 | elif args.membrane == "wave": 221 | membrane.sinusoidal(frequency=2, amplitude=2) 222 | membrane.draw(lipids=args.membrane_lipids) 223 | 224 | # draw molecules 225 | w=0 226 | for i, o in enumerate(object_list): 227 | if args.membrane is not None and not args.no_membrane_offset: 228 | #y_offset = membrane.height_at(w+o['bottom'][0])-10 229 | #y_offset = o['bottom'][1] 230 | if o["bottom"][1] < 0: 231 | y_offset = -1*o["bottom"][1] 232 | else: 233 | y_offset = 0 234 | else: 235 | y_offset = 0 236 | for p in o["polygons"]: 237 | # TODO change to dict.get() call to have default 238 | if args.recolor: 239 | if 'color' in o: 240 | # check if color specified in CSV file 241 | facecolor = o['color'] 242 | edgecolor = p["edgecolor"] 243 | else: 244 | # use color scheme from recolor_cmap 245 | facecolor = color_scheme[o['name']] 246 | edgecolor = 'black' 247 | if "shade" in p: 248 | # TODO export shading_range from polygons as well 249 | facecolor = shade_from_color(facecolor, p["shade"], range=p.get("shading_range", 0.4)) # using default from cartoon.py, could change 250 | else: 251 | # use color already specified 252 | facecolor = p["facecolor"] 253 | edgecolor = p["edgecolor"] 254 | 255 | plot_polygon(p["polygon"], translate_pre=[w, y_offset], facecolor=facecolor, edgecolor=edgecolor, linewidth=p["linewidth"], zorder_mod=p.get("zorder", 0)) 256 | if args.labels: 257 | # option is experimental, text needs to be properly sized and placed 258 | # testing use of figure width in inches (specified above) and total width in angstroms to infer appropriate font size 259 | #plt.text(w+o['width']/2,-100, o.get("name", ""), rotation=90, fontsize=fontsize) 260 | # 1.1 and 0.6 numbers chosen through experimentation, best way would be to look at length of labels in characters 261 | angstroms_per_inch = total_width/scene_width_in 262 | fontsize = total_width*args.label_size/len(object_list)/angstroms_per_inch*72 263 | font_inches = fontsize/72 264 | # TODO better text positioning, allow for top/bottom selection 265 | if args.label_orientation == "vertical": 266 | #plt.text(w+o['width']/2,o['bottom'][1]-1.1*angstroms_per_inch*font_inches, o.get("name", ""), rotation=90, fontsize=fontsize, va='top', ha='center') # vertical text (below) 267 | plt.text(w+o['width']/2,0-1.1*angstroms_per_inch*font_inches, o.get("name", ""), rotation=90, fontsize=fontsize, va='top', ha='center') # vertical text (below) 268 | elif args.label_orientation == "horizontal": 269 | plt.text(w+o['width']/2,o['top'][1]+2*angstroms_per_inch*font_inches, o.get("name", ""), rotation=0, fontsize=fontsize, va='top', ha='center') # horizontal text (above) 270 | elif args.label_orientation == "diagonal": 271 | plt.text(w+o['width']/5,o['top'][1]+angstroms_per_inch*font_inches, o.get("name", ""), rotation=45, fontsize=fontsize) # diagonal text (above) 272 | w += o['width']+args.padding 273 | 274 | if args.background: 275 | background_w=0 276 | for i, o in enumerate(background_object_list): 277 | for p in o["polygons"]: 278 | plot_polygon(p["polygon"], offset=[background_w, 0], scale=scaling_factor, zorder_mod=p.get("zorder", -2), facecolor=p["facecolor"], edgecolor=p["edgecolor"], linewidth=p["linewidth"]*scaling_factor) 279 | background_w += (o['width']+args.padding) 280 | 281 | plt.savefig(args.save, transparent=True, pad_inches=0, bbox_inches='tight', dpi=args.dpi) 282 | -------------------------------------------------------------------------------- /cellscape/structure.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import shapely.geometry as sg 3 | import shapely.ops as so 4 | import re 5 | import os 6 | import sys 7 | import operator 8 | import warnings 9 | from Bio.PDB import rotmat, vectors, MMCIFParser, PDBParser 10 | from scipy.spatial.distance import pdist, squareform 11 | import time 12 | 13 | import cellscape 14 | from cellscape.util import amino_acid_3letter, group_by 15 | from cellscape.parse_uniprot_xml import parse_xml, download_uniprot_record 16 | from cellscape.parse_alignment import align_pair, overlap_from_alignment, sequence_overlap 17 | 18 | # silence warnings from Biopython that might pop up when loading the PDB 19 | from Bio import BiopythonWarning 20 | warnings.simplefilter('ignore', BiopythonWarning) 21 | 22 | def matrix_from_nglview(m): 23 | """Take flattened 4x4 view matrix from NGLView and convert to 3x3 rotation matrix.""" 24 | camera_matrix = np.array(m).reshape(4,4) 25 | return camera_matrix[:3,:3]/np.linalg.norm(camera_matrix[:3,:3], axis=1), camera_matrix[3,:3] 26 | 27 | def matrix_to_nglview(m): 28 | """Take 3x3 rotation matrix and convert to flattened 4x4 view matrix for NGLView.""" 29 | nglv_matrix = np.identity(4) 30 | nglv_matrix[:3,:3] = np.dot(m, np.array([[-1,0,0],[0,1,0],[0,0,-1]])) 31 | return list(nglv_matrix.flatten()) 32 | 33 | def orientation_from_topology(topologies): 34 | """Infer protein vertical orientation (N->C or C->N) from UniProt topology annotation.""" 35 | first_ex_flag = True 36 | first_ex = None 37 | first_cy_flag = True 38 | first_cy = None 39 | first_he_flag = True 40 | first_he = None 41 | 42 | for row in topologies: 43 | (description, start, end) = row 44 | 45 | if description == 'Extracellular' and first_ex_flag: 46 | first_ex = (start, end) 47 | first_ex_flag = False 48 | elif description == 'Helical' and first_he_flag: 49 | first_he = (start, end) 50 | first_he_flag = False 51 | elif description == 'Cytoplasmic' and first_cy_flag: 52 | first_cy = (start, end) 53 | first_cy_flag = False 54 | 55 | # rough heuristic for now, works for single pass transmembrane proteins 56 | nc_orient = True 57 | if first_ex is not None and first_cy is not None: 58 | if first_ex[0] < first_cy[0]: 59 | nc_orient = True # N->C (top to bottom) 60 | elif first_ex[0] > first_cy[0]: 61 | nc_orient = False # C->N (top to bottom) 62 | 63 | return(nc_orient) 64 | 65 | def orientation_from_ptm(ptm): 66 | """Assumes signal peptide is on the cytoplasmic/membrane side with the chain extracellular""" 67 | 68 | nc_orient = True 69 | if ('chain' in ptm) and ('signal peptide' in ptm): 70 | if ptm['signal peptide'][0] < ptm['chain'][0]: 71 | nc_orient = True 72 | else: 73 | nc_orient = False 74 | 75 | return(nc_orient) 76 | 77 | def depth_slices_from_coord(xyz, width): 78 | """Split single xyz Nx3 matrix into list of Nx3 matrices""" 79 | binned = (xyz[:,-1]/width).astype(int) 80 | binned_shifted = binned - np.min(binned) 81 | num_bins = np.max(binned_shifted)+1 82 | 83 | total_coords = 0 84 | slice_coords = [] 85 | 86 | for i in range(num_bins): 87 | bin_coords = xyz[binned_shifted == i] 88 | slice_coords.append(bin_coords) 89 | total_coords += len(bin_coords) 90 | 91 | assert(len(xyz) == total_coords) 92 | return slice_coords 93 | 94 | def get_z_slice_labels(xyz, width): 95 | """Take an Nx3 coordinate matrix and return Z bin""" 96 | binned = (xyz[:,-1]/width).astype(int) 97 | return binned - np.min(binned) 98 | 99 | def split_on_labels(m, labels): 100 | num_bins = np.max(labels)+1 101 | total_coords = 0 102 | coords = [] 103 | for i in range(num_bins): 104 | group_coords = m[labels == i] 105 | coords.append(group_coords) 106 | total_coords += len(group_coords) 107 | assert(len(m) == total_coords) 108 | return coords 109 | 110 | def get_dimensions(xy, end_window=50): 111 | dimensions = {} 112 | dimensions['width'] = np.max(xy[:,0]) - np.min(xy[:,0]) 113 | dimensions['height'] = np.max(xy[:,1]) - np.min(xy[:,1]) 114 | dimensions['start'] = np.mean(xy[:end_window]) 115 | dimensions['end'] = np.mean(xy[:-end_window]) 116 | dimensions['bottom'] = min(xy, key=operator.itemgetter(1)) 117 | dimensions['top'] = max(xy, key=operator.itemgetter(1)) 118 | return dimensions 119 | 120 | class Structure: 121 | """ A class to load coordinates, handle an NGLView instance, and generate cartoons""" 122 | # 123 | def __init__(self, file, name=None, model=0, chain="all", uniprot=None, view=True, is_opm=False, res_start=None, res_end=None): 124 | """ 125 | Args: 126 | file (str): Path to PDB/mmCIF coordinates 127 | name (str, optional): Descriptive name for structure. Defaults to None. 128 | model (int, optional): Model number from structure. Defaults to 0. 129 | chain (str, optional): Either "all" or list of chains to include e.g. "ABC". Defaults to "all". 130 | uniprot (str, optional): UniProt identifier (to download the record) or the path to a UniProt XML file. Defaults to None. 131 | view (bool, optional): Whether to use interactive NGLView widget. Defaults to True. 132 | is_opm (bool, optional): Structure is from Orientation of Proteins in Membranes database. Defaults to False. 133 | res_start (int, optional): Select subset of protein. Defaults to None. 134 | res_end (int, optional): Select subset of protein. Defaults to None. 135 | """ 136 | 137 | # descriptive name for the protein, otherwise use file 138 | if name is None: 139 | self.name = os.path.basename(file) 140 | else: 141 | self.name = name 142 | 143 | # load structure with biopython 144 | if file[-3:] in ["cif", "mcif"]: 145 | parser = MMCIFParser() 146 | elif file[-3:] in ["pdb", "ent"]: 147 | parser = PDBParser() 148 | else: 149 | sys.exit("File format not recognized!") 150 | self.structure = parser.get_structure(file, file)[model] 151 | _all_chains = [c.id for c in self.structure.get_chains()] 152 | 153 | # eliminate undesired chains from the biopython object 154 | if chain.lower() == "all": 155 | self.chains = _all_chains 156 | else: 157 | self.chains = list(chain) 158 | for c in _all_chains: 159 | if c not in self.chains: 160 | self.structure.detach_child(c) 161 | 162 | # take chain start and end for first chain 163 | if res_start is not None and res_end is not None: 164 | assert(res_end > res_start) 165 | for res in list(self.structure[_all_chains[0]]): 166 | res_id = res.get_full_id()[3][1] 167 | if (res_id < res_start) or (res_id > res_end): 168 | self.structure[_all_chains[0]].detach_child(res.get_id()) 169 | 170 | # BUG with some biopython structures not loading in nglview 171 | # can be fixed by resetting disordered flags 172 | # could this cause problems later on? 173 | for chain in self.structure: 174 | for residue in chain: 175 | for atom in residue.get_unpacked_list(): 176 | atom.disordered_flag = 0 177 | 178 | # assumes PDB is oriented as described here: 179 | # https://opm.phar.umich.edu/about#features 180 | self.is_opm = is_opm 181 | 182 | # view matrix and NGLView options 183 | self.use_nglview = view 184 | self.view_matrix = [] 185 | if self.use_nglview: 186 | if 'nglview' not in sys.modules or 'nv' not in sys.modules: 187 | import nglview as nv 188 | self._structure_to_view = self.structure 189 | initial_repr = [ 190 | {"type": "spacefill", "params": { 191 | "sele": "protein", "color": "element" 192 | }} 193 | ] 194 | self.view = nv.show_biopython(self._structure_to_view, sync_camera=True, representations=initial_repr) 195 | self.view.camera = 'orthographic' 196 | self.view._set_sync_camera([self.view]) 197 | self._reflect_y = np.array([[-1,0,0],[0,1,0],[0,0,-1]]) 198 | 199 | # data structure holding residue information 200 | self.residues = dict() 201 | self.sequence = dict() 202 | self.coord = [] 203 | self.ca_atoms = [] 204 | self.backbone_atoms = [] 205 | all_atoms = 0 206 | for chain in self.chains: 207 | self.sequence[chain] = "" 208 | self.residues[chain] = dict() 209 | for res in self.structure[chain]: 210 | res_id = res.get_full_id()[3][1] 211 | if res.get_full_id()[3][0][0] == "H": # skip hetatm records 212 | continue 213 | if res.get_resname() not in amino_acid_3letter: 214 | continue 215 | res_aa = amino_acid_3letter[res.get_resname()] 216 | self.sequence[chain] += res_aa 217 | residue_atoms = 0 218 | these_atoms = [] 219 | backbone_atoms = [] 220 | for a in res: 221 | self.coord.append(list(a.get_vector())) 222 | these_atoms.append(a.id) # tracking atom identities for now 223 | if a.id == "CA": 224 | this_ca_atom = all_atoms 225 | self.ca_atoms.append(this_ca_atom) 226 | if a.id in ["CA", "N", "C", "O"]: 227 | backbone_atoms.append(all_atoms) 228 | self.backbone_atoms.append(all_atoms) 229 | all_atoms += 1 230 | residue_atoms += 1 231 | self.residues[chain][res_id] = { 232 | 'chain':chain, 233 | 'id':res_id, 234 | 'amino_acid':res_aa, 235 | 'object':res, 236 | 'coord':(all_atoms-residue_atoms, all_atoms), 237 | 'coord_ca':(this_ca_atom, this_ca_atom+1), 238 | 'coord_backbone':np.array(backbone_atoms), 239 | 'atoms':np.array(these_atoms) 240 | } 241 | self.coord = np.array(self.coord) 242 | self.ca_atoms = np.array(self.ca_atoms).astype(int) 243 | 244 | # uniprot information 245 | if uniprot is not None: 246 | if os.path.exists(uniprot): 247 | self._uniprot_xml = uniprot 248 | elif re.fullmatch(r'[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}', uniprot): 249 | # if file doesn't exist, check it is a valid UniProt ID and download from server 250 | # using regex from https://www.uniprot.org/help/accession_numbers 251 | try: 252 | self._uniprot_xml = download_uniprot_record(uniprot, "xml", os.getcwd()) 253 | except: 254 | sys.exit("Couldn't download UniProt file") 255 | else: 256 | self._uniprot_xml = None 257 | sys.exit("Must specify either a UniProt XML file or a valid UniProt ID") 258 | else: 259 | self._uniprot_xml = None 260 | 261 | if self._uniprot_xml is not None: 262 | self._preprocess_uniprot(self._uniprot_xml) 263 | 264 | def _preprocess_uniprot(self, xml): 265 | # TODO support more than one XML file (e.g. for different chains?) 266 | self._uniprot = parse_xml(xml)[0] 267 | 268 | # align PDB and UniProt sequences to find offset 269 | uniprot_chain = self.chains[0] 270 | pdb_seq = self.sequence[uniprot_chain] 271 | uniprot_seq = self._uniprot.sequence 272 | first_residue_id = sorted(self.residues[uniprot_chain])[0] 273 | # alignment coordinates are 0-indexed (but PDB numbering and Uniprot ranges are 1-indexed) 274 | #self._uniprot_overlap = np.array(overlap_from_alignment(align_pair(uniprot_seq, pdb_seq))) 275 | self._uniprot_overlap = np.array(sequence_overlap(uniprot_seq, pdb_seq)) 276 | self._uniprot_offset = self._uniprot_overlap[0] - first_residue_id 277 | 278 | if len(self._uniprot.domains) > 0: 279 | self._annotate_residues_from_uniprot(self._uniprot.domains, name_key="domain", residues=self.residues[uniprot_chain], offset=self._uniprot_offset) 280 | 281 | if len(self._uniprot.topology) > 0: 282 | self._annotate_residues_from_uniprot(self._uniprot.topology, name_key="topology", residues=self.residues[uniprot_chain], offset=self._uniprot_offset) 283 | 284 | def _annotate_residues_from_uniprot(self, ranges, name_key, residues, offset=0): 285 | # pdb_number - offset = up_number 286 | for row in ranges: 287 | (name, start, end) = row 288 | for r in range(start, end+1): 289 | if (r-offset) in residues: 290 | residues[r-offset][name_key] = name 291 | 292 | def _update_view_matrix(self): 293 | # check if camera orientation has been specified from nglview 294 | if len(self.view._camera_orientation) == 16: 295 | m, t = matrix_from_nglview(self.view._camera_orientation) 296 | self.view_matrix = np.dot(m, self._reflect_y) 297 | elif len(self.view_matrix) == 0: 298 | self.view_matrix = np.identity(3) 299 | 300 | def align_view(self, v1, v2): 301 | """Rotate structure so v1 is aligned with v2 302 | 303 | Args: 304 | v1 (ndarray): first vector 305 | v2 (ndarray): second vector 306 | """ 307 | # rotate structure so v1 is aligned with v2 308 | r = rotmat(vectors.Vector(v1), vectors.Vector(v2)) 309 | view_matrix = r.T 310 | self.set_view_matrix(view_matrix) 311 | 312 | def align_view_nc(self, n_atoms=10, c_atoms=10, flip=False): 313 | """Rotate structure so N-C vector is aligned with the vertical axis 314 | 315 | Args: 316 | n_atoms (int, optional): N terminus CoM calculated from first x atoms. Defaults to 10. 317 | c_atoms (int, optional): C terminus CoM calculated from first x atoms. Defaults to 10. 318 | flip (bool, optional): Orient C-to-N instead of N-to-C. Defaults to False. 319 | """ 320 | com = np.mean(self.coord, axis=0) 321 | atoms_ = self.coord - com 322 | v1 = np.mean(atoms_[:n_atoms], axis=0) - np.mean(atoms_[-c_atoms:], axis=0) 323 | if not flip: 324 | self.align_view(v1, np.array([0,1,0])) 325 | else: 326 | self.align_view(v1, np.array([0,-1,0])) 327 | 328 | def auto_view(self, n_atoms=100, c_atoms=100, flip=None): 329 | """Infer protein orientation from UniProt data 330 | 331 | Args: 332 | n_atoms (int, optional): N terminus CoM calculated from first x atoms.. Defaults to 100. 333 | c_atoms (int, optional): C terminus CoM calculated from first x atoms. Defaults to 100. 334 | flip (bool, optional): Explicitly pass orientation. Defaults to None. 335 | """ 336 | # TODO should be same as align_view_nc if no UniProt data? 337 | # TODO abstract with align_view? 338 | # TODO abstract rotmat to separate function e.g. get_rotation_matrix() 339 | if flip is None: 340 | if self._uniprot_xml and len(self._uniprot.topology) > 0: 341 | print("orienting based on topology...") 342 | nc_orient = orientation_from_topology(self._uniprot.topology) 343 | elif self._uniprot_xml and len(self._uniprot.ptm) > 0: 344 | print("orienting based on ptm...") 345 | nc_orient = orientation_from_ptm(self._uniprot.ptm) 346 | else: 347 | nc_orient = True 348 | elif isinstance(flip, bool): 349 | nc_orient = flip 350 | print("guessed N>C orientation? {}".format(nc_orient)) 351 | self.nc_orient = nc_orient 352 | 353 | # rotate structure so N-C vector is aligned with the vertical axis 354 | com = np.mean(self.coord, axis=0) 355 | atoms_ = self.coord - com 356 | v1 = np.mean(atoms_[:n_atoms], axis=0) - np.mean(atoms_[-c_atoms:], axis=0) 357 | if nc_orient: 358 | first_rotation = rotmat(vectors.Vector(v1), vectors.Vector(np.array([0,1,0]))).T 359 | else: 360 | first_rotation = rotmat(vectors.Vector(v1), vectors.Vector(np.array([0,-1,0]))).T 361 | 362 | # rotate around Y axis so X axis aligns with longest distance in XZ plane 363 | rot_coord = np.dot(self.coord, first_rotation) 364 | com = np.mean(rot_coord, axis=0) 365 | atoms_ = rot_coord - com 366 | xz = atoms_[self.ca_atoms][:,[0,2]] 367 | dist = squareform(pdist(xz)) 368 | max_dist = np.unravel_index(np.argmax(dist, axis=None), dist.shape) 369 | #print(max_dist, np.max(dist), dist[max_dist[0]][max_dist[1]]) 370 | v2 = atoms_[self.ca_atoms[max_dist[0]]]-atoms_[self.ca_atoms[max_dist[1]]] 371 | v2[1] = 0 372 | second_rotation = rotmat(vectors.Vector(v2), vectors.Vector(np.array([1,0,0]))).T 373 | 374 | view_matrix = np.dot(first_rotation, second_rotation) 375 | self.set_view_matrix(view_matrix) 376 | 377 | def _set_nglview_orientation(self, m): 378 | # m is 3x3 rotation matrix 379 | if self.use_nglview: 380 | nglv_matrix = matrix_to_nglview(m) 381 | #print("Before", self.view._camera_orientation) 382 | self.view._set_camera_orientation(nglv_matrix) 383 | # having a bug where setting camera orientation does nothing 384 | # waiting a little bit seems to fix it (maybe an issue with sync/refresh rate) 385 | #self.view.control.orient(nglv_matrix) 386 | #self.view._camera_orientation = nglv_matrix 387 | time.sleep(0.5) 388 | self.view.center() 389 | #print("After", self.view._camera_orientation) 390 | 391 | def _apply_view_matrix(self): 392 | # transform atomic coordinates using view matrix 393 | self.rotated_coord = np.dot(self.coord, self.view_matrix) 394 | 395 | def load_pymol_view(self, file): 396 | """Read rotation matrix from output of PyMol ``get_view`` command 397 | 398 | Args: 399 | file (str): Path to file 400 | """ 401 | matrix = [] 402 | with open(file,'r') as view: 403 | for line in view: 404 | fields = line.split(',') 405 | if len(fields) == 4: 406 | matrix.append(list(map(float,fields[:3]))) 407 | view_matrix = np.array(matrix)[:3] 408 | self.set_view_matrix(view_matrix) 409 | 410 | def load_chimera_view(self, file): 411 | """Read rotation matrix from output of Chimera ``matrixset`` command 412 | 413 | Args: 414 | file (str): Path to file 415 | """ 416 | matrix = [] 417 | with open(file,'r') as view: 418 | for line in view.readlines()[1:4]: 419 | matrix.append(line.split()) 420 | 421 | # transpose and remove translation vector 422 | view_matrix = np.array(matrix).astype(float).T[:3] 423 | self.set_view_matrix(view_matrix) 424 | 425 | def save_view_matrix(self, p): 426 | """Save rotation matrix to a NumPy text file 427 | 428 | Args: 429 | p (str): Path to file 430 | """ 431 | self._update_view_matrix() 432 | np.savetxt(p, self.view_matrix) 433 | 434 | def load_view_matrix(self, p): 435 | """Load rotation matrix from a NumPy text file 436 | 437 | Args: 438 | p (str): Path to file 439 | """ 440 | view_matrix = np.loadtxt(p) 441 | self.set_view_matrix(view_matrix) 442 | 443 | def set_view_matrix(self, m): 444 | """Manually set view matrix 445 | 446 | Args: 447 | m (ndarray): 3x3 matrix 448 | """ 449 | assert m.shape == (3,3) 450 | self.view_matrix = m 451 | self._set_nglview_orientation(self.view_matrix) 452 | 453 | def outline(self, by="all", depth=None, depth_contour_interval=3, only_backbone=False, only_ca=False, only_annotated=False, radius=None, back_outline=False, align_transmembrane=False): 454 | """Create 2D projection from coordinates and outline atoms 455 | 456 | Args: 457 | by (str, optional): Grouping to use for cartoon. Options are ["all", "residue", "chain", "domain", "topology"]. Defaults to "all". 458 | depth (_type_, optional): How to deal with depth/occlusions. Options are ["flat", "contours"]. Defaults to None. 459 | depth_contour_interval (float, optional): Size in angstroms of contour slices into the Z-axis. Defaults to 3. 460 | only_backbone (bool, optional): Only use backbone atoms for visualization. Defaults to False. 461 | only_ca (bool, optional): Only use alpha-carbon atoms for visualization. Defaults to False. 462 | only_annotated (bool, optional): Only include residues that have an annotation in UniProt (e.g. domain or topology). Defaults to False. 463 | radius (float, optional): Explicitly pass atomic radius, otherwise infer from settings. Defaults to None. 464 | back_outline (bool, optional): Draw additional outline of entire structure at the back. Defaults to False. 465 | align_transmembrane (bool, optional): Align CoM of annotated transmembrane regions with membrane (requires UniProt data). Defaults to False. 466 | 467 | Returns: 468 | Cartoon: Object containing and residue information and outlined polygons 469 | """ 470 | 471 | # check options 472 | assert by in ["all", "residue", "chain", "domain", "topology"], "Option not recognized" 473 | assert depth in [None, "flat", "contours"], "Option not recognized" 474 | # depth option doesn't affect by="residue" 475 | 476 | # collapse chain hierarchy into flat list 477 | self.residues_flat = [self.residues[c][i] for c in self.residues for i in self.residues[c]] 478 | 479 | if self.is_opm: 480 | self.set_view_matrix(np.array([[1,0,0],[0,0,1],[0,1,0]])) 481 | elif self.use_nglview: 482 | self._update_view_matrix() 483 | 484 | # transform atomic coordinates using view matrix 485 | self._apply_view_matrix() 486 | 487 | # recenter coordinates on lower left edge of bounding box 488 | offset_x = np.min(self.rotated_coord[:,0]) 489 | if self.is_opm: 490 | offset_y = 0 # since OPM already aligned to membrane 491 | else: 492 | offset_y = np.min(self.rotated_coord[:,1]) 493 | self.rotated_coord -= np.array([offset_x, offset_y, 0]) 494 | 495 | # calculate vertical offset for transmembrane proteins 496 | if self._uniprot_xml and align_transmembrane: 497 | tm_coordinates = [] 498 | for res in self.residues_flat: 499 | if res.get("topology","") == "Helical": 500 | tm_coordinates.append(np.array(self.rotated_coord[range(*res['coord_ca'])])) 501 | if len(tm_coordinates) > 0: 502 | tm_coordinates = np.concatenate(np.array(tm_coordinates)) 503 | tm_com_y = np.mean(tm_coordinates[:,1]) 504 | print("shifted for transmembrane region by {} angstroms".format(tm_com_y)) 505 | self.rotated_coord -= np.array([0, tm_com_y, 0]) 506 | 507 | self._rescale_z = lambda z: (z-np.min(self.rotated_coord[:,-1]))/(np.max(self.rotated_coord[:,-1])-np.min(self.rotated_coord[:,-1])) 508 | polygons = [] 509 | groups = {} 510 | self._group_outlines = [] 511 | 512 | # default radius for rendering atoms 513 | if only_ca and radius is None: 514 | radius_ = 5 515 | elif only_backbone and radius is None: 516 | radius_ = 4 517 | elif radius is None: 518 | radius_ = 1.5 519 | else: 520 | radius_ = radius 521 | 522 | if by == 'all': 523 | # space-filling outline of entire molecule 524 | self.num_groups = 1 525 | if only_ca: 526 | coord_to_outline = self.rotated_coord[self.ca_atoms] 527 | elif only_backbone: 528 | coord_to_outline = self.rotated_coord[self.backbone_atoms] 529 | else: 530 | coord_to_outline = self.rotated_coord 531 | if depth == "contours": 532 | slice_coords = split_on_labels(coord_to_outline, get_z_slice_labels(coord_to_outline, width=depth_contour_interval)) 533 | for slice in slice_coords: 534 | slice_depth = self._rescale_z(np.mean(slice[:,-1])) 535 | polygons.append(({"depth":slice_depth}, so.unary_union([sg.Point(i).buffer(radius_) for i in slice]))) 536 | else: 537 | # depth=None and depth=flat are equivalent for by="all" 538 | polygons.append(({}, so.unary_union([sg.Point(i).buffer(radius_) for i in coord_to_outline]))) 539 | else: 540 | for res in self.residues_flat: 541 | # pick range of atomic coordinates out of main data structure 542 | if only_ca: 543 | res_coords = np.array(self.rotated_coord[range(*res['coord_ca'])]) 544 | elif only_backbone: 545 | res_coords = np.array(self.rotated_coord[range(*res['coord_backbone'])]) 546 | else: 547 | res_coords = np.array(self.rotated_coord[range(*res['coord'])]) 548 | res["xyz"] = res_coords 549 | 550 | if by == 'residue': 551 | for res in sorted(self.residues_flat, key=lambda res: np.mean(res["xyz"][:,-1])): 552 | group_outline = so.cascaded_union([sg.Point(i).buffer(radius_) for i in res["xyz"] ]) 553 | res["polygon"] = group_outline 554 | res["depth"] = self._rescale_z(np.mean(res["xyz"][:,-1])) 555 | polygons.append((res, group_outline)) 556 | self.num_groups = 1 557 | 558 | elif by in ['domain', 'topology', 'chain']: 559 | 560 | if by in ['domain', 'topology']: 561 | assert(self._uniprot_xml is not None) 562 | 563 | # TODO comment code and be consistent with variable names group vs region 564 | residue_groups = group_by(self.residues_flat, key=lambda x: x.get(by)) 565 | groups = sorted(residue_groups.keys(), key=lambda x: (x is None, x)) 566 | 567 | self.num_groups = len(residue_groups) 568 | region_atoms = dict() # residue group to atomic indices 569 | total_atoms = 0 570 | for k,v in residue_groups.items(): 571 | region_atoms[k] = [] 572 | for res in v: 573 | if only_ca: 574 | region_atoms[k].extend(range(*res['coord_ca'])) 575 | elif only_backbone: 576 | region_atoms[k].extend(range(*res['coord_backbone'])) 577 | else: 578 | region_atoms[k].extend(range(*res['coord'])) 579 | region_atoms[k] = np.array(region_atoms[k], dtype=int) 580 | total_atoms += len(region_atoms[k]) 581 | 582 | if depth is not None: 583 | 584 | slice_labels = get_z_slice_labels(self.rotated_coord, width=depth_contour_interval) 585 | num_slices = np.max(slice_labels)+1 586 | 587 | if depth == "contours": 588 | for s in range(num_slices): 589 | for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))): 590 | if not only_annotated or group_name is not None: 591 | atom_indices = region_atoms[group_name] 592 | slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == s] 593 | if len(slice_coords) > 0: 594 | slice_depth = self._rescale_z(np.mean(slice_coords[:,-1])) 595 | slice_outline = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords]) 596 | polygons.append(({by:group_name, "depth":slice_depth}, slice_outline)) 597 | 598 | # back outline to highlight each group's contours... just duplicating depth==flat code here 599 | if back_outline: 600 | empty_polygon = sg.Point((0,0)).buffer(0) 601 | view_object = empty_polygon 602 | region_polygons = dict() 603 | for slice in range(num_slices, 0, -1): 604 | for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))): 605 | if not only_annotated or group_name is not None: 606 | atom_indices = region_atoms[group_name] 607 | slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == slice] 608 | poly = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords]) 609 | this_difference = poly.difference(view_object) 610 | region_polygons[group_name] = region_polygons.get(group_name, empty_polygon).union(this_difference.buffer(0.01)) 611 | view_object = view_object.union(this_difference.buffer(0.01)) 612 | 613 | for v in region_polygons.values(): 614 | self._group_outlines.append(v) 615 | 616 | elif depth == "flat": 617 | empty_polygon = sg.Point((0,0)).buffer(0) 618 | view_object = empty_polygon 619 | region_polygons = dict() 620 | for slice in range(num_slices, 0, -1): 621 | for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))): 622 | if not only_annotated or group_name is not None: 623 | atom_indices = region_atoms[group_name] 624 | slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == slice] 625 | poly = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords]) 626 | this_difference = poly.difference(view_object) 627 | region_polygons[group_name] = region_polygons.get(group_name, empty_polygon).union(this_difference.buffer(0.01)) 628 | view_object = view_object.union(this_difference.buffer(0.01)) 629 | 630 | for k,v in region_polygons.items(): 631 | polygons.append(({by:k}, v)) 632 | 633 | else: 634 | for group_i, (group_name, group_res) in enumerate(residue_groups.items()): 635 | if not only_annotated or group_name is not None: 636 | group_coords = self.rotated_coord[region_atoms[group_name]] 637 | polygons.append(({by:group_name}, so.unary_union([sg.Point(i).buffer(radius_) for i in group_coords]))) 638 | 639 | if back_outline: 640 | self._back_outline = so.unary_union([p[1].buffer(0.01) for p in polygons]) 641 | else: 642 | self._back_outline = None 643 | 644 | print("Outlined {} polygons!".format(len(polygons)), file=sys.stderr) 645 | 646 | return cellscape.Cartoon(self.name, polygons, self.residues_flat, by, self._back_outline, self._group_outlines, self.num_groups, get_dimensions(self.rotated_coord), groups) 647 | -------------------------------------------------------------------------------- /cellscape/util.py: -------------------------------------------------------------------------------- 1 | amino_acid_3letter = {'ALA':'A', 2 | 'ASX':'B', 3 | 'CYS':'C', 4 | 'ASP':'D', 5 | 'GLU':'E', 6 | 'PHE':'F', 7 | 'GLY':'G', 8 | 'HIS':'H', 9 | 'ILE':'I', 10 | 'LYS':'K', 11 | 'LEU':'L', 12 | 'MET':'M', 13 | 'MSE':'M', 14 | 'ASN':'N', 15 | 'PRO':'P', 16 | 'GLN':'Q', 17 | 'ARG':'R', 18 | 'SER':'S', 19 | 'THR':'T', 20 | 'VAL':'V', 21 | 'TRP':'W', 22 | 'XAA':'X', 23 | 'TYR':'Y', 24 | 'GLX':'Z'} 25 | 26 | def group_by(l, key): 27 | """Take a list of dictionaries and group them according to a key.""" 28 | d = dict() 29 | for i in l: 30 | k = key(i) 31 | if k in d: 32 | d[k].append(i) 33 | else: 34 | d[k] = [i] 35 | return d -------------------------------------------------------------------------------- /examples/ceacam5/P06731.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | P06731 5 | H9KVA7 6 | CEAM5_HUMAN 7 | 8 | 9 | Carcinoembryonic antigen-related cell adhesion molecule 5 10 | 11 | 12 | Carcinoembryonic antigen 13 | CEA 14 | 15 | 16 | Meconium antigen 100 17 | 18 | CD66e 19 | 20 | 21 | CEACAM5 22 | CEA 23 | 24 | 25 | Homo sapiens 26 | Human 27 | 28 | 29 | Eukaryota 30 | Metazoa 31 | Chordata 32 | Craniata 33 | Vertebrata 34 | Euteleostomi 35 | Mammalia 36 | Eutheria 37 | Euarchontoglires 38 | Primates 39 | Haplorrhini 40 | Catarrhini 41 | Hominidae 42 | Homo 43 | 44 | 45 | 46 | 47 | Isolation and characterization of full-length functional cDNA clones for human carcinoembryonic antigen. 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | NUCLEOTIDE SEQUENCE [GENOMIC DNA] 59 | VARIANT GLU-398 60 | 61 | 62 | 63 | Carcinoembryonic antigen family: characterization of cDNAs coding for NCA and CEA and suggestion of nonrandom sequence variation in their conserved loop-domains. 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1) 74 | VARIANT GLU-398 75 | 76 | 77 | 78 | Cloning of the complete gene for carcinoembryonic antigen: analysis of its promoter indicates a region conveying cell type-specific expression. 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | NUCLEOTIDE SEQUENCE [GENOMIC DNA] 94 | VARIANT GLU-398 95 | 96 | 97 | 98 | The DNA sequence and biology of human chromosome 19. 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA] 203 | 204 | 205 | 206 | Primary structure of human carcinoembryonic antigen (CEA) deduced from cDNA sequence. 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | NUCLEOTIDE SEQUENCE [MRNA] OF 5-702 (ISOFORM 2) 216 | VARIANT GLU-398 217 | 218 | 219 | 220 | Isolation and characterization of cDNA clones encoding the human carcinoembryonic antigen reveal a highly conserved repeating structure. 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | NUCLEOTIDE SEQUENCE [MRNA] OF 331-702 (ISOFORM 1) 231 | VARIANT GLU-398 232 | 233 | 234 | 235 | Cell adhesion activity of non-specific cross-reacting antigen (NCA) and carcinoembryonic antigen (CEA) expressed on CHO cell surface: homophilic and heterophilic adhesion. 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | FUNCTION 248 | SUBCELLULAR LOCATION 249 | 250 | 251 | 252 | Expression of complementary DNA and genomic clones for carcinoembryonic antigen and nonspecific cross-reacting antigen in Chinese hamster ovary and mouse fibroblast cells and characterization of the membrane-expressed products. 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | SUBCELLULAR LOCATION 264 | GPI-ANCHOR AT ALA-685 265 | 266 | 267 | 268 | Four carcinoembryonic antigen subfamily members, CEA, NCA, BGP and CGM2, selectively expressed in the normal human colonic epithelium, are integral components of the fuzzy coat. 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | SUBCELLULAR LOCATION 278 | TISSUE SPECIFICITY 279 | 280 | 281 | 282 | Human carcinoembryonic antigen functions as a general inhibitor of anoikis. 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | FUNCTION 292 | 293 | 294 | 295 | Self recognition in the Ig superfamily. Identification of precise subdomains in carcinoembryonic antigen required for intercellular adhesion. 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | FUNCTION 308 | MUTAGENESIS OF SER-66; TYR-68; LYS-69 AND GLN-78 309 | SUBCELLULAR LOCATION 310 | 311 | 312 | 313 | Identification of N-linked glycoproteins in human saliva by glycoprotein capture and mass spectrometry. 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | GLYCOSYLATION [LARGE SCALE ANALYSIS] AT ASN-560 326 | 327 | Saliva 328 | 329 | 330 | 331 | 332 | Glycoproteomics analysis of human liver tissue by combination of multiple enzyme digestion and hydrazide chemistry. 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | GLYCOSYLATION [LARGE SCALE ANALYSIS] AT ASN-246 347 | 348 | Liver 349 | 350 | 351 | 352 | 353 | Diverse oligomeric states of CEACAM IgV domains. 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | SUBUNIT 365 | 366 | 367 | 368 | Structural models for carcinoembryonic antigen and its complex with the single-chain Fv antibody molecule MFE23. 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 3D-STRUCTURE MODELING OF 35-676 377 | 378 | 379 | 380 | Binding of Dr adhesins of Escherichia coli to carcinoembryonic antigen triggers receptor dissociation. 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | X-RAY CRYSTALLOGRAPHY (1.95 ANGSTROMS) OF 34-144 IN COMPLEX WITH E.COLI DR ADHESIN 397 | FUNCTION (MICROBIAL INFECTION) 398 | SUBUNIT 399 | SUBCELLULAR LOCATION 400 | MUTAGENESIS OF PHE-63; SER-66; VAL-73; ASP-74; GLN-78; ILE-125; LEU-129 AND GLU-133 401 | 402 | 403 | Cell surface glycoprotein that plays a role in cell adhesion, intracellular signaling and tumor progression (PubMed:2803308, PubMed:10910050, PubMed:10864933). Mediates homophilic and heterophilic cell adhesion with other carcinoembryonic antigen-related cell adhesion molecules, such as CEACAM6 (PubMed:2803308). Plays a role as an oncogene by promoting tumor progression; induces resistance to anoikis of colorectal carcinoma cells (PubMed:10910050). 404 | 405 | 406 | (Microbial infection) Receptor for E.coli Dr adhesins. Binding of E.coli Dr adhesins leads to dissociation of the homodimer. 407 | 408 | 409 | Homodimer. 410 | 411 | 412 | 413 | P06731 414 | 415 | 416 | P06731 417 | 418 | 419 | false 420 | 6 421 | 422 | 423 | 424 | P06731 425 | 426 | 427 | K0BRG7 428 | 429 | 430 | true 431 | 4 432 | 433 | 434 | 435 | Cell membrane 436 | Lipid-anchor 437 | GPI-anchor 438 | 439 | 440 | Apical cell membrane 441 | 442 | 443 | Cell surface 444 | 445 | Localized to the apical glycocalyx surface. 446 | 447 | 448 | 449 | 450 | P06731-1 451 | 1 452 | 453 | 454 | 455 | P06731-2 456 | 2 457 | 458 | 459 | 460 | 461 | Expressed in columnar epithelial and goblet cells of the colon (at protein level) (PubMed:10436421). Found in adenocarcinomas of endodermally derived digestive system epithelium and fetal colon. 462 | 463 | 464 | Complex immunoreactive glycoprotein with a MW of 180 kDa comprising 60% carbohydrate. 465 | 466 | 467 | Belongs to the immunoglobulin superfamily. CEA family. 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 | 487 | 488 | 489 | 490 | 491 | 492 | 493 | 494 | 495 | 496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | 511 | 512 | 513 | 514 | 515 | 516 | 517 | 518 | 519 | 520 | 521 | 522 | 523 | 524 | 525 | 526 | 527 | 528 | 529 | 530 | 531 | 532 | 533 | 534 | 535 | 536 | 537 | 538 | 539 | 540 | 541 | 542 | 543 | 544 | 545 | 546 | 547 | 548 | 549 | 550 | 551 | 552 | 553 | 554 | 555 | 556 | 557 | 558 | 559 | 560 | 561 | 562 | 563 | 564 | 565 | 566 | 567 | 568 | 569 | 570 | 571 | 572 | 573 | 574 | 575 | 576 | 577 | 578 | 579 | 580 | 581 | 582 | 583 | 584 | 585 | 586 | 587 | 588 | 589 | 590 | 591 | 592 | 593 | 594 | 595 | 596 | 597 | 598 | 599 | 600 | 601 | 602 | 603 | 604 | 605 | 606 | 607 | 608 | 609 | 610 | 611 | 612 | 613 | 614 | 615 | 616 | 617 | 618 | 619 | 620 | 621 | 622 | 623 | 624 | 625 | 626 | 627 | 628 | 629 | 630 | 631 | 632 | 633 | 634 | 635 | 636 | 637 | 638 | 639 | 640 | 641 | 642 | 643 | 644 | 645 | 646 | 647 | 648 | 649 | 650 | 651 | 652 | 653 | 654 | 655 | 656 | 657 | 658 | 659 | 660 | 661 | 662 | 663 | 664 | 665 | 666 | 667 | 668 | 669 | 670 | 671 | 672 | 673 | 674 | 675 | 676 | 677 | 678 | 679 | 680 | 681 | 682 | 683 | 684 | 685 | 686 | 687 | 688 | 689 | 690 | 691 | 692 | 693 | 694 | 695 | 696 | 697 | 698 | 699 | 700 | 701 | 702 | 703 | 704 | 705 | 706 | 707 | 708 | 709 | 710 | 711 | 712 | 713 | 714 | 715 | 716 | 717 | 718 | 719 | 720 | 721 | 722 | 723 | 724 | 725 | 726 | 727 | 728 | 729 | 730 | 731 | 732 | 733 | 734 | 735 | 736 | 737 | 738 | 739 | 740 | 741 | 742 | 743 | 744 | 745 | 746 | 747 | 748 | 749 | 750 | 751 | 752 | 753 | 754 | 755 | 756 | 757 | 758 | 759 | 760 | 761 | 762 | 763 | 764 | 765 | 766 | 767 | 768 | 769 | 770 | 771 | 772 | 773 | 774 | 775 | 776 | 777 | 778 | 779 | 780 | 781 | 782 | 783 | 784 | 785 | 786 | 787 | 788 | 789 | 790 | 791 | 792 | 793 | 794 | 795 | 796 | 797 | 798 | 799 | 800 | 801 | 802 | 803 | 804 | 805 | 806 | 807 | 808 | 809 | 810 | 811 | 812 | 813 | 814 | 815 | 816 | 817 | 818 | 819 | 820 | 821 | 822 | 823 | 824 | 825 | 826 | 827 | 828 | 829 | 830 | 831 | 832 | 833 | 834 | 835 | 836 | 837 | 838 | 839 | 840 | 841 | 842 | 843 | 844 | 845 | 846 | 847 | 848 | 849 | 850 | 851 | 3D-structure 852 | Alternative splicing 853 | Apoptosis 854 | Cell adhesion 855 | Cell membrane 856 | Disulfide bond 857 | Glycoprotein 858 | GPI-anchor 859 | Immunoglobulin domain 860 | Lipoprotein 861 | Membrane 862 | Oncogene 863 | Polymorphism 864 | Reference proteome 865 | Repeat 866 | Signal 867 | 868 | 869 | 870 | 871 | 872 | 873 | 874 | 875 | 876 | 877 | 878 | 879 | 880 | 881 | 882 | 883 | 884 | 885 | 886 | 887 | 888 | 889 | 890 | 891 | 892 | 893 | 894 | 895 | 896 | 897 | 898 | 899 | 900 | 901 | 902 | 903 | 904 | 905 | 906 | 907 | 908 | 909 | 910 | 911 | 912 | 913 | 914 | 915 | 916 | 917 | 918 | 919 | 920 | 921 | 922 | 923 | 924 | 925 | 926 | 927 | 928 | 929 | 930 | 931 | 932 | 933 | 934 | 935 | 936 | 937 | 938 | 939 | 940 | 941 | 942 | 943 | 944 | 945 | 946 | 947 | 948 | 949 | 950 | 951 | 952 | 953 | 954 | 955 | 956 | 957 | 958 | 959 | 960 | 961 | 962 | 963 | 964 | 965 | 966 | 967 | 968 | 969 | 970 | 971 | 972 | 973 | 974 | 975 | 976 | 977 | 978 | 979 | 980 | 981 | 982 | 983 | 984 | 985 | 986 | 987 | 988 | 989 | 990 | 991 | 992 | 993 | 994 | 995 | 996 | 997 | 998 | 999 | 1000 | 1001 | 1002 | 1003 | 1004 | 1005 | 1006 | 1007 | 1008 | 1009 | 1010 | 1011 | 1012 | 1013 | 1014 | 1015 | 1016 | 1017 | 1018 | 1019 | 1020 | 1021 | 1022 | 1023 | 1024 | 1025 | 1026 | 1027 | 1028 | 1029 | 1030 | 1031 | 1032 | 1033 | 1034 | 1035 | 1036 | 1037 | 1038 | 1039 | 1040 | 1041 | 1042 | 1043 | 1044 | 1045 | 1046 | 1047 | 1048 | 1049 | 1050 | 1051 | 1052 | 1053 | 1054 | 1055 | 1056 | 1057 | 1058 | 1059 | 1060 | 1061 | 1062 | 1063 | 1064 | 1065 | 1066 | 1067 | 1068 | 1069 | 1070 | 1071 | 1072 | 1073 | 1074 | 1075 | 1076 | 1077 | 1078 | 1079 | 1080 | 1081 | 1082 | 1083 | 1084 | 1085 | 1086 | 1087 | 1088 | 1089 | 1090 | 1091 | 1092 | 1093 | 1094 | 1095 | 1096 | 1097 | 1098 | 1099 | 1100 | 1101 | 1102 | 1103 | 1104 | 1105 | 1106 | 1107 | 1108 | 1109 | 1110 | 1111 | 1112 | 1113 | 1114 | I 1115 | V 1116 | 1117 | 1118 | 1119 | 1120 | 1121 | V 1122 | A 1123 | 1124 | 1125 | 1126 | 1127 | 1128 | Q 1129 | P 1130 | 1131 | 1132 | 1133 | 1134 | 1135 | A 1136 | D 1137 | 1138 | 1139 | 1140 | 1141 | 1142 | K 1143 | E 1144 | 1145 | 1146 | 1147 | 1148 | 1149 | R 1150 | S 1151 | 1152 | 1153 | 1154 | 1155 | 1156 | G 1157 | R 1158 | 1159 | 1160 | 1161 | 1162 | 1163 | F 1164 | I 1165 | 1166 | 1167 | 1168 | 1169 | 1170 | F 1171 | R 1172 | 1173 | 1174 | 1175 | 1176 | 1177 | S 1178 | N 1179 | 1180 | 1181 | 1182 | 1183 | 1184 | Y 1185 | A 1186 | 1187 | 1188 | 1189 | 1190 | 1191 | Y 1192 | F 1193 | 1194 | 1195 | 1196 | 1197 | 1198 | K 1199 | A 1200 | 1201 | 1202 | 1203 | 1204 | 1205 | V 1206 | A 1207 | 1208 | 1209 | 1210 | 1211 | 1212 | D 1213 | A 1214 | 1215 | 1216 | 1217 | 1218 | 1219 | D 1220 | L 1221 | R 1222 | 1223 | 1224 | 1225 | 1226 | 1227 | Q 1228 | L 1229 | R 1230 | 1231 | 1232 | 1233 | 1234 | 1235 | I 1236 | A 1237 | 1238 | 1239 | 1240 | 1241 | 1242 | L 1243 | A 1244 | C 1245 | 1246 | 1247 | 1248 | 1249 | 1250 | L 1251 | S 1252 | 1253 | 1254 | 1255 | 1256 | 1257 | E 1258 | A 1259 | 1260 | 1261 | 1262 | 1263 | 1264 | F 1265 | L 1266 | 1267 | 1268 | 1269 | 1270 | 1271 | T 1272 | Q 1273 | 1274 | 1275 | 1276 | 1277 | 1278 | V 1279 | A 1280 | 1281 | 1282 | 1283 | 1284 | 1285 | 1286 | 1287 | 1288 | 1289 | 1290 | 1291 | 1292 | 1293 | 1294 | 1295 | 1296 | 1297 | 1298 | 1299 | 1300 | 1301 | 1302 | 1303 | 1304 | 1305 | 1306 | 1307 | 1308 | 1309 | 1310 | 1311 | 1312 | 1313 | 1314 | 1315 | 1316 | 1317 | 1318 | 1319 | 1320 | 1321 | 1322 | 1323 | 1324 | 1325 | 1326 | 1327 | 1328 | 1329 | 1330 | 1331 | 1332 | 1333 | 1334 | 1335 | 1336 | 1337 | 1338 | 1339 | 1340 | 1341 | 1342 | 1343 | 1344 | 1345 | 1346 | 1347 | 1348 | 1349 | 1350 | 1351 | 1352 | 1353 | 1354 | 1355 | 1356 | 1357 | 1358 | 1359 | 1360 | 1361 | 1362 | 1363 | 1364 | 1365 | 1366 | 1367 | 1368 | 1369 | 1370 | 1371 | 1372 | 1373 | 1374 | 1375 | 1376 | 1377 | 1378 | 1379 | 1380 | 1381 | 1382 | 1383 | 1384 | 1385 | 1386 | 1387 | 1388 | 1389 | 1390 | 1391 | 1392 | 1393 | 1394 | 1395 | 1396 | 1397 | 1398 | 1399 | 1400 | 1401 | 1402 | 1403 | 1404 | 1405 | 1406 | 1407 | 1408 | 1409 | 1410 | 1411 | 1412 | 1413 | 1414 | 1415 | 1416 | 1417 | 1418 | 1419 | 1420 | 1421 | 1422 | 1423 | 1424 | 1425 | 1426 | 1427 | 1428 | 1429 | 1430 | 1431 | 1432 | 1433 | 1434 | 1435 | 1436 | 1437 | 1438 | 1439 | 1440 | 1441 | 1442 | 1443 | 1444 | 1445 | 1446 | 1447 | 1448 | 1449 | 1450 | 1451 | 1452 | 1453 | MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNKLSVDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI 1454 | 1455 | 1456 | Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms 1457 | Distributed under the Creative Commons Attribution (CC BY 4.0) License 1458 | 1459 | -------------------------------------------------------------------------------- /examples/ig/view: -------------------------------------------------------------------------------- 1 | -1.910422069964390068e-01 -5.326551199259592639e-01 8.244863191311111450e-01 2 | 9.099226597463622168e-01 2.189434646111362570e-01 3.522841670538991998e-01 3 | -3.681638186629090370e-01 8.175240438430914081e-01 4.428475206470018910e-01 4 | -------------------------------------------------------------------------------- /ig.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jordisr/cellscape/f1ee7b480440825cea2ddffc4db029bf0d240ea2/ig.png -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools>=42"] 3 | build-backend = "setuptools.build_meta" 4 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | name = cellscape 3 | version = 0.0.0 4 | author = Jordi Silvestre-Ryan 5 | description = Protein structure visualization with vector graphics cartoons 6 | long_description = file: README.md 7 | long_description_content_type = text/markdown 8 | url = https://github.com/jordisr/cellscape 9 | project_urls = 10 | Bug Tracker = https://github.com/jordisr/cellscape/issues 11 | classifiers = 12 | Programming Language :: Python :: 3 13 | Operating System :: OS Independent 14 | 15 | [options] 16 | packages = find: 17 | python_requires = >=3.6 18 | install_requires = 19 | numpy 20 | scipy 21 | matplotlib 22 | shapely<1.8 23 | biopython>=1.75 24 | nglview 25 | 26 | [options.entry_points] 27 | console_scripts = 28 | cellscape = cellscape.cli:main 29 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | if __name__ == '__main__': 4 | setup() 5 | --------------------------------------------------------------------------------