├── .gitignore
├── CITATION.bib
├── LICENSE
├── README.md
├── cellscape
├── __init__.py
├── cartoon.py
├── cli.py
├── interface.py
├── parse_alignment.py
├── parse_uniprot_xml.py
├── scene.py
├── structure.py
└── util.py
├── examples
├── cartoon.ipynb
├── ceacam5
│ ├── P06731.xml
│ └── ceacam5.pdb
└── ig
│ ├── 1igt.pdb
│ └── view
├── ig.png
├── pyproject.toml
├── setup.cfg
└── setup.py
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | *.egg-info
3 | *.DS_Store
4 | */.ipynb_checkpoints
5 |
--------------------------------------------------------------------------------
/CITATION.bib:
--------------------------------------------------------------------------------
1 | @article {Silvestre-Ryan2022.06.14.495869,
2 | author = {Silvestre-Ryan, Jordi and Fletcher, Daniel A. and Holmes, Ian},
3 | title = {CellScape: Protein structure visualization with vector graphics cartoons},
4 | elocation-id = {2022.06.14.495869},
5 | year = {2022},
6 | doi = {10.1101/2022.06.14.495869},
7 | publisher = {Cold Spring Harbor Laboratory},
8 | abstract = {Motivation: Illustrative renderings of proteins are useful aids for scientific communication and education. Nevertheless, few software packages exist to automate the generation of these visualizations. Results: We introduce CellScape, a tool designed to generate 2D molecular cartoons from atomic coordinates and combine them into larger cellular scenes. These illustrations can outline protein regions in different levels of detail. Unlike most molecular visualization tools which use raster image formats, these illustrations are represented as vector graphics, making them easily editable and composable with other graphics. Availability and Implementation: CellScape is implemented in Python 3 and freely available at https://github.com/jordisr/cellscape. It can be run as a command-line tool or interactively in a Jupyter notebook.Competing Interest StatementThe authors have declared no competing interest.},
9 | URL = {https://www.biorxiv.org/content/early/2022/06/16/2022.06.14.495869},
10 | eprint = {https://www.biorxiv.org/content/early/2022/06/16/2022.06.14.495869.full.pdf},
11 | journal = {bioRxiv}
12 | }
13 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Jordi Silvestre-Ryan
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # CellScape: Protein structure visualization with vector graphics cartoons
2 |
3 |
4 | ## Installation
5 | To run CellScape you will need:
6 | * Python 3
7 | * [PyMOL](https://pymol.org/2/) or [Chimera](https://www.cgl.ucsf.edu/chimera/) (optional, needed to orient the protein if not using the Jupyter notebook interface)
8 |
9 | CellScape and its dependencies can be installed with:
10 |
11 | ```
12 | git clone https://github.com/jordisr/cellscape
13 | cd cellscape
14 | pip install -e .
15 | ```
16 |
17 | ## Making a cartoon from a PDB structure
18 |
19 | ### Jupyter notebook interface
20 | The most interactive way of building cartoons is through the Python package interface. An example notebook is provided [here](examples/cartoon.ipynb).
21 |
22 | ### Command-line interface
23 |
24 | Cartoons can also be built in one-go from the command-line, as illustrated below.
25 |
26 | #### Generating molecular outlines
27 | The following examples should yield images similar to the top figure (from right to left):
28 |
29 | The simplest visualization is a space-filling outline of the entire structure.
30 | The `--view` option specifies the camera rotation matrix (see [below](#exporting-the-camera-view)).
31 |
32 | ```
33 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline all --save outline_all.svg
34 | ```
35 |
36 | The `--outline` option specifies which regions of the protein to outline (each residue, each chain, the entire molecule etc).
37 | In the following example we outline each chain separately.
38 | The `--depth flat` option ensures that if the chains overlap, only the portion that is visible (i.e. closer to the camera) is incorporated into the outline.
39 |
40 | ```
41 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline chain --depth flat --save outline_chain.svg
42 | ```
43 |
44 | The most realistic visualization outlines each residue separately.
45 | Shading by residue depth is used to simulate 3D lighting in a style inspired by [David Goodsell](https://pdb101.rcsb.org/motm/21).
46 |
47 | ```
48 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline residue --color_by chain --depth_shading --depth_lines --save outline_residue.svg
49 | ```
50 |
51 | Full description of all options is available by running `cellscape cartoon -h`.
52 |
53 | ### Exporting the camera view
54 | The camera orientation can be set interactively through the Jupyter notebook interface, however to use the command-line interface you will need a separate file with the rotation matrix.
55 | One option is to export it from another molecular visualization tool (currently PyMOL and Chimera formats are supported).
56 |
57 | #### PyMOL
58 | Open the protein structure in PyMOL, and choose the desired rotation (zoom is irrelevant). Next, enter `get_view` in the PyMOL console. The output should look something like this:
59 | ```
60 | ### cut below here and paste into script ###
61 | set_view (\
62 | -0.273240060, -0.516133010, 0.811750829,\
63 | 0.870557129, 0.226309016, 0.436930388,\
64 | -0.409222305, 0.826064587, 0.387488008,\
65 | 0.000000000, 0.000000000, -544.673034668,\
66 | -0.071666718, -17.390396118, 8.293336868,\
67 | 455.182373047, 634.163574219, -20.000000000 )
68 | ### cut above here and paste into script ###
69 | ```
70 | Copy and paste the indicated region (between the ### lines) into a new text file, which can be passed to CellScape.
71 |
72 | #### Chimera
73 | Open the protein structure in Chimera, and choose the desired rotation (zoom is irrelevant).
74 | Enter the command `matrixget` (if no output filename is given it will prompt you for one).
75 | This will write the rotation matrix to a file that can be understood by CellScape.
76 | It should look something like this:
77 | ```
78 | Model 0.0
79 | -0.607365 0.792409 0.0565265 9.04218
80 | -0.309318 -0.301425 0.901923 -30.7393
81 | 0.731731 0.530312 0.428181 15.789
82 | ```
83 |
84 | ## Composing cartoons into a cellular scene
85 |
86 | Re-running the above `cellscape cartoon` examples with the `--export` flag will write each cartoon's data to a Python pickle file, which can then be read by `cellscape scene`.
87 |
88 | The simplest usage of `cellscape scene` takes a list of pickled cartoons as input and lays them out in a row, preserving the relative sizes of each protein.
89 | The `--padding` option specifies how far apart each protein should be (in angstroms).
90 |
91 | ```
92 | cellscape scene --files outline_residue.pickle outline_chain.pickle outline_all.pickle --padding 10 --save scene.png
93 | ```
94 |
95 | Full description of all options is available by running `cellscape scene -h`.
96 |
--------------------------------------------------------------------------------
/cellscape/__init__.py:
--------------------------------------------------------------------------------
1 | from .cartoon import Cartoon
2 | from .structure import Structure
3 | from .interface import plot_pairs
4 |
--------------------------------------------------------------------------------
/cellscape/cartoon.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib
3 | import matplotlib.pyplot as plt
4 | from matplotlib.path import Path
5 | from matplotlib.patches import PathPatch
6 | import matplotlib.colors as mcolors
7 | from matplotlib.colors import LinearSegmentedColormap
8 | import shapely.geometry as sg
9 | import shapely.ops as so
10 | import pickle
11 | import os
12 | import sys
13 | import colorsys
14 | from Bio.PDB import *
15 |
16 | import cellscape
17 |
18 | def scale_line_width(x, lw_min, lw_max):
19 | return lw_max*(1-x) + lw_min*x
20 |
21 | def shade_from_color(color, x, range):
22 | (r, g, b, a) = mcolors.to_rgba(color)
23 | h, l, s = colorsys.rgb_to_hls(r,g,b)
24 | l_dark = max(l-range/2, 0)
25 | l_light = min(l+range/2, 1)
26 | l_new = l_dark*(1-x) + l_light*x
27 | return colorsys.hls_to_rgb(h, l_new, s)
28 |
29 | def get_sequential_colors(colors='Set1', n=1):
30 | """
31 | Sample n colors sequentially from a named matplotlib ColorMap.
32 | """
33 | # uses matplotlib.colors.ColorMap.N to distinguish continuous/discrete
34 | cmap = matplotlib.cm.get_cmap(colors)
35 | if cmap.N == 256:
36 | # continuous color map
37 | sequential_colors = [cmap(x) for x in np.linspace(0.0,1.0, n)]
38 | else:
39 | # discrete color map
40 | sequential_colors = [cmap(x) for x in range(n)]
41 | return sequential_colors
42 |
43 | def smooth_polygon(p, level=0):
44 | # somewhat arbitrary but a lot easier than interpolation
45 | if level == 0:
46 | return p.simplify(0.3).buffer(-2, join_style=1).buffer(3, join_style=1)
47 | elif level == 1:
48 | return p.simplify(1).buffer(3, join_style=1).buffer(-5, join_style=1).buffer(4, join_style=1)
49 | elif level == 2:
50 | return p.simplify(3).buffer(5, join_style=1).buffer(-9, join_style=1).buffer(5, join_style=1)
51 | elif level == 3:
52 | return p.simplify(0.1).buffer(2, join_style=1)
53 | else:
54 | return p
55 |
56 | def ring_coding(ob):
57 | # https://sgillies.net/2010/04/06/painting-punctured-polygons-with-matplotlib.html
58 | # The codes will be all "LINETO" commands, except for "MOVETO"s at the
59 | # beginning of each subpath
60 | #n = len(ob.coords)
61 | n = len(np.asarray(ob))
62 | codes = np.ones(n, dtype=Path.code_type) * Path.LINETO
63 | codes[0] = Path.MOVETO
64 | return codes
65 |
66 | def placeholder_polygon(height, buffer_width=25, origin=[0,0]):
67 | return sg.LineString([(buffer_width+origin[0],0+origin[1]),(buffer_width+origin[0],height+origin[1])]).buffer(buffer_width)
68 |
69 | def composite_polygon(cartoon, height_before, height_after, buffer_width=25):
70 | # placeholder + structure cartoon + placeholder
71 | if height_before > 0:
72 | before_poly = placeholder_polygon(height_before, origin=cartoon.bottom_coord[:2]-[buffer_width,height_before], buffer_width=buffer_width)
73 | cartoon._styled_polygons.append({"polygon":before_poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1})
74 |
75 | if height_after > 0:
76 | after_poly = placeholder_polygon(height_after, origin=cartoon.top_coord[:2]-[buffer_width, 0], buffer_width=buffer_width)
77 | cartoon._styled_polygons.append({"polygon":after_poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1})
78 |
79 | cartoon.image_height = cartoon.image_height + buffer_width + height_before + height_after
80 | cartoon.bottom_coord = cartoon.bottom_coord - np.array([0,height_before,0])
81 | cartoon.top_coord = cartoon.top_coord + np.array([0,height_after,0])
82 |
83 | def export_placeholder(height, name, fname, buffer_width=25):
84 | # placeholder by itself
85 | poly = placeholder_polygon(height, origin=[buffer_width, 0], buffer_width=buffer_width)
86 | styled_polygons = [{"polygon":poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1}]
87 |
88 | data = {'polygons':styled_polygons, 'name':name, 'width':buffer_width*2, 'height':height+buffer_width, 'start':np.array([buffer_width,0]), 'end':np.array([height+2*buffer_width,0]), 'bottom':np.array([buffer_width,0]), 'top':np.array([height+2*buffer_width,0])}
89 |
90 | with open('{}.pickle'.format(fname),'wb') as f:
91 | pickle.dump(data, f)
92 |
93 | def transform_coord(xy, translate_post=np.array([0,0]), translate_pre=np.array([0,0]), scale=1.0, flip=False):
94 | # 2d coordinates
95 | xy_ = xy
96 | if translate_pre is not None:
97 | # optionally shift coordinates before rotation
98 | xy_ += translate_pre
99 | if flip:
100 | xy_ = np.dot(xy_, np.array([[-1,0],[0,-1]]).T)
101 | #xy_ = np.dot(xy_, np.array([[-1,0],[0,-1]]))
102 | #offset_x = np.min(xy_[:,0])
103 | #offset_y = np.min(xy_[:,1])
104 | #xy_ -= np.array([offset_x, offset_y])
105 | return (xy_+translate_post)*scale
106 |
107 | def polygon_to_path(polygon, min_interior_length=40, translate_pre=np.array([0,0]), translate_post=np.array([0,0]), scale=1.0, flip=False):
108 | # generate matplotlib Path object from Shapely polygon
109 | # filter out small interior holes and apply a scaling factor if desired
110 | #
111 | # https://sgillies.net/2010/04/06/painting-punctured-polygons-with-matplotlib.html
112 | # Convert coordinates to path vertices. Objects produced by Shapely's
113 | # analytic methods have the proper coordinate order, no need to sort.
114 | interiors = list(filter(lambda x: x.length > min_interior_length, polygon.interiors))
115 | vertices = np.concatenate(
116 | [np.asarray(polygon.exterior)]
117 | + [np.asarray(r) for r in interiors])
118 | codes = np.concatenate(
119 | [ring_coding(polygon.exterior)]
120 | + [ring_coding(r) for r in interiors])
121 | transformed_vertices = transform_coord(vertices, translate_pre=translate_pre, translate_post=translate_post, scale=scale, flip=flip)
122 | return Path(transformed_vertices, codes)
123 |
124 | def plot_polygon(poly, facecolor='orange', edgecolor='k', linewidth=0.7, axes=None, zorder_mod=0, translate_pre=np.array([0,0]), translate_post=np.array([0,0]), scale=1.0, flip=False, min_area=7, linestyle='solid'):
125 | """Draw a Shapely polygon using matplotlib Patches."""
126 | if axes is None:
127 | axs = plt.gca()
128 | axs.set_aspect('equal')
129 | else:
130 | axs = axes
131 | if isinstance(poly, sg.polygon.Polygon):
132 | if poly.area > min_area:
133 | path = polygon_to_path(poly, translate_pre=translate_pre, translate_post=translate_post, scale=scale, flip=flip)
134 | patch = PathPatch(path, facecolor=facecolor, edgecolor='black', linewidth=linewidth, zorder=3+zorder_mod, linestyle=linestyle)
135 | axs.add_patch(patch)
136 | elif isinstance(poly, sg.multipolygon.MultiPolygon):
137 | for p in poly:
138 | plot_polygon(p, axes=axs, facecolor=facecolor, edgecolor=edgecolor, linewidth=linewidth, scale=scale, zorder_mod=zorder_mod, translate_pre=translate_pre, translate_post=translate_post, flip=flip)
139 |
140 | class Cartoon:
141 | """A class for molecular outlines generated by Structure class"""
142 | def __init__(self, name, polygons, residues, outline_by, back_outline, group_outlines, num_groups, dimensions, groups):
143 | # TODO currently just copying over all variables needed, should condense a little
144 | self.name = name
145 | self._polygons = polygons
146 | self.residues_flat = residues
147 | self.outline_by = outline_by
148 | self.num_groups = num_groups
149 | self.groups = groups
150 | self._back_outline = back_outline
151 | self._group_outlines = group_outlines
152 | self.dimensions = dimensions
153 |
154 | def plot(self, colors=None, axes_labels=False, color_residues_by=None, edge_color="black", line_width=0.7,
155 | depth_shading=False, depth_lines=False, shading_range=0.4, smoothing=False, do_show=True, axes=None, save=None, dpi=300, placeholder=None):
156 | """Plot styled protein cartoon
157 |
158 | Color schemes for plotting can be specified in multiple ways
159 | - named matplotlib-compatible color e.g. "red" (string)
160 | - hexadecimal color e.g. "#F8F8FF" (string)
161 | - list/tuple of colors e.g. ["red", "#F8F8FF"] (list/tuple)
162 | - dict of names to colors e.g. {"domain A": "red", "domain B":"blue"} (dict)
163 | - named discrete or continuous color scheme e.g. "Set1" (string)
164 |
165 | By default, plot() creates a new matplotlib Axes instance, though one can be passed explicitly.
166 | This mirrors Biopython's phylogeny drawing https://biopython.org/DIST/docs/api/Bio.Phylo._utils-module.html.
167 |
168 | Args:
169 | colors (optional): Explicitly pass color scheme (see description). Defaults to None.
170 | axes_labels (bool, optional): Include axes labels on plot. Defaults to False.
171 | color_residues_by (str, optional): If outlining all residues, color based on attribute (e.g. "chain"). Defaults to None.
172 | edge_color (str, optional): Color for outline edges. Defaults to "black".
173 | line_width (float, optional): Width of outlines. Defaults to 0.7.
174 | depth_shading (bool, optional): Use lighter shades for outlines closer to the front. Defaults to False.
175 | depth_lines (bool, optional): Use lighter lines for outlines closer to the front. Defaults to False.
176 | shading_range (float, optional): Dynamic range for depth_shading effect. Defaults to 0.4.
177 | smoothing (bool, optional): Apply smoothing to polygons. Defaults to False.
178 | do_show (bool, optional): Whether to show figure (otherwise just returns Axes object). Defaults to True.
179 | axes (Axes, optional): Explicitly pass matplotlib Axes object. Defaults to None.
180 | save (str, optional): Path to save cartoon image. Defaults to None.
181 | dpi (int, optional): DPI of rasterized images. Defaults to 300.
182 | placeholder (float, optional): Specify expected protein height (in angstroms). Will add a placeholder shape to add up to total height. Defaults to None.
183 |
184 | Returns:
185 | Axes: Returns matplotlib Axes if do_show=False, otherwise return None
186 | """
187 | self._styled_polygons = []
188 |
189 | if axes is None:
190 | # create a new matplotlib figure if none provided
191 | fig, axs = plt.subplots()
192 | else:
193 | assert(isinstance(axes, matplotlib.axes.Axes))
194 | axs = axes
195 |
196 | if axes_labels:
197 | axs.axis('on')
198 | axs.set_axis_on()
199 | axs.xaxis.grid(False)
200 | axs.yaxis.grid(True)
201 | axs.axes.xaxis.set_ticklabels([])
202 | else:
203 | axs.axis('off')
204 | axs.set_axis_off()
205 | #plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)
206 | #axs.xaxis.set_major_locator(plt.NullLocator())
207 | #axs.yaxis.set_major_locator(plt.NullLocator())
208 |
209 | # color schemes
210 | default_color = 'tab:blue'
211 | default_cmap = 'Set1'
212 | named_colors = [*mcolors.BASE_COLORS.keys(), *mcolors.TABLEAU_COLORS.keys(), *mcolors.CSS4_COLORS.keys(), *mcolors.XKCD_COLORS.keys()]
213 |
214 | # if outlining residues don't know number of color groups until plot is called
215 | if self.outline_by == "residue":
216 | if color_residues_by is None:
217 | num_colors_needed = 1
218 | residue_color_groups = {"all":self.residues_flat}
219 | else:
220 | residue_color_groups = cellscape.util.group_by(self.residues_flat, lambda x: x.get(color_residues_by))
221 | num_colors_needed = len(residue_color_groups)
222 | self.num_groups = num_colors_needed
223 |
224 | # parse options and get list of base colors needed for plotting
225 | if colors is None:
226 | # choose default sequential color scheme based on number of colors needed
227 | if self.num_groups == 1:
228 | sequential_colors = [default_color]
229 | elif self.num_groups <= 9:
230 | sequential_colors = get_sequential_colors(colors="Set1", n=self.num_groups)
231 | elif self.num_groups <= 10:
232 | sequential_colors = get_sequential_colors(colors="tab10", n=self.num_groups)
233 | else:
234 | sequential_colors = get_sequential_colors(colors="tab20", n=self.num_groups)
235 | else:
236 | if isinstance(colors, dict):
237 | sequential_colors = []
238 | else:
239 | if isinstance(colors, str):
240 | if self.num_groups == 1:
241 | sequential_colors = [colors]
242 | else:
243 | sequential_colors = get_sequential_colors(colors=colors, n=self.num_groups)
244 | elif isinstance(colors, (list, tuple)):
245 | if self.num_groups == 1:
246 | if (len(colors) == 4) or (len(colors) == 3):
247 | # assume single RGBA or RGB color
248 | sequential_colors = [colors]
249 | else:
250 | sequential_colors = [colors[0]]
251 | elif self.num_groups == len(colors):
252 | sequential_colors = colors
253 | else:
254 | sys.exit("Insufficient colors provided")
255 | assert(len(sequential_colors) == self.num_groups)
256 |
257 | # color scheme represented as dict that maps group names to colors
258 | if self.outline_by == "residue":
259 | if len(sequential_colors) > 0:
260 | color_map = {k:sequential_colors[i] for i,k in enumerate(residue_color_groups.keys())}
261 | else:
262 | color_map = colors
263 | elif self.outline_by == "all":
264 | color_map = {None:sequential_colors[0]}
265 | else:
266 | if len(sequential_colors) > 0:
267 | color_map = {k:sequential_colors[i] for i,k in enumerate(self.groups)}
268 | else:
269 | color_map = colors
270 | assert(isinstance(color_map, dict))
271 |
272 | if self._back_outline is not None:
273 | if smoothing:
274 | smoothed_poly = smooth_polygon(self._back_outline, level=3)
275 | plot_polygon(smoothed_poly, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=-1)
276 | self._styled_polygons.append({"polygon":smoothed_poly, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width})
277 | else:
278 | plot_polygon(self._back_outline, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=-1)
279 | self._styled_polygons.append({"polygon":self._back_outline, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width})
280 |
281 | if len(self._group_outlines) > 0:
282 | for p in self._group_outlines:
283 | plot_polygon(p, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=2)
284 | self._styled_polygons.append({"polygon":p, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width, "zorder":2})
285 |
286 | # TODO optionally show placeholder for unstructured regions
287 | if placeholder is not None:
288 | placeholder_poly = placeholder_polygon(placeholder-self.image_height, origin=[self.image_width/2-25, self.image_height+25])
289 | self._styled_polygons.append({"polygon":placeholder_poly, "facecolor":"None", "shade":0.5, "edgecolor":'black', "linewidth":1})
290 | plot_polygon(placeholder_poly, facecolor="#eeeeee", scale=1.0, axes=axs, edgecolor='black', linewidth=1, zorder_mod=-1)
291 | self.image_height = 25 + placeholder
292 |
293 | # main plotting loop
294 | for i, p in enumerate(self._polygons):
295 | if smoothing:
296 | poly_to_draw = smooth_polygon(p[1], level=3)
297 | else:
298 | poly_to_draw = p[1]
299 |
300 | # look up color for polygon
301 | if self.outline_by == "residue":
302 | key_for_color = p[0].get(color_residues_by)
303 | else:
304 | key_for_color = p[0].get(self.outline_by)
305 | fc = color_map.get(key_for_color, sequential_colors[0])
306 | base_fc = fc # store original color as well as shading
307 |
308 | shade_value = None
309 | if depth_shading:
310 | #fc = shade_from_color(fc, i/len(self._polygons), range=shading_range)
311 | shade_value = p[0].get("depth", 0.5)
312 | fc = shade_from_color(fc, shade_value, range=shading_range)
313 | if depth_lines:
314 | shade_value = p[0].get("depth", 0.5)
315 | lw = scale_line_width(shade_value, 0, 0.5)
316 | else:
317 | lw = line_width
318 | plot_polygon(poly_to_draw, facecolor=fc, axes=axs, edgecolor=edge_color, linewidth=lw)
319 | self._styled_polygons.append({"polygon":poly_to_draw, "facecolor":fc, "edgecolor":edge_color, "linewidth":lw, "shade":shade_value, "base_fc":base_fc})
320 |
321 | axs.set_aspect('equal')
322 | axs.margins(0,0)
323 | self._axes= axs
324 |
325 | if save is not None:
326 | file_ext = os.path.splitext(save)[1].lower()
327 | assert file_ext in ['.png','.pdf','.svg','.ps'], "Image file extension not supported"
328 | #plt.gcf().savefig(save, dpi=dpi, transparent=True, pad_inches=0, bbox_inches='tight')
329 | fig.savefig(save, dpi=dpi, transparent=True, pad_inches=0, bbox_inches='tight')
330 |
331 | if do_show:
332 | plt.show()
333 | else:
334 | return axs
335 |
336 | def export(self, fname):
337 | """Export a pickle object containing styled polygons than can be combined using ``cellscape scene``"""
338 | assert(len(self._styled_polygons) > 0)
339 |
340 | data = {'polygons':self._styled_polygons, 'name':self.name}
341 | for k in ['width', 'height', 'start', 'end', 'top', 'bottom']:
342 | data[k] = self.dimensions[k]
343 |
344 | with open('{}.pickle'.format(fname),'wb') as f:
345 | pickle.dump(data, f)
346 |
347 | print("Exported polygon data to {}.pickle".format(fname), file=sys.stderr)
348 |
349 | def make_cartoon(args):
350 | """Build a cartoon in one-go. Called when running ``cellscape cartoon``."""
351 |
352 | # accept list of chains for backwards-compatibility
353 | # convert to string e.g. ABCD for current interface
354 | # can be an issue if chains have more than one letter
355 | if len(args.chain) == 1:
356 | chain = args.chain[0]
357 | else:
358 | chain = ''.join(args.chain)
359 |
360 | molecule = cellscape.Structure(args.pdb, chain=chain, model=args.model, uniprot=args.uniprot, view=False)
361 |
362 | # open first line to identify view file
363 | if args.view is not None:
364 | with open(args.view) as view_f:
365 | first_line = view_f.readline()
366 | if first_line[:8] == 'set_view':
367 | molecule.load_pymol_view(args.view)
368 | elif first_line[:5] == 'Model':
369 | molecule.load_chimera_view(args.view)
370 | else:
371 | molecule.load_view_matrix(args.view)
372 | else:
373 | # if no view matrix provided just use default PDB orientation for now
374 | molecule.view_matrix = np.identity(3)
375 |
376 | cartoon = molecule.outline(
377 | args.outline_by,
378 | depth=args.depth,
379 | radius=args.radius,
380 | only_annotated=args.only_annotated,
381 | only_ca=args.only_ca,
382 | depth_contour_interval=args.depth_contour_interval,
383 | back_outline=args.back_outline
384 | )
385 |
386 | if args.outline_by == "residue" and args.color_by != "same":
387 | color_residues_by = args.color_by
388 | else:
389 | color_residues_by = None
390 |
391 | if len(args.colors) > 0:
392 | colors = args.colors
393 | else:
394 | colors = None
395 |
396 | cartoon.plot(
397 | do_show=False,
398 | axes_labels=args.axes,
399 | colors=colors,
400 | color_residues_by=color_residues_by,
401 | dpi=args.dpi,
402 | save=args.save,
403 | depth_shading=args.depth_shading,
404 | depth_lines=args.depth_lines,
405 | edge_color=args.edge_color,
406 | line_width=args.line_width
407 | )
408 |
409 | if args.export:
410 | cartoon.export(os.path.splitext(args.save)[0])
411 |
--------------------------------------------------------------------------------
/cellscape/cli.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from cellscape.cartoon import make_cartoon
3 | from cellscape.scene import make_scene
4 |
5 | def main():
6 | # set up argument parser
7 | parser = argparse.ArgumentParser(description='CellScape: Protein structure visualization with vector graphics cartoons')
8 | subparsers = parser.add_subparsers(dest="command")
9 | subparsers.required=True
10 |
11 | # cartoon
12 | parser_cartoon = subparsers.add_parser('cartoon', help="Make a cartoon from a protein structure", formatter_class=argparse.ArgumentDefaultsHelpFormatter, description="Make a cartoon from a protein structure")
13 | parser_cartoon.set_defaults(func=make_cartoon)
14 | # input/output options
15 | parser_cartoon_io = parser_cartoon.add_argument_group('input/output options')
16 | parser_cartoon_io.add_argument('--pdb', help='Protein coordinates file (must be .pdb/.ent/.cif/.mcif)', required=True)
17 | parser_cartoon_io.add_argument('--model', type=int, default=0, help='Model number in PDB to load')
18 | parser_cartoon_io.add_argument('--chain', default=['all'], help='Chain(s) in structure to outline', nargs='+')
19 | parser_cartoon_io.add_argument('--view', help='Camera rotation matrix (saved from cellscape, PyMOL get_view, or Chimera matrixget)')
20 | parser_cartoon_io.add_argument('--uniprot', help='UniProt XML file to parse for sequence/domain/topology information')
21 | parser_cartoon_io.add_argument('--save', default='out.svg', help='Image output file (valid formats are png/pdf/svg/ps)')
22 | parser_cartoon_io.add_argument('--export', default=False, action="store_true", help='Export Python object with structural information')
23 | # outline building options
24 | parser_cartoon_outline = parser_cartoon.add_argument_group('outline-building options')
25 | parser_cartoon_outline.add_argument('--only_annotated', action='store_true', default=False, help='Ignore regions without UniProt annotations')
26 | parser_cartoon_outline.add_argument('--only_ca', action='store_true', default=False, help='Only use alpha carbons for outline')
27 | parser_cartoon_outline.add_argument('--outline_by', '--outline', default='all', choices=['all', 'chain', 'domain', 'topology', 'residue'], help='Outline protein regions')
28 | parser_cartoon_outline.add_argument('--depth', default=None, choices=['flat', 'contours', None], help='Represent depth with flat occluded outlines or contour slices')
29 | parser_cartoon_outline.add_argument('--depth_contour_interval', type=float, default=3, help='Width of depth contour bins in angstroms (if --depth contours)')
30 | parser_cartoon_outline.add_argument('--radius', default=1.5, help='Atomic radius, in angstroms', type=float)
31 | parser_cartoon_outline.add_argument('--back_outline', action='store_true', help='Outline entire molecule separately from group outlines')
32 |
33 | # visual style options
34 | parser_cartoon_style = parser_cartoon.add_argument_group('styling options')
35 | parser_cartoon_style.add_argument('--axes', action='store_true', default=False, help='Draw x and y axes around molecule')
36 | parser_cartoon_style.add_argument('--colors', default=[], nargs='+', help='Specify color scheme for protein (list of colors or matplotlib named color map)')
37 | parser_cartoon_style.add_argument('--edge_color', default='black', help='Edge color')
38 | parser_cartoon_style.add_argument('--line_width', default=0.7, type=float, help='Line width')
39 | parser_cartoon_style.add_argument('--color_by', default='same', choices=['same', 'chain', 'domain', 'topology'], help='Color residues by attribute (if --outline_by residues is selected)')
40 | parser_cartoon_style.add_argument('--depth_shading', action='store_true', default=False, help='Shade regions darker in the back to simulate depth')
41 | parser_cartoon_style.add_argument('--depth_lines', action='store_true', default=False, help='Use thicker lines the back to simulate depth')
42 | parser_cartoon_style.add_argument('--dpi', type=int, default=300, help='DPI to use if exporting to a raster format like PNG')
43 |
44 | # scene
45 | parser_scene = subparsers.add_parser('scene', help="Compose multiple cartoons together", description="Compose multiple cartoons together", formatter_class=argparse.ArgumentDefaultsHelpFormatter)
46 | parser_scene.set_defaults(func=make_scene)
47 | # input/output options
48 | parser_scene_io = parser_scene.add_argument_group('input/output options')
49 | parser_scene_io.add_argument('--files', nargs='+', help='Pickled objects to load')
50 | parser_scene_io.add_argument('--save', default='out.svg', help='Image output path (valid formats are png/pdf/svg/ps)')
51 | # visual style options
52 | parser_scene_style = parser_scene.add_argument_group('styling options')
53 | parser_scene_style.add_argument('--offsets', nargs='+', default=[], help='Vertical offsets for each molecule specified manually')
54 | parser_scene_style.add_argument('--padding', type=int, default=0, help='Horizontal padding to add between each molecule (in angstroms)')
55 | parser_scene_style.add_argument('--axes', action='store_true', default=False, help='Draw x and y axes')
56 | parser_scene_style.add_argument('--membrane', default=None, choices=[None, 'arc', 'flat', 'wave'], help='Draw membrane on X axis')
57 | parser_scene_style.add_argument('--membrane_thickness', default=40, type=float, help='Thickness of the membrane (in angstroms)')
58 | parser_scene_style.add_argument('--membrane_lipids', action='store_true', help='Draw lipid head groups')
59 | parser_scene_style.add_argument('--no_membrane_offset', action='store_true', help=argparse.SUPPRESS) # don't adjust y-axis to position bottom of structure in membrane
60 | parser_scene_style.add_argument('--order_by', default='input', choices=['input', 'random', 'height','top', 'membrane'], help='How to order proteins in scene')
61 | parser_scene_style.add_argument('--recolor', action='store_true', default=False, help='Recolor proteins in scene')
62 | parser_scene_style.add_argument('--recolor_cmap', default=['hsv'], nargs='+', help='Named cmap or color scheme for re-coloring')
63 | parser_scene_style.add_argument('--dpi', type=int, default=300, help='DPI to use if exporting to a raster format like PNG')
64 | parser_scene_style.add_argument('--use_placeholders', action='store_true', help=argparse.SUPPRESS)
65 | parser_scene_style.add_argument('--labels', action='store_true', default=False, help=argparse.SUPPRESS) # still testing
66 | parser_scene_style.add_argument('--label_size', type=float, default=0.5, help=argparse.SUPPRESS) # fraction of the screen to use for labels
67 | parser_scene_style.add_argument('--label_orientation', choices=["vertical", "horizontal", "diagonal"], default="vertical", help=argparse.SUPPRESS)
68 | parser_scene_style.add_argument('--label_position', choices=["above", "below"], default="below", help=argparse.SUPPRESS)
69 | parser_scene_style.add_argument('--fig_height', type=float, default=11, help=argparse.SUPPRESS) # passed to figsize
70 | parser_scene_style.add_argument('--fig_width', type=float, default=8.5, help=argparse.SUPPRESS) # passed to figsize
71 | # for simulating according to stoichiometry
72 | parser_scene_sim = parser_scene.add_argument_group('random scene options')
73 | parser_scene_sim.add_argument('--csv', help='Table of protein information')
74 | parser_scene_sim.add_argument('--seed', type=int, help='Random seed for scene generation')
75 | parser_scene_sim.add_argument('--sample_from', help='Column to use for sampling (with --csv)', default='stoichiometry')
76 | parser_scene_sim.add_argument('--num_mol', type=int, help='Number of molecules to sample for scene', default=0)
77 | parser_scene_sim.add_argument('--background', action='store_true', default=False, help='Add background plane using same frequencies')
78 |
79 | # parse arguments and call corresponding command
80 | args = parser.parse_args()
81 | args.func(args)
82 |
--------------------------------------------------------------------------------
/cellscape/interface.py:
--------------------------------------------------------------------------------
1 | """
2 | Testing code for visualizing protein interactions across membrane interfaces
3 | """
4 |
5 | import numpy as np
6 | import matplotlib
7 | import matplotlib.pyplot as plt
8 | import matplotlib.patches as mpatches
9 | import matplotlib.lines as mlines
10 | from matplotlib import lines, text, cm
11 | from matplotlib.colors import LinearSegmentedColormap, ListedColormap
12 | from scipy import interpolate
13 | import shapely.geometry as sg
14 | import shapely.ops as so
15 | import os, sys, argparse, pickle
16 | import glob
17 | import csv
18 |
19 | from cellscape.cartoon import plot_polygon, shade_from_color
20 |
21 | class MembraneInterface:
22 | """
23 | just use piecemeal flat + connector, no lipids
24 | """
25 | def __init__(self, axes, lengths, bottom_y, top_y, thickness=40, padding=10, base_y=0):
26 |
27 | # axes
28 | self.axes = axes
29 |
30 | # membrane thickness (angstroms)
31 | self.thickness = thickness
32 |
33 | # padding between each segment, scalar
34 | self.padding = padding
35 |
36 | # length of each segment, array
37 | self.lengths = lengths
38 |
39 | # y coordinate of each bottom membrane segment, array
40 | self.bottom_y = bottom_y
41 |
42 | # y coordinate of each top membrane segment, array
43 | self.top_y = top_y
44 |
45 | assert(len(lengths) == len(top_y))
46 | assert(len(top_y) == len(bottom_y))
47 |
48 | def draw(self, color='#C4E7EF'):
49 | if isinstance(color, (list,tuple)):
50 | top_color = color[0]
51 | bot_color = color[1]
52 | else:
53 | top_color = color
54 | bot_color = color
55 |
56 | membrane_x = []
57 | membrane_bot_y = []
58 | membrane_top_y = []
59 | x_cum = 0
60 | for i, w in enumerate(self.lengths):
61 | membrane_x.append(x_cum)
62 | x_cum += w
63 | membrane_x.append(x_cum)
64 | x_cum += self.padding
65 |
66 | membrane_bot_y.append(self.bottom_y[i])
67 | membrane_bot_y.append(self.bottom_y[i])
68 |
69 | membrane_top_y.append(self.top_y[i])
70 | membrane_top_y.append(self.top_y[i])
71 |
72 | membrane_x = np.array(membrane_x)
73 | membrane_bot_y = np.array(membrane_bot_y)
74 | membrane_top_y = np.array(membrane_top_y)
75 |
76 | # plot bottom membrane
77 | self.axes.fill_between(membrane_x, membrane_bot_y, membrane_bot_y-self.thickness, color=bot_color, zorder=1.6, capstyle='round', joinstyle='miter')
78 |
79 | # plot top membrane
80 | self.axes.fill_between(membrane_x, membrane_top_y, membrane_top_y+self.thickness, color=top_color, zorder=1.6, capstyle='round', joinstyle='round')
81 |
82 | def plot_pairs(pairs, labels=None, thickness=40, padding=50, align="bottom", membrane_color="#E8E8E8", colors=None, axes=True, linewidth=None, sort=False):
83 |
84 | assert align in ["bottom", "middle", "top"]
85 | assert sort in [False, "height", "horseshoe"]
86 |
87 | if labels is not None:
88 | assert len(labels) == len(pairs)
89 |
90 | # optionally sort proteins by height
91 | if sort == "height":
92 | pair_heights = np.array(list(map(lambda x: x[0]['height']+x[1]['height'], pairs)))
93 | sorted_order = np.argsort(pair_heights)[::-1]
94 | pairs_ = [pairs[i] for i in sorted_order]
95 | if labels is not None:
96 | labels_ = [labels[i] for i in sorted_order]
97 |
98 | elif sort == "horseshoe":
99 | pair_heights = np.array(list(map(lambda x: x[0]['height']+x[1]['height'], pairs)))
100 | sorted_order = np.argsort(pair_heights)[::-1]
101 | new_order = np.zeros_like(sorted_order)
102 | first_half = sorted_order[::2]
103 | if len(sorted_order) % 2:
104 | second_half = sorted_order[-2::-2]
105 | else:
106 | second_half = sorted_order[::-2]
107 | new_order[:len(first_half)] = first_half
108 | new_order[len(first_half):] = second_half
109 |
110 | pairs_ = [pairs[i] for i in new_order]
111 | if labels is not None:
112 | labels_ = [labels[i] for i in new_order]
113 | if colors is not None:
114 | colors_ = [colors[i] for i in new_order]
115 |
116 | else:
117 | pairs_ = pairs[:]
118 | if labels is not None:
119 | labels_ = labels[:]
120 |
121 | fig, axs = plt.subplots(figsize=(11,8.5))
122 | axs.set_aspect('equal')
123 |
124 | if axes:
125 | axs.xaxis.grid(False)
126 | axs.yaxis.grid(False)
127 | axs.axes.xaxis.set_ticklabels([])
128 | else:
129 | plt.axis('off')
130 | plt.gca().set_axis_off()
131 | plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)
132 | plt.margins(0,0)
133 | plt.gca().xaxis.set_major_locator(plt.NullLocator())
134 | plt.gca().yaxis.set_major_locator(plt.NullLocator())
135 |
136 | # set font options
137 | font_options = {'family':'Arial', 'weight':'normal', 'size':10}
138 | matplotlib.rc('font', **font_options)
139 |
140 | assert(align in ["top","bottom","middle"])
141 |
142 | # get all the interface heights
143 | all_heights = np.array([p[0]['height']+p[1]['height'] for p in pairs_])
144 | max_height = np.max(all_heights)
145 |
146 | # calculate membrane geometry
147 | bot_y = []
148 | top_y = []
149 | lengths = []
150 | for p in pairs_:
151 | o1, o2 = p
152 | if align == "bottom":
153 | top_y.append(o1['height']+o2['height'])
154 | bot_y.append(0)
155 | elif align == "top":
156 | top_y.append(max_height)
157 | bot_y.append(max_height-(o1['height']+o2['height']))
158 | elif align == "middle":
159 | top_y.append(max_height-(max_height-o1['height']-o2['height'])/2)
160 | bot_y.append((max_height-o1['height']-o2['height'])/2)
161 | lengths.append(max(o1['width'], o2['width']))
162 | top_y = np.array(top_y)
163 | bot_y = np.array(bot_y)
164 | lengths = np.array(lengths)
165 |
166 | total_width = np.sum(lengths)+len(pairs_)*padding
167 |
168 | # draw membrane
169 | mem = MembraneInterface(axes=axs, lengths=lengths, bottom_y=bot_y, top_y=top_y, padding=padding, thickness=thickness)
170 | mem.draw(color=membrane_color)
171 |
172 | # draw proteins
173 | w=0
174 | for i, o in enumerate(pairs_):
175 | o1, o2 = o
176 | this_width = max(o1['width'], o2['width'])
177 | this_height = o1['height']+o2['height']
178 | y_offset = bot_y[i]
179 | if colors is not None:
180 | color_top = colors_[i][1]
181 | color_bot = colors_[i][0]
182 | else:
183 | color_top = None
184 | color_bot = None
185 |
186 | # TODO rotation needs to be a little cleaner, making some assumptions here
187 | for p in o1["polygons"]:
188 | xy = np.array(o1['polygons'][0]['polygon'].exterior.xy) # assuming first polygon is outline, TODO fix
189 | recenter = np.array([np.min(xy[:,0]), np.min(xy[:,1])])
190 | facecolor = shade_from_color(color_bot, p.get("shade", 0.5), range=p.get("shading_range", 0.4))
191 | plot_polygon(p['polygon'], axes=axs, translate_pre=[-recenter[0]+w+(this_width-o1['width'])/2, -recenter[1]+y_offset], flip=False, facecolor=facecolor, linewidth=p['linewidth'])
192 |
193 | for p in o2["polygons"]:
194 | xy = np.array(o2['polygons'][0]['polygon'].exterior.xy) # assuming first polygon is outline, TODO fix
195 | recenter = np.array([np.min(xy[:,0]), np.min(xy[:,1])])
196 | facecolor = shade_from_color(color_top, p.get("shade", 0.5), range=p.get("shading_range", 0.4))
197 | plot_polygon(p['polygon'], axes=axs, translate_pre=-1*recenter, translate_post=[w+(this_width+o2['width'])/2, y_offset+10+o2['height']+o1['height']], flip=True, facecolor=facecolor, linewidth=p['linewidth'])
198 | #plot_polygon(o2["polygons"][0]['polygon'], axes=axs, offset=[w+(this_width-o2['width'])/2, y_offset+o1['height']], flip=True, facecolor=o2["polygons"][0]['facecolor'], linewidth=linewidth)
199 |
200 | if labels_ is not None:
201 | angstroms_per_inch = total_width/11
202 | fontsize = total_width*0.5/len(pairs_)/angstroms_per_inch*72
203 | font_inches = fontsize/72
204 | plt.text(w+this_width/2, y_offset+this_height+50, labels_[i][1], rotation=90, fontsize=fontsize, va='bottom', ha='center')
205 | plt.text(w+this_width/2, y_offset-1.1*angstroms_per_inch*font_inches, labels_[i][0], rotation=90, fontsize=fontsize, va='top', ha='center')
206 |
207 | w += this_width+padding
208 |
209 | fig.set_size_inches(18.5, 10.5)
210 | return fig
211 |
--------------------------------------------------------------------------------
/cellscape/parse_alignment.py:
--------------------------------------------------------------------------------
1 | from Bio import pairwise2
2 | from Bio.PDB import *
3 | from Bio.Align import substitution_matrices, PairwiseAligner
4 | import numpy as np
5 |
6 | def identity_from_alignment(a):
7 | s1 = np.array(list(a[0]))
8 | s2 = np.array(list(a[1]))
9 | return np.sum(s1 == s2) / len(np.where( s1 != '-')[0])
10 |
11 | def overlap_from_alignment(a):
12 | s1 = np.array(list(a[0]))
13 | s2 = np.array(list(a[1]))
14 | s1_nogap = np.where( s1 != '-')
15 | s2_nogap = np.where( s2 != '-')
16 | s1_start_align = np.min(s1_nogap)
17 | s1_end_align = np.max(s1_nogap)
18 | s2_start_align = np.min(s2_nogap)
19 | s2_end_align = np.max(s2_nogap)
20 | overlap_align = (max(s1_start_align, s2_start_align), min(s1_end_align, s2_end_align))
21 | return(
22 | np.where(s1_nogap == overlap_align[0])[1][0],
23 | np.where(s1_nogap == overlap_align[1])[1][0],
24 | np.where(s2_nogap == overlap_align[0])[1][0],
25 | np.where(s2_nogap == overlap_align[1])[1][0]) + np.array([1,1,1,1])
26 |
27 | def align_pair(s1, s2):
28 | # wrapper for biopython pairwise alignment
29 | blosum62 = substitution_matrices.load("BLOSUM62")
30 | return pairwise2.align.localds(s1, s2, blosum62, -3, -3, one_alignment_only=True)[0]
31 |
32 | def align_all_pairs(s):
33 | for i in range(len(s)):
34 | for j in range(i+1, len(s)):
35 | s1 = s[i][1]
36 | s2 = s[j][1]
37 | alignments = align_pair(s1,s2)
38 | print(s[i][0], len(s1), s[j][0], len(s2), *overlap_from_alignment(alignments[0]), identity_from_alignment(alignments[0]))
39 |
40 | def sequence_overlap(s1, s2):
41 | aligner = PairwiseAligner()
42 | aligner.mode = "global"
43 | aligner.substitution_matrix = substitution_matrices.load("BLOSUM62")
44 | alignments = aligner.align(s1, s2)
45 | alignment = list(alignments)[0]
46 | alignment_bounds = alignment.aligned
47 | return np.array([alignment_bounds[0][0][0], alignment_bounds[0][-1][1], alignment_bounds[-1][0][0], alignment_bounds[-1][-1][1]]) + np.array([1,0,1,0])
48 |
49 | if __name__ == '__main__':
50 | a1 = (
51 | '---------AAAAAAAABBBBBBB',
52 | 'BBBBBBBBBAAAAAAAA-------'
53 | )
54 | print(overlap_from_alignment(a1))
55 | print(sequence_overlap("AAAAAAAABBBBBBB", "BBBBBBBBBAAAAAAAA"))
56 |
57 | a2 = (
58 | 'BBBBBBBBBAAAAABBBBBBB',
59 | '---------AAAAA-------'
60 | )
61 | print(overlap_from_alignment(a2))
62 | print(sequence_overlap("BBBBBBBBBAAAAABBBBBBB", "AAAAA"))
63 |
64 | a3 = (
65 | 'BBBBBBBBBAAAAABBBBAAAABBB',
66 | '---------AAAAA----AAAA----'
67 | )
68 | print(overlap_from_alignment(a3))
69 | print(sequence_overlap("BBBBBBBBBAAAAABBBBAAAABBB", "AAAAAAAAA"))
70 |
71 | s1 = "AACDAEECDAECDEADAEEAEADADCADEAEAECDDAEACDAECDA"
72 | s2 = "ACDAEECDADEADWAEEAEADAWDCADEAEAECGDDAEAGCDACDA"
73 | a = align_pair(s1,s2)
74 | print(a[0],a[1])
75 |
--------------------------------------------------------------------------------
/cellscape/parse_uniprot_xml.py:
--------------------------------------------------------------------------------
1 | import xml.etree.ElementTree as ET
2 | import os
3 | import urllib
4 | import sys
5 | import argparse
6 | import json
7 |
8 | class UniprotRecord:
9 | """Data structure to hold Uniprot annotations for single sequence."""
10 | def __init__(self, id, name=None):
11 | self.id = id
12 | self.name = name
13 | self.domains = []
14 | self.topology = []
15 | self.ptm = {}
16 | self.sequence = ""
17 |
18 | def add_domain(self,name, start, end):
19 | self.domains.append((name, int(start), int(end)))
20 | def add_topology(self, name, start, end):
21 | self.topology.append((name, int(start), int(end)))
22 | def add_ptm(self, name, start, end):
23 | self.ptm[name] = (int(start), int(end))
24 | def process_segments(self):
25 | if 'chain' in self.ptm:
26 | (self.chain_start, self.chain_end) = self.ptm['chain']
27 | else:
28 | (self.chain_start, self.chain_end) = (1, 99999)
29 |
30 | last = self.chain_start
31 | self.domain_segments = []
32 |
33 | for domain in self.domains:
34 | if (domain[1] - last) > 1:
35 | self.domain_segments.append(('None',last, domain[1]-1))
36 |
37 | self.domain_segments.append(domain)
38 | last = domain[2]
39 |
40 | if (self.chain_end - last) > 1:
41 | self.domain_segments.append(('None',last, self.chain_end))
42 |
43 | def parse_xml(xmlpath):
44 | """
45 | Parse Uniprot XML file to return list of UniprotRecord objects.
46 | """
47 | tree = ET.parse(xmlpath)
48 | root = tree.getroot()
49 | ns = '{http://uniprot.org/uniprot}'
50 | sequences = []
51 |
52 | for entry in tree.iter(tag=ns+'entry'):
53 | accession = entry.find(ns+'accession').text
54 | gene = entry.find(ns+'name').text
55 | sequence = UniprotRecord(accession, gene)
56 |
57 | for feature in entry.iter(tag=ns+'feature'):
58 |
59 | # look for transmembrane regions
60 | if feature.get('type') in ('topological domain','transmembrane region'):
61 | try:
62 | begin = feature.find(ns+'location').find(ns+'begin').get('position')
63 | end = feature.find(ns+'location').find(ns+'end').get('position')
64 | feature_description = feature.get('description').split(';')[0]
65 | sequence.add_topology(feature_description, begin, end)
66 | except:
67 | pass
68 |
69 | # look for protein domains
70 | elif feature.get('type') == 'domain':
71 | try:
72 | begin = feature.find(ns+'location').find(ns+'begin').get('position')
73 | end = feature.find(ns+'location').find(ns+'end').get('position')
74 | sequence.add_domain(feature.get('description'),begin,end)
75 | except:
76 | pass
77 |
78 | # look for signal peptide and mature chain
79 | elif feature.get('type') in ('chain', 'propeptide','signal peptide'):
80 | try:
81 | begin = feature.find(ns+'location').find(ns+'begin').get('position')
82 | end = feature.find(ns+'location').find(ns+'end').get('position')
83 | sequence.add_ptm(feature.get('type'), begin, end)
84 | except:
85 | pass
86 |
87 | sequence.process_segments()
88 | sequences.append(sequence)
89 |
90 | for seq in entry.iter(tag=ns+'sequence'):
91 | if seq.text is not None:
92 | sequence.sequence = seq.text.replace('\n','')
93 |
94 | return(sequences)
95 |
96 | def split_uniprot_xml(xmlpath, outpath='.'):
97 | """Take a multi-record XML file and split to one XML file per entry."""
98 | tree = ET.parse(xmlpath)
99 | root = tree.getroot()
100 | ns = '{http://uniprot.org/uniprot}'
101 | for entry in tree.iter(tag=ns+'entry'):
102 | accession = entry.find(ns+'accession')
103 | with open("{}/{}.xml".format(outpath, accession.text), "w") as xml_out:
104 | xml_out.write(ET.tostring(entry).decode('utf-8'))
105 |
106 | def download_uniprot_record(record, fileformat, outdir):
107 | """Download record from Uniprot server."""
108 | file_path = "{}.{}".format(record, fileformat)
109 | out_path = os.path.join(outdir, file_path)
110 | if not os.path.exists(out_path):
111 | print("Requesting {}".format(out_path))
112 | urllib.request.urlretrieve("https://www.uniprot.org/uniprot/{}".format(file_path), out_path)
113 | else:
114 | pass
115 | #print("UniProt file already there", file=sys.stderr)
116 | return out_path
117 |
118 | if __name__ == "__main__":
119 |
120 | parser = argparse.ArgumentParser(description='Parse UniProt XML file', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
121 | parser.add_argument('--xml', help='Input XML file', required=True)
122 | parser.add_argument('--json', action='store_true', default=False, help='Output relevant information in JSON')
123 | args = parser.parse_args()
124 |
125 | uniprot = parse_xml(args.xml)
126 |
127 | for entry in uniprot:
128 | if args.json:
129 | data = {
130 | 'name': entry.name,
131 | 'sequence': entry.sequence,
132 | 'domains': entry.domain_segments,
133 | 'topology': entry.topology
134 | }
135 | print(json.dumps(data, indent=2))
136 | else:
137 | with open(entry.name+'.domains.csv','w') as f:
138 | f.write(','.join(['res_start','res_end','description'])+'\n')
139 | for domain in entry.domain_segments:
140 | f.write(','.join(map(str,[domain[1],domain[2],domain[0]]))+'\n')
141 |
142 | with open(entry.name+'.topology.csv','w') as f:
143 | f.write(','.join(['res_start','res_end','description'])+'\n')
144 | for domain in entry.topology:
145 | f.write(','.join(map(str,[domain[1],domain[2],domain[0]]))+'\n')
146 |
--------------------------------------------------------------------------------
/cellscape/scene.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib
3 | import matplotlib.pyplot as plt
4 | import matplotlib.patches as mpatches
5 | import matplotlib.lines as mlines
6 | from matplotlib import lines, text, cm
7 | from matplotlib.colors import LinearSegmentedColormap, ListedColormap
8 | from scipy import interpolate
9 | import os
10 | import sys
11 | import pickle
12 | import csv
13 |
14 | from cellscape.cartoon import plot_polygon, shade_from_color, placeholder_polygon
15 |
16 | def rotation_matrix_2d(theta):
17 | """Return matrix to rotate 2D coordinates by angle theta."""
18 | return np.array([[np.cos(theta), -1*np.sin(theta)],[np.sin(theta), np.cos(theta)]])
19 |
20 | class Membrane:
21 | def __init__(self, width, thickness, axes, base_y=0):
22 | self.width = width
23 | self.thickness = thickness
24 | self.y = base_y
25 | self.axes = axes
26 | # other constants
27 | self.head_radius = 4
28 |
29 | def flat(self):
30 | self.height_at = lambda x: self.y + self.thickness/2
31 |
32 | def sinusoidal(self, frequency=1, amplitude=1):
33 | self.height_at = lambda x: self.y + self.thickness/2*amplitude*np.sin(x*frequency*2*np.pi/self.width)
34 |
35 | def interpolate(self, x, y, kind='linear'):
36 | #self.height_at = interpolate.interp1d(x, y, kind=kind)
37 | self.height_fn = interpolate.PchipInterpolator(x, y)
38 | self.height_at = lambda x: self.height_fn(x) + self.y
39 |
40 | def draw(self, lipids=False):
41 |
42 | membrane_x = np.linspace(0,self.width,200)
43 | membrane_y_top = np.array([self.height_at(x) for x in membrane_x])
44 | membrane_y_bot = membrane_y_top-self.thickness
45 |
46 | if lipids:
47 | membrane_box_fc='#C4E7EF'
48 | lipid_head_fc='#D6D1EF'
49 | lipid_tail_fc='#A3DCEF'
50 | plt.fill_between(membrane_x, membrane_y_top-self.head_radius, membrane_y_bot+self.head_radius, color=membrane_box_fc, zorder=1.6)
51 | num_lipids = int(self.width/(2*self.head_radius))
52 | for i in range(num_lipids):
53 | membrane_y = self.height_at(i/num_lipids*self.width)
54 | self.axes.add_line(mlines.Line2D([i*self.head_radius*2, i*self.head_radius*2], [-4+membrane_y, -18+membrane_y], zorder=1.7, c=lipid_tail_fc, linewidth=self.head_radius*.7, alpha=1, solid_capstyle='round'))
55 | self.axes.add_line(mlines.Line2D([i*self.head_radius*2, i*self.head_radius*2], [-38+membrane_y, -24+membrane_y], zorder=1.7, c=lipid_tail_fc, linewidth=self.head_radius*.7, alpha=1, solid_capstyle='round'))
56 | self.axes.add_patch(mpatches.Circle((i*self.head_radius*2, -1*self.head_radius+membrane_y), self.head_radius, facecolor=lipid_head_fc, ec='k', linewidth=0.3, alpha=1, zorder=2))
57 | self.axes.add_patch(mpatches.Circle((i*self.head_radius*2, -1*self.thickness+membrane_y), self.head_radius, facecolor=lipid_head_fc, ec='k', linewidth=0.3, alpha=1, zorder=2))
58 |
59 | else:
60 | membrane_box_fc='#C8C8C8'
61 | plt.fill_between(membrane_x, membrane_y_top, membrane_y_bot, color=membrane_box_fc, zorder=1.6)
62 |
63 | def make_scene(args):
64 | """Build a scene in one-go. Called when running ``cellscape scene``."""
65 |
66 | assert args.save.split('.')[-1] in ['png','pdf','svg','ps'], "image format not recognized"
67 |
68 | # list of protein polygons to draw
69 | object_list = []
70 | num_files = 0
71 |
72 | # set random seed for reproducibility
73 | if args.seed:
74 | np.random.seed(args.seed)
75 |
76 | if args.files:
77 | for path in args.files:
78 | with open(path,'rb') as f:
79 | data = pickle.load(f)
80 | object_list.append(data)
81 |
82 | # allow random scene generation even if manually specifying files
83 | if args.num_mol > 0:
84 | object_list = np.random.choice(object_list, size=args.num_mol)
85 | num_files = len(object_list)
86 |
87 | elif args.csv:
88 | protein_data = dict()
89 | with open(args.csv) as csvfile:
90 | reader = csv.DictReader(csvfile)
91 | for row in reader:
92 | (name, stoich, path) = (row['name'], float(row[args.sample_from]), row.get('file'))
93 | if path != "":
94 | with open(path,'rb') as f:
95 | data = pickle.load(f)
96 | data['name'] = name
97 | data['stoichiometry'] = stoich
98 | # TEST specifying color in CSV file
99 | if 'color' in row:
100 | data['color'] = row['color']
101 | protein_data[name] = (stoich, data)
102 | elif args.use_placeholders:
103 | height = float(row.get('height'))*10 # assuming in nanometers
104 | data = {'name':name, 'stoichiometry':stoich, 'height':height, 'bottom':np.array([25,0]), 'width':50, 'polygons':[{'polygon':placeholder_polygon(height), 'edgecolor':'k', 'linewidth':1, 'facecolor':"#eeeeee"}]}
105 | protein_data[name] = (stoich, data)
106 |
107 | num_files = len(protein_data)
108 |
109 | else:
110 | sys.exit("No input files specified, see options with --help")
111 |
112 | if len(args.offsets) > 0:
113 | assert(len(args.files) == len(args.offsets))
114 | y_offsets = list(map(float, args.offsets))
115 | else:
116 | y_offsets = np.zeros(len(object_list))
117 |
118 | if args.csv:
119 | # total sum of protein counts
120 | protein_names = np.array(list(protein_data.keys()))
121 | protein_stoich = np.array([protein_data[p][0] for p in protein_names])
122 | sum_stoich = np.sum(protein_stoich)
123 | stoich_weights = protein_stoich / sum_stoich
124 |
125 | if args.num_mol > 0:
126 | # protein copy number
127 | sampled_protein = np.random.choice(protein_names, size=args.num_mol, p=stoich_weights)
128 | object_list = [protein_data[p][1] for p in sampled_protein]
129 | else:
130 | object_list = [protein_data[p][1] for p in protein_names]
131 |
132 | # assemble objects for background
133 | if args.background and args.num_mol > 0:
134 | scaling_factor = 0.7
135 | sampled_protein = np.random.choice(protein_names, int(args.num_mol*1/scaling_factor), p=stoich_weights)
136 | background_object_list = [protein_data[p][1] for p in sampled_protein]
137 | elif 'name' in object_list[0]:
138 | protein_names = [o['name'] for o in object_list]
139 | else:
140 | for i,o in enumerate(object_list):
141 | o['name'] = i
142 | protein_names = range(len(object_list))
143 |
144 | # sort proteins
145 | if args.order_by == "random":
146 | np.random.shuffle(object_list)
147 | elif args.order_by == "height":
148 | object_list = sorted(object_list, key=lambda x: x['height'], reverse=True)
149 | elif args.order_by == "top":
150 | # TODO should be renamed, maybe length for overall size and height for above membrane?
151 | object_list = sorted(object_list, key=lambda x: x['top'][1], reverse=True)
152 | elif args.order_by == "membrane":
153 | # sorted by maximum height above or below the membrane
154 | def max_abs(l1, l2):
155 | if abs(l1) > abs(l2):
156 | return l1
157 | else:
158 | return l2
159 | object_list = sorted(object_list, key=lambda x: max_abs(x['top'][1], x['bottom'][1]), reverse=True)
160 |
161 | # set font options
162 | font_options = {'family':'Arial', 'weight':'normal', 'size':10}
163 | matplotlib.rc('font', **font_options)
164 |
165 | # set up plot
166 | # POSSIBLE BUG: while coordinates and scale are prserved in the pickle files,
167 | # this doesn't necessarily apply to the images. Hence if someone tries to
168 | # manually add a protein to a scene that has been generated there could be
169 | # sizing issues. Is the solution to use a constant angstrom/inch scaling?
170 | scene_height_in = args.fig_height
171 | scene_width_in = args.fig_width
172 | fig, axs = plt.subplots(figsize=(scene_width_in, scene_height_in))
173 | axs.set_aspect('equal')
174 |
175 | if args.axes:
176 | plt.axis('on')
177 | axs.xaxis.grid(False)
178 | axs.yaxis.grid(True)
179 | axs.axes.xaxis.set_ticklabels([])
180 | axs.autoscale()
181 | plt.margins(0.01,0.01) # is this needed?
182 |
183 | else:
184 | plt.axis('off')
185 | plt.gca().set_axis_off()
186 | plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)
187 | axs.autoscale()
188 | plt.margins(0.01,0.01)
189 | plt.gca().xaxis.set_major_locator(plt.NullLocator())
190 | plt.gca().yaxis.set_major_locator(plt.NullLocator())
191 |
192 | if args.recolor:
193 | # default cmap is hsv. for discrete could try Set1 or Pastel1
194 | if len(args.recolor_cmap) == 1:
195 | cmap = cm.get_cmap(args.recolor_cmap[0])
196 | else:
197 | # TESTING interpret as continous color scheme
198 | # cmap = LinearSegmentedColormap.from_list("cmap", args.recolor_cmap)
199 | cmap = ListedColormap(args.recolor_cmap)
200 | color_scheme = dict()
201 | for i,c in enumerate(sorted(object_list, key=lambda x: x['height'])):
202 | name = c['name']
203 | if isinstance(cmap, ListedColormap):
204 | color_scheme[name] = cmap(i)
205 | else:
206 | color_scheme[name] = cmap(i/len(object_list))
207 |
208 | # TESTING
209 | # so colors are by height (what about duplicated molecules)
210 | # np.random.shuffle(object_list)
211 |
212 | total_width = np.sum([o['width'] for o in object_list])+len(object_list)*args.padding
213 | if args.membrane is not None:
214 | membrane = Membrane(width=total_width, axes=axs, thickness=args.membrane_thickness)
215 |
216 | if args.membrane == "flat":
217 | membrane.flat()
218 | elif args.membrane == "arc":
219 | membrane.sinusoidal(frequency=0.5, amplitude=2)
220 | elif args.membrane == "wave":
221 | membrane.sinusoidal(frequency=2, amplitude=2)
222 | membrane.draw(lipids=args.membrane_lipids)
223 |
224 | # draw molecules
225 | w=0
226 | for i, o in enumerate(object_list):
227 | if args.membrane is not None and not args.no_membrane_offset:
228 | #y_offset = membrane.height_at(w+o['bottom'][0])-10
229 | #y_offset = o['bottom'][1]
230 | if o["bottom"][1] < 0:
231 | y_offset = -1*o["bottom"][1]
232 | else:
233 | y_offset = 0
234 | else:
235 | y_offset = 0
236 | for p in o["polygons"]:
237 | # TODO change to dict.get() call to have default
238 | if args.recolor:
239 | if 'color' in o:
240 | # check if color specified in CSV file
241 | facecolor = o['color']
242 | edgecolor = p["edgecolor"]
243 | else:
244 | # use color scheme from recolor_cmap
245 | facecolor = color_scheme[o['name']]
246 | edgecolor = 'black'
247 | if "shade" in p:
248 | # TODO export shading_range from polygons as well
249 | facecolor = shade_from_color(facecolor, p["shade"], range=p.get("shading_range", 0.4)) # using default from cartoon.py, could change
250 | else:
251 | # use color already specified
252 | facecolor = p["facecolor"]
253 | edgecolor = p["edgecolor"]
254 |
255 | plot_polygon(p["polygon"], translate_pre=[w, y_offset], facecolor=facecolor, edgecolor=edgecolor, linewidth=p["linewidth"], zorder_mod=p.get("zorder", 0))
256 | if args.labels:
257 | # option is experimental, text needs to be properly sized and placed
258 | # testing use of figure width in inches (specified above) and total width in angstroms to infer appropriate font size
259 | #plt.text(w+o['width']/2,-100, o.get("name", ""), rotation=90, fontsize=fontsize)
260 | # 1.1 and 0.6 numbers chosen through experimentation, best way would be to look at length of labels in characters
261 | angstroms_per_inch = total_width/scene_width_in
262 | fontsize = total_width*args.label_size/len(object_list)/angstroms_per_inch*72
263 | font_inches = fontsize/72
264 | # TODO better text positioning, allow for top/bottom selection
265 | if args.label_orientation == "vertical":
266 | #plt.text(w+o['width']/2,o['bottom'][1]-1.1*angstroms_per_inch*font_inches, o.get("name", ""), rotation=90, fontsize=fontsize, va='top', ha='center') # vertical text (below)
267 | plt.text(w+o['width']/2,0-1.1*angstroms_per_inch*font_inches, o.get("name", ""), rotation=90, fontsize=fontsize, va='top', ha='center') # vertical text (below)
268 | elif args.label_orientation == "horizontal":
269 | plt.text(w+o['width']/2,o['top'][1]+2*angstroms_per_inch*font_inches, o.get("name", ""), rotation=0, fontsize=fontsize, va='top', ha='center') # horizontal text (above)
270 | elif args.label_orientation == "diagonal":
271 | plt.text(w+o['width']/5,o['top'][1]+angstroms_per_inch*font_inches, o.get("name", ""), rotation=45, fontsize=fontsize) # diagonal text (above)
272 | w += o['width']+args.padding
273 |
274 | if args.background:
275 | background_w=0
276 | for i, o in enumerate(background_object_list):
277 | for p in o["polygons"]:
278 | plot_polygon(p["polygon"], offset=[background_w, 0], scale=scaling_factor, zorder_mod=p.get("zorder", -2), facecolor=p["facecolor"], edgecolor=p["edgecolor"], linewidth=p["linewidth"]*scaling_factor)
279 | background_w += (o['width']+args.padding)
280 |
281 | plt.savefig(args.save, transparent=True, pad_inches=0, bbox_inches='tight', dpi=args.dpi)
282 |
--------------------------------------------------------------------------------
/cellscape/structure.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import shapely.geometry as sg
3 | import shapely.ops as so
4 | import re
5 | import os
6 | import sys
7 | import operator
8 | import warnings
9 | from Bio.PDB import rotmat, vectors, MMCIFParser, PDBParser
10 | from scipy.spatial.distance import pdist, squareform
11 | import time
12 |
13 | import cellscape
14 | from cellscape.util import amino_acid_3letter, group_by
15 | from cellscape.parse_uniprot_xml import parse_xml, download_uniprot_record
16 | from cellscape.parse_alignment import align_pair, overlap_from_alignment, sequence_overlap
17 |
18 | # silence warnings from Biopython that might pop up when loading the PDB
19 | from Bio import BiopythonWarning
20 | warnings.simplefilter('ignore', BiopythonWarning)
21 |
22 | def matrix_from_nglview(m):
23 | """Take flattened 4x4 view matrix from NGLView and convert to 3x3 rotation matrix."""
24 | camera_matrix = np.array(m).reshape(4,4)
25 | return camera_matrix[:3,:3]/np.linalg.norm(camera_matrix[:3,:3], axis=1), camera_matrix[3,:3]
26 |
27 | def matrix_to_nglview(m):
28 | """Take 3x3 rotation matrix and convert to flattened 4x4 view matrix for NGLView."""
29 | nglv_matrix = np.identity(4)
30 | nglv_matrix[:3,:3] = np.dot(m, np.array([[-1,0,0],[0,1,0],[0,0,-1]]))
31 | return list(nglv_matrix.flatten())
32 |
33 | def orientation_from_topology(topologies):
34 | """Infer protein vertical orientation (N->C or C->N) from UniProt topology annotation."""
35 | first_ex_flag = True
36 | first_ex = None
37 | first_cy_flag = True
38 | first_cy = None
39 | first_he_flag = True
40 | first_he = None
41 |
42 | for row in topologies:
43 | (description, start, end) = row
44 |
45 | if description == 'Extracellular' and first_ex_flag:
46 | first_ex = (start, end)
47 | first_ex_flag = False
48 | elif description == 'Helical' and first_he_flag:
49 | first_he = (start, end)
50 | first_he_flag = False
51 | elif description == 'Cytoplasmic' and first_cy_flag:
52 | first_cy = (start, end)
53 | first_cy_flag = False
54 |
55 | # rough heuristic for now, works for single pass transmembrane proteins
56 | nc_orient = True
57 | if first_ex is not None and first_cy is not None:
58 | if first_ex[0] < first_cy[0]:
59 | nc_orient = True # N->C (top to bottom)
60 | elif first_ex[0] > first_cy[0]:
61 | nc_orient = False # C->N (top to bottom)
62 |
63 | return(nc_orient)
64 |
65 | def orientation_from_ptm(ptm):
66 | """Assumes signal peptide is on the cytoplasmic/membrane side with the chain extracellular"""
67 |
68 | nc_orient = True
69 | if ('chain' in ptm) and ('signal peptide' in ptm):
70 | if ptm['signal peptide'][0] < ptm['chain'][0]:
71 | nc_orient = True
72 | else:
73 | nc_orient = False
74 |
75 | return(nc_orient)
76 |
77 | def depth_slices_from_coord(xyz, width):
78 | """Split single xyz Nx3 matrix into list of Nx3 matrices"""
79 | binned = (xyz[:,-1]/width).astype(int)
80 | binned_shifted = binned - np.min(binned)
81 | num_bins = np.max(binned_shifted)+1
82 |
83 | total_coords = 0
84 | slice_coords = []
85 |
86 | for i in range(num_bins):
87 | bin_coords = xyz[binned_shifted == i]
88 | slice_coords.append(bin_coords)
89 | total_coords += len(bin_coords)
90 |
91 | assert(len(xyz) == total_coords)
92 | return slice_coords
93 |
94 | def get_z_slice_labels(xyz, width):
95 | """Take an Nx3 coordinate matrix and return Z bin"""
96 | binned = (xyz[:,-1]/width).astype(int)
97 | return binned - np.min(binned)
98 |
99 | def split_on_labels(m, labels):
100 | num_bins = np.max(labels)+1
101 | total_coords = 0
102 | coords = []
103 | for i in range(num_bins):
104 | group_coords = m[labels == i]
105 | coords.append(group_coords)
106 | total_coords += len(group_coords)
107 | assert(len(m) == total_coords)
108 | return coords
109 |
110 | def get_dimensions(xy, end_window=50):
111 | dimensions = {}
112 | dimensions['width'] = np.max(xy[:,0]) - np.min(xy[:,0])
113 | dimensions['height'] = np.max(xy[:,1]) - np.min(xy[:,1])
114 | dimensions['start'] = np.mean(xy[:end_window])
115 | dimensions['end'] = np.mean(xy[:-end_window])
116 | dimensions['bottom'] = min(xy, key=operator.itemgetter(1))
117 | dimensions['top'] = max(xy, key=operator.itemgetter(1))
118 | return dimensions
119 |
120 | class Structure:
121 | """ A class to load coordinates, handle an NGLView instance, and generate cartoons"""
122 | #
123 | def __init__(self, file, name=None, model=0, chain="all", uniprot=None, view=True, is_opm=False, res_start=None, res_end=None):
124 | """
125 | Args:
126 | file (str): Path to PDB/mmCIF coordinates
127 | name (str, optional): Descriptive name for structure. Defaults to None.
128 | model (int, optional): Model number from structure. Defaults to 0.
129 | chain (str, optional): Either "all" or list of chains to include e.g. "ABC". Defaults to "all".
130 | uniprot (str, optional): UniProt identifier (to download the record) or the path to a UniProt XML file. Defaults to None.
131 | view (bool, optional): Whether to use interactive NGLView widget. Defaults to True.
132 | is_opm (bool, optional): Structure is from Orientation of Proteins in Membranes database. Defaults to False.
133 | res_start (int, optional): Select subset of protein. Defaults to None.
134 | res_end (int, optional): Select subset of protein. Defaults to None.
135 | """
136 |
137 | # descriptive name for the protein, otherwise use file
138 | if name is None:
139 | self.name = os.path.basename(file)
140 | else:
141 | self.name = name
142 |
143 | # load structure with biopython
144 | if file[-3:] in ["cif", "mcif"]:
145 | parser = MMCIFParser()
146 | elif file[-3:] in ["pdb", "ent"]:
147 | parser = PDBParser()
148 | else:
149 | sys.exit("File format not recognized!")
150 | self.structure = parser.get_structure(file, file)[model]
151 | _all_chains = [c.id for c in self.structure.get_chains()]
152 |
153 | # eliminate undesired chains from the biopython object
154 | if chain.lower() == "all":
155 | self.chains = _all_chains
156 | else:
157 | self.chains = list(chain)
158 | for c in _all_chains:
159 | if c not in self.chains:
160 | self.structure.detach_child(c)
161 |
162 | # take chain start and end for first chain
163 | if res_start is not None and res_end is not None:
164 | assert(res_end > res_start)
165 | for res in list(self.structure[_all_chains[0]]):
166 | res_id = res.get_full_id()[3][1]
167 | if (res_id < res_start) or (res_id > res_end):
168 | self.structure[_all_chains[0]].detach_child(res.get_id())
169 |
170 | # BUG with some biopython structures not loading in nglview
171 | # can be fixed by resetting disordered flags
172 | # could this cause problems later on?
173 | for chain in self.structure:
174 | for residue in chain:
175 | for atom in residue.get_unpacked_list():
176 | atom.disordered_flag = 0
177 |
178 | # assumes PDB is oriented as described here:
179 | # https://opm.phar.umich.edu/about#features
180 | self.is_opm = is_opm
181 |
182 | # view matrix and NGLView options
183 | self.use_nglview = view
184 | self.view_matrix = []
185 | if self.use_nglview:
186 | if 'nglview' not in sys.modules or 'nv' not in sys.modules:
187 | import nglview as nv
188 | self._structure_to_view = self.structure
189 | initial_repr = [
190 | {"type": "spacefill", "params": {
191 | "sele": "protein", "color": "element"
192 | }}
193 | ]
194 | self.view = nv.show_biopython(self._structure_to_view, sync_camera=True, representations=initial_repr)
195 | self.view.camera = 'orthographic'
196 | self.view._set_sync_camera([self.view])
197 | self._reflect_y = np.array([[-1,0,0],[0,1,0],[0,0,-1]])
198 |
199 | # data structure holding residue information
200 | self.residues = dict()
201 | self.sequence = dict()
202 | self.coord = []
203 | self.ca_atoms = []
204 | self.backbone_atoms = []
205 | all_atoms = 0
206 | for chain in self.chains:
207 | self.sequence[chain] = ""
208 | self.residues[chain] = dict()
209 | for res in self.structure[chain]:
210 | res_id = res.get_full_id()[3][1]
211 | if res.get_full_id()[3][0][0] == "H": # skip hetatm records
212 | continue
213 | if res.get_resname() not in amino_acid_3letter:
214 | continue
215 | res_aa = amino_acid_3letter[res.get_resname()]
216 | self.sequence[chain] += res_aa
217 | residue_atoms = 0
218 | these_atoms = []
219 | backbone_atoms = []
220 | for a in res:
221 | self.coord.append(list(a.get_vector()))
222 | these_atoms.append(a.id) # tracking atom identities for now
223 | if a.id == "CA":
224 | this_ca_atom = all_atoms
225 | self.ca_atoms.append(this_ca_atom)
226 | if a.id in ["CA", "N", "C", "O"]:
227 | backbone_atoms.append(all_atoms)
228 | self.backbone_atoms.append(all_atoms)
229 | all_atoms += 1
230 | residue_atoms += 1
231 | self.residues[chain][res_id] = {
232 | 'chain':chain,
233 | 'id':res_id,
234 | 'amino_acid':res_aa,
235 | 'object':res,
236 | 'coord':(all_atoms-residue_atoms, all_atoms),
237 | 'coord_ca':(this_ca_atom, this_ca_atom+1),
238 | 'coord_backbone':np.array(backbone_atoms),
239 | 'atoms':np.array(these_atoms)
240 | }
241 | self.coord = np.array(self.coord)
242 | self.ca_atoms = np.array(self.ca_atoms).astype(int)
243 |
244 | # uniprot information
245 | if uniprot is not None:
246 | if os.path.exists(uniprot):
247 | self._uniprot_xml = uniprot
248 | elif re.fullmatch(r'[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}', uniprot):
249 | # if file doesn't exist, check it is a valid UniProt ID and download from server
250 | # using regex from https://www.uniprot.org/help/accession_numbers
251 | try:
252 | self._uniprot_xml = download_uniprot_record(uniprot, "xml", os.getcwd())
253 | except:
254 | sys.exit("Couldn't download UniProt file")
255 | else:
256 | self._uniprot_xml = None
257 | sys.exit("Must specify either a UniProt XML file or a valid UniProt ID")
258 | else:
259 | self._uniprot_xml = None
260 |
261 | if self._uniprot_xml is not None:
262 | self._preprocess_uniprot(self._uniprot_xml)
263 |
264 | def _preprocess_uniprot(self, xml):
265 | # TODO support more than one XML file (e.g. for different chains?)
266 | self._uniprot = parse_xml(xml)[0]
267 |
268 | # align PDB and UniProt sequences to find offset
269 | uniprot_chain = self.chains[0]
270 | pdb_seq = self.sequence[uniprot_chain]
271 | uniprot_seq = self._uniprot.sequence
272 | first_residue_id = sorted(self.residues[uniprot_chain])[0]
273 | # alignment coordinates are 0-indexed (but PDB numbering and Uniprot ranges are 1-indexed)
274 | #self._uniprot_overlap = np.array(overlap_from_alignment(align_pair(uniprot_seq, pdb_seq)))
275 | self._uniprot_overlap = np.array(sequence_overlap(uniprot_seq, pdb_seq))
276 | self._uniprot_offset = self._uniprot_overlap[0] - first_residue_id
277 |
278 | if len(self._uniprot.domains) > 0:
279 | self._annotate_residues_from_uniprot(self._uniprot.domains, name_key="domain", residues=self.residues[uniprot_chain], offset=self._uniprot_offset)
280 |
281 | if len(self._uniprot.topology) > 0:
282 | self._annotate_residues_from_uniprot(self._uniprot.topology, name_key="topology", residues=self.residues[uniprot_chain], offset=self._uniprot_offset)
283 |
284 | def _annotate_residues_from_uniprot(self, ranges, name_key, residues, offset=0):
285 | # pdb_number - offset = up_number
286 | for row in ranges:
287 | (name, start, end) = row
288 | for r in range(start, end+1):
289 | if (r-offset) in residues:
290 | residues[r-offset][name_key] = name
291 |
292 | def _update_view_matrix(self):
293 | # check if camera orientation has been specified from nglview
294 | if len(self.view._camera_orientation) == 16:
295 | m, t = matrix_from_nglview(self.view._camera_orientation)
296 | self.view_matrix = np.dot(m, self._reflect_y)
297 | elif len(self.view_matrix) == 0:
298 | self.view_matrix = np.identity(3)
299 |
300 | def align_view(self, v1, v2):
301 | """Rotate structure so v1 is aligned with v2
302 |
303 | Args:
304 | v1 (ndarray): first vector
305 | v2 (ndarray): second vector
306 | """
307 | # rotate structure so v1 is aligned with v2
308 | r = rotmat(vectors.Vector(v1), vectors.Vector(v2))
309 | view_matrix = r.T
310 | self.set_view_matrix(view_matrix)
311 |
312 | def align_view_nc(self, n_atoms=10, c_atoms=10, flip=False):
313 | """Rotate structure so N-C vector is aligned with the vertical axis
314 |
315 | Args:
316 | n_atoms (int, optional): N terminus CoM calculated from first x atoms. Defaults to 10.
317 | c_atoms (int, optional): C terminus CoM calculated from first x atoms. Defaults to 10.
318 | flip (bool, optional): Orient C-to-N instead of N-to-C. Defaults to False.
319 | """
320 | com = np.mean(self.coord, axis=0)
321 | atoms_ = self.coord - com
322 | v1 = np.mean(atoms_[:n_atoms], axis=0) - np.mean(atoms_[-c_atoms:], axis=0)
323 | if not flip:
324 | self.align_view(v1, np.array([0,1,0]))
325 | else:
326 | self.align_view(v1, np.array([0,-1,0]))
327 |
328 | def auto_view(self, n_atoms=100, c_atoms=100, flip=None):
329 | """Infer protein orientation from UniProt data
330 |
331 | Args:
332 | n_atoms (int, optional): N terminus CoM calculated from first x atoms.. Defaults to 100.
333 | c_atoms (int, optional): C terminus CoM calculated from first x atoms. Defaults to 100.
334 | flip (bool, optional): Explicitly pass orientation. Defaults to None.
335 | """
336 | # TODO should be same as align_view_nc if no UniProt data?
337 | # TODO abstract with align_view?
338 | # TODO abstract rotmat to separate function e.g. get_rotation_matrix()
339 | if flip is None:
340 | if self._uniprot_xml and len(self._uniprot.topology) > 0:
341 | print("orienting based on topology...")
342 | nc_orient = orientation_from_topology(self._uniprot.topology)
343 | elif self._uniprot_xml and len(self._uniprot.ptm) > 0:
344 | print("orienting based on ptm...")
345 | nc_orient = orientation_from_ptm(self._uniprot.ptm)
346 | else:
347 | nc_orient = True
348 | elif isinstance(flip, bool):
349 | nc_orient = flip
350 | print("guessed N>C orientation? {}".format(nc_orient))
351 | self.nc_orient = nc_orient
352 |
353 | # rotate structure so N-C vector is aligned with the vertical axis
354 | com = np.mean(self.coord, axis=0)
355 | atoms_ = self.coord - com
356 | v1 = np.mean(atoms_[:n_atoms], axis=0) - np.mean(atoms_[-c_atoms:], axis=0)
357 | if nc_orient:
358 | first_rotation = rotmat(vectors.Vector(v1), vectors.Vector(np.array([0,1,0]))).T
359 | else:
360 | first_rotation = rotmat(vectors.Vector(v1), vectors.Vector(np.array([0,-1,0]))).T
361 |
362 | # rotate around Y axis so X axis aligns with longest distance in XZ plane
363 | rot_coord = np.dot(self.coord, first_rotation)
364 | com = np.mean(rot_coord, axis=0)
365 | atoms_ = rot_coord - com
366 | xz = atoms_[self.ca_atoms][:,[0,2]]
367 | dist = squareform(pdist(xz))
368 | max_dist = np.unravel_index(np.argmax(dist, axis=None), dist.shape)
369 | #print(max_dist, np.max(dist), dist[max_dist[0]][max_dist[1]])
370 | v2 = atoms_[self.ca_atoms[max_dist[0]]]-atoms_[self.ca_atoms[max_dist[1]]]
371 | v2[1] = 0
372 | second_rotation = rotmat(vectors.Vector(v2), vectors.Vector(np.array([1,0,0]))).T
373 |
374 | view_matrix = np.dot(first_rotation, second_rotation)
375 | self.set_view_matrix(view_matrix)
376 |
377 | def _set_nglview_orientation(self, m):
378 | # m is 3x3 rotation matrix
379 | if self.use_nglview:
380 | nglv_matrix = matrix_to_nglview(m)
381 | #print("Before", self.view._camera_orientation)
382 | self.view._set_camera_orientation(nglv_matrix)
383 | # having a bug where setting camera orientation does nothing
384 | # waiting a little bit seems to fix it (maybe an issue with sync/refresh rate)
385 | #self.view.control.orient(nglv_matrix)
386 | #self.view._camera_orientation = nglv_matrix
387 | time.sleep(0.5)
388 | self.view.center()
389 | #print("After", self.view._camera_orientation)
390 |
391 | def _apply_view_matrix(self):
392 | # transform atomic coordinates using view matrix
393 | self.rotated_coord = np.dot(self.coord, self.view_matrix)
394 |
395 | def load_pymol_view(self, file):
396 | """Read rotation matrix from output of PyMol ``get_view`` command
397 |
398 | Args:
399 | file (str): Path to file
400 | """
401 | matrix = []
402 | with open(file,'r') as view:
403 | for line in view:
404 | fields = line.split(',')
405 | if len(fields) == 4:
406 | matrix.append(list(map(float,fields[:3])))
407 | view_matrix = np.array(matrix)[:3]
408 | self.set_view_matrix(view_matrix)
409 |
410 | def load_chimera_view(self, file):
411 | """Read rotation matrix from output of Chimera ``matrixset`` command
412 |
413 | Args:
414 | file (str): Path to file
415 | """
416 | matrix = []
417 | with open(file,'r') as view:
418 | for line in view.readlines()[1:4]:
419 | matrix.append(line.split())
420 |
421 | # transpose and remove translation vector
422 | view_matrix = np.array(matrix).astype(float).T[:3]
423 | self.set_view_matrix(view_matrix)
424 |
425 | def save_view_matrix(self, p):
426 | """Save rotation matrix to a NumPy text file
427 |
428 | Args:
429 | p (str): Path to file
430 | """
431 | self._update_view_matrix()
432 | np.savetxt(p, self.view_matrix)
433 |
434 | def load_view_matrix(self, p):
435 | """Load rotation matrix from a NumPy text file
436 |
437 | Args:
438 | p (str): Path to file
439 | """
440 | view_matrix = np.loadtxt(p)
441 | self.set_view_matrix(view_matrix)
442 |
443 | def set_view_matrix(self, m):
444 | """Manually set view matrix
445 |
446 | Args:
447 | m (ndarray): 3x3 matrix
448 | """
449 | assert m.shape == (3,3)
450 | self.view_matrix = m
451 | self._set_nglview_orientation(self.view_matrix)
452 |
453 | def outline(self, by="all", depth=None, depth_contour_interval=3, only_backbone=False, only_ca=False, only_annotated=False, radius=None, back_outline=False, align_transmembrane=False):
454 | """Create 2D projection from coordinates and outline atoms
455 |
456 | Args:
457 | by (str, optional): Grouping to use for cartoon. Options are ["all", "residue", "chain", "domain", "topology"]. Defaults to "all".
458 | depth (_type_, optional): How to deal with depth/occlusions. Options are ["flat", "contours"]. Defaults to None.
459 | depth_contour_interval (float, optional): Size in angstroms of contour slices into the Z-axis. Defaults to 3.
460 | only_backbone (bool, optional): Only use backbone atoms for visualization. Defaults to False.
461 | only_ca (bool, optional): Only use alpha-carbon atoms for visualization. Defaults to False.
462 | only_annotated (bool, optional): Only include residues that have an annotation in UniProt (e.g. domain or topology). Defaults to False.
463 | radius (float, optional): Explicitly pass atomic radius, otherwise infer from settings. Defaults to None.
464 | back_outline (bool, optional): Draw additional outline of entire structure at the back. Defaults to False.
465 | align_transmembrane (bool, optional): Align CoM of annotated transmembrane regions with membrane (requires UniProt data). Defaults to False.
466 |
467 | Returns:
468 | Cartoon: Object containing and residue information and outlined polygons
469 | """
470 |
471 | # check options
472 | assert by in ["all", "residue", "chain", "domain", "topology"], "Option not recognized"
473 | assert depth in [None, "flat", "contours"], "Option not recognized"
474 | # depth option doesn't affect by="residue"
475 |
476 | # collapse chain hierarchy into flat list
477 | self.residues_flat = [self.residues[c][i] for c in self.residues for i in self.residues[c]]
478 |
479 | if self.is_opm:
480 | self.set_view_matrix(np.array([[1,0,0],[0,0,1],[0,1,0]]))
481 | elif self.use_nglview:
482 | self._update_view_matrix()
483 |
484 | # transform atomic coordinates using view matrix
485 | self._apply_view_matrix()
486 |
487 | # recenter coordinates on lower left edge of bounding box
488 | offset_x = np.min(self.rotated_coord[:,0])
489 | if self.is_opm:
490 | offset_y = 0 # since OPM already aligned to membrane
491 | else:
492 | offset_y = np.min(self.rotated_coord[:,1])
493 | self.rotated_coord -= np.array([offset_x, offset_y, 0])
494 |
495 | # calculate vertical offset for transmembrane proteins
496 | if self._uniprot_xml and align_transmembrane:
497 | tm_coordinates = []
498 | for res in self.residues_flat:
499 | if res.get("topology","") == "Helical":
500 | tm_coordinates.append(np.array(self.rotated_coord[range(*res['coord_ca'])]))
501 | if len(tm_coordinates) > 0:
502 | tm_coordinates = np.concatenate(np.array(tm_coordinates))
503 | tm_com_y = np.mean(tm_coordinates[:,1])
504 | print("shifted for transmembrane region by {} angstroms".format(tm_com_y))
505 | self.rotated_coord -= np.array([0, tm_com_y, 0])
506 |
507 | self._rescale_z = lambda z: (z-np.min(self.rotated_coord[:,-1]))/(np.max(self.rotated_coord[:,-1])-np.min(self.rotated_coord[:,-1]))
508 | polygons = []
509 | groups = {}
510 | self._group_outlines = []
511 |
512 | # default radius for rendering atoms
513 | if only_ca and radius is None:
514 | radius_ = 5
515 | elif only_backbone and radius is None:
516 | radius_ = 4
517 | elif radius is None:
518 | radius_ = 1.5
519 | else:
520 | radius_ = radius
521 |
522 | if by == 'all':
523 | # space-filling outline of entire molecule
524 | self.num_groups = 1
525 | if only_ca:
526 | coord_to_outline = self.rotated_coord[self.ca_atoms]
527 | elif only_backbone:
528 | coord_to_outline = self.rotated_coord[self.backbone_atoms]
529 | else:
530 | coord_to_outline = self.rotated_coord
531 | if depth == "contours":
532 | slice_coords = split_on_labels(coord_to_outline, get_z_slice_labels(coord_to_outline, width=depth_contour_interval))
533 | for slice in slice_coords:
534 | slice_depth = self._rescale_z(np.mean(slice[:,-1]))
535 | polygons.append(({"depth":slice_depth}, so.unary_union([sg.Point(i).buffer(radius_) for i in slice])))
536 | else:
537 | # depth=None and depth=flat are equivalent for by="all"
538 | polygons.append(({}, so.unary_union([sg.Point(i).buffer(radius_) for i in coord_to_outline])))
539 | else:
540 | for res in self.residues_flat:
541 | # pick range of atomic coordinates out of main data structure
542 | if only_ca:
543 | res_coords = np.array(self.rotated_coord[range(*res['coord_ca'])])
544 | elif only_backbone:
545 | res_coords = np.array(self.rotated_coord[range(*res['coord_backbone'])])
546 | else:
547 | res_coords = np.array(self.rotated_coord[range(*res['coord'])])
548 | res["xyz"] = res_coords
549 |
550 | if by == 'residue':
551 | for res in sorted(self.residues_flat, key=lambda res: np.mean(res["xyz"][:,-1])):
552 | group_outline = so.cascaded_union([sg.Point(i).buffer(radius_) for i in res["xyz"] ])
553 | res["polygon"] = group_outline
554 | res["depth"] = self._rescale_z(np.mean(res["xyz"][:,-1]))
555 | polygons.append((res, group_outline))
556 | self.num_groups = 1
557 |
558 | elif by in ['domain', 'topology', 'chain']:
559 |
560 | if by in ['domain', 'topology']:
561 | assert(self._uniprot_xml is not None)
562 |
563 | # TODO comment code and be consistent with variable names group vs region
564 | residue_groups = group_by(self.residues_flat, key=lambda x: x.get(by))
565 | groups = sorted(residue_groups.keys(), key=lambda x: (x is None, x))
566 |
567 | self.num_groups = len(residue_groups)
568 | region_atoms = dict() # residue group to atomic indices
569 | total_atoms = 0
570 | for k,v in residue_groups.items():
571 | region_atoms[k] = []
572 | for res in v:
573 | if only_ca:
574 | region_atoms[k].extend(range(*res['coord_ca']))
575 | elif only_backbone:
576 | region_atoms[k].extend(range(*res['coord_backbone']))
577 | else:
578 | region_atoms[k].extend(range(*res['coord']))
579 | region_atoms[k] = np.array(region_atoms[k], dtype=int)
580 | total_atoms += len(region_atoms[k])
581 |
582 | if depth is not None:
583 |
584 | slice_labels = get_z_slice_labels(self.rotated_coord, width=depth_contour_interval)
585 | num_slices = np.max(slice_labels)+1
586 |
587 | if depth == "contours":
588 | for s in range(num_slices):
589 | for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))):
590 | if not only_annotated or group_name is not None:
591 | atom_indices = region_atoms[group_name]
592 | slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == s]
593 | if len(slice_coords) > 0:
594 | slice_depth = self._rescale_z(np.mean(slice_coords[:,-1]))
595 | slice_outline = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords])
596 | polygons.append(({by:group_name, "depth":slice_depth}, slice_outline))
597 |
598 | # back outline to highlight each group's contours... just duplicating depth==flat code here
599 | if back_outline:
600 | empty_polygon = sg.Point((0,0)).buffer(0)
601 | view_object = empty_polygon
602 | region_polygons = dict()
603 | for slice in range(num_slices, 0, -1):
604 | for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))):
605 | if not only_annotated or group_name is not None:
606 | atom_indices = region_atoms[group_name]
607 | slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == slice]
608 | poly = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords])
609 | this_difference = poly.difference(view_object)
610 | region_polygons[group_name] = region_polygons.get(group_name, empty_polygon).union(this_difference.buffer(0.01))
611 | view_object = view_object.union(this_difference.buffer(0.01))
612 |
613 | for v in region_polygons.values():
614 | self._group_outlines.append(v)
615 |
616 | elif depth == "flat":
617 | empty_polygon = sg.Point((0,0)).buffer(0)
618 | view_object = empty_polygon
619 | region_polygons = dict()
620 | for slice in range(num_slices, 0, -1):
621 | for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))):
622 | if not only_annotated or group_name is not None:
623 | atom_indices = region_atoms[group_name]
624 | slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == slice]
625 | poly = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords])
626 | this_difference = poly.difference(view_object)
627 | region_polygons[group_name] = region_polygons.get(group_name, empty_polygon).union(this_difference.buffer(0.01))
628 | view_object = view_object.union(this_difference.buffer(0.01))
629 |
630 | for k,v in region_polygons.items():
631 | polygons.append(({by:k}, v))
632 |
633 | else:
634 | for group_i, (group_name, group_res) in enumerate(residue_groups.items()):
635 | if not only_annotated or group_name is not None:
636 | group_coords = self.rotated_coord[region_atoms[group_name]]
637 | polygons.append(({by:group_name}, so.unary_union([sg.Point(i).buffer(radius_) for i in group_coords])))
638 |
639 | if back_outline:
640 | self._back_outline = so.unary_union([p[1].buffer(0.01) for p in polygons])
641 | else:
642 | self._back_outline = None
643 |
644 | print("Outlined {} polygons!".format(len(polygons)), file=sys.stderr)
645 |
646 | return cellscape.Cartoon(self.name, polygons, self.residues_flat, by, self._back_outline, self._group_outlines, self.num_groups, get_dimensions(self.rotated_coord), groups)
647 |
--------------------------------------------------------------------------------
/cellscape/util.py:
--------------------------------------------------------------------------------
1 | amino_acid_3letter = {'ALA':'A',
2 | 'ASX':'B',
3 | 'CYS':'C',
4 | 'ASP':'D',
5 | 'GLU':'E',
6 | 'PHE':'F',
7 | 'GLY':'G',
8 | 'HIS':'H',
9 | 'ILE':'I',
10 | 'LYS':'K',
11 | 'LEU':'L',
12 | 'MET':'M',
13 | 'MSE':'M',
14 | 'ASN':'N',
15 | 'PRO':'P',
16 | 'GLN':'Q',
17 | 'ARG':'R',
18 | 'SER':'S',
19 | 'THR':'T',
20 | 'VAL':'V',
21 | 'TRP':'W',
22 | 'XAA':'X',
23 | 'TYR':'Y',
24 | 'GLX':'Z'}
25 |
26 | def group_by(l, key):
27 | """Take a list of dictionaries and group them according to a key."""
28 | d = dict()
29 | for i in l:
30 | k = key(i)
31 | if k in d:
32 | d[k].append(i)
33 | else:
34 | d[k] = [i]
35 | return d
--------------------------------------------------------------------------------
/examples/ceacam5/P06731.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | P06731
5 | H9KVA7
6 | CEAM5_HUMAN
7 |
8 |
9 | Carcinoembryonic antigen-related cell adhesion molecule 5
10 |
11 |
12 | Carcinoembryonic antigen
13 | CEA
14 |
15 |
16 | Meconium antigen 100
17 |
18 | CD66e
19 |
20 |
21 | CEACAM5
22 | CEA
23 |
24 |
25 | Homo sapiens
26 | Human
27 |
28 |
29 | Eukaryota
30 | Metazoa
31 | Chordata
32 | Craniata
33 | Vertebrata
34 | Euteleostomi
35 | Mammalia
36 | Eutheria
37 | Euarchontoglires
38 | Primates
39 | Haplorrhini
40 | Catarrhini
41 | Hominidae
42 | Homo
43 |
44 |
45 |
46 |
47 | Isolation and characterization of full-length functional cDNA clones for human carcinoembryonic antigen.
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 | NUCLEOTIDE SEQUENCE [GENOMIC DNA]
59 | VARIANT GLU-398
60 |
61 |
62 |
63 | Carcinoembryonic antigen family: characterization of cDNAs coding for NCA and CEA and suggestion of nonrandom sequence variation in their conserved loop-domains.
64 |
65 |
66 |
67 |
68 |
69 |
70 |
71 |
72 |
73 | NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1)
74 | VARIANT GLU-398
75 |
76 |
77 |
78 | Cloning of the complete gene for carcinoembryonic antigen: analysis of its promoter indicates a region conveying cell type-specific expression.
79 |
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 | NUCLEOTIDE SEQUENCE [GENOMIC DNA]
94 | VARIANT GLU-398
95 |
96 |
97 |
98 | The DNA sequence and biology of human chromosome 19.
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 |
107 |
108 |
109 |
110 |
111 |
112 |
113 |
114 |
115 |
116 |
117 |
118 |
119 |
120 |
121 |
122 |
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 |
142 |
143 |
144 |
145 |
146 |
147 |
148 |
149 |
150 |
151 |
152 |
153 |
154 |
155 |
156 |
157 |
158 |
159 |
160 |
161 |
162 |
163 |
164 |
165 |
166 |
167 |
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 |
178 |
179 |
180 |
181 |
182 |
183 |
184 |
185 |
186 |
187 |
188 |
189 |
190 |
191 |
192 |
193 |
194 |
195 |
196 |
197 |
198 |
199 |
200 |
201 |
202 | NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]
203 |
204 |
205 |
206 | Primary structure of human carcinoembryonic antigen (CEA) deduced from cDNA sequence.
207 |
208 |
209 |
210 |
211 |
212 |
213 |
214 |
215 | NUCLEOTIDE SEQUENCE [MRNA] OF 5-702 (ISOFORM 2)
216 | VARIANT GLU-398
217 |
218 |
219 |
220 | Isolation and characterization of cDNA clones encoding the human carcinoembryonic antigen reveal a highly conserved repeating structure.
221 |
222 |
223 |
224 |
225 |
226 |
227 |
228 |
229 |
230 | NUCLEOTIDE SEQUENCE [MRNA] OF 331-702 (ISOFORM 1)
231 | VARIANT GLU-398
232 |
233 |
234 |
235 | Cell adhesion activity of non-specific cross-reacting antigen (NCA) and carcinoembryonic antigen (CEA) expressed on CHO cell surface: homophilic and heterophilic adhesion.
236 |
237 |
238 |
239 |
240 |
241 |
242 |
243 |
244 |
245 |
246 |
247 | FUNCTION
248 | SUBCELLULAR LOCATION
249 |
250 |
251 |
252 | Expression of complementary DNA and genomic clones for carcinoembryonic antigen and nonspecific cross-reacting antigen in Chinese hamster ovary and mouse fibroblast cells and characterization of the membrane-expressed products.
253 |
254 |
255 |
256 |
257 |
258 |
259 |
260 |
261 |
262 |
263 | SUBCELLULAR LOCATION
264 | GPI-ANCHOR AT ALA-685
265 |
266 |
267 |
268 | Four carcinoembryonic antigen subfamily members, CEA, NCA, BGP and CGM2, selectively expressed in the normal human colonic epithelium, are integral components of the fuzzy coat.
269 |
270 |
271 |
272 |
273 |
274 |
275 |
276 |
277 | SUBCELLULAR LOCATION
278 | TISSUE SPECIFICITY
279 |
280 |
281 |
282 | Human carcinoembryonic antigen functions as a general inhibitor of anoikis.
283 |
284 |
285 |
286 |
287 |
288 |
289 |
290 |
291 | FUNCTION
292 |
293 |
294 |
295 | Self recognition in the Ig superfamily. Identification of precise subdomains in carcinoembryonic antigen required for intercellular adhesion.
296 |
297 |
298 |
299 |
300 |
301 |
302 |
303 |
304 |
305 |
306 |
307 | FUNCTION
308 | MUTAGENESIS OF SER-66; TYR-68; LYS-69 AND GLN-78
309 | SUBCELLULAR LOCATION
310 |
311 |
312 |
313 | Identification of N-linked glycoproteins in human saliva by glycoprotein capture and mass spectrometry.
314 |
315 |
316 |
317 |
318 |
319 |
320 |
321 |
322 |
323 |
324 |
325 | GLYCOSYLATION [LARGE SCALE ANALYSIS] AT ASN-560
326 |
327 | Saliva
328 |
329 |
330 |
331 |
332 | Glycoproteomics analysis of human liver tissue by combination of multiple enzyme digestion and hydrazide chemistry.
333 |
334 |
335 |
336 |
337 |
338 |
339 |
340 |
341 |
342 |
343 |
344 |
345 |
346 | GLYCOSYLATION [LARGE SCALE ANALYSIS] AT ASN-246
347 |
348 | Liver
349 |
350 |
351 |
352 |
353 | Diverse oligomeric states of CEACAM IgV domains.
354 |
355 |
356 |
357 |
358 |
359 |
360 |
361 |
362 |
363 |
364 | SUBUNIT
365 |
366 |
367 |
368 | Structural models for carcinoembryonic antigen and its complex with the single-chain Fv antibody molecule MFE23.
369 |
370 |
371 |
372 |
373 |
374 |
375 |
376 | 3D-STRUCTURE MODELING OF 35-676
377 |
378 |
379 |
380 | Binding of Dr adhesins of Escherichia coli to carcinoembryonic antigen triggers receptor dissociation.
381 |
382 |
383 |
384 |
385 |
386 |
387 |
388 |
389 |
390 |
391 |
392 |
393 |
394 |
395 |
396 | X-RAY CRYSTALLOGRAPHY (1.95 ANGSTROMS) OF 34-144 IN COMPLEX WITH E.COLI DR ADHESIN
397 | FUNCTION (MICROBIAL INFECTION)
398 | SUBUNIT
399 | SUBCELLULAR LOCATION
400 | MUTAGENESIS OF PHE-63; SER-66; VAL-73; ASP-74; GLN-78; ILE-125; LEU-129 AND GLU-133
401 |
402 |
403 | Cell surface glycoprotein that plays a role in cell adhesion, intracellular signaling and tumor progression (PubMed:2803308, PubMed:10910050, PubMed:10864933). Mediates homophilic and heterophilic cell adhesion with other carcinoembryonic antigen-related cell adhesion molecules, such as CEACAM6 (PubMed:2803308). Plays a role as an oncogene by promoting tumor progression; induces resistance to anoikis of colorectal carcinoma cells (PubMed:10910050).
404 |
405 |
406 | (Microbial infection) Receptor for E.coli Dr adhesins. Binding of E.coli Dr adhesins leads to dissociation of the homodimer.
407 |
408 |
409 | Homodimer.
410 |
411 |
412 |
413 | P06731
414 |
415 |
416 | P06731
417 |
418 |
419 | false
420 | 6
421 |
422 |
423 |
424 | P06731
425 |
426 |
427 | K0BRG7
428 |
429 |
430 | true
431 | 4
432 |
433 |
434 |
435 | Cell membrane
436 | Lipid-anchor
437 | GPI-anchor
438 |
439 |
440 | Apical cell membrane
441 |
442 |
443 | Cell surface
444 |
445 | Localized to the apical glycocalyx surface.
446 |
447 |
448 |
449 |
450 | P06731-1
451 | 1
452 |
453 |
454 |
455 | P06731-2
456 | 2
457 |
458 |
459 |
460 |
461 | Expressed in columnar epithelial and goblet cells of the colon (at protein level) (PubMed:10436421). Found in adenocarcinomas of endodermally derived digestive system epithelium and fetal colon.
462 |
463 |
464 | Complex immunoreactive glycoprotein with a MW of 180 kDa comprising 60% carbohydrate.
465 |
466 |
467 | Belongs to the immunoglobulin superfamily. CEA family.
468 |
469 |
470 |
471 |
472 |
473 |
474 |
475 |
476 |
477 |
478 |
479 |
480 |
481 |
482 |
483 |
484 |
485 |
486 |
487 |
488 |
489 |
490 |
491 |
492 |
493 |
494 |
495 |
496 |
497 |
498 |
499 |
500 |
501 |
502 |
503 |
504 |
505 |
506 |
507 |
508 |
509 |
510 |
511 |
512 |
513 |
514 |
515 |
516 |
517 |
518 |
519 |
520 |
521 |
522 |
523 |
524 |
525 |
526 |
527 |
528 |
529 |
530 |
531 |
532 |
533 |
534 |
535 |
536 |
537 |
538 |
539 |
540 |
541 |
542 |
543 |
544 |
545 |
546 |
547 |
548 |
549 |
550 |
551 |
552 |
553 |
554 |
555 |
556 |
557 |
558 |
559 |
560 |
561 |
562 |
563 |
564 |
565 |
566 |
567 |
568 |
569 |
570 |
571 |
572 |
573 |
574 |
575 |
576 |
577 |
578 |
579 |
580 |
581 |
582 |
583 |
584 |
585 |
586 |
587 |
588 |
589 |
590 |
591 |
592 |
593 |
594 |
595 |
596 |
597 |
598 |
599 |
600 |
601 |
602 |
603 |
604 |
605 |
606 |
607 |
608 |
609 |
610 |
611 |
612 |
613 |
614 |
615 |
616 |
617 |
618 |
619 |
620 |
621 |
622 |
623 |
624 |
625 |
626 |
627 |
628 |
629 |
630 |
631 |
632 |
633 |
634 |
635 |
636 |
637 |
638 |
639 |
640 |
641 |
642 |
643 |
644 |
645 |
646 |
647 |
648 |
649 |
650 |
651 |
652 |
653 |
654 |
655 |
656 |
657 |
658 |
659 |
660 |
661 |
662 |
663 |
664 |
665 |
666 |
667 |
668 |
669 |
670 |
671 |
672 |
673 |
674 |
675 |
676 |
677 |
678 |
679 |
680 |
681 |
682 |
683 |
684 |
685 |
686 |
687 |
688 |
689 |
690 |
691 |
692 |
693 |
694 |
695 |
696 |
697 |
698 |
699 |
700 |
701 |
702 |
703 |
704 |
705 |
706 |
707 |
708 |
709 |
710 |
711 |
712 |
713 |
714 |
715 |
716 |
717 |
718 |
719 |
720 |
721 |
722 |
723 |
724 |
725 |
726 |
727 |
728 |
729 |
730 |
731 |
732 |
733 |
734 |
735 |
736 |
737 |
738 |
739 |
740 |
741 |
742 |
743 |
744 |
745 |
746 |
747 |
748 |
749 |
750 |
751 |
752 |
753 |
754 |
755 |
756 |
757 |
758 |
759 |
760 |
761 |
762 |
763 |
764 |
765 |
766 |
767 |
768 |
769 |
770 |
771 |
772 |
773 |
774 |
775 |
776 |
777 |
778 |
779 |
780 |
781 |
782 |
783 |
784 |
785 |
786 |
787 |
788 |
789 |
790 |
791 |
792 |
793 |
794 |
795 |
796 |
797 |
798 |
799 |
800 |
801 |
802 |
803 |
804 |
805 |
806 |
807 |
808 |
809 |
810 |
811 |
812 |
813 |
814 |
815 |
816 |
817 |
818 |
819 |
820 |
821 |
822 |
823 |
824 |
825 |
826 |
827 |
828 |
829 |
830 |
831 |
832 |
833 |
834 |
835 |
836 |
837 |
838 |
839 |
840 |
841 |
842 |
843 |
844 |
845 |
846 |
847 |
848 |
849 |
850 |
851 | 3D-structure
852 | Alternative splicing
853 | Apoptosis
854 | Cell adhesion
855 | Cell membrane
856 | Disulfide bond
857 | Glycoprotein
858 | GPI-anchor
859 | Immunoglobulin domain
860 | Lipoprotein
861 | Membrane
862 | Oncogene
863 | Polymorphism
864 | Reference proteome
865 | Repeat
866 | Signal
867 |
868 |
869 |
870 |
871 |
872 |
873 |
874 |
875 |
876 |
877 |
878 |
879 |
880 |
881 |
882 |
883 |
884 |
885 |
886 |
887 |
888 |
889 |
890 |
891 |
892 |
893 |
894 |
895 |
896 |
897 |
898 |
899 |
900 |
901 |
902 |
903 |
904 |
905 |
906 |
907 |
908 |
909 |
910 |
911 |
912 |
913 |
914 |
915 |
916 |
917 |
918 |
919 |
920 |
921 |
922 |
923 |
924 |
925 |
926 |
927 |
928 |
929 |
930 |
931 |
932 |
933 |
934 |
935 |
936 |
937 |
938 |
939 |
940 |
941 |
942 |
943 |
944 |
945 |
946 |
947 |
948 |
949 |
950 |
951 |
952 |
953 |
954 |
955 |
956 |
957 |
958 |
959 |
960 |
961 |
962 |
963 |
964 |
965 |
966 |
967 |
968 |
969 |
970 |
971 |
972 |
973 |
974 |
975 |
976 |
977 |
978 |
979 |
980 |
981 |
982 |
983 |
984 |
985 |
986 |
987 |
988 |
989 |
990 |
991 |
992 |
993 |
994 |
995 |
996 |
997 |
998 |
999 |
1000 |
1001 |
1002 |
1003 |
1004 |
1005 |
1006 |
1007 |
1008 |
1009 |
1010 |
1011 |
1012 |
1013 |
1014 |
1015 |
1016 |
1017 |
1018 |
1019 |
1020 |
1021 |
1022 |
1023 |
1024 |
1025 |
1026 |
1027 |
1028 |
1029 |
1030 |
1031 |
1032 |
1033 |
1034 |
1035 |
1036 |
1037 |
1038 |
1039 |
1040 |
1041 |
1042 |
1043 |
1044 |
1045 |
1046 |
1047 |
1048 |
1049 |
1050 |
1051 |
1052 |
1053 |
1054 |
1055 |
1056 |
1057 |
1058 |
1059 |
1060 |
1061 |
1062 |
1063 |
1064 |
1065 |
1066 |
1067 |
1068 |
1069 |
1070 |
1071 |
1072 |
1073 |
1074 |
1075 |
1076 |
1077 |
1078 |
1079 |
1080 |
1081 |
1082 |
1083 |
1084 |
1085 |
1086 |
1087 |
1088 |
1089 |
1090 |
1091 |
1092 |
1093 |
1094 |
1095 |
1096 |
1097 |
1098 |
1099 |
1100 |
1101 |
1102 |
1103 |
1104 |
1105 |
1106 |
1107 |
1108 |
1109 |
1110 |
1111 |
1112 |
1113 |
1114 | I
1115 | V
1116 |
1117 |
1118 |
1119 |
1120 |
1121 | V
1122 | A
1123 |
1124 |
1125 |
1126 |
1127 |
1128 | Q
1129 | P
1130 |
1131 |
1132 |
1133 |
1134 |
1135 | A
1136 | D
1137 |
1138 |
1139 |
1140 |
1141 |
1142 | K
1143 | E
1144 |
1145 |
1146 |
1147 |
1148 |
1149 | R
1150 | S
1151 |
1152 |
1153 |
1154 |
1155 |
1156 | G
1157 | R
1158 |
1159 |
1160 |
1161 |
1162 |
1163 | F
1164 | I
1165 |
1166 |
1167 |
1168 |
1169 |
1170 | F
1171 | R
1172 |
1173 |
1174 |
1175 |
1176 |
1177 | S
1178 | N
1179 |
1180 |
1181 |
1182 |
1183 |
1184 | Y
1185 | A
1186 |
1187 |
1188 |
1189 |
1190 |
1191 | Y
1192 | F
1193 |
1194 |
1195 |
1196 |
1197 |
1198 | K
1199 | A
1200 |
1201 |
1202 |
1203 |
1204 |
1205 | V
1206 | A
1207 |
1208 |
1209 |
1210 |
1211 |
1212 | D
1213 | A
1214 |
1215 |
1216 |
1217 |
1218 |
1219 | D
1220 | L
1221 | R
1222 |
1223 |
1224 |
1225 |
1226 |
1227 | Q
1228 | L
1229 | R
1230 |
1231 |
1232 |
1233 |
1234 |
1235 | I
1236 | A
1237 |
1238 |
1239 |
1240 |
1241 |
1242 | L
1243 | A
1244 | C
1245 |
1246 |
1247 |
1248 |
1249 |
1250 | L
1251 | S
1252 |
1253 |
1254 |
1255 |
1256 |
1257 | E
1258 | A
1259 |
1260 |
1261 |
1262 |
1263 |
1264 | F
1265 | L
1266 |
1267 |
1268 |
1269 |
1270 |
1271 | T
1272 | Q
1273 |
1274 |
1275 |
1276 |
1277 |
1278 | V
1279 | A
1280 |
1281 |
1282 |
1283 |
1284 |
1285 |
1286 |
1287 |
1288 |
1289 |
1290 |
1291 |
1292 |
1293 |
1294 |
1295 |
1296 |
1297 |
1298 |
1299 |
1300 |
1301 |
1302 |
1303 |
1304 |
1305 |
1306 |
1307 |
1308 |
1309 |
1310 |
1311 |
1312 |
1313 |
1314 |
1315 |
1316 |
1317 |
1318 |
1319 |
1320 |
1321 |
1322 |
1323 |
1324 |
1325 |
1326 |
1327 |
1328 |
1329 |
1330 |
1331 |
1332 |
1333 |
1334 |
1335 |
1336 |
1337 |
1338 |
1339 |
1340 |
1341 |
1342 |
1343 |
1344 |
1345 |
1346 |
1347 |
1348 |
1349 |
1350 |
1351 |
1352 |
1353 |
1354 |
1355 |
1356 |
1357 |
1358 |
1359 |
1360 |
1361 |
1362 |
1363 |
1364 |
1365 |
1366 |
1367 |
1368 |
1369 |
1370 |
1371 |
1372 |
1373 |
1374 |
1375 |
1376 |
1377 |
1378 |
1379 |
1380 |
1381 |
1382 |
1383 |
1384 |
1385 |
1386 |
1387 |
1388 |
1389 |
1390 |
1391 |
1392 |
1393 |
1394 |
1395 |
1396 |
1397 |
1398 |
1399 |
1400 |
1401 |
1402 |
1403 |
1404 |
1405 |
1406 |
1407 |
1408 |
1409 |
1410 |
1411 |
1412 |
1413 |
1414 |
1415 |
1416 |
1417 |
1418 |
1419 |
1420 |
1421 |
1422 |
1423 |
1424 |
1425 |
1426 |
1427 |
1428 |
1429 |
1430 |
1431 |
1432 |
1433 |
1434 |
1435 |
1436 |
1437 |
1438 |
1439 |
1440 |
1441 |
1442 |
1443 |
1444 |
1445 |
1446 |
1447 |
1448 |
1449 |
1450 |
1451 |
1452 |
1453 | MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNKLSVDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI
1454 |
1455 |
1456 | Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
1457 | Distributed under the Creative Commons Attribution (CC BY 4.0) License
1458 |
1459 |
--------------------------------------------------------------------------------
/examples/ig/view:
--------------------------------------------------------------------------------
1 | -1.910422069964390068e-01 -5.326551199259592639e-01 8.244863191311111450e-01
2 | 9.099226597463622168e-01 2.189434646111362570e-01 3.522841670538991998e-01
3 | -3.681638186629090370e-01 8.175240438430914081e-01 4.428475206470018910e-01
4 |
--------------------------------------------------------------------------------
/ig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jordisr/cellscape/f1ee7b480440825cea2ddffc4db029bf0d240ea2/ig.png
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = ["setuptools>=42"]
3 | build-backend = "setuptools.build_meta"
4 |
--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [metadata]
2 | name = cellscape
3 | version = 0.0.0
4 | author = Jordi Silvestre-Ryan
5 | description = Protein structure visualization with vector graphics cartoons
6 | long_description = file: README.md
7 | long_description_content_type = text/markdown
8 | url = https://github.com/jordisr/cellscape
9 | project_urls =
10 | Bug Tracker = https://github.com/jordisr/cellscape/issues
11 | classifiers =
12 | Programming Language :: Python :: 3
13 | Operating System :: OS Independent
14 |
15 | [options]
16 | packages = find:
17 | python_requires = >=3.6
18 | install_requires =
19 | numpy
20 | scipy
21 | matplotlib
22 | shapely<1.8
23 | biopython>=1.75
24 | nglview
25 |
26 | [options.entry_points]
27 | console_scripts =
28 | cellscape = cellscape.cli:main
29 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 |
3 | if __name__ == '__main__':
4 | setup()
5 |
--------------------------------------------------------------------------------