├── .gitignore
├── CITATION.bib
├── LICENSE
├── README.md
├── cellscape
    ├── __init__.py
    ├── cartoon.py
    ├── cli.py
    ├── interface.py
    ├── parse_alignment.py
    ├── parse_uniprot_xml.py
    ├── scene.py
    ├── structure.py
    └── util.py
├── examples
    ├── cartoon.ipynb
    ├── ceacam5
    │   ├── P06731.xml
    │   └── ceacam5.pdb
    └── ig
    │   ├── 1igt.pdb
    │   └── view
├── ig.png
├── pyproject.toml
├── setup.cfg
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | *.egg-info
3 | *.DS_Store
4 | */.ipynb_checkpoints
5 | 


--------------------------------------------------------------------------------
/CITATION.bib:
--------------------------------------------------------------------------------
 1 | @article {Silvestre-Ryan2022.06.14.495869,
 2 | 	author = {Silvestre-Ryan, Jordi and Fletcher, Daniel A. and Holmes, Ian},
 3 | 	title = {CellScape: Protein structure visualization with vector graphics cartoons},
 4 | 	elocation-id = {2022.06.14.495869},
 5 | 	year = {2022},
 6 | 	doi = {10.1101/2022.06.14.495869},
 7 | 	publisher = {Cold Spring Harbor Laboratory},
 8 | 	abstract = {Motivation: Illustrative renderings of proteins are useful aids for scientific communication and education. Nevertheless, few software packages exist to automate the generation of these visualizations. Results: We introduce CellScape, a tool designed to generate 2D molecular cartoons from atomic coordinates and combine them into larger cellular scenes. These illustrations can outline protein regions in different levels of detail. Unlike most molecular visualization tools which use raster image formats, these illustrations are represented as vector graphics, making them easily editable and composable with other graphics. Availability and Implementation: CellScape is implemented in Python 3 and freely available at https://github.com/jordisr/cellscape. It can be run as a command-line tool or interactively in a Jupyter notebook.Competing Interest StatementThe authors have declared no competing interest.},
 9 | 	URL = {https://www.biorxiv.org/content/early/2022/06/16/2022.06.14.495869},
10 | 	eprint = {https://www.biorxiv.org/content/early/2022/06/16/2022.06.14.495869.full.pdf},
11 | 	journal = {bioRxiv}
12 | }
13 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Jordi Silvestre-Ryan
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # CellScape: Protein structure visualization with vector graphics cartoons
 2 | <img src="ig.png" alt="logo" width=700/>
 3 | 
 4 | ## Installation
 5 | To run CellScape you will need:
 6 | * Python 3
 7 | * [PyMOL](https://pymol.org/2/) or [Chimera](https://www.cgl.ucsf.edu/chimera/) (optional, needed to orient the protein if not using the Jupyter notebook interface)
 8 | 
 9 | CellScape and its dependencies can be installed with:
10 | 
11 | ```
12 | git clone https://github.com/jordisr/cellscape
13 | cd cellscape
14 | pip install -e .
15 | ```
16 | 
17 | ## Making a cartoon from a PDB structure
18 | 
19 | ### Jupyter notebook interface
20 | The most interactive way of building cartoons is through the Python package interface. An example notebook is provided [here](examples/cartoon.ipynb).
21 | 
22 | ### Command-line interface
23 | 
24 | Cartoons can also be built in one-go from the command-line, as illustrated below.
25 | 
26 | #### Generating molecular outlines
27 | The following examples should yield images similar to the top figure (from right to left):
28 | 
29 | The simplest visualization is a space-filling outline of the entire structure.
30 | The `--view` option specifies the camera rotation matrix (see [below](#exporting-the-camera-view)).
31 | 
32 | ```
33 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline all --save outline_all.svg
34 | ```
35 | 
36 | The `--outline` option specifies which regions of the protein to outline (each residue, each chain, the entire molecule etc).
37 | In the following example we outline each chain separately.
38 | The `--depth flat` option ensures that if the chains overlap, only the portion that is visible (i.e. closer to the camera) is incorporated into the outline.
39 | 
40 | ```
41 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline chain --depth flat --save outline_chain.svg
42 | ```
43 | 
44 | The most realistic visualization outlines each residue separately.
45 | Shading by residue depth is used to simulate 3D lighting in a style inspired by [David Goodsell](https://pdb101.rcsb.org/motm/21).
46 | 
47 | ```
48 | cellscape cartoon --pdb examples/ig/1igt.pdb --view examples/ig/view --outline residue --color_by chain --depth_shading --depth_lines --save outline_residue.svg
49 | ```
50 | 
51 | Full description of all options is available by running `cellscape cartoon -h`.
52 | 
53 | ### Exporting the camera view
54 | The camera orientation can be set interactively through the Jupyter notebook interface, however to use the command-line interface you will need a separate file with the rotation matrix.
55 | One option is to export it from another molecular visualization tool (currently PyMOL and Chimera formats are supported).
56 | 
57 | #### PyMOL
58 | Open the protein structure in PyMOL, and choose the desired rotation (zoom is irrelevant). Next, enter `get_view` in the PyMOL console. The output should look something like this:
59 | ```
60 | ### cut below here and paste into script ###
61 | set_view (\
62 |     -0.273240060,   -0.516133010,    0.811750829,\
63 |      0.870557129,    0.226309016,    0.436930388,\
64 |     -0.409222305,    0.826064587,    0.387488008,\
65 |      0.000000000,    0.000000000, -544.673034668,\
66 |     -0.071666718,  -17.390396118,    8.293336868,\
67 |    455.182373047,  634.163574219,  -20.000000000 )
68 | ### cut above here and paste into script ###
69 | ```
70 | Copy and paste the indicated region (between the ### lines) into a new text file, which can be passed to CellScape.
71 | 
72 | #### Chimera
73 | Open the protein structure in Chimera, and choose the desired rotation (zoom is irrelevant).
74 | Enter the command `matrixget` (if no output filename is given it will prompt you for one).
75 | This will write the rotation matrix to a file that can be understood by CellScape.
76 | It should look something like this:
77 | ```
78 | Model 0.0
79 |         -0.607365 0.792409 0.0565265 9.04218
80 |         -0.309318 -0.301425 0.901923 -30.7393
81 |         0.731731 0.530312 0.428181 15.789
82 | ```
83 | 
84 | ## Composing cartoons into a cellular scene
85 | 
86 | Re-running the above `cellscape cartoon` examples with the `--export` flag will write each cartoon's data to a Python pickle file, which can then be read by `cellscape scene`.
87 | 
88 | The simplest usage of `cellscape scene` takes a list of pickled cartoons as input and lays them out in a row, preserving the relative sizes of each protein.
89 | The `--padding` option specifies how far apart each protein should be (in angstroms).
90 | 
91 | ```
92 | cellscape scene --files outline_residue.pickle outline_chain.pickle outline_all.pickle --padding 10 --save scene.png
93 | ```
94 | 
95 | Full description of all options is available by running `cellscape scene -h`.
96 | 


--------------------------------------------------------------------------------
/cellscape/__init__.py:
--------------------------------------------------------------------------------
1 | from .cartoon import Cartoon
2 | from .structure import Structure
3 | from .interface import plot_pairs
4 | 


--------------------------------------------------------------------------------
/cellscape/cartoon.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import matplotlib
  3 | import matplotlib.pyplot as plt
  4 | from matplotlib.path import Path
  5 | from matplotlib.patches import PathPatch
  6 | import matplotlib.colors as mcolors
  7 | from matplotlib.colors import LinearSegmentedColormap
  8 | import shapely.geometry as sg
  9 | import shapely.ops as so
 10 | import pickle
 11 | import os
 12 | import sys
 13 | import colorsys
 14 | from Bio.PDB import *
 15 | 
 16 | import cellscape
 17 | 
 18 | def scale_line_width(x, lw_min, lw_max):
 19 |     return lw_max*(1-x) + lw_min*x
 20 | 
 21 | def shade_from_color(color, x, range):
 22 |     (r, g, b, a) = mcolors.to_rgba(color)
 23 |     h, l, s = colorsys.rgb_to_hls(r,g,b)
 24 |     l_dark = max(l-range/2, 0)
 25 |     l_light = min(l+range/2, 1)
 26 |     l_new = l_dark*(1-x) + l_light*x
 27 |     return colorsys.hls_to_rgb(h, l_new, s)
 28 | 
 29 | def get_sequential_colors(colors='Set1', n=1):
 30 |     """
 31 |     Sample n colors sequentially from a named matplotlib ColorMap.
 32 |     """
 33 |     # uses matplotlib.colors.ColorMap.N to distinguish continuous/discrete
 34 |     cmap = matplotlib.cm.get_cmap(colors)
 35 |     if cmap.N == 256:
 36 |         # continuous color map
 37 |         sequential_colors = [cmap(x) for x in np.linspace(0.0,1.0, n)]
 38 |     else:
 39 |         # discrete color map
 40 |         sequential_colors = [cmap(x) for x in range(n)]
 41 |     return sequential_colors
 42 | 
 43 | def smooth_polygon(p, level=0):
 44 |     # somewhat arbitrary but a lot easier than interpolation
 45 |     if level == 0:
 46 |         return p.simplify(0.3).buffer(-2, join_style=1).buffer(3, join_style=1)
 47 |     elif level == 1:
 48 |         return p.simplify(1).buffer(3, join_style=1).buffer(-5, join_style=1).buffer(4, join_style=1)
 49 |     elif level == 2:
 50 |         return p.simplify(3).buffer(5, join_style=1).buffer(-9, join_style=1).buffer(5, join_style=1)
 51 |     elif level == 3:
 52 |         return p.simplify(0.1).buffer(2, join_style=1)
 53 |     else:
 54 |         return p
 55 | 
 56 | def ring_coding(ob):
 57 |     # https://sgillies.net/2010/04/06/painting-punctured-polygons-with-matplotlib.html
 58 |     # The codes will be all "LINETO" commands, except for "MOVETO"s at the
 59 |     # beginning of each subpath
 60 |     #n = len(ob.coords)
 61 |     n = len(np.asarray(ob))
 62 |     codes = np.ones(n, dtype=Path.code_type) * Path.LINETO
 63 |     codes[0] = Path.MOVETO
 64 |     return codes
 65 | 
 66 | def placeholder_polygon(height, buffer_width=25, origin=[0,0]):
 67 |     return sg.LineString([(buffer_width+origin[0],0+origin[1]),(buffer_width+origin[0],height+origin[1])]).buffer(buffer_width)
 68 | 
 69 | def composite_polygon(cartoon, height_before, height_after, buffer_width=25):
 70 |     # placeholder + structure cartoon + placeholder
 71 |     if height_before > 0:
 72 |         before_poly =  placeholder_polygon(height_before, origin=cartoon.bottom_coord[:2]-[buffer_width,height_before], buffer_width=buffer_width)
 73 |         cartoon._styled_polygons.append({"polygon":before_poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1})
 74 | 
 75 |     if height_after > 0:
 76 |         after_poly =  placeholder_polygon(height_after, origin=cartoon.top_coord[:2]-[buffer_width, 0], buffer_width=buffer_width)
 77 |         cartoon._styled_polygons.append({"polygon":after_poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1})
 78 | 
 79 |     cartoon.image_height = cartoon.image_height + buffer_width + height_before + height_after
 80 |     cartoon.bottom_coord = cartoon.bottom_coord - np.array([0,height_before,0])
 81 |     cartoon.top_coord = cartoon.top_coord + np.array([0,height_after,0])
 82 | 
 83 | def export_placeholder(height, name, fname, buffer_width=25):
 84 |     # placeholder by itself
 85 |     poly =  placeholder_polygon(height, origin=[buffer_width, 0], buffer_width=buffer_width)
 86 |     styled_polygons = [{"polygon":poly, "facecolor":"#eeeeee", "shade":0.5, "edgecolor":'black', "linewidth":1, "zorder":-1}]
 87 | 
 88 |     data = {'polygons':styled_polygons, 'name':name, 'width':buffer_width*2, 'height':height+buffer_width, 'start':np.array([buffer_width,0]), 'end':np.array([height+2*buffer_width,0]), 'bottom':np.array([buffer_width,0]), 'top':np.array([height+2*buffer_width,0])}
 89 | 
 90 |     with open('{}.pickle'.format(fname),'wb') as f:
 91 |         pickle.dump(data, f)
 92 | 
 93 | def transform_coord(xy, translate_post=np.array([0,0]), translate_pre=np.array([0,0]), scale=1.0, flip=False):
 94 |     # 2d coordinates
 95 |     xy_ = xy
 96 |     if translate_pre is not None:
 97 |         # optionally shift coordinates before rotation
 98 |         xy_ += translate_pre
 99 |     if flip:
100 |         xy_ = np.dot(xy_, np.array([[-1,0],[0,-1]]).T)
101 |         #xy_ = np.dot(xy_, np.array([[-1,0],[0,-1]]))
102 |         #offset_x = np.min(xy_[:,0])
103 |         #offset_y = np.min(xy_[:,1])
104 |         #xy_ -= np.array([offset_x, offset_y])
105 |     return (xy_+translate_post)*scale
106 | 
107 | def polygon_to_path(polygon, min_interior_length=40, translate_pre=np.array([0,0]), translate_post=np.array([0,0]), scale=1.0, flip=False):
108 |     # generate matplotlib Path object from Shapely polygon
109 |     # filter out small interior holes and apply a scaling factor if desired
110 |     #
111 |     # https://sgillies.net/2010/04/06/painting-punctured-polygons-with-matplotlib.html
112 |     # Convert coordinates to path vertices. Objects produced by Shapely's
113 |     # analytic methods have the proper coordinate order, no need to sort.
114 |     interiors = list(filter(lambda x: x.length > min_interior_length, polygon.interiors))
115 |     vertices = np.concatenate(
116 |                     [np.asarray(polygon.exterior)]
117 |                     + [np.asarray(r) for r in interiors])
118 |     codes = np.concatenate(
119 |                 [ring_coding(polygon.exterior)]
120 |                 + [ring_coding(r) for r in interiors])
121 |     transformed_vertices = transform_coord(vertices, translate_pre=translate_pre, translate_post=translate_post, scale=scale, flip=flip)
122 |     return Path(transformed_vertices, codes)
123 | 
124 | def plot_polygon(poly, facecolor='orange', edgecolor='k', linewidth=0.7, axes=None, zorder_mod=0, translate_pre=np.array([0,0]), translate_post=np.array([0,0]), scale=1.0, flip=False, min_area=7, linestyle='solid'):
125 |     """Draw a Shapely polygon using matplotlib Patches."""
126 |     if axes is None:
127 |         axs = plt.gca()
128 |         axs.set_aspect('equal')
129 |     else:
130 |         axs = axes
131 |     if isinstance(poly, sg.polygon.Polygon):
132 |         if poly.area > min_area:
133 |             path = polygon_to_path(poly, translate_pre=translate_pre, translate_post=translate_post, scale=scale, flip=flip)
134 |             patch = PathPatch(path, facecolor=facecolor, edgecolor='black', linewidth=linewidth, zorder=3+zorder_mod, linestyle=linestyle)
135 |             axs.add_patch(patch)
136 |     elif isinstance(poly, sg.multipolygon.MultiPolygon):
137 |         for p in poly:
138 |             plot_polygon(p, axes=axs, facecolor=facecolor, edgecolor=edgecolor, linewidth=linewidth, scale=scale, zorder_mod=zorder_mod, translate_pre=translate_pre, translate_post=translate_post, flip=flip)
139 | 
140 | class Cartoon:
141 |     """A class for molecular outlines generated by Structure class"""
142 |     def __init__(self, name, polygons, residues, outline_by, back_outline, group_outlines, num_groups, dimensions, groups):
143 |         # TODO currently just copying over all variables needed, should condense a little
144 |         self.name = name
145 |         self._polygons = polygons
146 |         self.residues_flat = residues
147 |         self.outline_by = outline_by
148 |         self.num_groups = num_groups
149 |         self.groups = groups
150 |         self._back_outline = back_outline
151 |         self._group_outlines = group_outlines
152 |         self.dimensions = dimensions
153 | 
154 |     def plot(self, colors=None, axes_labels=False, color_residues_by=None, edge_color="black", line_width=0.7,
155 |         depth_shading=False, depth_lines=False, shading_range=0.4, smoothing=False, do_show=True, axes=None, save=None, dpi=300, placeholder=None):
156 |         """Plot styled protein cartoon
157 | 
158 |         Color schemes for plotting can be specified in multiple ways
159 |             - named matplotlib-compatible color e.g. "red" (string)
160 |             - hexadecimal color e.g. "#F8F8FF" (string)
161 |             - list/tuple of colors e.g. ["red", "#F8F8FF"] (list/tuple)
162 |             - dict of names to colors e.g. {"domain A": "red", "domain B":"blue"} (dict)
163 |             - named discrete or continuous color scheme e.g. "Set1" (string)
164 | 
165 |         By default, plot() creates a new matplotlib Axes instance, though one can be passed explicitly.
166 |         This mirrors Biopython's phylogeny drawing https://biopython.org/DIST/docs/api/Bio.Phylo._utils-module.html.
167 | 
168 |         Args:
169 |             colors (optional): Explicitly pass color scheme (see description). Defaults to None.
170 |             axes_labels (bool, optional): Include axes labels on plot. Defaults to False.
171 |             color_residues_by (str, optional): If outlining all residues, color based on attribute (e.g. "chain"). Defaults to None.
172 |             edge_color (str, optional): Color for outline edges. Defaults to "black".
173 |             line_width (float, optional): Width of outlines. Defaults to 0.7.
174 |             depth_shading (bool, optional): Use lighter shades for outlines closer to the front. Defaults to False.
175 |             depth_lines (bool, optional): Use lighter lines for outlines closer to the front. Defaults to False.
176 |             shading_range (float, optional): Dynamic range for depth_shading effect. Defaults to 0.4.
177 |             smoothing (bool, optional): Apply smoothing to polygons. Defaults to False.
178 |             do_show (bool, optional): Whether to show figure (otherwise just returns Axes object). Defaults to True.
179 |             axes (Axes, optional): Explicitly pass matplotlib Axes object. Defaults to None.
180 |             save (str, optional): Path to save cartoon image. Defaults to None.
181 |             dpi (int, optional): DPI of rasterized images. Defaults to 300.
182 |             placeholder (float, optional): Specify expected protein height (in angstroms). Will add a placeholder shape to add up to total height. Defaults to None.
183 | 
184 |         Returns:
185 |             Axes: Returns matplotlib Axes if do_show=False, otherwise return None
186 |         """
187 |         self._styled_polygons = []
188 | 
189 |         if axes is None:
190 |             # create a new matplotlib figure if none provided
191 |             fig, axs = plt.subplots()
192 |         else:
193 |             assert(isinstance(axes, matplotlib.axes.Axes))
194 |             axs = axes
195 |     
196 |         if axes_labels:
197 |             axs.axis('on')
198 |             axs.set_axis_on()
199 |             axs.xaxis.grid(False)
200 |             axs.yaxis.grid(True)
201 |             axs.axes.xaxis.set_ticklabels([])
202 |         else:
203 |             axs.axis('off')
204 |             axs.set_axis_off()
205 |             #plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)
206 |             #axs.xaxis.set_major_locator(plt.NullLocator())
207 |             #axs.yaxis.set_major_locator(plt.NullLocator())
208 | 
209 |         # color schemes
210 |         default_color = 'tab:blue'
211 |         default_cmap = 'Set1'
212 |         named_colors = [*mcolors.BASE_COLORS.keys(), *mcolors.TABLEAU_COLORS.keys(), *mcolors.CSS4_COLORS.keys(), *mcolors.XKCD_COLORS.keys()]
213 | 
214 |         # if outlining residues don't know number of color groups until plot is called
215 |         if self.outline_by == "residue":
216 |             if color_residues_by is None:
217 |                 num_colors_needed = 1
218 |                 residue_color_groups = {"all":self.residues_flat}
219 |             else:
220 |                 residue_color_groups = cellscape.util.group_by(self.residues_flat, lambda x: x.get(color_residues_by))
221 |                 num_colors_needed = len(residue_color_groups)
222 |             self.num_groups = num_colors_needed
223 | 
224 |         # parse options and get list of base colors needed for plotting
225 |         if colors is None:
226 |             # choose default sequential color scheme based on number of colors needed
227 |             if self.num_groups == 1:
228 |                 sequential_colors = [default_color]
229 |             elif self.num_groups <= 9:
230 |                 sequential_colors = get_sequential_colors(colors="Set1", n=self.num_groups)
231 |             elif self.num_groups <= 10:
232 |                 sequential_colors = get_sequential_colors(colors="tab10", n=self.num_groups)
233 |             else:
234 |                 sequential_colors = get_sequential_colors(colors="tab20", n=self.num_groups)
235 |         else:
236 |             if isinstance(colors, dict):
237 |                 sequential_colors = []
238 |             else:
239 |                 if isinstance(colors, str):
240 |                     if self.num_groups == 1:
241 |                         sequential_colors = [colors]
242 |                     else:
243 |                         sequential_colors = get_sequential_colors(colors=colors, n=self.num_groups)
244 |                 elif isinstance(colors, (list, tuple)):
245 |                     if self.num_groups == 1:
246 |                         if (len(colors) == 4) or (len(colors) == 3):
247 |                             # assume single RGBA or RGB color
248 |                             sequential_colors = [colors]
249 |                         else:
250 |                             sequential_colors = [colors[0]]
251 |                     elif self.num_groups == len(colors):
252 |                         sequential_colors = colors
253 |                     else:
254 |                         sys.exit("Insufficient colors provided")
255 |         assert(len(sequential_colors) == self.num_groups)
256 | 
257 |         # color scheme represented as dict that maps group names to colors
258 |         if self.outline_by == "residue":
259 |             if len(sequential_colors) > 0:
260 |                 color_map = {k:sequential_colors[i] for i,k in enumerate(residue_color_groups.keys())}
261 |             else:
262 |                 color_map = colors
263 |         elif self.outline_by == "all":
264 |             color_map = {None:sequential_colors[0]}
265 |         else:
266 |             if len(sequential_colors) > 0:
267 |                 color_map = {k:sequential_colors[i] for i,k in enumerate(self.groups)}
268 |             else:
269 |                 color_map = colors
270 |         assert(isinstance(color_map, dict))
271 | 
272 |         if self._back_outline is not None:
273 |             if smoothing:
274 |                 smoothed_poly = smooth_polygon(self._back_outline, level=3)
275 |                 plot_polygon(smoothed_poly, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=-1)
276 |                 self._styled_polygons.append({"polygon":smoothed_poly, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width})
277 |             else:
278 |                 plot_polygon(self._back_outline, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=-1)
279 |                 self._styled_polygons.append({"polygon":self._back_outline, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width})
280 | 
281 |         if len(self._group_outlines) > 0:
282 |             for p in self._group_outlines:
283 |                 plot_polygon(p, facecolor="None", scale=1.0, axes=axs, edgecolor=edge_color, linewidth=line_width, zorder_mod=2)
284 |                 self._styled_polygons.append({"polygon":p, "facecolor":"None", "edgecolor":edge_color, "linewidth":line_width, "zorder":2})
285 | 
286 |         # TODO optionally show placeholder for unstructured regions
287 |         if placeholder is not None:
288 |             placeholder_poly = placeholder_polygon(placeholder-self.image_height, origin=[self.image_width/2-25, self.image_height+25])
289 |             self._styled_polygons.append({"polygon":placeholder_poly, "facecolor":"None", "shade":0.5, "edgecolor":'black', "linewidth":1})
290 |             plot_polygon(placeholder_poly, facecolor="#eeeeee", scale=1.0, axes=axs, edgecolor='black', linewidth=1, zorder_mod=-1)
291 |             self.image_height = 25 + placeholder
292 | 
293 |         # main plotting loop
294 |         for i, p in enumerate(self._polygons):
295 |             if smoothing:
296 |                 poly_to_draw = smooth_polygon(p[1], level=3)
297 |             else:
298 |                 poly_to_draw = p[1]
299 | 
300 |             # look up color for polygon
301 |             if self.outline_by == "residue":
302 |                 key_for_color = p[0].get(color_residues_by)
303 |             else:
304 |                 key_for_color = p[0].get(self.outline_by)
305 |             fc = color_map.get(key_for_color, sequential_colors[0])
306 |             base_fc = fc # store original color as well as shading
307 | 
308 |             shade_value = None
309 |             if depth_shading:
310 |                 #fc = shade_from_color(fc, i/len(self._polygons), range=shading_range)
311 |                 shade_value = p[0].get("depth", 0.5)
312 |                 fc = shade_from_color(fc, shade_value, range=shading_range)
313 |             if depth_lines:
314 |                 shade_value = p[0].get("depth", 0.5)
315 |                 lw = scale_line_width(shade_value, 0, 0.5)
316 |             else:
317 |                 lw = line_width
318 |             plot_polygon(poly_to_draw, facecolor=fc, axes=axs, edgecolor=edge_color, linewidth=lw)
319 |             self._styled_polygons.append({"polygon":poly_to_draw, "facecolor":fc, "edgecolor":edge_color, "linewidth":lw, "shade":shade_value, "base_fc":base_fc})
320 | 
321 |         axs.set_aspect('equal')
322 |         axs.margins(0,0)
323 |         self._axes= axs
324 | 
325 |         if save is not None:
326 |             file_ext = os.path.splitext(save)[1].lower()
327 |             assert file_ext in ['.png','.pdf','.svg','.ps'], "Image file extension not supported"
328 |             #plt.gcf().savefig(save, dpi=dpi, transparent=True, pad_inches=0, bbox_inches='tight')
329 |             fig.savefig(save, dpi=dpi, transparent=True, pad_inches=0, bbox_inches='tight')
330 | 
331 |         if do_show:
332 |             plt.show()
333 |         else:
334 |             return axs
335 | 
336 |     def export(self, fname):
337 |         """Export a pickle object containing styled polygons than can be combined using ``cellscape scene``"""
338 |         assert(len(self._styled_polygons) > 0)
339 | 
340 |         data = {'polygons':self._styled_polygons, 'name':self.name}
341 |         for k in ['width', 'height', 'start', 'end', 'top', 'bottom']:
342 |             data[k] = self.dimensions[k]
343 | 
344 |         with open('{}.pickle'.format(fname),'wb') as f:
345 |             pickle.dump(data, f)
346 | 
347 |         print("Exported polygon data to {}.pickle".format(fname), file=sys.stderr)
348 | 
349 | def make_cartoon(args):
350 |     """Build a cartoon in one-go. Called when running ``cellscape cartoon``."""
351 | 
352 |     # accept list of chains for backwards-compatibility
353 |     # convert to string e.g. ABCD for current interface
354 |     # can be an issue if chains have more than one letter
355 |     if len(args.chain) == 1:
356 |         chain = args.chain[0]
357 |     else:
358 |         chain = ''.join(args.chain)
359 | 
360 |     molecule = cellscape.Structure(args.pdb, chain=chain, model=args.model, uniprot=args.uniprot, view=False)
361 | 
362 |     # open first line to identify view file
363 |     if args.view is not None:
364 |         with open(args.view) as view_f:
365 |             first_line = view_f.readline()
366 |         if first_line[:8] == 'set_view':
367 |             molecule.load_pymol_view(args.view)
368 |         elif first_line[:5] == 'Model':
369 |             molecule.load_chimera_view(args.view)
370 |         else:
371 |             molecule.load_view_matrix(args.view)
372 |     else:
373 |         # if no view matrix provided just use default PDB orientation for now
374 |         molecule.view_matrix = np.identity(3)
375 | 
376 |     cartoon = molecule.outline(
377 |                             args.outline_by, 
378 |                             depth=args.depth, 
379 |                             radius=args.radius,
380 |                             only_annotated=args.only_annotated,
381 |                             only_ca=args.only_ca,
382 |                             depth_contour_interval=args.depth_contour_interval,
383 |                             back_outline=args.back_outline
384 |                             )
385 | 
386 |     if args.outline_by == "residue" and args.color_by != "same":
387 |         color_residues_by = args.color_by
388 |     else:
389 |         color_residues_by = None
390 | 
391 |     if len(args.colors) > 0:
392 |         colors = args.colors
393 |     else:
394 |         colors = None
395 |     
396 |     cartoon.plot(
397 |                 do_show=False, 
398 |                 axes_labels=args.axes,
399 |                 colors=colors,
400 |                 color_residues_by=color_residues_by,
401 |                 dpi=args.dpi,
402 |                 save=args.save,
403 |                 depth_shading=args.depth_shading,
404 |                 depth_lines=args.depth_lines,
405 |                 edge_color=args.edge_color,
406 |                 line_width=args.line_width
407 |                 )
408 | 
409 |     if args.export:
410 |         cartoon.export(os.path.splitext(args.save)[0])
411 | 


--------------------------------------------------------------------------------
/cellscape/cli.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | from cellscape.cartoon import make_cartoon
 3 | from cellscape.scene import make_scene
 4 | 
 5 | def main():
 6 |     # set up argument parser
 7 |     parser = argparse.ArgumentParser(description='CellScape: Protein structure visualization with vector graphics cartoons')
 8 |     subparsers = parser.add_subparsers(dest="command")
 9 |     subparsers.required=True
10 | 
11 |     # cartoon
12 |     parser_cartoon = subparsers.add_parser('cartoon', help="Make a cartoon from a protein structure", formatter_class=argparse.ArgumentDefaultsHelpFormatter, description="Make a cartoon from a protein structure")
13 |     parser_cartoon.set_defaults(func=make_cartoon)
14 |     # input/output options
15 |     parser_cartoon_io = parser_cartoon.add_argument_group('input/output options')
16 |     parser_cartoon_io.add_argument('--pdb', help='Protein coordinates file (must be .pdb/.ent/.cif/.mcif)', required=True)
17 |     parser_cartoon_io.add_argument('--model', type=int, default=0, help='Model number in PDB to load')
18 |     parser_cartoon_io.add_argument('--chain', default=['all'], help='Chain(s) in structure to outline', nargs='+')
19 |     parser_cartoon_io.add_argument('--view', help='Camera rotation matrix (saved from cellscape, PyMOL get_view, or Chimera matrixget)')
20 |     parser_cartoon_io.add_argument('--uniprot', help='UniProt XML file to parse for sequence/domain/topology information')
21 |     parser_cartoon_io.add_argument('--save', default='out.svg', help='Image output file (valid formats are png/pdf/svg/ps)')
22 |     parser_cartoon_io.add_argument('--export', default=False, action="store_true", help='Export Python object with structural information')
23 |     # outline building options
24 |     parser_cartoon_outline = parser_cartoon.add_argument_group('outline-building options')
25 |     parser_cartoon_outline.add_argument('--only_annotated', action='store_true', default=False, help='Ignore regions without UniProt annotations')
26 |     parser_cartoon_outline.add_argument('--only_ca', action='store_true', default=False, help='Only use alpha carbons for outline')
27 |     parser_cartoon_outline.add_argument('--outline_by', '--outline',  default='all',  choices=['all', 'chain', 'domain', 'topology', 'residue'], help='Outline protein regions')
28 |     parser_cartoon_outline.add_argument('--depth',  default=None,  choices=['flat', 'contours', None], help='Represent depth with flat occluded outlines or contour slices')
29 |     parser_cartoon_outline.add_argument('--depth_contour_interval', type=float, default=3, help='Width of depth contour bins in angstroms (if --depth contours)')
30 |     parser_cartoon_outline.add_argument('--radius', default=1.5, help='Atomic radius, in angstroms', type=float)
31 |     parser_cartoon_outline.add_argument('--back_outline', action='store_true', help='Outline entire molecule separately from group outlines')
32 | 
33 |     # visual style options
34 |     parser_cartoon_style = parser_cartoon.add_argument_group('styling options')
35 |     parser_cartoon_style.add_argument('--axes', action='store_true', default=False, help='Draw x and y axes around molecule')
36 |     parser_cartoon_style.add_argument('--colors', default=[], nargs='+', help='Specify color scheme for protein (list of colors or matplotlib named color map)')
37 |     parser_cartoon_style.add_argument('--edge_color', default='black', help='Edge color')
38 |     parser_cartoon_style.add_argument('--line_width', default=0.7, type=float, help='Line width')
39 |     parser_cartoon_style.add_argument('--color_by', default='same',  choices=['same', 'chain', 'domain', 'topology'], help='Color residues by attribute (if --outline_by residues is selected)')
40 |     parser_cartoon_style.add_argument('--depth_shading', action='store_true', default=False, help='Shade regions darker in the back to simulate depth')
41 |     parser_cartoon_style.add_argument('--depth_lines', action='store_true', default=False, help='Use thicker lines the back to simulate depth')
42 |     parser_cartoon_style.add_argument('--dpi', type=int, default=300, help='DPI to use if exporting to a raster format like PNG')
43 | 
44 |     # scene
45 |     parser_scene = subparsers.add_parser('scene', help="Compose multiple cartoons together", description="Compose multiple cartoons together", formatter_class=argparse.ArgumentDefaultsHelpFormatter)
46 |     parser_scene.set_defaults(func=make_scene)
47 |     # input/output options
48 |     parser_scene_io = parser_scene.add_argument_group('input/output options')
49 |     parser_scene_io.add_argument('--files', nargs='+', help='Pickled objects to load')
50 |     parser_scene_io.add_argument('--save', default='out.svg', help='Image output path (valid formats are png/pdf/svg/ps)')
51 |     # visual style options
52 |     parser_scene_style = parser_scene.add_argument_group('styling options')
53 |     parser_scene_style.add_argument('--offsets', nargs='+', default=[], help='Vertical offsets for each molecule specified manually')
54 |     parser_scene_style.add_argument('--padding', type=int, default=0, help='Horizontal padding to add between each molecule (in angstroms)')
55 |     parser_scene_style.add_argument('--axes', action='store_true', default=False, help='Draw x and y axes')
56 |     parser_scene_style.add_argument('--membrane', default=None, choices=[None, 'arc', 'flat', 'wave'], help='Draw membrane on X axis')
57 |     parser_scene_style.add_argument('--membrane_thickness', default=40, type=float, help='Thickness of the membrane (in angstroms)')
58 |     parser_scene_style.add_argument('--membrane_lipids', action='store_true', help='Draw lipid head groups')
59 |     parser_scene_style.add_argument('--no_membrane_offset', action='store_true', help=argparse.SUPPRESS) # don't adjust y-axis to position bottom of structure in membrane
60 |     parser_scene_style.add_argument('--order_by', default='input', choices=['input', 'random', 'height','top', 'membrane'], help='How to order proteins in scene')
61 |     parser_scene_style.add_argument('--recolor', action='store_true', default=False, help='Recolor proteins in scene')
62 |     parser_scene_style.add_argument('--recolor_cmap', default=['hsv'], nargs='+', help='Named cmap or color scheme for re-coloring')
63 |     parser_scene_style.add_argument('--dpi', type=int, default=300, help='DPI to use if exporting to a raster format like PNG')
64 |     parser_scene_style.add_argument('--use_placeholders', action='store_true', help=argparse.SUPPRESS)
65 |     parser_scene_style.add_argument('--labels', action='store_true', default=False, help=argparse.SUPPRESS) # still testing
66 |     parser_scene_style.add_argument('--label_size', type=float, default=0.5, help=argparse.SUPPRESS) # fraction of the screen to use for labels
67 |     parser_scene_style.add_argument('--label_orientation', choices=["vertical", "horizontal", "diagonal"], default="vertical", help=argparse.SUPPRESS)
68 |     parser_scene_style.add_argument('--label_position', choices=["above", "below"], default="below", help=argparse.SUPPRESS)
69 |     parser_scene_style.add_argument('--fig_height', type=float, default=11, help=argparse.SUPPRESS) # passed to figsize
70 |     parser_scene_style.add_argument('--fig_width', type=float, default=8.5, help=argparse.SUPPRESS) # passed to figsize
71 |     # for simulating according to stoichiometry
72 |     parser_scene_sim = parser_scene.add_argument_group('random scene options')
73 |     parser_scene_sim.add_argument('--csv', help='Table of protein information')
74 |     parser_scene_sim.add_argument('--seed', type=int, help='Random seed for scene generation')
75 |     parser_scene_sim.add_argument('--sample_from', help='Column to use for sampling (with --csv)', default='stoichiometry')
76 |     parser_scene_sim.add_argument('--num_mol', type=int, help='Number of molecules to sample for scene', default=0)
77 |     parser_scene_sim.add_argument('--background', action='store_true', default=False, help='Add background plane using same frequencies')
78 | 
79 |     # parse arguments and call corresponding command
80 |     args = parser.parse_args()
81 |     args.func(args)
82 | 


--------------------------------------------------------------------------------
/cellscape/interface.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Testing code for visualizing protein interactions across membrane interfaces
  3 | """
  4 | 
  5 | import numpy as np
  6 | import matplotlib
  7 | import matplotlib.pyplot as plt
  8 | import matplotlib.patches as mpatches
  9 | import matplotlib.lines as mlines
 10 | from matplotlib import lines, text, cm
 11 | from matplotlib.colors import LinearSegmentedColormap, ListedColormap
 12 | from scipy import interpolate
 13 | import shapely.geometry as sg
 14 | import shapely.ops as so
 15 | import os, sys, argparse, pickle
 16 | import glob
 17 | import csv
 18 | 
 19 | from cellscape.cartoon import plot_polygon, shade_from_color
 20 | 
 21 | class MembraneInterface:
 22 |     """
 23 |     just use piecemeal flat + connector, no lipids
 24 |     """
 25 |     def __init__(self, axes, lengths, bottom_y, top_y, thickness=40, padding=10, base_y=0):
 26 | 
 27 |         # axes
 28 |         self.axes = axes
 29 | 
 30 |         # membrane thickness (angstroms)
 31 |         self.thickness = thickness
 32 | 
 33 |         # padding between each segment, scalar
 34 |         self.padding = padding
 35 | 
 36 |         # length of each segment, array
 37 |         self.lengths = lengths
 38 | 
 39 |         # y coordinate of each bottom membrane segment, array
 40 |         self.bottom_y = bottom_y
 41 | 
 42 |         # y coordinate of each top membrane segment, array
 43 |         self.top_y = top_y
 44 | 
 45 |         assert(len(lengths) == len(top_y))
 46 |         assert(len(top_y) == len(bottom_y))
 47 | 
 48 |     def draw(self, color='#C4E7EF'):
 49 |         if isinstance(color, (list,tuple)):
 50 |             top_color = color[0]
 51 |             bot_color = color[1]
 52 |         else:
 53 |             top_color = color
 54 |             bot_color = color
 55 | 
 56 |         membrane_x = []
 57 |         membrane_bot_y = []
 58 |         membrane_top_y = []
 59 |         x_cum = 0
 60 |         for i, w in enumerate(self.lengths):
 61 |             membrane_x.append(x_cum)
 62 |             x_cum += w
 63 |             membrane_x.append(x_cum)
 64 |             x_cum += self.padding
 65 | 
 66 |             membrane_bot_y.append(self.bottom_y[i])
 67 |             membrane_bot_y.append(self.bottom_y[i])
 68 | 
 69 |             membrane_top_y.append(self.top_y[i])
 70 |             membrane_top_y.append(self.top_y[i])
 71 | 
 72 |         membrane_x = np.array(membrane_x)
 73 |         membrane_bot_y = np.array(membrane_bot_y)
 74 |         membrane_top_y = np.array(membrane_top_y)
 75 | 
 76 |         # plot bottom membrane
 77 |         self.axes.fill_between(membrane_x, membrane_bot_y, membrane_bot_y-self.thickness, color=bot_color, zorder=1.6, capstyle='round', joinstyle='miter')
 78 | 
 79 |         # plot top membrane
 80 |         self.axes.fill_between(membrane_x, membrane_top_y, membrane_top_y+self.thickness, color=top_color, zorder=1.6, capstyle='round', joinstyle='round')
 81 | 
 82 | def plot_pairs(pairs, labels=None, thickness=40, padding=50, align="bottom", membrane_color="#E8E8E8", colors=None, axes=True, linewidth=None, sort=False):
 83 | 
 84 |     assert align in ["bottom", "middle", "top"]
 85 |     assert sort in [False, "height", "horseshoe"]
 86 | 
 87 |     if labels is not None:
 88 |         assert len(labels) == len(pairs)
 89 | 
 90 |     # optionally sort proteins by height
 91 |     if sort == "height":
 92 |         pair_heights = np.array(list(map(lambda x: x[0]['height']+x[1]['height'], pairs)))
 93 |         sorted_order = np.argsort(pair_heights)[::-1]
 94 |         pairs_ = [pairs[i] for i in sorted_order]
 95 |         if labels is not None:
 96 |             labels_ = [labels[i] for i in sorted_order]
 97 | 
 98 |     elif sort == "horseshoe":
 99 |         pair_heights = np.array(list(map(lambda x: x[0]['height']+x[1]['height'], pairs)))
100 |         sorted_order = np.argsort(pair_heights)[::-1]
101 |         new_order = np.zeros_like(sorted_order)
102 |         first_half = sorted_order[::2]
103 |         if len(sorted_order) % 2:
104 |             second_half = sorted_order[-2::-2]
105 |         else:
106 |             second_half = sorted_order[::-2]
107 |         new_order[:len(first_half)] = first_half
108 |         new_order[len(first_half):] = second_half
109 | 
110 |         pairs_ = [pairs[i] for i in new_order]
111 |         if labels is not None:
112 |             labels_ = [labels[i] for i in new_order]
113 |         if colors is not None:
114 |             colors_ = [colors[i] for i in new_order]
115 | 
116 |     else:
117 |         pairs_ = pairs[:]
118 |         if labels is not None:
119 |             labels_ = labels[:]
120 | 
121 |     fig, axs = plt.subplots(figsize=(11,8.5))
122 |     axs.set_aspect('equal')
123 | 
124 |     if axes:
125 |         axs.xaxis.grid(False)
126 |         axs.yaxis.grid(False)
127 |         axs.axes.xaxis.set_ticklabels([])
128 |     else:
129 |         plt.axis('off')
130 |         plt.gca().set_axis_off()
131 |         plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)
132 |         plt.margins(0,0)
133 |         plt.gca().xaxis.set_major_locator(plt.NullLocator())
134 |         plt.gca().yaxis.set_major_locator(plt.NullLocator())
135 | 
136 |     # set font options
137 |     font_options = {'family':'Arial', 'weight':'normal', 'size':10}
138 |     matplotlib.rc('font', **font_options)
139 | 
140 |     assert(align in ["top","bottom","middle"])
141 | 
142 |     # get all the interface heights
143 |     all_heights = np.array([p[0]['height']+p[1]['height'] for p in pairs_])
144 |     max_height = np.max(all_heights)
145 | 
146 |     # calculate membrane geometry
147 |     bot_y = []
148 |     top_y = []
149 |     lengths = []
150 |     for p in pairs_:
151 |         o1, o2 = p
152 |         if align == "bottom":
153 |             top_y.append(o1['height']+o2['height'])
154 |             bot_y.append(0)
155 |         elif align == "top":
156 |             top_y.append(max_height)
157 |             bot_y.append(max_height-(o1['height']+o2['height']))
158 |         elif align == "middle":
159 |             top_y.append(max_height-(max_height-o1['height']-o2['height'])/2)
160 |             bot_y.append((max_height-o1['height']-o2['height'])/2)
161 |         lengths.append(max(o1['width'], o2['width']))
162 |     top_y = np.array(top_y)
163 |     bot_y = np.array(bot_y)
164 |     lengths = np.array(lengths)
165 | 
166 |     total_width = np.sum(lengths)+len(pairs_)*padding
167 | 
168 |     # draw membrane
169 |     mem = MembraneInterface(axes=axs, lengths=lengths, bottom_y=bot_y, top_y=top_y, padding=padding, thickness=thickness)
170 |     mem.draw(color=membrane_color)
171 | 
172 |     # draw proteins
173 |     w=0
174 |     for i, o in enumerate(pairs_):
175 |         o1, o2 = o
176 |         this_width = max(o1['width'], o2['width'])
177 |         this_height = o1['height']+o2['height']
178 |         y_offset = bot_y[i]
179 |         if colors is not None:
180 |             color_top = colors_[i][1]
181 |             color_bot = colors_[i][0]
182 |         else:
183 |             color_top = None
184 |             color_bot = None
185 | 
186 |         # TODO rotation needs to be a little cleaner, making some assumptions here
187 |         for p in o1["polygons"]:
188 |             xy = np.array(o1['polygons'][0]['polygon'].exterior.xy) # assuming first polygon is outline, TODO fix
189 |             recenter = np.array([np.min(xy[:,0]), np.min(xy[:,1])])
190 |             facecolor = shade_from_color(color_bot, p.get("shade", 0.5), range=p.get("shading_range", 0.4))
191 |             plot_polygon(p['polygon'], axes=axs, translate_pre=[-recenter[0]+w+(this_width-o1['width'])/2, -recenter[1]+y_offset], flip=False, facecolor=facecolor, linewidth=p['linewidth'])
192 | 
193 |         for p in o2["polygons"]:
194 |             xy = np.array(o2['polygons'][0]['polygon'].exterior.xy) # assuming first polygon is outline, TODO fix
195 |             recenter = np.array([np.min(xy[:,0]), np.min(xy[:,1])])
196 |             facecolor = shade_from_color(color_top, p.get("shade", 0.5), range=p.get("shading_range", 0.4))
197 |             plot_polygon(p['polygon'], axes=axs, translate_pre=-1*recenter, translate_post=[w+(this_width+o2['width'])/2, y_offset+10+o2['height']+o1['height']], flip=True, facecolor=facecolor, linewidth=p['linewidth'])
198 |         #plot_polygon(o2["polygons"][0]['polygon'], axes=axs, offset=[w+(this_width-o2['width'])/2, y_offset+o1['height']], flip=True, facecolor=o2["polygons"][0]['facecolor'], linewidth=linewidth)
199 | 
200 |         if labels_ is not None:
201 |             angstroms_per_inch = total_width/11
202 |             fontsize = total_width*0.5/len(pairs_)/angstroms_per_inch*72
203 |             font_inches = fontsize/72
204 |             plt.text(w+this_width/2,  y_offset+this_height+50, labels_[i][1], rotation=90, fontsize=fontsize, va='bottom', ha='center')
205 |             plt.text(w+this_width/2, y_offset-1.1*angstroms_per_inch*font_inches, labels_[i][0], rotation=90, fontsize=fontsize, va='top', ha='center')
206 | 
207 |         w += this_width+padding
208 | 
209 |     fig.set_size_inches(18.5, 10.5)
210 |     return fig
211 | 


--------------------------------------------------------------------------------
/cellscape/parse_alignment.py:
--------------------------------------------------------------------------------
 1 | from Bio import pairwise2
 2 | from Bio.PDB import *
 3 | from Bio.Align import substitution_matrices, PairwiseAligner
 4 | import numpy as np
 5 | 
 6 | def identity_from_alignment(a):
 7 |     s1 = np.array(list(a[0]))
 8 |     s2 = np.array(list(a[1]))
 9 |     return np.sum(s1 == s2) / len(np.where( s1 != '-')[0])
10 | 
11 | def overlap_from_alignment(a):
12 |     s1 = np.array(list(a[0]))
13 |     s2 = np.array(list(a[1]))
14 |     s1_nogap = np.where( s1 != '-')
15 |     s2_nogap = np.where( s2 != '-')
16 |     s1_start_align = np.min(s1_nogap)
17 |     s1_end_align = np.max(s1_nogap)
18 |     s2_start_align = np.min(s2_nogap)
19 |     s2_end_align = np.max(s2_nogap)
20 |     overlap_align = (max(s1_start_align, s2_start_align), min(s1_end_align, s2_end_align))
21 |     return(
22 |     np.where(s1_nogap == overlap_align[0])[1][0],
23 |     np.where(s1_nogap == overlap_align[1])[1][0],
24 |     np.where(s2_nogap == overlap_align[0])[1][0],
25 |     np.where(s2_nogap == overlap_align[1])[1][0]) + np.array([1,1,1,1])
26 | 
27 | def align_pair(s1, s2):
28 |     # wrapper for biopython pairwise alignment
29 |     blosum62 = substitution_matrices.load("BLOSUM62")
30 |     return pairwise2.align.localds(s1, s2, blosum62, -3, -3, one_alignment_only=True)[0]
31 | 
32 | def align_all_pairs(s):
33 |     for i in range(len(s)):
34 |         for j in range(i+1, len(s)):
35 |             s1 = s[i][1]
36 |             s2 = s[j][1]
37 |             alignments = align_pair(s1,s2)
38 |             print(s[i][0], len(s1), s[j][0], len(s2), *overlap_from_alignment(alignments[0]), identity_from_alignment(alignments[0]))
39 | 
40 | def sequence_overlap(s1, s2):
41 |     aligner = PairwiseAligner()
42 |     aligner.mode = "global"
43 |     aligner.substitution_matrix = substitution_matrices.load("BLOSUM62")
44 |     alignments = aligner.align(s1, s2)
45 |     alignment = list(alignments)[0]
46 |     alignment_bounds = alignment.aligned
47 |     return np.array([alignment_bounds[0][0][0], alignment_bounds[0][-1][1], alignment_bounds[-1][0][0], alignment_bounds[-1][-1][1]]) + np.array([1,0,1,0])
48 | 
49 | if __name__ == '__main__':
50 |     a1 = (
51 |     '---------AAAAAAAABBBBBBB',
52 |     'BBBBBBBBBAAAAAAAA-------'
53 |     )
54 |     print(overlap_from_alignment(a1))
55 |     print(sequence_overlap("AAAAAAAABBBBBBB", "BBBBBBBBBAAAAAAAA"))
56 | 
57 |     a2 = (
58 |     'BBBBBBBBBAAAAABBBBBBB',
59 |     '---------AAAAA-------'
60 |     )
61 |     print(overlap_from_alignment(a2))
62 |     print(sequence_overlap("BBBBBBBBBAAAAABBBBBBB", "AAAAA"))
63 | 
64 |     a3 = (
65 |     'BBBBBBBBBAAAAABBBBAAAABBB',
66 |     '---------AAAAA----AAAA----'
67 |     )
68 |     print(overlap_from_alignment(a3))
69 |     print(sequence_overlap("BBBBBBBBBAAAAABBBBAAAABBB", "AAAAAAAAA"))
70 | 
71 |     s1 = "AACDAEECDAECDEADAEEAEADADCADEAEAECDDAEACDAECDA"
72 |     s2 = "ACDAEECDADEADWAEEAEADAWDCADEAEAECGDDAEAGCDACDA"
73 |     a = align_pair(s1,s2)
74 |     print(a[0],a[1])
75 | 


--------------------------------------------------------------------------------
/cellscape/parse_uniprot_xml.py:
--------------------------------------------------------------------------------
  1 | import xml.etree.ElementTree as ET
  2 | import os
  3 | import urllib
  4 | import sys
  5 | import argparse
  6 | import json
  7 | 
  8 | class UniprotRecord:
  9 |     """Data structure to hold Uniprot annotations for single sequence."""
 10 |     def __init__(self, id, name=None):
 11 |         self.id = id
 12 |         self.name = name
 13 |         self.domains = []
 14 |         self.topology = []
 15 |         self.ptm = {}
 16 |         self.sequence = ""
 17 | 
 18 |     def add_domain(self,name, start, end):
 19 |         self.domains.append((name, int(start), int(end)))
 20 |     def add_topology(self, name, start, end):
 21 |         self.topology.append((name, int(start), int(end)))
 22 |     def add_ptm(self, name, start, end):
 23 |         self.ptm[name] = (int(start), int(end))
 24 |     def process_segments(self):
 25 |         if 'chain' in self.ptm:
 26 |             (self.chain_start, self.chain_end) = self.ptm['chain']
 27 |         else:
 28 |             (self.chain_start, self.chain_end) = (1, 99999)
 29 | 
 30 |         last = self.chain_start
 31 |         self.domain_segments = []
 32 | 
 33 |         for domain in self.domains:
 34 |             if (domain[1] - last) > 1:
 35 |                 self.domain_segments.append(('None',last, domain[1]-1))
 36 | 
 37 |             self.domain_segments.append(domain)
 38 |             last = domain[2]
 39 | 
 40 |         if (self.chain_end - last) > 1:
 41 |             self.domain_segments.append(('None',last, self.chain_end))
 42 | 
 43 | def parse_xml(xmlpath):
 44 |     """
 45 |     Parse Uniprot XML file to return list of UniprotRecord objects.
 46 |     """
 47 |     tree = ET.parse(xmlpath)
 48 |     root = tree.getroot()
 49 |     ns = '{http://uniprot.org/uniprot}'
 50 |     sequences = []
 51 | 
 52 |     for entry in tree.iter(tag=ns+'entry'):
 53 |         accession = entry.find(ns+'accession').text
 54 |         gene = entry.find(ns+'name').text
 55 |         sequence = UniprotRecord(accession, gene)
 56 | 
 57 |         for feature in entry.iter(tag=ns+'feature'):
 58 | 
 59 |             # look for transmembrane regions
 60 |             if feature.get('type') in ('topological domain','transmembrane region'):
 61 |                 try:
 62 |                     begin = feature.find(ns+'location').find(ns+'begin').get('position')
 63 |                     end = feature.find(ns+'location').find(ns+'end').get('position')
 64 |                     feature_description = feature.get('description').split(';')[0]
 65 |                     sequence.add_topology(feature_description, begin, end)
 66 |                 except:
 67 |                     pass
 68 | 
 69 |             # look for protein domains
 70 |             elif feature.get('type') == 'domain':
 71 |                 try:
 72 |                     begin = feature.find(ns+'location').find(ns+'begin').get('position')
 73 |                     end = feature.find(ns+'location').find(ns+'end').get('position')
 74 |                     sequence.add_domain(feature.get('description'),begin,end)
 75 |                 except:
 76 |                     pass
 77 | 
 78 |             # look for signal peptide and mature chain
 79 |             elif feature.get('type') in ('chain', 'propeptide','signal peptide'):
 80 |                 try:
 81 |                     begin = feature.find(ns+'location').find(ns+'begin').get('position')
 82 |                     end = feature.find(ns+'location').find(ns+'end').get('position')
 83 |                     sequence.add_ptm(feature.get('type'), begin, end)
 84 |                 except:
 85 |                     pass
 86 | 
 87 |         sequence.process_segments()
 88 |         sequences.append(sequence)
 89 | 
 90 |         for seq in entry.iter(tag=ns+'sequence'):
 91 |             if seq.text is not None:
 92 |                 sequence.sequence = seq.text.replace('\n','')
 93 | 
 94 |     return(sequences)
 95 | 
 96 | def split_uniprot_xml(xmlpath, outpath='.'):
 97 |     """Take a multi-record XML file and split to one XML file per entry."""
 98 |     tree = ET.parse(xmlpath)
 99 |     root = tree.getroot()
100 |     ns = '{http://uniprot.org/uniprot}'
101 |     for entry in tree.iter(tag=ns+'entry'):
102 |         accession = entry.find(ns+'accession')
103 |         with open("{}/{}.xml".format(outpath, accession.text), "w") as xml_out:
104 |             xml_out.write(ET.tostring(entry).decode('utf-8'))
105 | 
106 | def download_uniprot_record(record, fileformat, outdir):
107 |     """Download record from Uniprot server."""
108 |     file_path = "{}.{}".format(record, fileformat)
109 |     out_path = os.path.join(outdir, file_path)
110 |     if not os.path.exists(out_path):
111 |         print("Requesting {}".format(out_path))
112 |         urllib.request.urlretrieve("https://www.uniprot.org/uniprot/{}".format(file_path), out_path)
113 |     else:
114 |         pass
115 |         #print("UniProt file already there", file=sys.stderr)
116 |     return out_path
117 | 
118 | if __name__ == "__main__":
119 | 
120 |     parser = argparse.ArgumentParser(description='Parse UniProt XML file',  formatter_class=argparse.ArgumentDefaultsHelpFormatter)
121 |     parser.add_argument('--xml', help='Input XML file', required=True)
122 |     parser.add_argument('--json', action='store_true', default=False, help='Output relevant information in JSON')
123 |     args = parser.parse_args()
124 | 
125 |     uniprot = parse_xml(args.xml)
126 | 
127 |     for entry in uniprot:
128 |         if args.json:
129 |             data = {
130 |             'name': entry.name,
131 |             'sequence': entry.sequence,
132 |             'domains': entry.domain_segments,
133 |             'topology': entry.topology
134 |             }
135 |             print(json.dumps(data, indent=2))
136 |         else:
137 |             with open(entry.name+'.domains.csv','w') as f:
138 |                 f.write(','.join(['res_start','res_end','description'])+'\n')
139 |                 for domain in entry.domain_segments:
140 |                     f.write(','.join(map(str,[domain[1],domain[2],domain[0]]))+'\n')
141 | 
142 |             with open(entry.name+'.topology.csv','w') as f:
143 |                 f.write(','.join(['res_start','res_end','description'])+'\n')
144 |                 for domain in entry.topology:
145 |                     f.write(','.join(map(str,[domain[1],domain[2],domain[0]]))+'\n')
146 | 


--------------------------------------------------------------------------------
/cellscape/scene.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import matplotlib
  3 | import matplotlib.pyplot as plt
  4 | import matplotlib.patches as mpatches
  5 | import matplotlib.lines as mlines
  6 | from matplotlib import lines, text, cm
  7 | from matplotlib.colors import LinearSegmentedColormap, ListedColormap
  8 | from scipy import interpolate
  9 | import os
 10 | import sys
 11 | import pickle
 12 | import csv
 13 | 
 14 | from cellscape.cartoon import plot_polygon, shade_from_color, placeholder_polygon
 15 | 
 16 | def rotation_matrix_2d(theta):
 17 |     """Return matrix to rotate 2D coordinates by angle theta."""
 18 |     return np.array([[np.cos(theta), -1*np.sin(theta)],[np.sin(theta), np.cos(theta)]])
 19 | 
 20 | class Membrane:
 21 |     def __init__(self, width, thickness, axes, base_y=0):
 22 |         self.width = width
 23 |         self.thickness = thickness
 24 |         self.y = base_y
 25 |         self.axes = axes
 26 |         # other constants
 27 |         self.head_radius = 4
 28 | 
 29 |     def flat(self):
 30 |         self.height_at = lambda x: self.y + self.thickness/2
 31 | 
 32 |     def sinusoidal(self, frequency=1, amplitude=1):
 33 |         self.height_at = lambda x: self.y + self.thickness/2*amplitude*np.sin(x*frequency*2*np.pi/self.width)
 34 | 
 35 |     def interpolate(self, x, y, kind='linear'):
 36 |         #self.height_at = interpolate.interp1d(x, y, kind=kind)
 37 |         self.height_fn = interpolate.PchipInterpolator(x, y)
 38 |         self.height_at = lambda x: self.height_fn(x) + self.y
 39 | 
 40 |     def draw(self, lipids=False):
 41 | 
 42 |         membrane_x = np.linspace(0,self.width,200)
 43 |         membrane_y_top = np.array([self.height_at(x) for x in membrane_x])
 44 |         membrane_y_bot = membrane_y_top-self.thickness
 45 | 
 46 |         if lipids:
 47 |             membrane_box_fc='#C4E7EF'
 48 |             lipid_head_fc='#D6D1EF'
 49 |             lipid_tail_fc='#A3DCEF'
 50 |             plt.fill_between(membrane_x, membrane_y_top-self.head_radius, membrane_y_bot+self.head_radius, color=membrane_box_fc, zorder=1.6)
 51 |             num_lipids = int(self.width/(2*self.head_radius))
 52 |             for i in range(num_lipids):
 53 |                 membrane_y = self.height_at(i/num_lipids*self.width)
 54 |                 self.axes.add_line(mlines.Line2D([i*self.head_radius*2, i*self.head_radius*2], [-4+membrane_y, -18+membrane_y], zorder=1.7, c=lipid_tail_fc, linewidth=self.head_radius*.7, alpha=1, solid_capstyle='round'))
 55 |                 self.axes.add_line(mlines.Line2D([i*self.head_radius*2, i*self.head_radius*2], [-38+membrane_y, -24+membrane_y], zorder=1.7, c=lipid_tail_fc, linewidth=self.head_radius*.7, alpha=1, solid_capstyle='round'))
 56 |                 self.axes.add_patch(mpatches.Circle((i*self.head_radius*2, -1*self.head_radius+membrane_y), self.head_radius, facecolor=lipid_head_fc, ec='k', linewidth=0.3, alpha=1, zorder=2))
 57 |                 self.axes.add_patch(mpatches.Circle((i*self.head_radius*2, -1*self.thickness+membrane_y), self.head_radius, facecolor=lipid_head_fc, ec='k', linewidth=0.3, alpha=1, zorder=2))
 58 | 
 59 |         else:
 60 |             membrane_box_fc='#C8C8C8'
 61 |             plt.fill_between(membrane_x, membrane_y_top, membrane_y_bot, color=membrane_box_fc, zorder=1.6)
 62 | 
 63 | def make_scene(args):
 64 |     """Build a scene in one-go. Called when running ``cellscape scene``."""
 65 | 
 66 |     assert args.save.split('.')[-1] in ['png','pdf','svg','ps'], "image format not recognized"
 67 | 
 68 |     # list of protein polygons to draw
 69 |     object_list = []
 70 |     num_files = 0
 71 | 
 72 |     # set random seed for reproducibility
 73 |     if args.seed:
 74 |         np.random.seed(args.seed)
 75 | 
 76 |     if args.files:
 77 |         for path in args.files:
 78 |             with open(path,'rb') as f:
 79 |                 data = pickle.load(f)
 80 |                 object_list.append(data)
 81 | 
 82 |         # allow random scene generation even if manually specifying files
 83 |         if args.num_mol > 0:
 84 |             object_list = np.random.choice(object_list, size=args.num_mol)
 85 |         num_files = len(object_list)
 86 | 
 87 |     elif args.csv:
 88 |         protein_data = dict()
 89 |         with open(args.csv) as csvfile:
 90 |             reader = csv.DictReader(csvfile)
 91 |             for row in reader:
 92 |                 (name, stoich, path) = (row['name'], float(row[args.sample_from]), row.get('file'))
 93 |                 if path != "":
 94 |                     with open(path,'rb') as f:
 95 |                         data = pickle.load(f)
 96 |                         data['name'] = name
 97 |                         data['stoichiometry'] = stoich
 98 |                         # TEST specifying color in CSV file
 99 |                         if 'color' in row:
100 |                             data['color'] = row['color']
101 |                         protein_data[name] = (stoich, data)
102 |                 elif args.use_placeholders:
103 |                     height = float(row.get('height'))*10 # assuming in nanometers
104 |                     data = {'name':name, 'stoichiometry':stoich, 'height':height, 'bottom':np.array([25,0]), 'width':50, 'polygons':[{'polygon':placeholder_polygon(height), 'edgecolor':'k', 'linewidth':1, 'facecolor':"#eeeeee"}]}
105 |                     protein_data[name] = (stoich, data)
106 | 
107 |         num_files = len(protein_data)
108 | 
109 |     else:
110 |         sys.exit("No input files specified, see options with --help")
111 | 
112 |     if len(args.offsets) > 0:
113 |         assert(len(args.files) == len(args.offsets))
114 |         y_offsets = list(map(float, args.offsets))
115 |     else:
116 |         y_offsets = np.zeros(len(object_list))
117 | 
118 |     if args.csv:
119 |         # total sum of protein counts
120 |         protein_names = np.array(list(protein_data.keys()))
121 |         protein_stoich = np.array([protein_data[p][0] for p in protein_names])
122 |         sum_stoich = np.sum(protein_stoich)
123 |         stoich_weights = protein_stoich / sum_stoich
124 | 
125 |         if args.num_mol > 0:
126 |             # protein copy number
127 |             sampled_protein = np.random.choice(protein_names, size=args.num_mol, p=stoich_weights)
128 |             object_list = [protein_data[p][1] for p in sampled_protein]
129 |         else:
130 |             object_list = [protein_data[p][1] for p in protein_names]
131 | 
132 |         # assemble objects for background
133 |         if args.background and args.num_mol > 0:
134 |             scaling_factor = 0.7
135 |             sampled_protein = np.random.choice(protein_names, int(args.num_mol*1/scaling_factor), p=stoich_weights)
136 |             background_object_list = [protein_data[p][1] for p in sampled_protein]
137 |         elif 'name' in object_list[0]:
138 |             protein_names = [o['name'] for o in object_list]
139 |         else:
140 |             for i,o in enumerate(object_list):
141 |                 o['name'] = i
142 |             protein_names = range(len(object_list))
143 | 
144 |     # sort proteins
145 |     if args.order_by == "random":
146 |         np.random.shuffle(object_list)
147 |     elif args.order_by == "height":
148 |         object_list = sorted(object_list, key=lambda x: x['height'], reverse=True)
149 |     elif args.order_by == "top":
150 |         # TODO should be renamed, maybe length for overall size and height for above membrane?
151 |         object_list = sorted(object_list, key=lambda x: x['top'][1], reverse=True)
152 |     elif args.order_by == "membrane":
153 |         # sorted by maximum height above or below the membrane
154 |         def max_abs(l1, l2):
155 |             if abs(l1) > abs(l2):
156 |                 return l1
157 |             else:
158 |                 return l2
159 |         object_list = sorted(object_list, key=lambda x: max_abs(x['top'][1], x['bottom'][1]), reverse=True)
160 | 
161 |     # set font options
162 |     font_options = {'family':'Arial', 'weight':'normal', 'size':10}
163 |     matplotlib.rc('font', **font_options)
164 | 
165 |     # set up plot
166 |     # POSSIBLE BUG: while coordinates and scale are prserved in the pickle files,
167 |     # this doesn't necessarily apply to the images. Hence if someone tries to
168 |     # manually add a protein to a scene that has been generated there could be
169 |     # sizing issues. Is the solution to use  a constant angstrom/inch scaling?
170 |     scene_height_in = args.fig_height
171 |     scene_width_in = args.fig_width
172 |     fig, axs = plt.subplots(figsize=(scene_width_in, scene_height_in))
173 |     axs.set_aspect('equal')
174 | 
175 |     if args.axes:
176 |         plt.axis('on')
177 |         axs.xaxis.grid(False)
178 |         axs.yaxis.grid(True)
179 |         axs.axes.xaxis.set_ticklabels([])
180 |         axs.autoscale()
181 |         plt.margins(0.01,0.01) # is this needed?
182 | 
183 |     else:
184 |         plt.axis('off')
185 |         plt.gca().set_axis_off()
186 |         plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)
187 |         axs.autoscale()
188 |         plt.margins(0.01,0.01)
189 |         plt.gca().xaxis.set_major_locator(plt.NullLocator())
190 |         plt.gca().yaxis.set_major_locator(plt.NullLocator())
191 | 
192 |     if args.recolor:
193 |         # default cmap is hsv. for discrete could try Set1 or Pastel1
194 |         if len(args.recolor_cmap) == 1:
195 |             cmap = cm.get_cmap(args.recolor_cmap[0])
196 |         else:
197 |             # TESTING interpret as continous color scheme
198 |             # cmap = LinearSegmentedColormap.from_list("cmap", args.recolor_cmap)
199 |             cmap = ListedColormap(args.recolor_cmap)
200 |         color_scheme = dict()
201 |         for i,c in enumerate(sorted(object_list, key=lambda x: x['height'])):
202 |             name = c['name']
203 |             if isinstance(cmap, ListedColormap):
204 |                 color_scheme[name] = cmap(i)
205 |             else:
206 |                 color_scheme[name] = cmap(i/len(object_list))
207 | 
208 |     # TESTING
209 |     # so colors are by height (what about duplicated molecules)
210 |     # np.random.shuffle(object_list)
211 | 
212 |     total_width = np.sum([o['width'] for o in object_list])+len(object_list)*args.padding
213 |     if args.membrane is not None:
214 |         membrane = Membrane(width=total_width, axes=axs, thickness=args.membrane_thickness)
215 | 
216 |         if args.membrane == "flat":
217 |             membrane.flat()
218 |         elif args.membrane == "arc":
219 |             membrane.sinusoidal(frequency=0.5, amplitude=2)
220 |         elif args.membrane == "wave":
221 |             membrane.sinusoidal(frequency=2, amplitude=2)
222 |         membrane.draw(lipids=args.membrane_lipids)
223 | 
224 |     # draw molecules
225 |     w=0
226 |     for i, o in enumerate(object_list):
227 |         if args.membrane is not None and not args.no_membrane_offset:
228 |             #y_offset = membrane.height_at(w+o['bottom'][0])-10
229 |             #y_offset = o['bottom'][1]
230 |             if o["bottom"][1] < 0:
231 |                 y_offset = -1*o["bottom"][1]
232 |             else:
233 |                 y_offset = 0
234 |         else:
235 |             y_offset = 0
236 |         for p in o["polygons"]:
237 |             # TODO change to dict.get() call to have default
238 |             if args.recolor:
239 |                 if 'color' in o:
240 |                     # check if color specified in CSV file
241 |                     facecolor = o['color']
242 |                     edgecolor = p["edgecolor"]
243 |                 else:
244 |                     # use color scheme from recolor_cmap
245 |                     facecolor = color_scheme[o['name']]
246 |                     edgecolor = 'black'
247 |                 if "shade" in p:
248 |                     # TODO export shading_range from polygons as well
249 |                     facecolor = shade_from_color(facecolor, p["shade"], range=p.get("shading_range", 0.4)) # using default from cartoon.py, could change
250 |             else:
251 |                 # use color already specified
252 |                 facecolor = p["facecolor"]
253 |                 edgecolor = p["edgecolor"]
254 | 
255 |             plot_polygon(p["polygon"], translate_pre=[w, y_offset], facecolor=facecolor, edgecolor=edgecolor, linewidth=p["linewidth"], zorder_mod=p.get("zorder", 0))
256 |             if args.labels:
257 |                 # option is experimental, text needs to be properly sized and placed
258 |                 # testing use of figure width in inches (specified above) and total width in angstroms to infer appropriate font size
259 |                 #plt.text(w+o['width']/2,-100, o.get("name", ""), rotation=90, fontsize=fontsize)
260 |                 # 1.1 and 0.6 numbers chosen through experimentation, best way would be to look at length of labels in characters
261 |                 angstroms_per_inch = total_width/scene_width_in
262 |                 fontsize = total_width*args.label_size/len(object_list)/angstroms_per_inch*72
263 |                 font_inches = fontsize/72
264 |                 # TODO better text positioning, allow for top/bottom selection
265 |                 if args.label_orientation == "vertical":
266 |                     #plt.text(w+o['width']/2,o['bottom'][1]-1.1*angstroms_per_inch*font_inches, o.get("name", ""), rotation=90, fontsize=fontsize, va='top', ha='center') # vertical text (below)
267 |                     plt.text(w+o['width']/2,0-1.1*angstroms_per_inch*font_inches, o.get("name", ""), rotation=90, fontsize=fontsize, va='top', ha='center') # vertical text (below)
268 |                 elif args.label_orientation == "horizontal":
269 |                     plt.text(w+o['width']/2,o['top'][1]+2*angstroms_per_inch*font_inches, o.get("name", ""), rotation=0, fontsize=fontsize, va='top', ha='center') # horizontal text (above)
270 |                 elif args.label_orientation == "diagonal":
271 |                     plt.text(w+o['width']/5,o['top'][1]+angstroms_per_inch*font_inches, o.get("name", ""), rotation=45, fontsize=fontsize) # diagonal text (above)
272 |         w += o['width']+args.padding
273 |  
274 |     if args.background:
275 |         background_w=0
276 |         for i, o in enumerate(background_object_list):
277 |             for p in o["polygons"]:
278 |                 plot_polygon(p["polygon"], offset=[background_w, 0], scale=scaling_factor, zorder_mod=p.get("zorder", -2), facecolor=p["facecolor"], edgecolor=p["edgecolor"], linewidth=p["linewidth"]*scaling_factor)
279 |             background_w += (o['width']+args.padding)
280 | 
281 |     plt.savefig(args.save, transparent=True, pad_inches=0, bbox_inches='tight', dpi=args.dpi)
282 | 


--------------------------------------------------------------------------------
/cellscape/structure.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import shapely.geometry as sg
  3 | import shapely.ops as so
  4 | import re
  5 | import os
  6 | import sys
  7 | import operator
  8 | import warnings
  9 | from Bio.PDB import rotmat, vectors, MMCIFParser, PDBParser
 10 | from scipy.spatial.distance import pdist, squareform
 11 | import time
 12 | 
 13 | import cellscape
 14 | from cellscape.util import amino_acid_3letter, group_by
 15 | from cellscape.parse_uniprot_xml import parse_xml, download_uniprot_record
 16 | from cellscape.parse_alignment import align_pair, overlap_from_alignment, sequence_overlap
 17 | 
 18 | # silence warnings from Biopython that might pop up when loading the PDB
 19 | from Bio import BiopythonWarning
 20 | warnings.simplefilter('ignore', BiopythonWarning)
 21 | 
 22 | def matrix_from_nglview(m):
 23 |     """Take flattened 4x4 view matrix from NGLView and convert to 3x3 rotation matrix."""
 24 |     camera_matrix = np.array(m).reshape(4,4)
 25 |     return camera_matrix[:3,:3]/np.linalg.norm(camera_matrix[:3,:3], axis=1), camera_matrix[3,:3]
 26 | 
 27 | def matrix_to_nglview(m):
 28 |     """Take 3x3 rotation matrix and convert to flattened 4x4 view matrix for NGLView."""
 29 |     nglv_matrix = np.identity(4)
 30 |     nglv_matrix[:3,:3] = np.dot(m, np.array([[-1,0,0],[0,1,0],[0,0,-1]]))
 31 |     return list(nglv_matrix.flatten())
 32 | 
 33 | def orientation_from_topology(topologies):
 34 |     """Infer protein vertical orientation (N->C or C->N) from UniProt topology annotation."""
 35 |     first_ex_flag = True
 36 |     first_ex = None
 37 |     first_cy_flag = True
 38 |     first_cy = None
 39 |     first_he_flag = True
 40 |     first_he = None
 41 | 
 42 |     for row in topologies:
 43 |         (description, start, end) = row
 44 | 
 45 |         if description == 'Extracellular' and first_ex_flag:
 46 |             first_ex = (start, end)
 47 |             first_ex_flag = False
 48 |         elif description == 'Helical' and first_he_flag:
 49 |             first_he = (start, end)
 50 |             first_he_flag = False
 51 |         elif description == 'Cytoplasmic' and first_cy_flag:
 52 |             first_cy = (start, end)
 53 |             first_cy_flag = False
 54 | 
 55 |     # rough heuristic for now, works for single pass transmembrane proteins
 56 |     nc_orient = True
 57 |     if first_ex is not None and first_cy is not None:
 58 |         if first_ex[0] < first_cy[0]:
 59 |             nc_orient = True # N->C (top to bottom)
 60 |         elif first_ex[0] > first_cy[0]:
 61 |             nc_orient = False # C->N (top to bottom)
 62 | 
 63 |     return(nc_orient)
 64 | 
 65 | def orientation_from_ptm(ptm):
 66 |     """Assumes signal peptide is on the cytoplasmic/membrane side with the chain extracellular"""
 67 | 
 68 |     nc_orient = True
 69 |     if ('chain' in ptm) and ('signal peptide' in ptm):
 70 |         if ptm['signal peptide'][0] < ptm['chain'][0]:
 71 |             nc_orient = True
 72 |         else:
 73 |             nc_orient = False
 74 | 
 75 |     return(nc_orient)
 76 | 
 77 | def depth_slices_from_coord(xyz, width):
 78 |     """Split single xyz Nx3 matrix into list of Nx3 matrices"""
 79 |     binned = (xyz[:,-1]/width).astype(int)
 80 |     binned_shifted = binned - np.min(binned)
 81 |     num_bins = np.max(binned_shifted)+1
 82 | 
 83 |     total_coords = 0
 84 |     slice_coords = []
 85 | 
 86 |     for i in range(num_bins):
 87 |         bin_coords = xyz[binned_shifted == i]
 88 |         slice_coords.append(bin_coords)
 89 |         total_coords += len(bin_coords)
 90 | 
 91 |     assert(len(xyz) == total_coords)
 92 |     return slice_coords
 93 | 
 94 | def get_z_slice_labels(xyz, width):
 95 |     """Take an Nx3 coordinate matrix and return Z bin"""
 96 |     binned = (xyz[:,-1]/width).astype(int)
 97 |     return binned - np.min(binned)
 98 | 
 99 | def split_on_labels(m, labels):
100 |     num_bins = np.max(labels)+1
101 |     total_coords = 0
102 |     coords = []
103 |     for i in range(num_bins):
104 |         group_coords = m[labels == i]
105 |         coords.append(group_coords)
106 |         total_coords += len(group_coords)
107 |     assert(len(m) == total_coords)
108 |     return coords
109 | 
110 | def get_dimensions(xy, end_window=50):
111 |     dimensions = {}
112 |     dimensions['width'] = np.max(xy[:,0]) - np.min(xy[:,0])
113 |     dimensions['height'] = np.max(xy[:,1]) - np.min(xy[:,1])
114 |     dimensions['start'] = np.mean(xy[:end_window])
115 |     dimensions['end'] = np.mean(xy[:-end_window])
116 |     dimensions['bottom'] = min(xy, key=operator.itemgetter(1))
117 |     dimensions['top'] = max(xy, key=operator.itemgetter(1))
118 |     return dimensions
119 | 
120 | class Structure:
121 |     """ A class to load coordinates, handle an NGLView instance, and generate cartoons"""
122 |     #
123 |     def __init__(self, file, name=None, model=0, chain="all", uniprot=None, view=True, is_opm=False, res_start=None, res_end=None):
124 |         """
125 |         Args:
126 |             file (str): Path to PDB/mmCIF coordinates
127 |             name (str, optional): Descriptive name for structure. Defaults to None.
128 |             model (int, optional): Model number from structure. Defaults to 0.
129 |             chain (str, optional): Either "all" or list of chains to include e.g. "ABC". Defaults to "all".
130 |             uniprot (str, optional): UniProt identifier (to download the record) or the path to a UniProt XML file. Defaults to None.
131 |             view (bool, optional): Whether to use interactive NGLView widget. Defaults to True.
132 |             is_opm (bool, optional): Structure is from Orientation of Proteins in Membranes database. Defaults to False.
133 |             res_start (int, optional): Select subset of protein. Defaults to None.
134 |             res_end (int, optional): Select subset of protein. Defaults to None.
135 |         """
136 | 
137 |         # descriptive name for the protein, otherwise use file
138 |         if name is None:
139 |             self.name = os.path.basename(file)
140 |         else:
141 |             self.name = name
142 | 
143 |         # load structure with biopython
144 |         if file[-3:] in ["cif", "mcif"]:
145 |             parser = MMCIFParser()
146 |         elif file[-3:] in ["pdb", "ent"]:
147 |             parser = PDBParser()
148 |         else:
149 |             sys.exit("File format not recognized!")
150 |         self.structure = parser.get_structure(file, file)[model]
151 |         _all_chains = [c.id for c in self.structure.get_chains()]
152 | 
153 |         # eliminate undesired chains from the biopython object
154 |         if chain.lower() == "all":
155 |             self.chains = _all_chains
156 |         else:
157 |             self.chains = list(chain)
158 |             for c in _all_chains:
159 |                 if c not in self.chains:
160 |                     self.structure.detach_child(c)
161 | 
162 |         # take chain start and end for first chain
163 |         if res_start is not None and res_end is not None:
164 |             assert(res_end > res_start)
165 |             for res in list(self.structure[_all_chains[0]]):
166 |                 res_id = res.get_full_id()[3][1]
167 |                 if (res_id < res_start) or (res_id > res_end):
168 |                     self.structure[_all_chains[0]].detach_child(res.get_id())
169 | 
170 |         # BUG with some biopython structures not loading in nglview
171 |         # can be fixed by resetting disordered flags
172 |         # could this cause problems later on?
173 |         for chain in self.structure:
174 |             for residue in chain:
175 |                 for atom in residue.get_unpacked_list():
176 |                     atom.disordered_flag = 0
177 | 
178 |         # assumes PDB is oriented as described here:
179 |         # https://opm.phar.umich.edu/about#features
180 |         self.is_opm = is_opm
181 | 
182 |         # view matrix and NGLView options
183 |         self.use_nglview = view
184 |         self.view_matrix = []
185 |         if self.use_nglview:
186 |             if 'nglview' not in sys.modules or 'nv' not in sys.modules:
187 |                 import nglview as nv
188 |             self._structure_to_view = self.structure
189 |             initial_repr = [
190 |                 {"type": "spacefill", "params": {
191 |                     "sele": "protein", "color": "element"
192 |                 }}
193 |             ]
194 |             self.view = nv.show_biopython(self._structure_to_view, sync_camera=True, representations=initial_repr)
195 |             self.view.camera = 'orthographic'
196 |             self.view._set_sync_camera([self.view])
197 |             self._reflect_y = np.array([[-1,0,0],[0,1,0],[0,0,-1]])
198 | 
199 |         # data structure holding residue information
200 |         self.residues = dict()
201 |         self.sequence = dict()
202 |         self.coord = []
203 |         self.ca_atoms = []
204 |         self.backbone_atoms = []
205 |         all_atoms = 0
206 |         for chain in self.chains:
207 |             self.sequence[chain] = ""
208 |             self.residues[chain] = dict()
209 |             for res in self.structure[chain]:
210 |                 res_id = res.get_full_id()[3][1]
211 |                 if res.get_full_id()[3][0][0] == "H": # skip hetatm records
212 |                     continue
213 |                 if res.get_resname() not in amino_acid_3letter:
214 |                     continue
215 |                 res_aa = amino_acid_3letter[res.get_resname()]
216 |                 self.sequence[chain] += res_aa
217 |                 residue_atoms = 0
218 |                 these_atoms = []
219 |                 backbone_atoms = []
220 |                 for a in res:
221 |                     self.coord.append(list(a.get_vector()))
222 |                     these_atoms.append(a.id) # tracking atom identities for now
223 |                     if a.id == "CA":
224 |                         this_ca_atom = all_atoms
225 |                         self.ca_atoms.append(this_ca_atom)
226 |                     if a.id in ["CA", "N", "C", "O"]:
227 |                         backbone_atoms.append(all_atoms)
228 |                         self.backbone_atoms.append(all_atoms)
229 |                     all_atoms += 1
230 |                     residue_atoms += 1
231 |                 self.residues[chain][res_id] = {
232 |                 'chain':chain,
233 |                 'id':res_id,
234 |                 'amino_acid':res_aa,
235 |                 'object':res,
236 |                 'coord':(all_atoms-residue_atoms, all_atoms),
237 |                 'coord_ca':(this_ca_atom, this_ca_atom+1),
238 |                 'coord_backbone':np.array(backbone_atoms),
239 |                 'atoms':np.array(these_atoms)
240 |                 }
241 |         self.coord = np.array(self.coord)
242 |         self.ca_atoms = np.array(self.ca_atoms).astype(int)
243 | 
244 |         # uniprot information
245 |         if uniprot is not None:
246 |             if os.path.exists(uniprot):
247 |                 self._uniprot_xml = uniprot
248 |             elif re.fullmatch(r'[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}', uniprot):
249 |                 # if file doesn't exist, check it is a valid UniProt ID and download from server
250 |                 # using regex from https://www.uniprot.org/help/accession_numbers
251 |                 try:
252 |                     self._uniprot_xml = download_uniprot_record(uniprot, "xml", os.getcwd())
253 |                 except:
254 |                     sys.exit("Couldn't download UniProt file")
255 |             else:
256 |                 self._uniprot_xml = None
257 |                 sys.exit("Must specify either a UniProt XML file or a valid UniProt ID")
258 |         else:
259 |             self._uniprot_xml = None
260 | 
261 |         if self._uniprot_xml is not None:
262 |             self._preprocess_uniprot(self._uniprot_xml)
263 | 
264 |     def _preprocess_uniprot(self, xml):
265 |         # TODO support more than one XML file (e.g. for different chains?)
266 |         self._uniprot = parse_xml(xml)[0]
267 | 
268 |         # align PDB and UniProt sequences to find offset
269 |         uniprot_chain = self.chains[0]
270 |         pdb_seq = self.sequence[uniprot_chain]
271 |         uniprot_seq = self._uniprot.sequence
272 |         first_residue_id = sorted(self.residues[uniprot_chain])[0]
273 |         # alignment coordinates are 0-indexed (but PDB numbering and Uniprot ranges are 1-indexed)
274 |         #self._uniprot_overlap = np.array(overlap_from_alignment(align_pair(uniprot_seq, pdb_seq)))
275 |         self._uniprot_overlap = np.array(sequence_overlap(uniprot_seq, pdb_seq))
276 |         self._uniprot_offset = self._uniprot_overlap[0] - first_residue_id
277 | 
278 |         if len(self._uniprot.domains) > 0:
279 |             self._annotate_residues_from_uniprot(self._uniprot.domains, name_key="domain", residues=self.residues[uniprot_chain], offset=self._uniprot_offset)
280 | 
281 |         if len(self._uniprot.topology) > 0:
282 |             self._annotate_residues_from_uniprot(self._uniprot.topology, name_key="topology", residues=self.residues[uniprot_chain], offset=self._uniprot_offset)
283 | 
284 |     def _annotate_residues_from_uniprot(self, ranges, name_key, residues, offset=0):
285 |         # pdb_number - offset = up_number
286 |         for row in ranges:
287 |             (name, start, end) = row
288 |             for r in range(start, end+1):
289 |                 if (r-offset) in residues:
290 |                     residues[r-offset][name_key] = name
291 | 
292 |     def _update_view_matrix(self):
293 |         # check if camera orientation has been specified from nglview
294 |         if len(self.view._camera_orientation) == 16:
295 |             m, t = matrix_from_nglview(self.view._camera_orientation)
296 |             self.view_matrix = np.dot(m, self._reflect_y)
297 |         elif len(self.view_matrix) == 0:
298 |             self.view_matrix = np.identity(3)
299 | 
300 |     def align_view(self, v1, v2):
301 |         """Rotate structure so v1 is aligned with v2
302 | 
303 |         Args:
304 |             v1 (ndarray): first vector
305 |             v2 (ndarray): second vector
306 |         """
307 |         # rotate structure so v1 is aligned with v2
308 |         r = rotmat(vectors.Vector(v1), vectors.Vector(v2))
309 |         view_matrix = r.T
310 |         self.set_view_matrix(view_matrix)
311 | 
312 |     def align_view_nc(self, n_atoms=10, c_atoms=10, flip=False):
313 |         """Rotate structure so N-C vector is aligned with the vertical axis
314 | 
315 |         Args:
316 |             n_atoms (int, optional): N terminus CoM calculated from first x atoms. Defaults to 10.
317 |             c_atoms (int, optional): C terminus CoM calculated from first x atoms. Defaults to 10.
318 |             flip (bool, optional): Orient C-to-N instead of N-to-C. Defaults to False.
319 |         """
320 |         com = np.mean(self.coord, axis=0)
321 |         atoms_ = self.coord - com
322 |         v1 = np.mean(atoms_[:n_atoms], axis=0) - np.mean(atoms_[-c_atoms:], axis=0)
323 |         if not flip:
324 |             self.align_view(v1, np.array([0,1,0]))
325 |         else:
326 |             self.align_view(v1, np.array([0,-1,0]))
327 | 
328 |     def auto_view(self, n_atoms=100, c_atoms=100, flip=None):
329 |         """Infer protein orientation from UniProt data
330 | 
331 |         Args:
332 |             n_atoms (int, optional):  N terminus CoM calculated from first x atoms.. Defaults to 100.
333 |             c_atoms (int, optional): C terminus CoM calculated from first x atoms. Defaults to 100.
334 |             flip (bool, optional): Explicitly pass orientation. Defaults to None.
335 |         """
336 |         # TODO should be same as align_view_nc if no UniProt data?
337 |         # TODO abstract with align_view?
338 |         # TODO abstract rotmat to separate function e.g. get_rotation_matrix()
339 |         if flip is None:
340 |             if self._uniprot_xml and len(self._uniprot.topology) > 0:
341 |                 print("orienting based on topology...")
342 |                 nc_orient = orientation_from_topology(self._uniprot.topology)
343 |             elif self._uniprot_xml and len(self._uniprot.ptm) > 0:
344 |                 print("orienting based on ptm...")
345 |                 nc_orient = orientation_from_ptm(self._uniprot.ptm)
346 |             else:
347 |                 nc_orient = True
348 |         elif isinstance(flip, bool):
349 |             nc_orient = flip
350 |         print("guessed N>C orientation? {}".format(nc_orient))
351 |         self.nc_orient = nc_orient
352 | 
353 |         # rotate structure so N-C vector is aligned with the vertical axis
354 |         com = np.mean(self.coord, axis=0)
355 |         atoms_ = self.coord - com
356 |         v1 = np.mean(atoms_[:n_atoms], axis=0) - np.mean(atoms_[-c_atoms:], axis=0)
357 |         if nc_orient:
358 |             first_rotation = rotmat(vectors.Vector(v1), vectors.Vector(np.array([0,1,0]))).T
359 |         else:
360 |             first_rotation = rotmat(vectors.Vector(v1), vectors.Vector(np.array([0,-1,0]))).T
361 | 
362 |         # rotate around Y axis so X axis aligns with longest distance in XZ plane
363 |         rot_coord = np.dot(self.coord, first_rotation)
364 |         com = np.mean(rot_coord, axis=0)
365 |         atoms_ = rot_coord - com
366 |         xz = atoms_[self.ca_atoms][:,[0,2]]
367 |         dist = squareform(pdist(xz))
368 |         max_dist = np.unravel_index(np.argmax(dist, axis=None), dist.shape)
369 |         #print(max_dist, np.max(dist), dist[max_dist[0]][max_dist[1]])
370 |         v2 = atoms_[self.ca_atoms[max_dist[0]]]-atoms_[self.ca_atoms[max_dist[1]]]
371 |         v2[1] = 0
372 |         second_rotation = rotmat(vectors.Vector(v2), vectors.Vector(np.array([1,0,0]))).T
373 | 
374 |         view_matrix = np.dot(first_rotation, second_rotation)
375 |         self.set_view_matrix(view_matrix)
376 | 
377 |     def _set_nglview_orientation(self, m):
378 |         # m is 3x3 rotation matrix
379 |         if self.use_nglview:
380 |             nglv_matrix = matrix_to_nglview(m)
381 |             #print("Before", self.view._camera_orientation)
382 |             self.view._set_camera_orientation(nglv_matrix)
383 |             # having a bug where setting camera orientation does nothing
384 |             # waiting a little bit seems to fix it (maybe an issue with sync/refresh rate)
385 |             #self.view.control.orient(nglv_matrix)
386 |             #self.view._camera_orientation = nglv_matrix
387 |             time.sleep(0.5)
388 |             self.view.center()
389 |             #print("After", self.view._camera_orientation)
390 | 
391 |     def _apply_view_matrix(self):
392 |         # transform atomic coordinates using view matrix
393 |         self.rotated_coord = np.dot(self.coord, self.view_matrix)
394 | 
395 |     def load_pymol_view(self, file):
396 |         """Read rotation matrix from output of PyMol ``get_view`` command
397 | 
398 |         Args:
399 |             file (str): Path to file
400 |         """
401 |         matrix = []
402 |         with open(file,'r') as view:
403 |             for line in view:
404 |                 fields = line.split(',')
405 |                 if len(fields) == 4:
406 |                     matrix.append(list(map(float,fields[:3])))
407 |         view_matrix = np.array(matrix)[:3]
408 |         self.set_view_matrix(view_matrix)
409 | 
410 |     def load_chimera_view(self, file):
411 |         """Read rotation matrix from output of Chimera ``matrixset`` command
412 | 
413 |         Args:
414 |             file (str): Path to file
415 |         """
416 |         matrix = []
417 |         with open(file,'r') as view:
418 |             for line in view.readlines()[1:4]:
419 |                 matrix.append(line.split())
420 | 
421 |         # transpose and remove translation vector
422 |         view_matrix = np.array(matrix).astype(float).T[:3]
423 |         self.set_view_matrix(view_matrix)
424 | 
425 |     def save_view_matrix(self, p):
426 |         """Save rotation matrix to a NumPy text file
427 | 
428 |         Args:
429 |             p (str): Path to file
430 |         """
431 |         self._update_view_matrix()
432 |         np.savetxt(p, self.view_matrix)
433 | 
434 |     def load_view_matrix(self, p):
435 |         """Load rotation matrix from a NumPy text file
436 | 
437 |         Args:
438 |             p (str): Path to file
439 |         """
440 |         view_matrix = np.loadtxt(p)
441 |         self.set_view_matrix(view_matrix)
442 | 
443 |     def set_view_matrix(self, m):
444 |         """Manually set view matrix
445 | 
446 |         Args:
447 |             m (ndarray): 3x3 matrix
448 |         """
449 |         assert m.shape == (3,3)
450 |         self.view_matrix = m
451 |         self._set_nglview_orientation(self.view_matrix)
452 | 
453 |     def outline(self, by="all", depth=None, depth_contour_interval=3, only_backbone=False, only_ca=False, only_annotated=False, radius=None, back_outline=False, align_transmembrane=False):
454 |         """Create 2D projection from coordinates and outline atoms
455 | 
456 |         Args:
457 |             by (str, optional): Grouping to use for cartoon. Options are ["all", "residue", "chain", "domain", "topology"]. Defaults to "all".
458 |             depth (_type_, optional): How to deal with depth/occlusions. Options are ["flat", "contours"]. Defaults to None.
459 |             depth_contour_interval (float, optional): Size in angstroms of contour slices into the Z-axis. Defaults to 3.
460 |             only_backbone (bool, optional): Only use backbone atoms for visualization. Defaults to False.
461 |             only_ca (bool, optional): Only use alpha-carbon atoms for visualization. Defaults to False.
462 |             only_annotated (bool, optional): Only include residues that have an annotation in UniProt (e.g. domain or topology). Defaults to False.
463 |             radius (float, optional): Explicitly pass atomic radius, otherwise infer from settings. Defaults to None.
464 |             back_outline (bool, optional): Draw additional outline of entire structure at the back. Defaults to False.
465 |             align_transmembrane (bool, optional): Align CoM of annotated transmembrane regions with membrane (requires UniProt data). Defaults to False.
466 | 
467 |         Returns:
468 |             Cartoon: Object containing and residue information and outlined polygons
469 |         """
470 | 
471 |         # check options
472 |         assert by in ["all", "residue", "chain", "domain", "topology"], "Option not recognized"
473 |         assert depth in [None, "flat", "contours"], "Option not recognized"
474 |         # depth option doesn't affect by="residue"
475 | 
476 |         # collapse chain hierarchy into flat list
477 |         self.residues_flat = [self.residues[c][i] for c in self.residues for i in self.residues[c]]
478 | 
479 |         if self.is_opm:
480 |             self.set_view_matrix(np.array([[1,0,0],[0,0,1],[0,1,0]]))
481 |         elif self.use_nglview:
482 |             self._update_view_matrix()
483 | 
484 |         # transform atomic coordinates using view matrix
485 |         self._apply_view_matrix()
486 | 
487 |         # recenter coordinates on lower left edge of bounding box
488 |         offset_x = np.min(self.rotated_coord[:,0])
489 |         if self.is_opm:
490 |             offset_y = 0 # since OPM already aligned to membrane
491 |         else:
492 |             offset_y = np.min(self.rotated_coord[:,1])
493 |         self.rotated_coord -= np.array([offset_x, offset_y, 0])
494 | 
495 |         # calculate vertical offset for transmembrane proteins
496 |         if self._uniprot_xml and align_transmembrane:
497 |             tm_coordinates = []
498 |             for res in self.residues_flat:
499 |                 if res.get("topology","") == "Helical":
500 |                     tm_coordinates.append(np.array(self.rotated_coord[range(*res['coord_ca'])]))
501 |             if len(tm_coordinates) > 0:
502 |                 tm_coordinates = np.concatenate(np.array(tm_coordinates))
503 |                 tm_com_y = np.mean(tm_coordinates[:,1])
504 |                 print("shifted for transmembrane region by {} angstroms".format(tm_com_y))
505 |                 self.rotated_coord -= np.array([0, tm_com_y, 0])
506 | 
507 |         self._rescale_z = lambda z: (z-np.min(self.rotated_coord[:,-1]))/(np.max(self.rotated_coord[:,-1])-np.min(self.rotated_coord[:,-1]))
508 |         polygons = []
509 |         groups = {}
510 |         self._group_outlines = []
511 | 
512 |         # default radius for rendering atoms
513 |         if only_ca and radius is None:
514 |             radius_ = 5
515 |         elif only_backbone and radius is None:
516 |             radius_ = 4
517 |         elif radius is None:
518 |             radius_ = 1.5
519 |         else:
520 |             radius_ = radius
521 | 
522 |         if by == 'all':
523 |             # space-filling outline of entire molecule
524 |             self.num_groups = 1
525 |             if only_ca:
526 |                 coord_to_outline = self.rotated_coord[self.ca_atoms]
527 |             elif only_backbone:
528 |                 coord_to_outline = self.rotated_coord[self.backbone_atoms]
529 |             else:
530 |                 coord_to_outline = self.rotated_coord
531 |             if depth == "contours":
532 |                 slice_coords = split_on_labels(coord_to_outline, get_z_slice_labels(coord_to_outline, width=depth_contour_interval))
533 |                 for slice in slice_coords:
534 |                     slice_depth = self._rescale_z(np.mean(slice[:,-1]))
535 |                     polygons.append(({"depth":slice_depth}, so.unary_union([sg.Point(i).buffer(radius_) for i in slice])))
536 |             else:
537 |                 # depth=None and depth=flat are equivalent for by="all"
538 |                 polygons.append(({}, so.unary_union([sg.Point(i).buffer(radius_) for i in coord_to_outline])))
539 |         else:
540 |             for res in self.residues_flat:
541 |                 # pick range of atomic coordinates out of main data structure
542 |                 if only_ca:
543 |                     res_coords = np.array(self.rotated_coord[range(*res['coord_ca'])])
544 |                 elif only_backbone:
545 |                     res_coords = np.array(self.rotated_coord[range(*res['coord_backbone'])])
546 |                 else:
547 |                     res_coords = np.array(self.rotated_coord[range(*res['coord'])])
548 |                 res["xyz"] = res_coords
549 | 
550 |         if by == 'residue':
551 |             for res in sorted(self.residues_flat, key=lambda res: np.mean(res["xyz"][:,-1])):
552 |                 group_outline = so.cascaded_union([sg.Point(i).buffer(radius_) for i in res["xyz"] ])
553 |                 res["polygon"] = group_outline
554 |                 res["depth"] = self._rescale_z(np.mean(res["xyz"][:,-1]))
555 |                 polygons.append((res, group_outline))
556 |             self.num_groups = 1
557 | 
558 |         elif by in ['domain', 'topology', 'chain']:
559 | 
560 |             if by in ['domain', 'topology']:
561 |                 assert(self._uniprot_xml is not None)
562 | 
563 |             # TODO comment code and be consistent with variable names group vs region
564 |             residue_groups = group_by(self.residues_flat, key=lambda x: x.get(by))
565 |             groups = sorted(residue_groups.keys(), key=lambda x: (x is None, x))
566 | 
567 |             self.num_groups = len(residue_groups)
568 |             region_atoms = dict() # residue group to atomic indices
569 |             total_atoms = 0
570 |             for k,v in residue_groups.items():
571 |                 region_atoms[k] = []
572 |                 for res in v:
573 |                     if only_ca:
574 |                         region_atoms[k].extend(range(*res['coord_ca']))
575 |                     elif only_backbone:
576 |                         region_atoms[k].extend(range(*res['coord_backbone']))
577 |                     else:
578 |                         region_atoms[k].extend(range(*res['coord']))
579 |                 region_atoms[k] = np.array(region_atoms[k], dtype=int)
580 |                 total_atoms += len(region_atoms[k])
581 | 
582 |             if depth is not None:
583 | 
584 |                 slice_labels = get_z_slice_labels(self.rotated_coord, width=depth_contour_interval)
585 |                 num_slices = np.max(slice_labels)+1
586 | 
587 |                 if depth == "contours":
588 |                     for s in range(num_slices):
589 |                         for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))):
590 |                             if not only_annotated or group_name is not None:
591 |                                 atom_indices = region_atoms[group_name]
592 |                                 slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == s]
593 |                                 if len(slice_coords) > 0:
594 |                                     slice_depth = self._rescale_z(np.mean(slice_coords[:,-1]))
595 |                                     slice_outline = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords])
596 |                                     polygons.append(({by:group_name, "depth":slice_depth}, slice_outline))
597 | 
598 |                     # back outline to highlight each group's contours... just duplicating depth==flat code here
599 |                     if back_outline:
600 |                         empty_polygon = sg.Point((0,0)).buffer(0)
601 |                         view_object = empty_polygon
602 |                         region_polygons = dict()
603 |                         for slice in range(num_slices, 0, -1):
604 |                             for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))):
605 |                                 if not only_annotated or group_name is not None:
606 |                                     atom_indices = region_atoms[group_name]
607 |                                     slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == slice]
608 |                                     poly = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords])
609 |                                     this_difference = poly.difference(view_object)
610 |                                     region_polygons[group_name] = region_polygons.get(group_name, empty_polygon).union(this_difference.buffer(0.01))
611 |                                     view_object = view_object.union(this_difference.buffer(0.01))
612 | 
613 |                         for v in region_polygons.values():
614 |                             self._group_outlines.append(v)
615 | 
616 |                 elif depth == "flat":
617 |                     empty_polygon = sg.Point((0,0)).buffer(0)
618 |                     view_object = empty_polygon
619 |                     region_polygons = dict()
620 |                     for slice in range(num_slices, 0, -1):
621 |                         for group_i, (group_name, group_res) in enumerate(sorted(residue_groups.items(), key=lambda x: (x[0] is None, x))):
622 |                             if not only_annotated or group_name is not None:
623 |                                 atom_indices = region_atoms[group_name]
624 |                                 slice_coords = self.rotated_coord[atom_indices][slice_labels[atom_indices] == slice]
625 |                                 poly = so.unary_union([sg.Point(c).buffer(radius_) for c in slice_coords])
626 |                                 this_difference = poly.difference(view_object)
627 |                                 region_polygons[group_name] = region_polygons.get(group_name, empty_polygon).union(this_difference.buffer(0.01))
628 |                                 view_object = view_object.union(this_difference.buffer(0.01))
629 | 
630 |                     for k,v in region_polygons.items():
631 |                         polygons.append(({by:k}, v))
632 | 
633 |             else:
634 |                 for group_i, (group_name, group_res) in enumerate(residue_groups.items()):
635 |                     if not only_annotated or group_name is not None:
636 |                         group_coords = self.rotated_coord[region_atoms[group_name]]
637 |                         polygons.append(({by:group_name}, so.unary_union([sg.Point(i).buffer(radius_) for i in group_coords])))
638 | 
639 |         if back_outline:
640 |             self._back_outline =  so.unary_union([p[1].buffer(0.01) for p in polygons])
641 |         else:
642 |             self._back_outline = None
643 | 
644 |         print("Outlined {} polygons!".format(len(polygons)), file=sys.stderr)
645 | 
646 |         return cellscape.Cartoon(self.name, polygons, self.residues_flat, by, self._back_outline, self._group_outlines, self.num_groups, get_dimensions(self.rotated_coord), groups)
647 | 


--------------------------------------------------------------------------------
/cellscape/util.py:
--------------------------------------------------------------------------------
 1 | amino_acid_3letter = {'ALA':'A',
 2 | 'ASX':'B',
 3 | 'CYS':'C',
 4 | 'ASP':'D',
 5 | 'GLU':'E',
 6 | 'PHE':'F',
 7 | 'GLY':'G',
 8 | 'HIS':'H',
 9 | 'ILE':'I',
10 | 'LYS':'K',
11 | 'LEU':'L',
12 | 'MET':'M',
13 | 'MSE':'M',
14 | 'ASN':'N',
15 | 'PRO':'P',
16 | 'GLN':'Q',
17 | 'ARG':'R',
18 | 'SER':'S',
19 | 'THR':'T',
20 | 'VAL':'V',
21 | 'TRP':'W',
22 | 'XAA':'X',
23 | 'TYR':'Y',
24 | 'GLX':'Z'}
25 | 
26 | def group_by(l, key):
27 |     """Take a list of dictionaries and group them according to a key."""
28 |     d = dict()
29 |     for i in l:
30 |         k = key(i)
31 |         if k in d:
32 |             d[k].append(i)
33 |         else:
34 |             d[k] = [i]
35 |     return d


--------------------------------------------------------------------------------
/examples/ceacam5/P06731.xml:
--------------------------------------------------------------------------------
   1 | <?xml version='1.0' encoding='UTF-8'?>
   2 | <uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://uniprot.org/uniprot http://www.uniprot.org/support/docs/uniprot.xsd">
   3 | <entry dataset="Swiss-Prot" created="1988-01-01" modified="2020-06-17" version="196">
   4 | <accession>P06731</accession>
   5 | <accession>H9KVA7</accession>
   6 | <name>CEAM5_HUMAN</name>
   7 | <protein>
   8 | <recommendedName>
   9 | <fullName>Carcinoembryonic antigen-related cell adhesion molecule 5</fullName>
  10 | </recommendedName>
  11 | <alternativeName>
  12 | <fullName>Carcinoembryonic antigen</fullName>
  13 | <shortName>CEA</shortName>
  14 | </alternativeName>
  15 | <alternativeName>
  16 | <fullName>Meconium antigen 100</fullName>
  17 | </alternativeName>
  18 | <cdAntigenName>CD66e</cdAntigenName>
  19 | </protein>
  20 | <gene>
  21 | <name type="primary">CEACAM5</name>
  22 | <name type="synonym">CEA</name>
  23 | </gene>
  24 | <organism>
  25 | <name type="scientific">Homo sapiens</name>
  26 | <name type="common">Human</name>
  27 | <dbReference type="NCBI Taxonomy" id="9606"/>
  28 | <lineage>
  29 | <taxon>Eukaryota</taxon>
  30 | <taxon>Metazoa</taxon>
  31 | <taxon>Chordata</taxon>
  32 | <taxon>Craniata</taxon>
  33 | <taxon>Vertebrata</taxon>
  34 | <taxon>Euteleostomi</taxon>
  35 | <taxon>Mammalia</taxon>
  36 | <taxon>Eutheria</taxon>
  37 | <taxon>Euarchontoglires</taxon>
  38 | <taxon>Primates</taxon>
  39 | <taxon>Haplorrhini</taxon>
  40 | <taxon>Catarrhini</taxon>
  41 | <taxon>Hominidae</taxon>
  42 | <taxon>Homo</taxon>
  43 | </lineage>
  44 | </organism>
  45 | <reference key="1">
  46 | <citation type="journal article" date="1987" name="Mol. Cell. Biol." volume="7" first="3221" last="3230">
  47 | <title>Isolation and characterization of full-length functional cDNA clones for human carcinoembryonic antigen.</title>
  48 | <authorList>
  49 | <person name="Beauchemin N."/>
  50 | <person name="Benchimol S."/>
  51 | <person name="Cournoyer D."/>
  52 | <person name="Fuks A."/>
  53 | <person name="Stanners C.P."/>
  54 | </authorList>
  55 | <dbReference type="PubMed" id="3670312"/>
  56 | <dbReference type="DOI" id="10.1128/mcb.7.9.3221"/>
  57 | </citation>
  58 | <scope>NUCLEOTIDE SEQUENCE [GENOMIC DNA]</scope>
  59 | <scope>VARIANT GLU-398</scope>
  60 | </reference>
  61 | <reference key="2">
  62 | <citation type="journal article" date="1988" name="Genomics" volume="3" first="59" last="66">
  63 | <title>Carcinoembryonic antigen family: characterization of cDNAs coding for NCA and CEA and suggestion of nonrandom sequence variation in their conserved loop-domains.</title>
  64 | <authorList>
  65 | <person name="Barnett T."/>
  66 | <person name="Goebel S.J."/>
  67 | <person name="Nothdurft M.A."/>
  68 | <person name="Elting J.J."/>
  69 | </authorList>
  70 | <dbReference type="PubMed" id="3220478"/>
  71 | <dbReference type="DOI" id="10.1016/0888-7543(88)90160-7"/>
  72 | </citation>
  73 | <scope>NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1)</scope>
  74 | <scope>VARIANT GLU-398</scope>
  75 | </reference>
  76 | <reference key="3">
  77 | <citation type="journal article" date="1990" name="Mol. Cell. Biol." volume="10" first="2738" last="2748">
  78 | <title>Cloning of the complete gene for carcinoembryonic antigen: analysis of its promoter indicates a region conveying cell type-specific expression.</title>
  79 | <authorList>
  80 | <person name="Schrewe H."/>
  81 | <person name="Thompson J."/>
  82 | <person name="Bona M."/>
  83 | <person name="Hefta L.J.F."/>
  84 | <person name="Maruya A."/>
  85 | <person name="Hassauer M."/>
  86 | <person name="Shively J.E."/>
  87 | <person name="von Kleist S."/>
  88 | <person name="Zimmermann W."/>
  89 | </authorList>
  90 | <dbReference type="PubMed" id="2342461"/>
  91 | <dbReference type="DOI" id="10.1128/mcb.10.6.2738"/>
  92 | </citation>
  93 | <scope>NUCLEOTIDE SEQUENCE [GENOMIC DNA]</scope>
  94 | <scope>VARIANT GLU-398</scope>
  95 | </reference>
  96 | <reference key="4">
  97 | <citation type="journal article" date="2004" name="Nature" volume="428" first="529" last="535">
  98 | <title>The DNA sequence and biology of human chromosome 19.</title>
  99 | <authorList>
 100 | <person name="Grimwood J."/>
 101 | <person name="Gordon L.A."/>
 102 | <person name="Olsen A.S."/>
 103 | <person name="Terry A."/>
 104 | <person name="Schmutz J."/>
 105 | <person name="Lamerdin J.E."/>
 106 | <person name="Hellsten U."/>
 107 | <person name="Goodstein D."/>
 108 | <person name="Couronne O."/>
 109 | <person name="Tran-Gyamfi M."/>
 110 | <person name="Aerts A."/>
 111 | <person name="Altherr M."/>
 112 | <person name="Ashworth L."/>
 113 | <person name="Bajorek E."/>
 114 | <person name="Black S."/>
 115 | <person name="Branscomb E."/>
 116 | <person name="Caenepeel S."/>
 117 | <person name="Carrano A.V."/>
 118 | <person name="Caoile C."/>
 119 | <person name="Chan Y.M."/>
 120 | <person name="Christensen M."/>
 121 | <person name="Cleland C.A."/>
 122 | <person name="Copeland A."/>
 123 | <person name="Dalin E."/>
 124 | <person name="Dehal P."/>
 125 | <person name="Denys M."/>
 126 | <person name="Detter J.C."/>
 127 | <person name="Escobar J."/>
 128 | <person name="Flowers D."/>
 129 | <person name="Fotopulos D."/>
 130 | <person name="Garcia C."/>
 131 | <person name="Georgescu A.M."/>
 132 | <person name="Glavina T."/>
 133 | <person name="Gomez M."/>
 134 | <person name="Gonzales E."/>
 135 | <person name="Groza M."/>
 136 | <person name="Hammon N."/>
 137 | <person name="Hawkins T."/>
 138 | <person name="Haydu L."/>
 139 | <person name="Ho I."/>
 140 | <person name="Huang W."/>
 141 | <person name="Israni S."/>
 142 | <person name="Jett J."/>
 143 | <person name="Kadner K."/>
 144 | <person name="Kimball H."/>
 145 | <person name="Kobayashi A."/>
 146 | <person name="Larionov V."/>
 147 | <person name="Leem S.-H."/>
 148 | <person name="Lopez F."/>
 149 | <person name="Lou Y."/>
 150 | <person name="Lowry S."/>
 151 | <person name="Malfatti S."/>
 152 | <person name="Martinez D."/>
 153 | <person name="McCready P.M."/>
 154 | <person name="Medina C."/>
 155 | <person name="Morgan J."/>
 156 | <person name="Nelson K."/>
 157 | <person name="Nolan M."/>
 158 | <person name="Ovcharenko I."/>
 159 | <person name="Pitluck S."/>
 160 | <person name="Pollard M."/>
 161 | <person name="Popkie A.P."/>
 162 | <person name="Predki P."/>
 163 | <person name="Quan G."/>
 164 | <person name="Ramirez L."/>
 165 | <person name="Rash S."/>
 166 | <person name="Retterer J."/>
 167 | <person name="Rodriguez A."/>
 168 | <person name="Rogers S."/>
 169 | <person name="Salamov A."/>
 170 | <person name="Salazar A."/>
 171 | <person name="She X."/>
 172 | <person name="Smith D."/>
 173 | <person name="Slezak T."/>
 174 | <person name="Solovyev V."/>
 175 | <person name="Thayer N."/>
 176 | <person name="Tice H."/>
 177 | <person name="Tsai M."/>
 178 | <person name="Ustaszewska A."/>
 179 | <person name="Vo N."/>
 180 | <person name="Wagner M."/>
 181 | <person name="Wheeler J."/>
 182 | <person name="Wu K."/>
 183 | <person name="Xie G."/>
 184 | <person name="Yang J."/>
 185 | <person name="Dubchak I."/>
 186 | <person name="Furey T.S."/>
 187 | <person name="DeJong P."/>
 188 | <person name="Dickson M."/>
 189 | <person name="Gordon D."/>
 190 | <person name="Eichler E.E."/>
 191 | <person name="Pennacchio L.A."/>
 192 | <person name="Richardson P."/>
 193 | <person name="Stubbs L."/>
 194 | <person name="Rokhsar D.S."/>
 195 | <person name="Myers R.M."/>
 196 | <person name="Rubin E.M."/>
 197 | <person name="Lucas S.M."/>
 198 | </authorList>
 199 | <dbReference type="PubMed" id="15057824"/>
 200 | <dbReference type="DOI" id="10.1038/nature02399"/>
 201 | </citation>
 202 | <scope>NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]</scope>
 203 | </reference>
 204 | <reference key="5">
 205 | <citation type="journal article" date="1987" name="Biochem. Biophys. Res. Commun." volume="142" first="511" last="518">
 206 | <title>Primary structure of human carcinoembryonic antigen (CEA) deduced from cDNA sequence.</title>
 207 | <authorList>
 208 | <person name="Oikawa S."/>
 209 | <person name="Nakazato H."/>
 210 | <person name="Kosaki G."/>
 211 | </authorList>
 212 | <dbReference type="PubMed" id="3814146"/>
 213 | <dbReference type="DOI" id="10.1016/0006-291x(87)90304-4"/>
 214 | </citation>
 215 | <scope>NUCLEOTIDE SEQUENCE [MRNA] OF 5-702 (ISOFORM 2)</scope>
 216 | <scope>VARIANT GLU-398</scope>
 217 | </reference>
 218 | <reference key="6">
 219 | <citation type="journal article" date="1987" name="Proc. Natl. Acad. Sci. U.S.A." volume="84" first="2960" last="2964">
 220 | <title>Isolation and characterization of cDNA clones encoding the human carcinoembryonic antigen reveal a highly conserved repeating structure.</title>
 221 | <authorList>
 222 | <person name="Zimmermann W."/>
 223 | <person name="Ortlieb B."/>
 224 | <person name="Friedrich R."/>
 225 | <person name="von Kleist S."/>
 226 | </authorList>
 227 | <dbReference type="PubMed" id="3033671"/>
 228 | <dbReference type="DOI" id="10.1073/pnas.84.9.2960"/>
 229 | </citation>
 230 | <scope>NUCLEOTIDE SEQUENCE [MRNA] OF 331-702 (ISOFORM 1)</scope>
 231 | <scope>VARIANT GLU-398</scope>
 232 | </reference>
 233 | <reference key="7">
 234 | <citation type="journal article" date="1989" name="Biochem. Biophys. Res. Commun." volume="164" first="39" last="45">
 235 | <title>Cell adhesion activity of non-specific cross-reacting antigen (NCA) and carcinoembryonic antigen (CEA) expressed on CHO cell surface: homophilic and heterophilic adhesion.</title>
 236 | <authorList>
 237 | <person name="Oikawa S."/>
 238 | <person name="Inuzuka C."/>
 239 | <person name="Kuroki M."/>
 240 | <person name="Matsuoka Y."/>
 241 | <person name="Kosaki G."/>
 242 | <person name="Nakazato H."/>
 243 | </authorList>
 244 | <dbReference type="PubMed" id="2803308"/>
 245 | <dbReference type="DOI" id="10.1016/0006-291x(89)91679-3"/>
 246 | </citation>
 247 | <scope>FUNCTION</scope>
 248 | <scope>SUBCELLULAR LOCATION</scope>
 249 | </reference>
 250 | <reference key="8">
 251 | <citation type="journal article" date="1990" name="Cancer Res." volume="50" first="2397" last="2403">
 252 | <title>Expression of complementary DNA and genomic clones for carcinoembryonic antigen and nonspecific cross-reacting antigen in Chinese hamster ovary and mouse fibroblast cells and characterization of the membrane-expressed products.</title>
 253 | <authorList>
 254 | <person name="Hefta L.J."/>
 255 | <person name="Schrewe H."/>
 256 | <person name="Thompson J.A."/>
 257 | <person name="Oikawa S."/>
 258 | <person name="Nakazato H."/>
 259 | <person name="Shively J.E."/>
 260 | </authorList>
 261 | <dbReference type="PubMed" id="2317824"/>
 262 | </citation>
 263 | <scope>SUBCELLULAR LOCATION</scope>
 264 | <scope>GPI-ANCHOR AT ALA-685</scope>
 265 | </reference>
 266 | <reference key="9">
 267 | <citation type="journal article" date="1999" name="Tumor Biol." volume="20" first="277" last="292">
 268 | <title>Four carcinoembryonic antigen subfamily members, CEA, NCA, BGP and CGM2, selectively expressed in the normal human colonic epithelium, are integral components of the fuzzy coat.</title>
 269 | <authorList>
 270 | <person name="Fraengsmyr L."/>
 271 | <person name="Baranov V."/>
 272 | <person name="Hammarstroem S."/>
 273 | </authorList>
 274 | <dbReference type="PubMed" id="10436421"/>
 275 | <dbReference type="DOI" id="10.1159/000030075"/>
 276 | </citation>
 277 | <scope>SUBCELLULAR LOCATION</scope>
 278 | <scope>TISSUE SPECIFICITY</scope>
 279 | </reference>
 280 | <reference key="10">
 281 | <citation type="journal article" date="2000" name="Cancer Res." volume="60" first="3419" last="3424">
 282 | <title>Human carcinoembryonic antigen functions as a general inhibitor of anoikis.</title>
 283 | <authorList>
 284 | <person name="Ordonez C."/>
 285 | <person name="Screaton R.A."/>
 286 | <person name="Ilantzis C."/>
 287 | <person name="Stanners C.P."/>
 288 | </authorList>
 289 | <dbReference type="PubMed" id="10910050"/>
 290 | </citation>
 291 | <scope>FUNCTION</scope>
 292 | </reference>
 293 | <reference key="11">
 294 | <citation type="journal article" date="2000" name="J. Biol. Chem." volume="275" first="26935" last="26943">
 295 | <title>Self recognition in the Ig superfamily. Identification of precise subdomains in carcinoembryonic antigen required for intercellular adhesion.</title>
 296 | <authorList>
 297 | <person name="Taheri M."/>
 298 | <person name="Saragovi U."/>
 299 | <person name="Fuks A."/>
 300 | <person name="Makkerh J."/>
 301 | <person name="Mort J."/>
 302 | <person name="Stanners C.P."/>
 303 | </authorList>
 304 | <dbReference type="PubMed" id="10864933"/>
 305 | <dbReference type="DOI" id="10.1074/jbc.m909242199"/>
 306 | </citation>
 307 | <scope>FUNCTION</scope>
 308 | <scope>MUTAGENESIS OF SER-66; TYR-68; LYS-69 AND GLN-78</scope>
 309 | <scope>SUBCELLULAR LOCATION</scope>
 310 | </reference>
 311 | <reference key="12">
 312 | <citation type="journal article" date="2006" name="J. Proteome Res." volume="5" first="1493" last="1503">
 313 | <title>Identification of N-linked glycoproteins in human saliva by glycoprotein capture and mass spectrometry.</title>
 314 | <authorList>
 315 | <person name="Ramachandran P."/>
 316 | <person name="Boontheung P."/>
 317 | <person name="Xie Y."/>
 318 | <person name="Sondej M."/>
 319 | <person name="Wong D.T."/>
 320 | <person name="Loo J.A."/>
 321 | </authorList>
 322 | <dbReference type="PubMed" id="16740002"/>
 323 | <dbReference type="DOI" id="10.1021/pr050492k"/>
 324 | </citation>
 325 | <scope>GLYCOSYLATION [LARGE SCALE ANALYSIS] AT ASN-560</scope>
 326 | <source>
 327 | <tissue>Saliva</tissue>
 328 | </source>
 329 | </reference>
 330 | <reference key="13">
 331 | <citation type="journal article" date="2009" name="J. Proteome Res." volume="8" first="651" last="661">
 332 | <title>Glycoproteomics analysis of human liver tissue by combination of multiple enzyme digestion and hydrazide chemistry.</title>
 333 | <authorList>
 334 | <person name="Chen R."/>
 335 | <person name="Jiang X."/>
 336 | <person name="Sun D."/>
 337 | <person name="Han G."/>
 338 | <person name="Wang F."/>
 339 | <person name="Ye M."/>
 340 | <person name="Wang L."/>
 341 | <person name="Zou H."/>
 342 | </authorList>
 343 | <dbReference type="PubMed" id="19159218"/>
 344 | <dbReference type="DOI" id="10.1021/pr8008012"/>
 345 | </citation>
 346 | <scope>GLYCOSYLATION [LARGE SCALE ANALYSIS] AT ASN-246</scope>
 347 | <source>
 348 | <tissue>Liver</tissue>
 349 | </source>
 350 | </reference>
 351 | <reference key="14">
 352 | <citation type="journal article" date="2015" name="Proc. Natl. Acad. Sci. U.S.A." volume="112" first="13561" last="13566">
 353 | <title>Diverse oligomeric states of CEACAM IgV domains.</title>
 354 | <authorList>
 355 | <person name="Bonsor D.A."/>
 356 | <person name="Gunther S."/>
 357 | <person name="Beadenkopf R."/>
 358 | <person name="Beckett D."/>
 359 | <person name="Sundberg E.J."/>
 360 | </authorList>
 361 | <dbReference type="PubMed" id="26483485"/>
 362 | <dbReference type="DOI" id="10.1073/pnas.1509511112"/>
 363 | </citation>
 364 | <scope>SUBUNIT</scope>
 365 | </reference>
 366 | <reference key="15">
 367 | <citation type="journal article" date="2000" name="FEBS Lett." volume="475" first="11" last="16">
 368 | <title>Structural models for carcinoembryonic antigen and its complex with the single-chain Fv antibody molecule MFE23.</title>
 369 | <authorList>
 370 | <person name="Boehm M.K."/>
 371 | <person name="Perkins S.J."/>
 372 | </authorList>
 373 | <dbReference type="PubMed" id="10854848"/>
 374 | <dbReference type="DOI" id="10.1016/s0014-5793(00)01612-4"/>
 375 | </citation>
 376 | <scope>3D-STRUCTURE MODELING OF 35-676</scope>
 377 | </reference>
 378 | <reference key="16">
 379 | <citation type="journal article" date="2008" name="Mol. Microbiol." volume="67" first="420" last="434">
 380 | <title>Binding of Dr adhesins of Escherichia coli to carcinoembryonic antigen triggers receptor dissociation.</title>
 381 | <authorList>
 382 | <person name="Korotkova N."/>
 383 | <person name="Yang Y."/>
 384 | <person name="Le Trong I."/>
 385 | <person name="Cota E."/>
 386 | <person name="Demeler B."/>
 387 | <person name="Marchant J."/>
 388 | <person name="Thomas W.E."/>
 389 | <person name="Stenkamp R.E."/>
 390 | <person name="Moseley S.L."/>
 391 | <person name="Matthews S."/>
 392 | </authorList>
 393 | <dbReference type="PubMed" id="18086185"/>
 394 | <dbReference type="DOI" id="10.1111/j.1365-2958.2007.06054.x"/>
 395 | </citation>
 396 | <scope>X-RAY CRYSTALLOGRAPHY (1.95 ANGSTROMS) OF 34-144 IN COMPLEX WITH E.COLI DR ADHESIN</scope>
 397 | <scope>FUNCTION (MICROBIAL INFECTION)</scope>
 398 | <scope>SUBUNIT</scope>
 399 | <scope>SUBCELLULAR LOCATION</scope>
 400 | <scope>MUTAGENESIS OF PHE-63; SER-66; VAL-73; ASP-74; GLN-78; ILE-125; LEU-129 AND GLU-133</scope>
 401 | </reference>
 402 | <comment type="function">
 403 | <text evidence="7 8 15">Cell surface glycoprotein that plays a role in cell adhesion, intracellular signaling and tumor progression (PubMed:2803308, PubMed:10910050, PubMed:10864933). Mediates homophilic and heterophilic cell adhesion with other carcinoembryonic antigen-related cell adhesion molecules, such as CEACAM6 (PubMed:2803308). Plays a role as an oncogene by promoting tumor progression; induces resistance to anoikis of colorectal carcinoma cells (PubMed:10910050).</text>
 404 | </comment>
 405 | <comment type="function">
 406 | <text evidence="10">(Microbial infection) Receptor for E.coli Dr adhesins. Binding of E.coli Dr adhesins leads to dissociation of the homodimer.</text>
 407 | </comment>
 408 | <comment type="subunit">
 409 | <text evidence="10 14">Homodimer.</text>
 410 | </comment>
 411 | <comment type="interaction">
 412 | <interactant intactId="EBI-3914938">
 413 | <id>P06731</id>
 414 | </interactant>
 415 | <interactant intactId="EBI-3914938">
 416 | <id>P06731</id>
 417 | <label>CEACAM5</label>
 418 | </interactant>
 419 | <organismsDiffer>false</organismsDiffer>
 420 | <experiments>6</experiments>
 421 | </comment>
 422 | <comment type="interaction">
 423 | <interactant intactId="EBI-3914938">
 424 | <id>P06731</id>
 425 | </interactant>
 426 | <interactant intactId="EBI-16040613">
 427 | <id>K0BRG7</id>
 428 | <label>S</label>
 429 | </interactant>
 430 | <organismsDiffer>true</organismsDiffer>
 431 | <experiments>4</experiments>
 432 | </comment>
 433 | <comment type="subcellular location">
 434 | <subcellularLocation>
 435 | <location evidence="7 10 12">Cell membrane</location>
 436 | <topology evidence="7 10 12">Lipid-anchor</topology>
 437 | <topology evidence="7 10 12">GPI-anchor</topology>
 438 | </subcellularLocation>
 439 | <subcellularLocation>
 440 | <location evidence="6">Apical cell membrane</location>
 441 | </subcellularLocation>
 442 | <subcellularLocation>
 443 | <location evidence="12 15">Cell surface</location>
 444 | </subcellularLocation>
 445 | <text evidence="6">Localized to the apical glycocalyx surface.</text>
 446 | </comment>
 447 | <comment type="alternative products">
 448 | <event type="alternative splicing"/>
 449 | <isoform>
 450 | <id>P06731-1</id>
 451 | <name>1</name>
 452 | <sequence type="displayed"/>
 453 | </isoform>
 454 | <isoform>
 455 | <id>P06731-2</id>
 456 | <name>2</name>
 457 | <sequence type="described" ref="VSP_053414"/>
 458 | </isoform>
 459 | </comment>
 460 | <comment type="tissue specificity">
 461 | <text evidence="6">Expressed in columnar epithelial and goblet cells of the colon (at protein level) (PubMed:10436421). Found in adenocarcinomas of endodermally derived digestive system epithelium and fetal colon.</text>
 462 | </comment>
 463 | <comment type="PTM">
 464 | <text>Complex immunoreactive glycoprotein with a MW of 180 kDa comprising 60% carbohydrate.</text>
 465 | </comment>
 466 | <comment type="similarity">
 467 | <text evidence="21">Belongs to the immunoglobulin superfamily. CEA family.</text>
 468 | </comment>
 469 | <comment type="sequence caution" evidence="21">
 470 | <conflict type="frameshift">
 471 | <sequence resource="EMBL-CDS" id="AAA62835" version="1"/>
 472 | </conflict>
 473 | </comment>
 474 | <dbReference type="EMBL" id="M17303">
 475 | <property type="protein sequence ID" value="AAB59513.1"/>
 476 | <property type="molecule type" value="Genomic_DNA"/>
 477 | </dbReference>
 478 | <dbReference type="EMBL" id="M29540">
 479 | <property type="protein sequence ID" value="AAA51967.1"/>
 480 | <property type="molecule type" value="mRNA"/>
 481 | </dbReference>
 482 | <dbReference type="EMBL" id="M59262">
 483 | <property type="protein sequence ID" value="AAA62835.1"/>
 484 | <property type="status" value="ALT_FRAME"/>
 485 | <property type="molecule type" value="Genomic_DNA"/>
 486 | </dbReference>
 487 | <dbReference type="EMBL" id="M59255">
 488 | <property type="protein sequence ID" value="AAA62835.1"/>
 489 | <property type="status" value="JOINED"/>
 490 | <property type="molecule type" value="Genomic_DNA"/>
 491 | </dbReference>
 492 | <dbReference type="EMBL" id="M59256">
 493 | <property type="protein sequence ID" value="AAA62835.1"/>
 494 | <property type="status" value="JOINED"/>
 495 | <property type="molecule type" value="Genomic_DNA"/>
 496 | </dbReference>
 497 | <dbReference type="EMBL" id="M59257">
 498 | <property type="protein sequence ID" value="AAA62835.1"/>
 499 | <property type="status" value="JOINED"/>
 500 | <property type="molecule type" value="Genomic_DNA"/>
 501 | </dbReference>
 502 | <dbReference type="EMBL" id="M59258">
 503 | <property type="protein sequence ID" value="AAA62835.1"/>
 504 | <property type="status" value="JOINED"/>
 505 | <property type="molecule type" value="Genomic_DNA"/>
 506 | </dbReference>
 507 | <dbReference type="EMBL" id="M59259">
 508 | <property type="protein sequence ID" value="AAA62835.1"/>
 509 | <property type="status" value="JOINED"/>
 510 | <property type="molecule type" value="Genomic_DNA"/>
 511 | </dbReference>
 512 | <dbReference type="EMBL" id="M59260">
 513 | <property type="protein sequence ID" value="AAA62835.1"/>
 514 | <property type="status" value="JOINED"/>
 515 | <property type="molecule type" value="Genomic_DNA"/>
 516 | </dbReference>
 517 | <dbReference type="EMBL" id="M59261">
 518 | <property type="protein sequence ID" value="AAA62835.1"/>
 519 | <property type="status" value="JOINED"/>
 520 | <property type="molecule type" value="Genomic_DNA"/>
 521 | </dbReference>
 522 | <dbReference type="EMBL" id="M59709">
 523 | <property type="status" value="NOT_ANNOTATED_CDS"/>
 524 | <property type="molecule type" value="Genomic_DNA"/>
 525 | </dbReference>
 526 | <dbReference type="EMBL" id="M59710">
 527 | <property type="status" value="NOT_ANNOTATED_CDS"/>
 528 | <property type="molecule type" value="Genomic_DNA"/>
 529 | </dbReference>
 530 | <dbReference type="EMBL" id="AC008999">
 531 | <property type="status" value="NOT_ANNOTATED_CDS"/>
 532 | <property type="molecule type" value="Genomic_DNA"/>
 533 | </dbReference>
 534 | <dbReference type="EMBL" id="X16455">
 535 | <property type="protein sequence ID" value="CAA34474.1"/>
 536 | <property type="molecule type" value="mRNA"/>
 537 | </dbReference>
 538 | <dbReference type="EMBL" id="M15042">
 539 | <property type="protein sequence ID" value="AAA51963.1"/>
 540 | <property type="molecule type" value="mRNA"/>
 541 | </dbReference>
 542 | <dbReference type="EMBL" id="M16234">
 543 | <property type="protein sequence ID" value="AAA51972.1"/>
 544 | <property type="molecule type" value="mRNA"/>
 545 | </dbReference>
 546 | <dbReference type="CCDS" id="CCDS12584.1">
 547 | <molecule id="P06731-1"/>
 548 | </dbReference>
 549 | <dbReference type="CCDS" id="CCDS77302.1">
 550 | <molecule id="P06731-2"/>
 551 | </dbReference>
 552 | <dbReference type="PIR" id="A36319">
 553 | <property type="entry name" value="A36319"/>
 554 | </dbReference>
 555 | <dbReference type="RefSeq" id="NP_001278413.1">
 556 | <property type="nucleotide sequence ID" value="NM_001291484.2"/>
 557 | </dbReference>
 558 | <dbReference type="RefSeq" id="NP_001295327.1">
 559 | <property type="nucleotide sequence ID" value="NM_001308398.1"/>
 560 | </dbReference>
 561 | <dbReference type="RefSeq" id="NP_004354.3">
 562 | <property type="nucleotide sequence ID" value="NM_004363.5"/>
 563 | </dbReference>
 564 | <dbReference type="PDB" id="1E07">
 565 | <property type="method" value="X-ray"/>
 566 | <property type="chains" value="A=35-676"/>
 567 | </dbReference>
 568 | <dbReference type="PDB" id="2QSQ">
 569 | <property type="method" value="X-ray"/>
 570 | <property type="resolution" value="1.95"/>
 571 | <property type="chains" value="A/B=34-144"/>
 572 | </dbReference>
 573 | <dbReference type="PDB" id="2QST">
 574 | <property type="method" value="X-ray"/>
 575 | <property type="resolution" value="2.90"/>
 576 | <property type="chains" value="A/B=34-144"/>
 577 | </dbReference>
 578 | <dbReference type="PDB" id="2VER">
 579 | <property type="method" value="NMR"/>
 580 | <property type="chains" value="N=35-144"/>
 581 | </dbReference>
 582 | <dbReference type="PDBsum" id="1E07"/>
 583 | <dbReference type="PDBsum" id="2QSQ"/>
 584 | <dbReference type="PDBsum" id="2QST"/>
 585 | <dbReference type="PDBsum" id="2VER"/>
 586 | <dbReference type="SMR" id="P06731"/>
 587 | <dbReference type="BioGRID" id="107478">
 588 | <property type="interactions" value="7"/>
 589 | </dbReference>
 590 | <dbReference type="DIP" id="DIP-57769N"/>
 591 | <dbReference type="IntAct" id="P06731">
 592 | <property type="interactions" value="8"/>
 593 | </dbReference>
 594 | <dbReference type="MINT" id="P06731"/>
 595 | <dbReference type="STRING" id="9606.ENSP00000221992"/>
 596 | <dbReference type="ChEMBL" id="CHEMBL3712881"/>
 597 | <dbReference type="DrugBank" id="DB05097">
 598 | <property type="generic name" value="Labetuzumab"/>
 599 | </dbReference>
 600 | <dbReference type="DrugBank" id="DB08217">
 601 | <property type="generic name" value="S-[(1-Hydroxy-2,2,5,5-tetramethyl-2,5-dihydro-1H-pyrrol-3-yl)methyl] methanesulfonothioate"/>
 602 | </dbReference>
 603 | <dbReference type="GlyConnect" id="1070"/>
 604 | <dbReference type="GlyConnect" id="1071"/>
 605 | <dbReference type="iPTMnet" id="P06731"/>
 606 | <dbReference type="PhosphoSitePlus" id="P06731"/>
 607 | <dbReference type="BioMuta" id="CEACAM5"/>
 608 | <dbReference type="DMDM" id="317373456"/>
 609 | <dbReference type="jPOST" id="P06731"/>
 610 | <dbReference type="MassIVE" id="P06731"/>
 611 | <dbReference type="MaxQB" id="P06731"/>
 612 | <dbReference type="PaxDb" id="P06731"/>
 613 | <dbReference type="PeptideAtlas" id="P06731"/>
 614 | <dbReference type="PRIDE" id="P06731"/>
 615 | <dbReference type="ProteomicsDB" id="46254"/>
 616 | <dbReference type="ProteomicsDB" id="51917">
 617 | <molecule id="P06731-1"/>
 618 | </dbReference>
 619 | <dbReference type="ABCD" id="P06731">
 620 | <property type="antibodies" value="39 sequenced antibodies"/>
 621 | </dbReference>
 622 | <dbReference type="Antibodypedia" id="3500">
 623 | <property type="antibodies" value="3158 antibodies"/>
 624 | </dbReference>
 625 | <dbReference type="DNASU" id="1048"/>
 626 | <dbReference type="Ensembl" id="ENST00000221992">
 627 | <property type="protein sequence ID" value="ENSP00000221992"/>
 628 | <property type="gene ID" value="ENSG00000105388"/>
 629 | </dbReference>
 630 | <dbReference type="Ensembl" id="ENST00000398599">
 631 | <property type="protein sequence ID" value="ENSP00000381600"/>
 632 | <property type="gene ID" value="ENSG00000105388"/>
 633 | </dbReference>
 634 | <dbReference type="Ensembl" id="ENST00000405816">
 635 | <property type="protein sequence ID" value="ENSP00000385072"/>
 636 | <property type="gene ID" value="ENSG00000105388"/>
 637 | </dbReference>
 638 | <dbReference type="GeneID" id="1048"/>
 639 | <dbReference type="KEGG" id="hsa:1048"/>
 640 | <dbReference type="UCSC" id="uc002orj.2">
 641 | <molecule id="P06731-1"/>
 642 | <property type="organism name" value="human"/>
 643 | </dbReference>
 644 | <dbReference type="CTD" id="1048"/>
 645 | <dbReference type="DisGeNET" id="1048"/>
 646 | <dbReference type="EuPathDB" id="HostDB:ENSG00000105388.15"/>
 647 | <dbReference type="GeneCards" id="CEACAM5"/>
 648 | <dbReference type="HGNC" id="HGNC:1817">
 649 | <property type="gene designation" value="CEACAM5"/>
 650 | </dbReference>
 651 | <dbReference type="HPA" id="ENSG00000105388">
 652 | <property type="expression patterns" value="Tissue enhanced (esophagus, intestine, lymphoid tissue)"/>
 653 | </dbReference>
 654 | <dbReference type="MIM" id="114890">
 655 | <property type="type" value="gene"/>
 656 | </dbReference>
 657 | <dbReference type="neXtProt" id="NX_P06731"/>
 658 | <dbReference type="PharmGKB" id="PA26361"/>
 659 | <dbReference type="eggNOG" id="ENOG410IFE1">
 660 | <property type="taxonomic scope" value="Eukaryota"/>
 661 | </dbReference>
 662 | <dbReference type="eggNOG" id="ENOG410YR1P">
 663 | <property type="taxonomic scope" value="LUCA"/>
 664 | </dbReference>
 665 | <dbReference type="HOGENOM" id="CLU_024555_7_0_1"/>
 666 | <dbReference type="InParanoid" id="P06731"/>
 667 | <dbReference type="KO" id="K06499"/>
 668 | <dbReference type="OrthoDB" id="998214at2759"/>
 669 | <dbReference type="PhylomeDB" id="P06731"/>
 670 | <dbReference type="TreeFam" id="TF336859"/>
 671 | <dbReference type="Reactome" id="R-HSA-163125">
 672 | <property type="pathway name" value="Post-translational modification: synthesis of GPI-anchored proteins"/>
 673 | </dbReference>
 674 | <dbReference type="Reactome" id="R-HSA-202733">
 675 | <property type="pathway name" value="Cell surface interactions at the vascular wall"/>
 676 | </dbReference>
 677 | <dbReference type="BioGRID-ORCS" id="1048">
 678 | <property type="hits" value="13 hits in 786 CRISPR screens"/>
 679 | </dbReference>
 680 | <dbReference type="ChiTaRS" id="CEACAM5">
 681 | <property type="organism name" value="human"/>
 682 | </dbReference>
 683 | <dbReference type="EvolutionaryTrace" id="P06731"/>
 684 | <dbReference type="GeneWiki" id="CEACAM5"/>
 685 | <dbReference type="GenomeRNAi" id="1048"/>
 686 | <dbReference type="Pharos" id="P06731">
 687 | <property type="development level" value="Tbio"/>
 688 | </dbReference>
 689 | <dbReference type="PRO" id="PR:P06731"/>
 690 | <dbReference type="Proteomes" id="UP000005640">
 691 | <property type="component" value="Chromosome 19"/>
 692 | </dbReference>
 693 | <dbReference type="RNAct" id="P06731">
 694 | <property type="molecule type" value="protein"/>
 695 | </dbReference>
 696 | <dbReference type="Bgee" id="ENSG00000105388">
 697 | <property type="expression patterns" value="Expressed in colonic mucosa and 130 other tissues"/>
 698 | </dbReference>
 699 | <dbReference type="ExpressionAtlas" id="P06731">
 700 | <property type="expression patterns" value="baseline and differential"/>
 701 | </dbReference>
 702 | <dbReference type="Genevisible" id="P06731">
 703 | <property type="organism ID" value="HS"/>
 704 | </dbReference>
 705 | <dbReference type="GO" id="GO:0031225">
 706 | <property type="term" value="C:anchored component of membrane"/>
 707 | <property type="evidence" value="ECO:0000314"/>
 708 | <property type="project" value="UniProtKB"/>
 709 | </dbReference>
 710 | <dbReference type="GO" id="GO:0016324">
 711 | <property type="term" value="C:apical plasma membrane"/>
 712 | <property type="evidence" value="ECO:0000314"/>
 713 | <property type="project" value="UniProtKB"/>
 714 | </dbReference>
 715 | <dbReference type="GO" id="GO:0016323">
 716 | <property type="term" value="C:basolateral plasma membrane"/>
 717 | <property type="evidence" value="ECO:0000314"/>
 718 | <property type="project" value="UniProtKB"/>
 719 | </dbReference>
 720 | <dbReference type="GO" id="GO:0009986">
 721 | <property type="term" value="C:cell surface"/>
 722 | <property type="evidence" value="ECO:0000314"/>
 723 | <property type="project" value="UniProtKB"/>
 724 | </dbReference>
 725 | <dbReference type="GO" id="GO:0070062">
 726 | <property type="term" value="C:extracellular exosome"/>
 727 | <property type="evidence" value="ECO:0007005"/>
 728 | <property type="project" value="UniProtKB"/>
 729 | </dbReference>
 730 | <dbReference type="GO" id="GO:0005576">
 731 | <property type="term" value="C:extracellular region"/>
 732 | <property type="evidence" value="ECO:0000304"/>
 733 | <property type="project" value="Reactome"/>
 734 | </dbReference>
 735 | <dbReference type="GO" id="GO:0071575">
 736 | <property type="term" value="C:integral component of external side of plasma membrane"/>
 737 | <property type="evidence" value="ECO:0000314"/>
 738 | <property type="project" value="UniProtKB"/>
 739 | </dbReference>
 740 | <dbReference type="GO" id="GO:0005887">
 741 | <property type="term" value="C:integral component of plasma membrane"/>
 742 | <property type="evidence" value="ECO:0000314"/>
 743 | <property type="project" value="UniProtKB"/>
 744 | </dbReference>
 745 | <dbReference type="GO" id="GO:0005886">
 746 | <property type="term" value="C:plasma membrane"/>
 747 | <property type="evidence" value="ECO:0000314"/>
 748 | <property type="project" value="HPA"/>
 749 | </dbReference>
 750 | <dbReference type="GO" id="GO:0034235">
 751 | <property type="term" value="F:GPI anchor binding"/>
 752 | <property type="evidence" value="ECO:0000315"/>
 753 | <property type="project" value="UniProtKB"/>
 754 | </dbReference>
 755 | <dbReference type="GO" id="GO:0042802">
 756 | <property type="term" value="F:identical protein binding"/>
 757 | <property type="evidence" value="ECO:0000314"/>
 758 | <property type="project" value="UniProtKB"/>
 759 | </dbReference>
 760 | <dbReference type="GO" id="GO:0042803">
 761 | <property type="term" value="F:protein homodimerization activity"/>
 762 | <property type="evidence" value="ECO:0000314"/>
 763 | <property type="project" value="UniProtKB"/>
 764 | </dbReference>
 765 | <dbReference type="GO" id="GO:0006915">
 766 | <property type="term" value="P:apoptotic process"/>
 767 | <property type="evidence" value="ECO:0000501"/>
 768 | <property type="project" value="UniProtKB-KW"/>
 769 | </dbReference>
 770 | <dbReference type="GO" id="GO:0007157">
 771 | <property type="term" value="P:heterophilic cell-cell adhesion via plasma membrane cell adhesion molecules"/>
 772 | <property type="evidence" value="ECO:0000315"/>
 773 | <property type="project" value="UniProtKB"/>
 774 | </dbReference>
 775 | <dbReference type="GO" id="GO:0007156">
 776 | <property type="term" value="P:homophilic cell adhesion via plasma membrane adhesion molecules"/>
 777 | <property type="evidence" value="ECO:0000315"/>
 778 | <property type="project" value="UniProtKB"/>
 779 | </dbReference>
 780 | <dbReference type="GO" id="GO:0034109">
 781 | <property type="term" value="P:homotypic cell-cell adhesion"/>
 782 | <property type="evidence" value="ECO:0000314"/>
 783 | <property type="project" value="UniProtKB"/>
 784 | </dbReference>
 785 | <dbReference type="GO" id="GO:0050900">
 786 | <property type="term" value="P:leukocyte migration"/>
 787 | <property type="evidence" value="ECO:0000304"/>
 788 | <property type="project" value="Reactome"/>
 789 | </dbReference>
 790 | <dbReference type="GO" id="GO:2000811">
 791 | <property type="term" value="P:negative regulation of anoikis"/>
 792 | <property type="evidence" value="ECO:0000314"/>
 793 | <property type="project" value="UniProtKB"/>
 794 | </dbReference>
 795 | <dbReference type="GO" id="GO:0043066">
 796 | <property type="term" value="P:negative regulation of apoptotic process"/>
 797 | <property type="evidence" value="ECO:0000314"/>
 798 | <property type="project" value="UniProtKB"/>
 799 | </dbReference>
 800 | <dbReference type="GO" id="GO:0010832">
 801 | <property type="term" value="P:negative regulation of myotube differentiation"/>
 802 | <property type="evidence" value="ECO:0000314"/>
 803 | <property type="project" value="UniProtKB"/>
 804 | </dbReference>
 805 | <dbReference type="Gene3D" id="2.60.40.10">
 806 | <property type="match status" value="7"/>
 807 | </dbReference>
 808 | <dbReference type="InterPro" id="IPR007110">
 809 | <property type="entry name" value="Ig-like_dom"/>
 810 | </dbReference>
 811 | <dbReference type="InterPro" id="IPR036179">
 812 | <property type="entry name" value="Ig-like_dom_sf"/>
 813 | </dbReference>
 814 | <dbReference type="InterPro" id="IPR013783">
 815 | <property type="entry name" value="Ig-like_fold"/>
 816 | </dbReference>
 817 | <dbReference type="InterPro" id="IPR003599">
 818 | <property type="entry name" value="Ig_sub"/>
 819 | </dbReference>
 820 | <dbReference type="InterPro" id="IPR003598">
 821 | <property type="entry name" value="Ig_sub2"/>
 822 | </dbReference>
 823 | <dbReference type="InterPro" id="IPR013106">
 824 | <property type="entry name" value="Ig_V-set"/>
 825 | </dbReference>
 826 | <dbReference type="Pfam" id="PF13895">
 827 | <property type="entry name" value="Ig_2"/>
 828 | <property type="match status" value="3"/>
 829 | </dbReference>
 830 | <dbReference type="Pfam" id="PF07686">
 831 | <property type="entry name" value="V-set"/>
 832 | <property type="match status" value="1"/>
 833 | </dbReference>
 834 | <dbReference type="SMART" id="SM00409">
 835 | <property type="entry name" value="IG"/>
 836 | <property type="match status" value="7"/>
 837 | </dbReference>
 838 | <dbReference type="SMART" id="SM00408">
 839 | <property type="entry name" value="IGc2"/>
 840 | <property type="match status" value="6"/>
 841 | </dbReference>
 842 | <dbReference type="SUPFAM" id="SSF48726">
 843 | <property type="entry name" value="SSF48726"/>
 844 | <property type="match status" value="7"/>
 845 | </dbReference>
 846 | <dbReference type="PROSITE" id="PS50835">
 847 | <property type="entry name" value="IG_LIKE"/>
 848 | <property type="match status" value="6"/>
 849 | </dbReference>
 850 | <proteinExistence type="evidence at protein level"/>
 851 | <keyword id="KW-0002">3D-structure</keyword>
 852 | <keyword id="KW-0025">Alternative splicing</keyword>
 853 | <keyword id="KW-0053">Apoptosis</keyword>
 854 | <keyword id="KW-0130">Cell adhesion</keyword>
 855 | <keyword id="KW-1003">Cell membrane</keyword>
 856 | <keyword id="KW-1015">Disulfide bond</keyword>
 857 | <keyword id="KW-0325">Glycoprotein</keyword>
 858 | <keyword id="KW-0336">GPI-anchor</keyword>
 859 | <keyword id="KW-0393">Immunoglobulin domain</keyword>
 860 | <keyword id="KW-0449">Lipoprotein</keyword>
 861 | <keyword id="KW-0472">Membrane</keyword>
 862 | <keyword id="KW-0553">Oncogene</keyword>
 863 | <keyword id="KW-0621">Polymorphism</keyword>
 864 | <keyword id="KW-1185">Reference proteome</keyword>
 865 | <keyword id="KW-0677">Repeat</keyword>
 866 | <keyword id="KW-0732">Signal</keyword>
 867 | <feature type="signal peptide">
 868 | <location>
 869 | <begin position="1"/>
 870 | <end position="34"/>
 871 | </location>
 872 | </feature>
 873 | <feature type="chain" description="Carcinoembryonic antigen-related cell adhesion molecule 5" id="PRO_0000014566">
 874 | <location>
 875 | <begin position="35"/>
 876 | <end position="685"/>
 877 | </location>
 878 | </feature>
 879 | <feature type="propeptide" description="Removed in mature form" id="PRO_0000014567" evidence="3">
 880 | <location>
 881 | <begin position="686"/>
 882 | <end position="702"/>
 883 | </location>
 884 | </feature>
 885 | <feature type="domain" description="Ig-like V-type" evidence="2">
 886 | <location>
 887 | <begin position="35"/>
 888 | <end position="144"/>
 889 | </location>
 890 | </feature>
 891 | <feature type="domain" description="Ig-like C2-type 1" evidence="4">
 892 | <location>
 893 | <begin position="145"/>
 894 | <end position="232"/>
 895 | </location>
 896 | </feature>
 897 | <feature type="domain" description="Ig-like C2-type 2" evidence="4">
 898 | <location>
 899 | <begin position="240"/>
 900 | <end position="315"/>
 901 | </location>
 902 | </feature>
 903 | <feature type="domain" description="Ig-like C2-type 3" evidence="4">
 904 | <location>
 905 | <begin position="323"/>
 906 | <end position="410"/>
 907 | </location>
 908 | </feature>
 909 | <feature type="domain" description="Ig-like C2-type 4" evidence="4">
 910 | <location>
 911 | <begin position="418"/>
 912 | <end position="495"/>
 913 | </location>
 914 | </feature>
 915 | <feature type="domain" description="Ig-like C2-type 5" evidence="4">
 916 | <location>
 917 | <begin position="501"/>
 918 | <end position="588"/>
 919 | </location>
 920 | </feature>
 921 | <feature type="domain" description="Ig-like C2-type 6" evidence="4">
 922 | <location>
 923 | <begin position="593"/>
 924 | <end position="675"/>
 925 | </location>
 926 | </feature>
 927 | <feature type="lipid moiety-binding region" description="GPI-anchor amidated alanine" evidence="12">
 928 | <location>
 929 | <position position="685"/>
 930 | </location>
 931 | </feature>
 932 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 933 | <location>
 934 | <position position="104"/>
 935 | </location>
 936 | </feature>
 937 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 938 | <location>
 939 | <position position="115"/>
 940 | </location>
 941 | </feature>
 942 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 943 | <location>
 944 | <position position="152"/>
 945 | </location>
 946 | </feature>
 947 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 948 | <location>
 949 | <position position="182"/>
 950 | </location>
 951 | </feature>
 952 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 953 | <location>
 954 | <position position="197"/>
 955 | </location>
 956 | </feature>
 957 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 958 | <location>
 959 | <position position="204"/>
 960 | </location>
 961 | </feature>
 962 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 963 | <location>
 964 | <position position="208"/>
 965 | </location>
 966 | </feature>
 967 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5 11">
 968 | <location>
 969 | <position position="246"/>
 970 | </location>
 971 | </feature>
 972 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 973 | <location>
 974 | <position position="256"/>
 975 | </location>
 976 | </feature>
 977 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 978 | <location>
 979 | <position position="274"/>
 980 | </location>
 981 | </feature>
 982 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 983 | <location>
 984 | <position position="288"/>
 985 | </location>
 986 | </feature>
 987 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 988 | <location>
 989 | <position position="292"/>
 990 | </location>
 991 | </feature>
 992 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 993 | <location>
 994 | <position position="309"/>
 995 | </location>
 996 | </feature>
 997 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
 998 | <location>
 999 | <position position="330"/>
1000 | </location>
1001 | </feature>
1002 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1003 | <location>
1004 | <position position="351"/>
1005 | </location>
1006 | </feature>
1007 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1008 | <location>
1009 | <position position="360"/>
1010 | </location>
1011 | </feature>
1012 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1013 | <location>
1014 | <position position="375"/>
1015 | </location>
1016 | </feature>
1017 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1018 | <location>
1019 | <position position="432"/>
1020 | </location>
1021 | </feature>
1022 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1023 | <location>
1024 | <position position="466"/>
1025 | </location>
1026 | </feature>
1027 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1028 | <location>
1029 | <position position="480"/>
1030 | </location>
1031 | </feature>
1032 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1033 | <location>
1034 | <position position="508"/>
1035 | </location>
1036 | </feature>
1037 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1038 | <location>
1039 | <position position="529"/>
1040 | </location>
1041 | </feature>
1042 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1043 | <location>
1044 | <position position="553"/>
1045 | </location>
1046 | </feature>
1047 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5 9">
1048 | <location>
1049 | <position position="560"/>
1050 | </location>
1051 | </feature>
1052 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1053 | <location>
1054 | <position position="580"/>
1055 | </location>
1056 | </feature>
1057 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1058 | <location>
1059 | <position position="612"/>
1060 | </location>
1061 | </feature>
1062 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1063 | <location>
1064 | <position position="650"/>
1065 | </location>
1066 | </feature>
1067 | <feature type="glycosylation site" description="N-linked (GlcNAc...) asparagine" evidence="5">
1068 | <location>
1069 | <position position="665"/>
1070 | </location>
1071 | </feature>
1072 | <feature type="disulfide bond" evidence="4">
1073 | <location>
1074 | <begin position="167"/>
1075 | <end position="215"/>
1076 | </location>
1077 | </feature>
1078 | <feature type="disulfide bond" evidence="4">
1079 | <location>
1080 | <begin position="259"/>
1081 | <end position="299"/>
1082 | </location>
1083 | </feature>
1084 | <feature type="disulfide bond" evidence="4">
1085 | <location>
1086 | <begin position="345"/>
1087 | <end position="393"/>
1088 | </location>
1089 | </feature>
1090 | <feature type="disulfide bond" evidence="4">
1091 | <location>
1092 | <begin position="437"/>
1093 | <end position="477"/>
1094 | </location>
1095 | </feature>
1096 | <feature type="disulfide bond" evidence="4">
1097 | <location>
1098 | <begin position="523"/>
1099 | <end position="571"/>
1100 | </location>
1101 | </feature>
1102 | <feature type="disulfide bond" evidence="4">
1103 | <location>
1104 | <begin position="615"/>
1105 | <end position="655"/>
1106 | </location>
1107 | </feature>
1108 | <feature type="splice variant" description="In isoform 2." id="VSP_053414" evidence="20">
1109 | <location>
1110 | <position position="320"/>
1111 | </location>
1112 | </feature>
1113 | <feature type="sequence variant" description="In dbSNP:rs12971352." id="VAR_061310">
1114 | <original>I</original>
1115 | <variation>V</variation>
1116 | <location>
1117 | <position position="80"/>
1118 | </location>
1119 | </feature>
1120 | <feature type="sequence variant" description="In dbSNP:rs28683503." id="VAR_061311">
1121 | <original>V</original>
1122 | <variation>A</variation>
1123 | <location>
1124 | <position position="83"/>
1125 | </location>
1126 | </feature>
1127 | <feature type="sequence variant" description="In dbSNP:rs3815780." id="VAR_056028">
1128 | <original>Q</original>
1129 | <variation>P</variation>
1130 | <location>
1131 | <position position="137"/>
1132 | </location>
1133 | </feature>
1134 | <feature type="sequence variant" description="In dbSNP:rs10407503." id="VAR_031091">
1135 | <original>A</original>
1136 | <variation>D</variation>
1137 | <location>
1138 | <position position="340"/>
1139 | </location>
1140 | </feature>
1141 | <feature type="sequence variant" description="In dbSNP:rs7249230." id="VAR_024493" evidence="13 16 17 18 19">
1142 | <original>K</original>
1143 | <variation>E</variation>
1144 | <location>
1145 | <position position="398"/>
1146 | </location>
1147 | </feature>
1148 | <feature type="sequence variant" description="In dbSNP:rs10423171." id="VAR_031092">
1149 | <original>R</original>
1150 | <variation>S</variation>
1151 | <location>
1152 | <position position="664"/>
1153 | </location>
1154 | </feature>
1155 | <feature type="sequence variant" description="In dbSNP:rs9621." id="VAR_056029">
1156 | <original>G</original>
1157 | <variation>R</variation>
1158 | <location>
1159 | <position position="678"/>
1160 | </location>
1161 | </feature>
1162 | <feature type="mutagenesis site" description="No effect on dimerization. Reduced affinity for E.coli Dr adhesins." evidence="10">
1163 | <original>F</original>
1164 | <variation>I</variation>
1165 | <location>
1166 | <position position="63"/>
1167 | </location>
1168 | </feature>
1169 | <feature type="mutagenesis site" description="Abolishes dimerization. Reduced affinity for E.coli Dr adhesins." evidence="10">
1170 | <original>F</original>
1171 | <variation>R</variation>
1172 | <location>
1173 | <position position="63"/>
1174 | </location>
1175 | </feature>
1176 | <feature type="mutagenesis site" description="Abolishes dimerization." evidence="7 10">
1177 | <original>S</original>
1178 | <variation>N</variation>
1179 | <location>
1180 | <position position="66"/>
1181 | </location>
1182 | </feature>
1183 | <feature type="mutagenesis site" description="Abolishes dimerization." evidence="7">
1184 | <original>Y</original>
1185 | <variation>A</variation>
1186 | <location>
1187 | <position position="68"/>
1188 | </location>
1189 | </feature>
1190 | <feature type="mutagenesis site" description="No effect on dimerization." evidence="7">
1191 | <original>Y</original>
1192 | <variation>F</variation>
1193 | <location>
1194 | <position position="68"/>
1195 | </location>
1196 | </feature>
1197 | <feature type="mutagenesis site" description="Abolishes dimerization." evidence="7">
1198 | <original>K</original>
1199 | <variation>A</variation>
1200 | <location>
1201 | <position position="69"/>
1202 | </location>
1203 | </feature>
1204 | <feature type="mutagenesis site" description="Abolishes dimerization." evidence="10">
1205 | <original>V</original>
1206 | <variation>A</variation>
1207 | <location>
1208 | <position position="73"/>
1209 | </location>
1210 | </feature>
1211 | <feature type="mutagenesis site" description="No effect on dimerization." evidence="10">
1212 | <original>D</original>
1213 | <variation>A</variation>
1214 | <location>
1215 | <position position="74"/>
1216 | </location>
1217 | </feature>
1218 | <feature type="mutagenesis site" description="Abolishes dimerization." evidence="10">
1219 | <original>D</original>
1220 | <variation>L</variation>
1221 | <variation>R</variation>
1222 | <location>
1223 | <position position="74"/>
1224 | </location>
1225 | </feature>
1226 | <feature type="mutagenesis site" description="Abolishes dimerization. Reduced affinity for E.coli Dr adhesins." evidence="7 10">
1227 | <original>Q</original>
1228 | <variation>L</variation>
1229 | <variation>R</variation>
1230 | <location>
1231 | <position position="78"/>
1232 | </location>
1233 | </feature>
1234 | <feature type="mutagenesis site" description="Abolishes dimerization. Reduced affinity for E.coli Dr adhesins." evidence="10">
1235 | <original>I</original>
1236 | <variation>A</variation>
1237 | <location>
1238 | <position position="125"/>
1239 | </location>
1240 | </feature>
1241 | <feature type="mutagenesis site" description="No effect on dimerization. Reduced affinity for E.coli Dr adhesins." evidence="10">
1242 | <original>L</original>
1243 | <variation>A</variation>
1244 | <variation>C</variation>
1245 | <location>
1246 | <position position="129"/>
1247 | </location>
1248 | </feature>
1249 | <feature type="mutagenesis site" description="Abolishes dimerization. Reduced affinity for E.coli Dr adhesins." evidence="10">
1250 | <original>L</original>
1251 | <variation>S</variation>
1252 | <location>
1253 | <position position="129"/>
1254 | </location>
1255 | </feature>
1256 | <feature type="mutagenesis site" description="Abolishes dimerization." evidence="10">
1257 | <original>E</original>
1258 | <variation>A</variation>
1259 | <location>
1260 | <position position="133"/>
1261 | </location>
1262 | </feature>
1263 | <feature type="sequence conflict" description="In Ref. 3; AAA62835." ref="3" evidence="21">
1264 | <original>F</original>
1265 | <variation>L</variation>
1266 | <location>
1267 | <position position="641"/>
1268 | </location>
1269 | </feature>
1270 | <feature type="sequence conflict" description="In Ref. 3; AAA62835." ref="3" evidence="21">
1271 | <original>T</original>
1272 | <variation>Q</variation>
1273 | <location>
1274 | <position position="646"/>
1275 | </location>
1276 | </feature>
1277 | <feature type="sequence conflict" description="In Ref. 3; AAA62835." ref="3" evidence="21">
1278 | <original>V</original>
1279 | <variation>A</variation>
1280 | <location>
1281 | <position position="689"/>
1282 | </location>
1283 | </feature>
1284 | <feature type="strand" evidence="1">
1285 | <location>
1286 | <begin position="37"/>
1287 | <end position="46"/>
1288 | </location>
1289 | </feature>
1290 | <feature type="strand" evidence="1">
1291 | <location>
1292 | <begin position="51"/>
1293 | <end position="57"/>
1294 | </location>
1295 | </feature>
1296 | <feature type="strand" evidence="1">
1297 | <location>
1298 | <begin position="60"/>
1299 | <end position="72"/>
1300 | </location>
1301 | </feature>
1302 | <feature type="helix" evidence="1">
1303 | <location>
1304 | <begin position="75"/>
1305 | <end position="77"/>
1306 | </location>
1307 | </feature>
1308 | <feature type="strand" evidence="1">
1309 | <location>
1310 | <begin position="78"/>
1311 | <end position="83"/>
1312 | </location>
1313 | </feature>
1314 | <feature type="turn" evidence="1">
1315 | <location>
1316 | <begin position="84"/>
1317 | <end position="87"/>
1318 | </location>
1319 | </feature>
1320 | <feature type="strand" evidence="1">
1321 | <location>
1322 | <begin position="88"/>
1323 | <end position="91"/>
1324 | </location>
1325 | </feature>
1326 | <feature type="strand" evidence="1">
1327 | <location>
1328 | <begin position="99"/>
1329 | <end position="101"/>
1330 | </location>
1331 | </feature>
1332 | <feature type="strand" evidence="1">
1333 | <location>
1334 | <begin position="107"/>
1335 | <end position="109"/>
1336 | </location>
1337 | </feature>
1338 | <feature type="helix" evidence="1">
1339 | <location>
1340 | <begin position="114"/>
1341 | <end position="116"/>
1342 | </location>
1343 | </feature>
1344 | <feature type="strand" evidence="1">
1345 | <location>
1346 | <begin position="118"/>
1347 | <end position="126"/>
1348 | </location>
1349 | </feature>
1350 | <feature type="strand" evidence="1">
1351 | <location>
1352 | <begin position="132"/>
1353 | <end position="141"/>
1354 | </location>
1355 | </feature>
1356 | <evidence key="1" type="ECO:0000244">
1357 | <source>
1358 | <dbReference type="PDB" id="2QSQ"/>
1359 | </source>
1360 | </evidence>
1361 | <evidence key="2" type="ECO:0000250">
1362 | <source>
1363 | <dbReference type="UniProtKB" id="P31997"/>
1364 | </source>
1365 | </evidence>
1366 | <evidence key="3" type="ECO:0000255"/>
1367 | <evidence key="4" type="ECO:0000255">
1368 | <source>
1369 | <dbReference type="PROSITE-ProRule" id="PRU00114"/>
1370 | </source>
1371 | </evidence>
1372 | <evidence key="5" type="ECO:0000255">
1373 | <source>
1374 | <dbReference type="PROSITE-ProRule" id="PRU00498"/>
1375 | </source>
1376 | </evidence>
1377 | <evidence key="6" type="ECO:0000269">
1378 | <source>
1379 | <dbReference type="PubMed" id="10436421"/>
1380 | </source>
1381 | </evidence>
1382 | <evidence key="7" type="ECO:0000269">
1383 | <source>
1384 | <dbReference type="PubMed" id="10864933"/>
1385 | </source>
1386 | </evidence>
1387 | <evidence key="8" type="ECO:0000269">
1388 | <source>
1389 | <dbReference type="PubMed" id="10910050"/>
1390 | </source>
1391 | </evidence>
1392 | <evidence key="9" type="ECO:0000269">
1393 | <source>
1394 | <dbReference type="PubMed" id="16740002"/>
1395 | </source>
1396 | </evidence>
1397 | <evidence key="10" type="ECO:0000269">
1398 | <source>
1399 | <dbReference type="PubMed" id="18086185"/>
1400 | </source>
1401 | </evidence>
1402 | <evidence key="11" type="ECO:0000269">
1403 | <source>
1404 | <dbReference type="PubMed" id="19159218"/>
1405 | </source>
1406 | </evidence>
1407 | <evidence key="12" type="ECO:0000269">
1408 | <source>
1409 | <dbReference type="PubMed" id="2317824"/>
1410 | </source>
1411 | </evidence>
1412 | <evidence key="13" type="ECO:0000269">
1413 | <source>
1414 | <dbReference type="PubMed" id="2342461"/>
1415 | </source>
1416 | </evidence>
1417 | <evidence key="14" type="ECO:0000269">
1418 | <source>
1419 | <dbReference type="PubMed" id="26483485"/>
1420 | </source>
1421 | </evidence>
1422 | <evidence key="15" type="ECO:0000269">
1423 | <source>
1424 | <dbReference type="PubMed" id="2803308"/>
1425 | </source>
1426 | </evidence>
1427 | <evidence key="16" type="ECO:0000269">
1428 | <source>
1429 | <dbReference type="PubMed" id="3033671"/>
1430 | </source>
1431 | </evidence>
1432 | <evidence key="17" type="ECO:0000269">
1433 | <source>
1434 | <dbReference type="PubMed" id="3220478"/>
1435 | </source>
1436 | </evidence>
1437 | <evidence key="18" type="ECO:0000269">
1438 | <source>
1439 | <dbReference type="PubMed" id="3670312"/>
1440 | </source>
1441 | </evidence>
1442 | <evidence key="19" type="ECO:0000269">
1443 | <source>
1444 | <dbReference type="PubMed" id="3814146"/>
1445 | </source>
1446 | </evidence>
1447 | <evidence key="20" type="ECO:0000303">
1448 | <source>
1449 | <dbReference type="PubMed" id="3814146"/>
1450 | </source>
1451 | </evidence>
1452 | <evidence key="21" type="ECO:0000305"/>
1453 | <sequence length="702" mass="76795" checksum="86318E244155DB58" modified="2011-01-11" version="3" precursor="true">MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNKLSVDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI</sequence>
1454 | </entry>
1455 | <copyright>
1456 | Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
1457 | Distributed under the Creative Commons Attribution (CC BY 4.0) License
1458 | </copyright>
1459 | </uniprot>


--------------------------------------------------------------------------------
/examples/ig/view:
--------------------------------------------------------------------------------
1 | -1.910422069964390068e-01 -5.326551199259592639e-01 8.244863191311111450e-01
2 | 9.099226597463622168e-01 2.189434646111362570e-01 3.522841670538991998e-01
3 | -3.681638186629090370e-01 8.175240438430914081e-01 4.428475206470018910e-01
4 | 


--------------------------------------------------------------------------------
/ig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jordisr/cellscape/f1ee7b480440825cea2ddffc4db029bf0d240ea2/ig.png


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = ["setuptools>=42"]
3 | build-backend = "setuptools.build_meta"
4 | 


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
 1 | [metadata]
 2 | name = cellscape
 3 | version = 0.0.0
 4 | author = Jordi Silvestre-Ryan
 5 | description = Protein structure visualization with vector graphics cartoons
 6 | long_description = file: README.md
 7 | long_description_content_type = text/markdown
 8 | url = https://github.com/jordisr/cellscape
 9 | project_urls =
10 |     Bug Tracker = https://github.com/jordisr/cellscape/issues
11 | classifiers =
12 |     Programming Language :: Python :: 3
13 |     Operating System :: OS Independent
14 | 
15 | [options]
16 | packages = find:
17 | python_requires = >=3.6
18 | install_requires =
19 |     numpy
20 |     scipy
21 |     matplotlib
22 |     shapely<1.8
23 |     biopython>=1.75
24 |     nglview
25 | 
26 | [options.entry_points]
27 | console_scripts =
28 |     cellscape = cellscape.cli:main
29 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 | 
3 | if __name__ == '__main__':
4 |     setup()
5 | 


--------------------------------------------------------------------------------