├── .gitignore ├── 01_graph_from_edge_list.ipynb ├── 02_basic_graph_properties.ipynb ├── 03_graph_exploration_and_sampling.ipynb ├── 04_ML_on_graphs.ipynb ├── 05_Reddit_Pushshift_API.ipynb ├── 06_exploring_wikipedia.ipynb ├── LICENSE ├── README.md ├── data └── taxonomy_small.tsv ├── environment.yml ├── environment_local.yml └── figures ├── MHRWalgo.png ├── explorationstep.png ├── gcb.png ├── gfb.png ├── gff.png ├── gmh.png ├── graphdiameter.png ├── redditneighbors.png ├── spikyballfinal.png └── spikyballproba.png /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | -------------------------------------------------------------------------------- /01_graph_from_edge_list.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction: build a graph from an edge list\n", 8 | "\n", 9 | "\n", 10 | "* Dataset: [Open Tree of Life](https://tree.opentreeoflife.org)\n", 11 | "* Tools: [pandas](https://pandas.pydata.org), [numpy](http://www.numpy.org), [networkx](https://networkx.github.io)" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Importing packages" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "By convention, the first lines of code are always about importing the packages we'll use." 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": null, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "import pandas as pd\n", 35 | "import numpy as np\n", 36 | "import networkx as nx" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "Tutorials on pandas can be found at:\n", 44 | "* https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html\n", 45 | "* https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html\n", 46 | "\n", 47 | "Tutorials on numpy can be found at:\n", 48 | "* https://numpy.org/doc/stable/user/quickstart.html\n", 49 | "* \n", 50 | "* \n", 51 | "\n", 52 | "A tutorial on networkx can be found at:\n", 53 | "* https://networkx.org/documentation/stable/tutorial.html" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "## Import the data\n", 61 | "\n", 62 | "We will play with a excerpt of the Tree of Life, that can be found together with this notebook. This dataset is reduced to the first 1000 taxons (starting from the root node). The full version is available here: [Open Tree of Life](https://tree.opentreeoflife.org/about/taxonomy-version/ott3.0).\n", 63 | "\n", 64 | "![Public domain, https://en.wikipedia.org/wiki/File:Phylogenetic_tree.svg](https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Phylogenetic_tree.svg/800px-Phylogenetic_tree.svg.png)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "tree_of_life = pd.read_csv('data/taxonomy_small.tsv', sep='\\t\\|\\t?', encoding='utf-8', engine='python')" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "If you do not remember the details of a function:" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": null, 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [ 89 | "pd.read_csv?" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "For more info on the separator, see [regex](https://docs.python.org/3.6/library/re.html)." 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "Now, what is the object `tree_of_life`? It is a Pandas DataFrame." 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": null, 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "tree_of_life" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "The description of the entries is given here:\n", 120 | "https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Interim-taxonomy-file-format" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "## Explore the table" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "tree_of_life.columns" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "Let us drop some columns." 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [ 152 | "tree_of_life = tree_of_life.drop(columns=['sourceinfo', 'uniqname', 'flags','Unnamed: 7'])" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "tree_of_life.head()" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "Pandas infered the type of values inside each column (`int`, `float`, `string` and `string`). The `parent_uid` column has floating-point values because there was a missing value, converted to `NaN`, which is considered as a float." 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": null, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "print(tree_of_life['uid'].dtype, tree_of_life.parent_uid.dtype)" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "How to access individual values." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "tree_of_life.iloc[0, 2]" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": null, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "tree_of_life.loc[0, 'name']" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "**Exercise**: Guess the output of the following line:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "# tree_of_life.uid[0] == tree_of_life.parent_uid[1]" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "Ordering the data." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": {}, 232 | "outputs": [], 233 | "source": [ 234 | "tree_of_life.sort_values(by='name').head()" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | " *Remark:* Some functions do not change the dataframe (option `inline=False` by default)." 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": null, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [ 250 | "tree_of_life.head()" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "## Operation on the columns" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "Unique values, useful for categories:" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": null, 270 | "metadata": {}, 271 | "outputs": [], 272 | "source": [ 273 | "tree_of_life['rank'].unique()" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": {}, 279 | "source": [ 280 | "Selecting only one category." 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": null, 286 | "metadata": {}, 287 | "outputs": [], 288 | "source": [ 289 | "tree_of_life[tree_of_life['rank'] == 'species'].head()" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "How many species do we have?" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "len(tree_of_life[tree_of_life['rank'] == 'species'])" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": null, 311 | "metadata": {}, 312 | "outputs": [], 313 | "source": [ 314 | "tree_of_life['rank'].value_counts()" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "**Exercise:** Display the entry with name 'Archaea', then display the entry of its parent." 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": null, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "# Your code here." 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "## Preparing the data" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "Before building the graph, we need to reorganize the data. First we separate the nodes and their properties from the edges." 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": {}, 351 | "outputs": [], 352 | "source": [ 353 | "nodes = tree_of_life[['uid', 'name','rank']]\n", 354 | "edges = tree_of_life[['uid', 'parent_uid']]" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "Second step, some more data pre-processing for the edges and nodes data." 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "metadata": {}, 368 | "outputs": [], 369 | "source": [ 370 | "edges.head()" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": null, 376 | "metadata": {}, 377 | "outputs": [], 378 | "source": [ 379 | "# Drop the first row as it is not encoding an edge (no parent for the first node)\n", 380 | "edges = edges.drop(0)\n", 381 | "edges.head()" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "For the node data, we shall index them with the node id." 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": null, 394 | "metadata": {}, 395 | "outputs": [], 396 | "source": [ 397 | "nodes.head()" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": null, 403 | "metadata": {}, 404 | "outputs": [], 405 | "source": [ 406 | "nodes.set_index('uid',inplace=True)\n", 407 | "nodes.head()" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "## The graph\n", 415 | "Now the data is has the appropriate shape, we may build the graph using `networkx`. It is a simple iteration over the rows of the dataframe, using `nx.add_edge`. Alternatively, you may use `nx.add_edge_from` with a list of edges as input." 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": null, 421 | "metadata": {}, 422 | "outputs": [], 423 | "source": [ 424 | "# A simple command to create the graph from the edge list.\n", 425 | "graph = nx.DiGraph() # DiGraph class is for directed graph\n", 426 | "_ = [graph.add_edge(source, target) for source, target in zip(edges['parent_uid'], edges['uid'])]" 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": {}, 432 | "source": [ 433 | "We can also use the `add_edges_from` function instead of a list comprehension (beware of the column reordering needed since we have a directed graph)" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": null, 439 | "metadata": {}, 440 | "outputs": [], 441 | "source": [ 442 | "graph = nx.DiGraph()\n", 443 | "graph.add_edges_from(edges[['parent_uid', 'uid']].itertuples(name=None, index=False))" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "And finally, the dataframe can be used directly to create the graph thanks to the `from_pandas_edgelist` function." 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": null, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "graph = nx.from_pandas_edgelist(edges, source='parent_uid', target='uid', create_using=nx.DiGraph())" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "In addition, let us add some attributes to the nodes:" 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": null, 472 | "metadata": {}, 473 | "outputs": [], 474 | "source": [ 475 | "node_props = nodes.to_dict()" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": {}, 482 | "outputs": [], 483 | "source": [ 484 | "for key in node_props:\n", 485 | " nx.set_node_attributes(graph, node_props[key], key)" 486 | ] 487 | }, 488 | { 489 | "cell_type": "markdown", 490 | "metadata": {}, 491 | "source": [ 492 | "Let us check if it is correctly recorded:" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": null, 498 | "metadata": {}, 499 | "outputs": [], 500 | "source": [ 501 | "print(graph.nodes[805080], graph.nodes[102415])" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "**Exercise:** \n", 509 | "* Have a look a the [networkx documentation](https://networkx.org/documentation/stable/tutorial.html) and display the number of nodes and edges of the graph.\n", 510 | "* Display the neighbors of node 'life', then the 2-hops neighbors." 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": null, 516 | "metadata": {}, 517 | "outputs": [], 518 | "source": [ 519 | "# your code here" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "## Graph visualization\n", 527 | "\n", 528 | "To conclude, let us visualize the graph. We will use the python module networkx." 529 | ] 530 | }, 531 | { 532 | "cell_type": "markdown", 533 | "metadata": {}, 534 | "source": [ 535 | "The following line is a [magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html). It enables plotting inside the notebook." 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": null, 541 | "metadata": {}, 542 | "outputs": [], 543 | "source": [ 544 | "%matplotlib inline" 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": {}, 550 | "source": [ 551 | "You may also try `%matplotlib notebook` for a zoomable version of plots." 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "Let us draw the graph with two different [layout algorithms](https://en.wikipedia.org/wiki/Graph_drawing#Layout_methods). As you will see, networkx and matplotlib are not very convenient for plotting graphs. We will see other visualization tools later on." 559 | ] 560 | }, 561 | { 562 | "cell_type": "code", 563 | "execution_count": null, 564 | "metadata": {}, 565 | "outputs": [], 566 | "source": [ 567 | "nx.draw_spectral(graph)" 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": null, 573 | "metadata": {}, 574 | "outputs": [], 575 | "source": [ 576 | "nx.draw_spring(graph)\n", 577 | "# You may also visualize names with the following command,\n", 578 | "# but in our case the graph is too big and labels overlap:\n", 579 | "#\n", 580 | "# nx.draw_spring(graph), labels=node_props['name'])" 581 | ] 582 | }, 583 | { 584 | "cell_type": "markdown", 585 | "metadata": {}, 586 | "source": [ 587 | "## Saving the graph\n", 588 | "Save the graph to disk in the `gexf` format, readable by gephi and other tools that manipulate graphs. You may now explore the graph using [gephi](https://gephi.org/) and compare the visualizations." 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": null, 594 | "metadata": {}, 595 | "outputs": [], 596 | "source": [ 597 | "nx.write_gexf(graph, 'data/tree_of_life.gexf')" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": {}, 603 | "source": [ 604 | "Note: the `gexf` format allows one to save node and edge properties, except if the properties have a complex structure such as python lists or dictionaries. In that case, these structures must be converted to strings (using json) before saving the graph." 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": null, 610 | "metadata": {}, 611 | "outputs": [], 612 | "source": [] 613 | } 614 | ], 615 | "metadata": { 616 | "kernelspec": { 617 | "display_name": "Python 3", 618 | "language": "python", 619 | "name": "python3" 620 | }, 621 | "language_info": { 622 | "codemirror_mode": { 623 | "name": "ipython", 624 | "version": 3 625 | }, 626 | "file_extension": ".py", 627 | "mimetype": "text/x-python", 628 | "name": "python", 629 | "nbconvert_exporter": "python", 630 | "pygments_lexer": "ipython3", 631 | "version": "3.6.9" 632 | } 633 | }, 634 | "nbformat": 4, 635 | "nbformat_minor": 4 636 | } 637 | -------------------------------------------------------------------------------- /02_basic_graph_properties.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Basic but important graph properties\n", 8 | "\n", 9 | "We introduce here important properties of graphs. These properties provide information on the structure of a graph and help understand their specificities. A social network is very different from a grid graph or a tree graph, they have different geometry and carry different information. The algorithms presented here enable to distinguish and categorize them. We will see later on that the exploration, analysis and visualization of graphs face challenges that change depending on these geometries.\n", 10 | "\n", 11 | "We use standard methods already implemented in networkx. For more information, have a look at the networkx documentation and the page listing graph properties:\n", 12 | "* https://networkx.org/documentation/stable//index.html\n", 13 | "* https://networkx.org/documentation/stable//reference/algorithms/index.html\n", 14 | "\n", 15 | "Some of the properties are computationally intensive for large graphs (e.g. the ones based on the computation of all the shortest paths or the spectral methods) and we intentionally work with small graphs for pedagogical purposes." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": null, 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "import networkx as nx\n", 25 | "import matplotlib.pyplot as plt\n", 26 | "import pandas as pd" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "For this experiment we shall test with several graphs. 3 examples are listed below. Uncomment the one you want to choose." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "## Load our demo graph\n", 43 | "#G = nx.read_gexf('data/tree_of_life.gexf').to_undirected()\n", 44 | "\n", 45 | "## Create a random graph\n", 46 | "N = 200 # number of nodes\n", 47 | "\n", 48 | "## Erdos Renyi random graph\n", 49 | "#p = 0.1 # probability of connection\n", 50 | "#G = nx.erdos_renyi_graph(N, p, seed=0)\n", 51 | "\n", 52 | "## Barabasi Albert graph (model of scale-free network)\n", 53 | "m = 4 # number of connections when adding a node\n", 54 | "G = nx.barabasi_albert_graph(N,m,seed=0)\n", 55 | "\n", 56 | "print('Number of nodes: {}, number of edges {}.'.format(G.number_of_nodes(),G.number_of_edges()))" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "Networkx has a large list of algorithms to analyze graphs. The ouput can be a single value associated to the whole graph or a dictionary of values where keys are node ids and values are the results of the computation for each node. We shall have a look at some examples." 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "## Node properties\n", 71 | "To evaluate the importance of a node in the network, several methods have been proposed in the literature. We review three of them:\n", 72 | "- the simple computation of the **degree** of the nodes\n", 73 | "- the **betweeness centrality** counting all the shortest paths passing through a node \n", 74 | "- the famous **Pagerank** which is more computationally efficient. \n", 75 | "\n", 76 | "All of them are standard methods one can find in `networkx`." 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "bc = nx.betweenness_centrality(G)\n", 86 | "pr = nx.pagerank(G)\n", 87 | "degree = dict(nx.degree(G))\n", 88 | "#\n", 89 | "node_props = pd.DataFrame({'degree' : degree, 'b. centrality' : bc, 'pagerank' : pr})\n", 90 | "node_props.index.name = 'Node id'\n", 91 | "print('DataFrame of some node properties:')\n", 92 | "node_props.sort_values('pagerank', ascending=False).head(10)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "## Graph properties\n", 100 | "Some properties are more global and characterize the entire graph.\n", 101 | "![graph diameter](figures/graphdiameter.png)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "# The diameter is the length of the longest shortest path in the network\n", 111 | "diameter = nx.diameter(G)\n", 112 | "# The average shortest path length give a rough idea of the \"small-worldness\" of a graph\n", 113 | "av_shortest_path = nx.average_shortest_path_length(G)\n", 114 | "print('Graph diameter: {}. Average shortest path length: {}.'.format(diameter, av_shortest_path))" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "# The density is the ratio of edges over the number of possible edges if the graph were fully connected\n", 124 | "density = nx.density(G)\n", 125 | "# The average clustering coefficient counts the number of triangles and output a ratio\n", 126 | "cc = nx.average_clustering(G) \n", 127 | "print('Density: {}. Average clustering coeff.: {}'.format(density, cc))" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "The **k-core** is a subgraph of nodes having at least degree `k`, in that subgraph. It gives an estimation of how hierarchical is the network. High degree nodes that are well connected together, forming a community, will be revealed by this approach." 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [ 143 | "# k-core\n", 144 | "for k in range(10):\n", 145 | " print('k = {}, size of the subgraph: {} nodes.'.format(k,nx.k_core(G,k).number_of_nodes()))" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "The **degree distribution** is one of the main measure of a network structure. Usually, real-world networks are \"scale-free\": the degree distribution is decreasing as the degree increase, in a linear manner when plotted on a log-log scale." 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "m=4 # minimal degree to display\n", 162 | "degree_freq = nx.degree_histogram(G)\n", 163 | "degrees = range(len(degree_freq))\n", 164 | "plt.figure(figsize=(12, 8)) \n", 165 | "#plt.loglog(degrees[m:], degree_freq[m:],'go')\n", 166 | "plt.scatter(degrees[m:], degree_freq[m:])\n", 167 | "plt.xlim(m, max(degrees))\n", 168 | "plt.ylim(1, max(degree_freq))\n", 169 | "plt.yscale('log')\n", 170 | "plt.xscale('log')\n", 171 | "plt.xlabel('Degree')\n", 172 | "plt.ylabel('Frequency')\n", 173 | "plt.show()" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "## Save the graph for visualization (with Gephi)\n", 181 | "In the next presentation we will see how to visualize a network and its properties with Gephi. We can save our networkx graph as a `gexf` file that can be read by Gephi." 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "nx.write_gexf(G, 'data/basic_graph.gexf')" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "### Exercise 1\n", 198 | "Compute the graph properties for different small graphs and compare the results. You can find a large set of graphs to try here: https://networkx.org/documentation/stable//reference/generators.html" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "### Exercise 2\n", 206 | "Compute the properties for one or more large graphs and note the methods which scale and the ones which do not. Can you explain why?" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [] 215 | } 216 | ], 217 | "metadata": { 218 | "kernelspec": { 219 | "display_name": "Python 3", 220 | "language": "python", 221 | "name": "python3" 222 | }, 223 | "language_info": { 224 | "codemirror_mode": { 225 | "name": "ipython", 226 | "version": 3 227 | }, 228 | "file_extension": ".py", 229 | "mimetype": "text/x-python", 230 | "name": "python", 231 | "nbconvert_exporter": "python", 232 | "pygments_lexer": "ipython3", 233 | "version": "3.6.9" 234 | } 235 | }, 236 | "nbformat": 4, 237 | "nbformat_minor": 4 238 | } 239 | -------------------------------------------------------------------------------- /03_graph_exploration_and_sampling.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Graph exploration and sampling\n", 8 | "\n", 9 | "Here we experiment network exploration using the main graph samplers that can be found in the literature. The principle is the following: The initial graph is too large to be handled and we need to extract a representative part of it for analysis. We hope this reduced subgraph is representative of the large one, and, indeed, there are theoretical guarantees about that for each method. \n", 10 | "\n", 11 | "The different samplers are designed to preserve particular graph properties when subsampling. We will see what properties are associated to each sampler and learn to select the most adapted one depending on the application." 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import networkx as nx\n", 21 | "import littleballoffur as lbof\n", 22 | "import matplotlib.pyplot as plt\n", 23 | "from collections import Counter" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "The main samplers are coded in the Python module called \"little ball of fur\" https://github.com/benedekrozemberczki/littleballoffur and we will use it here (`pip install littleballoffur`).\n", 31 | "The documentation can be found here:\n", 32 | "* https://little-ball-of-fur.readthedocs.io\n", 33 | "\n", 34 | "Let us load one of the datasets available in the module." 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "# load a graph\n", 44 | "#reader = lbof.GraphReader(\"facebook\")\n", 45 | "reader = lbof.GraphReader(\"github\")\n", 46 | "G = reader.get_graph()\n", 47 | "print('Number of nodes: {}, number of edges: {}.'.format(G.number_of_nodes(),G.number_of_edges()))" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "Let us suppose this graph is too big for our analysis. We need to get a reduced version of it. We can define the size of this reduced dataset." 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": null, 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [ 63 | "# number of nodes in the subgraph\n", 64 | "number_of_nodes = int(0.01*G.number_of_nodes())\n", 65 | "print('Number of nodes in the subgraph:',number_of_nodes)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "There exist several ways for sampling a graph. When you have access to the full graph, you may sample at random edges or nodes, this is the first family. Alternatively, you can start from an initial group of nodes and collect a part of the graph by exploring (following connections) from them. We shall focus on these latter approaches." 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "## Exploring the network" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "### General principle\n", 87 | "For the applications we have in mind, we want to start the exploration from an initial set of nodes. For example, it could be a particular user or group of users in a social network that are posting about a topic we are interested in. We want to know more about this topic and related topics appearing in the exchanges. Our goal is to explore the network around this initial group. Hence we plan to use the exploration methods.\n", 88 | "\n", 89 | "The exploration scheme is show on the following figure. Starting from an initial node, the neighborhood is explored, selecting randomly a subset of edges (with possibly different probability weights for different edges). Then the process is iterated on the new nodes sampled.\n", 90 | "![spikyball](figures/spikyballfinal.png)\n", 91 | "\n", 92 | "
Left: snowball exploration, follows all edges. Right: spikyball exploration, follow a subset of edges at each step.
\n", 93 | "\n", 94 | "Among the most popular explorations are: \n", 95 | "- Metropolis-Hasting randow walk sampler ([paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.140.4864&rep=rep1&type=pdf)) and ([paper](https://core.ac.uk/download/pdf/192275476.pdf)),\n", 96 | "- Forest Fire sampler ([paper](https://cs.stanford.edu/people/jure/pubs/sampling-kdd06.pdf)).\n", 97 | "\n", 98 | "MHRW algorithm (with $k_v$ the degree of node $v$):\n", 99 | "![MHRW algorithm](figures/MHRWalgo.png)\n", 100 | "\n", 101 | "The Spikyball ([paper](https://www.mdpi.com/1999-4893/13/11/275)) add more flexibility to the previous sampling schemes while still keeping the exploration efficient. At each step of the exploration, all the out-going edges are collected. Depending on their weight and the number of theses connections on the initial and target nodes, some edges will be selected. You can influence the sampling toward:\n", 102 | "* nodes with high degree (parameter $\\alpha$),\n", 103 | "* nodes connected with large weights (parameter $\\beta$),\n", 104 | "* neighbors connected to several nodes already sampled (parameter $\\gamma$).\n", 105 | "\n", 106 | "\n", 107 | "![spikyball probability](figures/spikyballproba.png)\n", 108 | "\n", 109 | "One step of the exploration is illustrated on the following figure. The sampled graph is in the middle, in purple. The green nodes are the neighbors that can be selected for the next exploration step. Among all the out-going edges, only a part of them are selected, the ones that are straight lines.\n", 110 | "![graph exploration step](figures/explorationstep.png)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "### Application\n", 118 | "In the following, we experiment the exploration methods on toy graphs. The goal is to understand the different possibility to explore a network, the different parameters and their impact on the sampled network. The initial group of nodes is chosen at random within the exploration functions of `littleballoffur`. So we do not focus on a specific region of the network but rather on the way the exploration is performed. Later on, we will choose initial nodes and apply the exploration to real networks.\n", 119 | "\n", 120 | "We save the graphs in `gexf` file in order to visualize them with Gephi." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "# Metropolis Hasting random walk sampler\n", 130 | "sampler = lbof.MetropolisHastingsRandomWalkSampler(number_of_nodes = number_of_nodes)\n", 131 | "GMH = sampler.sample(G)\n", 132 | "nx.write_gexf(GMH, 'data/gmh.gexf')\n", 133 | "print('Subgraph with {} nodes and {} edges.'.format(GMH.number_of_nodes(),GMH.number_of_edges()))" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "# Forest Fire sampler\n", 143 | "sampler = lbof.ForestFireSampler(number_of_nodes = number_of_nodes)\n", 144 | "GFF = sampler.sample(G)\n", 145 | "nx.write_gexf(GFF, 'data/gff.gexf')\n", 146 | "print('Subgraph with {} nodes and {} edges.'.format(GFF.number_of_nodes(),GFF.number_of_edges()))" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "A more general and flexible exploration approach called \"Spikyball\" ([paper](https://www.mdpi.com/1999-4893/13/11/275)) can be used." 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "# Fireball sampler is similar to the Forest Fire sampler\n", 163 | "sampler = lbof.SpikyBallSampler(number_of_nodes = number_of_nodes, sampling_probability=0.1, mode='fireball', \n", 164 | " initial_nodes_ratio=0.001)\n", 165 | "GFB = sampler.sample(G)\n", 166 | "nx.write_gexf(GFB, 'data/gfb.gexf')\n", 167 | "print('Subgraph with {} nodes and {} edges.'.format(GFB.number_of_nodes(), GFB.number_of_edges()))" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": null, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "# Coreball sampler\n", 177 | "sampler = lbof.SpikyBallSampler(number_of_nodes = number_of_nodes, sampling_probability=0.1, mode='coreball',\n", 178 | " initial_nodes_ratio=0.001)\n", 179 | "GCB = sampler.sample(G)\n", 180 | "# Remove isolated nodes\n", 181 | "GCB = nx.Graph(GCB)\n", 182 | "GCB.remove_nodes_from(list(nx.isolates(GCB)))\n", 183 | "nx.write_gexf(GCB, 'data/gcb.gexf')\n", 184 | "print('Subgraph with {} nodes and {} edges.'.format(GCB.number_of_nodes(), GCB.number_of_edges()))" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "# Coreball 2 sampler\n", 194 | "sampler = lbof.SpikyBallSampler(number_of_nodes = number_of_nodes, sampling_probability=0.1, mode='coreball',\n", 195 | " initial_nodes_ratio=0.001, distrib_coeff=2)\n", 196 | "GCB2 = sampler.sample(G)\n", 197 | "# Remove isolated nodes\n", 198 | "GCB2 = nx.Graph(GCB2)\n", 199 | "GCB2.remove_nodes_from(list(nx.isolates(GCB2)))\n", 200 | "nx.write_gexf(GCB2, 'data/gcb2.gexf')\n", 201 | "print('Subgraph with {} nodes and {} edges.'.format(GCB2.number_of_nodes(), GCB2.number_of_edges()))" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "**Exercise**: Visualize some of the sampled graphs with Gephi." 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "### Visualisation of the graphs (from Gephi)\n", 216 | "Metropolis-Hasting RW, Forest Fire, Fireball and Coreball \n", 217 | "\n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | "
\"MetropolisHasting\" \"Forest\"Fireball\"\"Coreball\"
\n" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### Degree distribution\n", 230 | "Let us see what is the degree distribution of these networks." 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "# A function to plot the degree distribution of a given graph\n", 240 | "def plot_degree(G,glabel):\n", 241 | " m=1 # minimal degree to display\n", 242 | " degree_freq = nx.degree_histogram(G)\n", 243 | " degrees = range(len(degree_freq))\n", 244 | " plt.scatter(degrees[m:], degree_freq[m:],label=glabel)\n", 245 | " return max(degrees),max(degree_freq)" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": null, 251 | "metadata": {}, 252 | "outputs": [], 253 | "source": [ 254 | "plt.figure(figsize=(12, 8)) \n", 255 | "mx1,my1 = plot_degree(GMH,'MH')\n", 256 | "mx2,my2 = plot_degree(GFF,'FF')\n", 257 | "mx3,my3 = plot_degree(GFB,'FB')\n", 258 | "mx4,my4 = plot_degree(GCB,'CB')\n", 259 | "mx5,my5 = plot_degree(GCB2,'CB2')\n", 260 | "\n", 261 | "plt.xlim(1, max([mx1,mx2,mx3,mx4,mx5]))\n", 262 | "plt.ylim(1, max([my1,my2,my3,my4,my5]))\n", 263 | "plt.yscale('log')\n", 264 | "plt.xscale('log')\n", 265 | "plt.xlabel('Degree')\n", 266 | "plt.ylabel('Frequency')\n", 267 | "plt.legend()\n", 268 | "plt.show()" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "**Remark 1:** this visualization is simple to code but not the most appropriate. A more precise curve could be obtained by grouping degrees in bins, with a range following a logaritmic scale." 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "**Remark 2:** The size of the sampled graphs is good for visualization but too small for having good statistics on the degree distribution." 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "### A larger subgraph\n", 290 | "We increase the size of the sampled subgraph to have better statistics on the degree distribution." 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": {}, 297 | "outputs": [], 298 | "source": [ 299 | "number_of_nodes = 3000\n", 300 | "# MHRW\n", 301 | "sampler = lbof.MetropolisHastingsRandomWalkSampler(number_of_nodes = number_of_nodes)\n", 302 | "GMH = sampler.sample(G)\n", 303 | "print('MHRW subgraph with {} nodes and {} edges.'.format(GMH.number_of_nodes(),GMH.number_of_edges()))\n", 304 | "\n", 305 | "# FF\n", 306 | "sampler = lbof.ForestFireSampler(number_of_nodes = number_of_nodes)\n", 307 | "GFF = sampler.sample(G)\n", 308 | "print('FF subgraph with {} nodes and {} edges.'.format(GFF.number_of_nodes(),GFF.number_of_edges()))\n", 309 | "\n", 310 | "# Fireball\n", 311 | "sampler = lbof.SpikyBallSampler(number_of_nodes = number_of_nodes, sampling_probability=0.05, mode='fireball', \n", 312 | " initial_nodes_ratio=0.001)\n", 313 | "GFB = sampler.sample(G)\n", 314 | "print('FB subgraph with {} nodes and {} edges.'.format(GFB.number_of_nodes(), GFB.number_of_edges()))\n", 315 | "\n", 316 | "# Coreball\n", 317 | "sampler = lbof.SpikyBallSampler(number_of_nodes = number_of_nodes, sampling_probability=0.1, mode='coreball',\n", 318 | " initial_nodes_ratio=0.001)\n", 319 | "GCB = sampler.sample(G)\n", 320 | "print('CB subgraph with {} nodes and {} edges.'.format(GCB.number_of_nodes(), GCB.number_of_edges()))\n", 321 | "\n", 322 | "# Coreball 2\n", 323 | "sampler = lbof.SpikyBallSampler(number_of_nodes = number_of_nodes, sampling_probability=0.1, mode='coreball',\n", 324 | " initial_nodes_ratio=0.001, distrib_coeff=2)\n", 325 | "GCB2 = sampler.sample(G)\n", 326 | "print('CB2 subgraph with {} nodes and {} edges.'.format(GCB2.number_of_nodes(), GCB2.number_of_edges()))\n", 327 | "\n", 328 | "# Plot degree distribution\n", 329 | "plt.figure(figsize=(12, 8)) \n", 330 | "mx1,my1 = plot_degree(GMH,'MH')\n", 331 | "mx2,my2 = plot_degree(GFF,'FF')\n", 332 | "mx3,my3 = plot_degree(GFB,'FB')\n", 333 | "mx4,my4 = plot_degree(GCB,'CB')\n", 334 | "mx5,my5 = plot_degree(GCB2,'CB2')\n", 335 | "\n", 336 | "plt.xlim(1, max([mx1,mx2,mx3,mx4,mx5]))\n", 337 | "plt.ylim(1, max([my1,my2,my3,my4,my5]))\n", 338 | "plt.yscale('log')\n", 339 | "plt.xscale('log')\n", 340 | "plt.xlabel('Degree')\n", 341 | "plt.ylabel('Frequency')\n", 342 | "plt.legend()\n", 343 | "plt.show()" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "### Degrees in the initial graph\n", 351 | "How do the different methods perform with respect to node degrees in the initial graph? Do they collect more nodes with a high degree or not?" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": null, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "# Add degree as a node property in the initial graph\n", 361 | "# Then we can collect it easily in the usbsampled graph\n", 362 | "nx.set_node_attributes(G,dict(G.degree()), name='degree')" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": null, 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "# A function to plot the degree distribution of a given graph\n", 372 | "def plot_d_init(G,subgraph,glabel):\n", 373 | " m=1 # minimal degree to display\n", 374 | " d = [G.nodes[i]['degree'] for i in subgraph.nodes()]\n", 375 | " counter_dic = Counter(d)\n", 376 | " degrees = list(counter_dic.keys())\n", 377 | " degree_freq = list(counter_dic.values())\n", 378 | " #degree_freq, degrees = np.histogram(d,bins=50)\n", 379 | " #print(len(degree_freq),len(degrees))\n", 380 | " #degree_freq = nx.degree_histogram(G)\n", 381 | " #degrees = range(len(degree_freq))\n", 382 | " plt.scatter(degrees[m:], degree_freq[m:],label=glabel)\n", 383 | " return max(degrees),max(degree_freq)" 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": null, 389 | "metadata": {}, 390 | "outputs": [], 391 | "source": [ 392 | "plt.figure(figsize=(12, 8)) \n", 393 | "mx1,my1 = plot_d_init(G,GMH,'MH')\n", 394 | "mx2,my2 = plot_d_init(G,GFF,'FF')\n", 395 | "mx3,my3 = plot_d_init(G,GFB,'FB')\n", 396 | "mx4,my4 = plot_d_init(G,GCB,'CB')\n", 397 | "mx5,my5 = plot_d_init(G,GCB2,'CB2')\n", 398 | "\n", 399 | "plt.xlim(1, max([mx1,mx2,mx3,mx4,mx5]))\n", 400 | "plt.ylim(1, max([my1,my2,my3,my4,my5]))\n", 401 | "plt.yscale('log')\n", 402 | "plt.xscale('log')\n", 403 | "plt.xlabel('Degree')\n", 404 | "plt.ylabel('Frequency')\n", 405 | "plt.legend()\n", 406 | "plt.show()" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "## Exercise\n", 414 | "Redo the experiments with different parameters for the fireball and coreball and a different graph.\n", 415 | "What are the best exploration methods\n", 416 | "* for collecting more high degree nodes? \n", 417 | "* for keeping the same degree distribution? \n", 418 | "* for exploring a larger part of the network?" 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": null, 424 | "metadata": {}, 425 | "outputs": [], 426 | "source": [] 427 | } 428 | ], 429 | "metadata": { 430 | "kernelspec": { 431 | "display_name": "Python 3", 432 | "language": "python", 433 | "name": "python3" 434 | }, 435 | "language_info": { 436 | "codemirror_mode": { 437 | "name": "ipython", 438 | "version": 3 439 | }, 440 | "file_extension": ".py", 441 | "mimetype": "text/x-python", 442 | "name": "python", 443 | "nbconvert_exporter": "python", 444 | "pygments_lexer": "ipython3", 445 | "version": "3.6.9" 446 | } 447 | }, 448 | "nbformat": 4, 449 | "nbformat_minor": 4 450 | } 451 | -------------------------------------------------------------------------------- /04_ML_on_graphs.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Machine Learning on graphs\n", 8 | "Here we review some simple yet powerful machine learning on graphs" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "import numpy as np\n", 18 | "import networkx as nx\n", 19 | "import community as community_louvain\n", 20 | "import matplotlib.pyplot as plt" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "# Clustering and community detection (unsupervised learning)\n", 28 | "Many real-world networks posesses communities, groups of nodes that are more connected together than with the rest of the network. Detecting these structures is of high importance. It reveals the hierarchy, the organization and interactions between nodes. It helps classifying parts of a network into categories. It can be seen as an equivalent of the unsupervised learning `k-means` clustering algorithm. Instead of computing distances and grouping data points in a high dimensional space, we use the network structure to detect the clusters.\n", 29 | "\n", 30 | "Many methods exists for community detection, see for example [this review](https://arxiv.org/abs/0906.0612). We will see two of them. The first one is available in `networkx`, the second one is the popular Louvain method which allows for fast computation." 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "## Girvan Newman algorithm\n", 38 | "The [GN algorithm](https://en.wikipedia.org/wiki/Girvan%E2%80%93Newman_algorithm) remove edges between communities in an iterative manner. The edges removed are the ones with the highest number of shortest paths passing through them (\"bottlenecks\" between communities). The idea is clear and intuitive. However, the computation is intensive as all shortest paths have to be computed. Moreover, the number of communities has to be specified as this algorithm do not have a stopping criterium (this can be an advantage or drawback)." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [ 47 | "# Girvan Newman clustering\n", 48 | "G = nx.path_graph(8)\n", 49 | "\n", 50 | "num_clusters = 5 # desired number of clusters\n", 51 | "k = num_clusters - 1\n", 52 | "\n", 53 | "comp = nx.algorithms.community.centrality.girvan_newman(G)\n", 54 | "\n", 55 | "comp = list(comp)\n", 56 | "for idx in range(k):\n", 57 | " print(' {} communities: {}'.format(idx+2, comp[idx]))\n", 58 | "\n", 59 | "# Alternative way using directly the generator\n", 60 | "#import itertools\n", 61 | "#for idx,communities in enumerate(itertools.islice(comp, k)):\n", 62 | "# print(' {} communities: {}'.format(idx+2, tuple(sorted(c) for c in communities)))" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "## Exercise\n", 70 | "* Apply it with a more complex network and visualize the communities using Gephi. (See [graph list in networkx](https://networkx.org/documentation/stable/reference/generators.html), you may test on the \"Karate club\" graph or \"Les Miserables\" graph)\n", 71 | "* Try a larger network and experience the limit of scalability. What is a reasonable number of nodes for this method?" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "## Louvain community detection\n", 79 | "\n", 80 | "Community detection with Louvain method. You have to install an external module: `pip install louvain`, see [module Github page](https://github.com/taynaud/python-louvain) for more info or the [paper](https://arxiv.org/abs/0803.0476). This method is much more efficient than the previous one. It is a greedy, non-parametric, algorithm that finds automatically the optimal number of communities." 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": null, 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [ 89 | "#Louvain module is called \"community\"\n", 90 | "partition = community_louvain.best_partition(G)\n", 91 | "#community_louvain.modularity(partition, G)" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "`partition` is a dictionary where each node id is a key and its community is the value." 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [ 107 | "# re-order partition to have a dictionary of clsuters\n", 108 | "clusters = {}\n", 109 | "for i, v in partition.items():\n", 110 | " clusters[v] = [i] if v not in clusters.keys() else clusters[v] + [i]\n", 111 | "print(clusters)" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "## Label propagation (Semi-supervised learning)\n", 119 | "In this approach, the graph structure is combined to values (or feature vectors) associated to the nodes. Missing node values are found by propagating the known values to their neighbors." 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "L = nx.normalized_laplacian_matrix(G)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "We use label spreading from this [publication](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3219) which perform a smoothing of the labels over the graph. We assume 2 classes with one-hot-encoding,i.e. feature vectors on nodes have dimension 2." 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "# labels\n", 145 | "labels = np.zeros((L.shape[0],2))\n", 146 | "labels[1,0] = 1 # node 1 has first label \n", 147 | "labels[5,1] = 1 # node 5 has second label\n", 148 | "print(labels)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "def labelSpreading(G, labels, alpha, tol=1e-3):\n", 158 | " L = nx.normalized_laplacian_matrix(G)\n", 159 | " S = np.identity(L.shape[0]) - L.toarray()\n", 160 | " max_iter = 1000\n", 161 | " Y = np.zeros(labels.shape) \n", 162 | " for i in range(max_iter):\n", 163 | " Y_tmp = Y.copy()\n", 164 | " Y = alpha * np.dot(S, Y) + (1 - alpha) * labels\n", 165 | " if np.linalg.norm(Y-Y_tmp) < tol:\n", 166 | " print('Converged after {} iterations.'.format(i))\n", 167 | " break\n", 168 | " return Y" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": null, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "smooth = labelSpreading(G,labels, 0.9)\n", 178 | "print(labels)\n", 179 | "print(smooth)" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "Let us plot the results" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": null, 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [ 195 | "pos = nx.spring_layout(G, iterations=200)\n", 196 | "\n", 197 | "plt.figure(figsize=(14, 6))\n", 198 | "ax1 = plt.subplot(1, 2, 1)\n", 199 | "ax2 = plt.subplot(1, 2, 2)\n", 200 | "#ax3 = plt.subplot(1, 4, 3)\n", 201 | "nx.draw(G, pos=pos, ax=ax1, node_color=smooth[:,0], cmap=plt.cm.Blues)\n", 202 | "nx.draw(G, pos=pos, ax=ax2, node_color=smooth[:,1], cmap=plt.cm.Blues)\n", 203 | "\n", 204 | "ax1.title.set_text(\"Smoothing of class 1 over the network\")\n", 205 | "ax2.title.set_text(\"Smoothing of class 2 over the network\")\n", 206 | "plt.show()" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [ 215 | "propagated_labels = np.argmax(smooth,axis=1)\n", 216 | "#label_dic = {k:v for k,v in enumerate(propagated_labels)}\n", 217 | "print(propagated_labels)" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "nx.draw(G, pos=pos, node_color=propagated_labels, cmap=plt.cm.Blues)\n", 227 | "plt.title(\"Final classification\")\n", 228 | "plt.show()" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [] 237 | } 238 | ], 239 | "metadata": { 240 | "kernelspec": { 241 | "display_name": "Python 3", 242 | "language": "python", 243 | "name": "python3" 244 | }, 245 | "language_info": { 246 | "codemirror_mode": { 247 | "name": "ipython", 248 | "version": 3 249 | }, 250 | "file_extension": ".py", 251 | "mimetype": "text/x-python", 252 | "name": "python", 253 | "nbconvert_exporter": "python", 254 | "pygments_lexer": "ipython3", 255 | "version": "3.6.9" 256 | } 257 | }, 258 | "nbformat": 4, 259 | "nbformat_minor": 4 260 | } 261 | -------------------------------------------------------------------------------- /05_Reddit_Pushshift_API.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Exploring Reddit with the pushshift API\n", 8 | "This notebook give you examples of how to use the pushshift API for querying Reddit data.\n", 9 | "\n", 10 | "* Pushshift doc: https://github.com/pushshift/api\n", 11 | "* FAQ about Pushshift: https://www.reddit.com/r/pushshift/comments/bcxguf/new_to_pushshift_read_this_faq/" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import requests\n", 21 | "import pandas as pd" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "We define a convenient function to get data from Pushshift:" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": null, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "def get_pushshift_data(data_type, params):\n", 38 | " \"\"\"\n", 39 | " Gets data from the pushshift api.\n", 40 | " \n", 41 | " data_type can be 'comment' or 'submission'\n", 42 | " The rest of the args are interpreted as payload.\n", 43 | " \n", 44 | " Read more: https://github.com/pushshift/api\n", 45 | " \n", 46 | " This function is inspired from:\n", 47 | " https://www.jcchouinard.com/how-to-use-reddit-api-with-python/\n", 48 | " \"\"\"\n", 49 | " \n", 50 | " base_url = f\"https://api.pushshift.io/reddit/search/{data_type}/\"\n", 51 | " request = requests.get(base_url, params=params)\n", 52 | " print('Query:')\n", 53 | " print(request.url)\n", 54 | " try: \n", 55 | " data = request.json().get(\"data\")\n", 56 | " except:\n", 57 | " print('--- Request failed ---')\n", 58 | " data = []\n", 59 | " return data\n" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "This function accepts the parameters of the pushshift API detailed in the doc at https://github.com/pushshift/api. An example is given below." 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "## Example of request to the API\n", 74 | "Let us collect the comments written in the last 2 day in the subreddit `askscience`. The number of results returned is limited to 100, the upper limit of the API." 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "# parameters for the pushshift API\n", 84 | "data_type = \"comment\" # accept \"comment\" or \"submission\", search in comments or submissions\n", 85 | "params = {\n", 86 | " \"subreddit\" : \"askscience\", # limit to one or a list of subreddit(s)\n", 87 | " \"after\" : \"7d\", # Select the timeframe. Epoch value or Integer + \"s,m,h,d\" (i.e. \"second\", \"minute\", \"hour\", \"day\")\n", 88 | " \"size\" : 100, # Number of results to return (limited to max 100 in the API)\n", 89 | " \"author\" : \"![deleted]\" # limit to a list of authors or ignore authors with a \"!\" mark in front\n", 90 | "}\n", 91 | "# Note: the option \"aggs\" (aggregate) has been de-activated in the API\n", 92 | "\n", 93 | "data = get_pushshift_data(data_type, params)\n", 94 | "if data: # control if something is returned\n", 95 | " df = pd.DataFrame.from_records(data)\n", 96 | " print('Some of the data returned:')\n", 97 | " df[['author', 'subreddit', 'score', 'created_utc', 'body']].head()\n", 98 | "else:\n", 99 | " print('The returned data is empty. Change the parameters.')" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "## Authors of comments\n", 107 | "Let us collect the authors of comments in a subreddit during the last days. The next function helps bypassing the limit of results by sending queries multiple times, avoiding collecting duplicate authors." 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "# Get the list of unique authors of comments in the API results\n", 117 | "# bypass the limit of 100 results by sending multiple queries\n", 118 | "def get_unique_authors(n_results, params):\n", 119 | " results_per_request = 100 # default nb of results per query\n", 120 | " n_queries = n_results // results_per_request + 1\n", 121 | " author_list = []\n", 122 | " author_neg_list = [\"![deleted]\"]\n", 123 | " for query in range(n_queries):\n", 124 | " params[\"author\"] = author_neg_list\n", 125 | " data = get_pushshift_data(data_type=\"comment\", params=params)\n", 126 | " df = pd.DataFrame.from_records(data)\n", 127 | " if df.empty:\n", 128 | " return author_list\n", 129 | " authors = list(df['author'].unique())\n", 130 | " # add ! mark\n", 131 | " authors_neg = [\"!\"+ a for a in authors]\n", 132 | " author_list += authors\n", 133 | " author_neg_list += authors_neg\n", 134 | " return author_list" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "Let us make a list of authors commenting on the subreddit \"askscience\"." 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "# Ask for the authors of comments in the last days, colect at least \"n_results\"\n", 151 | "subreddit = \"askscience\"\n", 152 | "data_type = \"comment\"\n", 153 | "params = {\n", 154 | " \"subreddit\" : subreddit,\n", 155 | " \"after\" : \"2d\"\n", 156 | "}\n", 157 | "n_results = 500\n", 158 | "author_list = get_unique_authors(n_results, params)\n", 159 | "print(\"Number of authors:\",len(author_list))" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "From the list of authors obtained, let us collect where else the commented posts (other subreddits)." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": {}, 173 | "outputs": [], 174 | "source": [ 175 | "# Collect the subreddits where the authors wrote comments and the number of comments\n", 176 | "from collections import Counter\n", 177 | "data_type = \"comment\"\n", 178 | "params = {\n", 179 | " \"size\" : 100\n", 180 | "}\n", 181 | "subreddits_count = Counter()\n", 182 | "for author in author_list:\n", 183 | " params[\"author\"] = author\n", 184 | " print(params[\"author\"])\n", 185 | " data = get_pushshift_data(data_type=data_type, params=params)\n", 186 | " if data: # in case the resquest failed and data is empty\n", 187 | " df = pd.DataFrame.from_records(data)\n", 188 | " subreddits_count += Counter(dict(df['subreddit'].value_counts()))" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "## Network of subreddits (ego-graph)\n", 196 | "Let us build the ego-graph of the subreddit. Other subreddits will be connected to the main one if the users commented in the other subreddits as well." 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": null, 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "# module for networks\n", 206 | "import networkx as nx" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [ 215 | "threshold = 0.05\n", 216 | "G = nx.Graph()\n", 217 | "G.add_node(subreddit)\n", 218 | "self_refs = subreddits_count[subreddit]\n", 219 | "for sub,value in subreddits_count.items():\n", 220 | " post_ratio = value/self_refs\n", 221 | " if post_ratio >= threshold:\n", 222 | " G.add_edge(subreddit,sub, weight=post_ratio)\n", 223 | "print(\"Total number of edges in the graph:\",G.number_of_edges())" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "Here is an alternative way of generating the graph using pandas dataframes instead of a for loop (it might scale better on bigger graphs)." 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": null, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "threshold = 0.05\n", 240 | "subreddits_count_df = pd.DataFrame.from_dict(subreddits_count, orient='index', columns=['total'])\n", 241 | "subreddits_ratio_df = subreddits_count_df/subreddits_count_df.loc[subreddit]\n", 242 | "subreddits_ratio_df.rename(columns={'total': 'weight'}, inplace=True)\n", 243 | "filtered_sr_df = subreddits_ratio_df[subreddits_ratio_df['weight'] >= threshold].copy() # filter weights < threshold\n", 244 | "filtered_sr_df['source'] = subreddit\n", 245 | "filtered_sr_df['target'] = filtered_sr_df.index\n", 246 | "Gdf = nx.from_pandas_edgelist(filtered_sr_df, source='source', target='target', edge_attr=True)\n", 247 | "print(\"Total number of edges in the graph:\",Gdf.number_of_edges())" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": null, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "# Write the graph to a file\n", 257 | "path = 'egograph.gexf'\n", 258 | "nx.write_gexf(G,path)" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "## Network of subreddit neighbors\n", 266 | "This second collection makes a distinction between the related subreddits. For each author, all the subreddits where he/she commented will be connected together. The weight of each connection will be proportional to the number of users commenting in both subreddits joined by the connection. The ego-graph becomes an approximate neighbor network for the central subreddit." 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": null, 272 | "metadata": { 273 | "scrolled": true 274 | }, 275 | "outputs": [], 276 | "source": [ 277 | "data_type = \"comment\"\n", 278 | "params = {\n", 279 | " \"size\" : 100\n", 280 | "}\n", 281 | "count_list = []\n", 282 | "for author in author_list:\n", 283 | " params[\"author\"] = author\n", 284 | " print(params[\"author\"])\n", 285 | " data = get_pushshift_data(data_type=data_type, params=params)\n", 286 | " if data:\n", 287 | " df = pd.DataFrame.from_records(data)\n", 288 | " count_list.append(Counter(dict(df['subreddit'].value_counts())))" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [ 297 | "import itertools\n", 298 | "threshold = 0.05\n", 299 | "G = nx.Graph()\n", 300 | "\n", 301 | "for author_sub_count in count_list:\n", 302 | " sub_list = author_sub_count.most_common(10)\n", 303 | " # Compute all the combinations of subreddit pairs\n", 304 | " sub_combinations = list(itertools.combinations(sub_list, 2))\n", 305 | " for sub_pair in sub_combinations:\n", 306 | " node1 = sub_pair[0][0]\n", 307 | " node2 = sub_pair[1][0]\n", 308 | " if G.has_edge(node1, node2):\n", 309 | " G[node1][node2]['weight'] +=1\n", 310 | " else:\n", 311 | " G.add_edge(node1, node2, weight=1)\n", 312 | "print(\"Total number of edges {}, and nodes {}\".format(G.number_of_edges(),G.number_of_nodes()))" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": null, 318 | "metadata": {}, 319 | "outputs": [], 320 | "source": [ 321 | "# Sparsify the graph\n", 322 | "to_remove = [edge for edge in G.edges.data() if edge[2]['weight'] < 2]\n", 323 | "G.remove_edges_from(to_remove)" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": {}, 330 | "outputs": [], 331 | "source": [ 332 | "# Remove isolated nodes\n", 333 | "G.remove_nodes_from(list(nx.isolates(G)))\n", 334 | "print(\"Total number of edges {}, and nodes {}\".format(G.number_of_edges(),G.number_of_nodes()))" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": null, 340 | "metadata": {}, 341 | "outputs": [], 342 | "source": [ 343 | "# Write the graph to a file\n", 344 | "path = 'graph.gexf'\n", 345 | "nx.write_gexf(G,path)" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "An example of the graph visualization you can obtain using Gephi:\n", 353 | "![Reddit neighbors](figures/redditneighbors.png \"Reddit neighbors\")" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": {}, 360 | "outputs": [], 361 | "source": [] 362 | } 363 | ], 364 | "metadata": { 365 | "kernelspec": { 366 | "display_name": "Python 3", 367 | "language": "python", 368 | "name": "python3" 369 | }, 370 | "language_info": { 371 | "codemirror_mode": { 372 | "name": "ipython", 373 | "version": 3 374 | }, 375 | "file_extension": ".py", 376 | "mimetype": "text/x-python", 377 | "name": "python", 378 | "nbconvert_exporter": "python", 379 | "pygments_lexer": "ipython3", 380 | "version": "3.6.9" 381 | } 382 | }, 383 | "nbformat": 4, 384 | "nbformat_minor": 4 385 | } 386 | -------------------------------------------------------------------------------- /06_exploring_wikipedia.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "247e90a8", 6 | "metadata": {}, 7 | "source": [ 8 | "# Exploring Wikipedia using its API\n", 9 | "\n", 10 | "No need to introduce Wikipedia, the online encyclopedia is now the default online reference for a lot of subjects.\n", 11 | "Each page contains one or more links to other pages, it is therefore natural to apply the graph exploration methods we have seen previously. The whole graph of wikipedia pages is quite big, the english version has ca. 7M pages and 500M links, and the graph is highly connected.\n", 12 | "\n", 13 | "A great thing about wikipedia is that almost all the data are open. Data is either available as [full dumps](https://dumps.wikimedia.org/) or [using the API](https://www.mediawiki.org/wiki/API:Main_page). Dumps are better for offline processing. There are a number of tools dedicated to processing data dumps from Wikipedia, e.g. [Sparkwiki](https://github.com/epfl-lts2/sparkwiki). In this tutorial we will be using the API to access the [English edition of Wikipedia](https://en.wikipedia.org), although adapting the code for another language is fairly trivial.\n", 14 | "\n", 15 | "## Experimenting the API using the sandbox\n", 16 | "There are a lot of possibilities to use the API, we only use the `query` action to retrieve data about pages. The [documentation](https://www.mediawiki.org/w/api.php?action=help&modules=query) provides a list to properties that can be retrieved for each page.\n", 17 | "\n", 18 | "The [API Sandbox](https://en.wikipedia.org/wiki/Special:ApiSandbox) helps testing and building queries quickly. If we need to retrieve\n", 19 | "the categories of Albert Einstein's wikipedia page it can be tried [here](https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=categories&titles=Albert%20Einstein) or use the following sandbox query:\n", 20 | "```\n", 21 | "https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=categories&titles=Albert%20Einstein\n", 22 | "```\n", 23 | "\n", 24 | "Once we are satisfied with the data we retrieve, it can be performed directly. Let us load the links from Einstein's wikipedia page using python `requests`. We add the `cllimit` parameter to retrieve the first 20 categories." 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "id": "e82f4995", 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "import requests" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "id": "57929177", 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "r = requests.get('https://en.wikipedia.org/w/api.php?action=query&format=json&prop=categories&titles=Albert%20Einstein&cllimit=20')" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": null, 50 | "id": "eddb7188", 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "einstein_cats = r.json()\n", 55 | "einstein_cats" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "c7b683d9", 61 | "metadata": {}, 62 | "source": [ 63 | "The result of the query is JSON-formatted and converted to a Python dict directly" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "id": "7fd30d64", 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "einstein_cats['query']['pages']['736']['title']" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "id": "e007a790", 79 | "metadata": {}, 80 | "source": [ 81 | "The query only return the 20 first categories of the page. While the `cllimit`parameter could be increased, there are cases where multiple queries are needed to retrieve all the data of a page. The response provides a `continue` key providing information about how to retrieve the following categories. In this case, appending `clcontinue=736|American_science_writers` to our query will retrieve the next categories." 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "id": "ceac8b59", 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "rc = requests.get('https://en.wikipedia.org/w/api.php?action=query&format=json&prop=categories&titles=Albert%20Einstein&cllimit=20&clcontinue=736|American_science_writers')" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "id": "e74c2acb", 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "rc.json()" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "id": "bc1bbd05", 107 | "metadata": {}, 108 | "source": [ 109 | "*Exercise*: experiment queries yourself, retrieve data for different pages. The `categories`, `links`, `pageviews` are of interest for us.\n", 110 | "\n", 111 | "## Using the Wikipedia-API package\n", 112 | "While it is fairly simple to make complete retrieval of the data using multiple requests fully automatic, we will leave this as an exercise for the readers and use a helper library that will handle this for us. \n", 113 | "\n", 114 | "There are multiple options available, we will use the [Wikipedia-API](https://github.com/martin-majlis/Wikipedia-API) library. You may want to check its [documentation](https://wikipedia-api.readthedocs.io/en/latest/API.html)." 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "id": "275d5bac", 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "import wikipediaapi" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "id": "e3fa4e43", 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [ 134 | "# create the api object\n", 135 | "api = wikipediaapi.Wikipedia('en')" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "id": "d8464926", 141 | "metadata": {}, 142 | "source": [ 143 | "We can simply create a `Page` object and get its properties (which are lazily evaluated to limit the number of requests actually sent to Wikipedia API)" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "id": "2bd992d7", 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "albert = api.page('Albert Einstein')" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "id": "02afb90c", 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "len(albert.categories)" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "id": "d334fb88", 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "len(albert.links)" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "id": "4927f76e", 179 | "metadata": {}, 180 | "source": [ 181 | "It is interesting to see which requests are sent by increasing the logging level" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "id": "dcb694df", 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "import sys\n", 192 | "# helper function\n", 193 | "def set_wikipediaapi_logging(level):\n", 194 | " wikipediaapi.logging.getLogger('wikipediaapi').handlers.clear() # ugly - remove handlers to avoid duplicates\n", 195 | " wikipediaapi.log.setLevel(level=level)\n", 196 | " # Set handler if you use Python in interactive mode\n", 197 | " out_hdlr = wikipediaapi.logging.StreamHandler(sys.stderr)\n", 198 | " out_hdlr.setFormatter(wikipediaapi.logging.Formatter('%(asctime)s %(message)s'))\n", 199 | " out_hdlr.setLevel(level)\n", 200 | " wikipediaapi.log.addHandler(out_hdlr)\n", 201 | "\n", 202 | "set_wikipediaapi_logging(wikipediaapi.logging.INFO)" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "id": "26b52d45", 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [ 212 | "quantum = api.page('Quantum mechanics')" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "id": "d1a4d1bc", 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [ 222 | "quantum.categories" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "id": "a9553bb6", 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [ 232 | "quantum.summary" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "id": "cf5a8893", 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "set_wikipediaapi_logging(wikipediaapi.logging.WARN) # reset logging" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "id": "10dbe049", 248 | "metadata": {}, 249 | "source": [ 250 | "## Graph exploration\n", 251 | "The methods presented in this tutorial can be used to explore the Wikipedia page graph. However, the package [littleballoffur](https://github.com/benedekrozemberczki/littleballoffur) used to demonstrate the concepts cannot be used directly with the API. Therefore those methods have been implemented into a different package: [spikexplore](https://github.com/epfl-lts2/spikexplore).\n", 252 | "\n", 253 | "The package has no release (yet) so install it using pip:\n", 254 | "```\n", 255 | "pip install git+https://github.com/epfl-lts2/spikexplore.git\n", 256 | "```" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "id": "8cc9a521", 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [ 266 | "import networkx as nx\n", 267 | "from spikexplore import graph_explore\n", 268 | "from spikexplore.backends.wikipedia import WikipediaNetwork\n", 269 | "from spikexplore.config import SamplingConfig, GraphConfig, DataCollectionConfig, WikipediaConfig" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "id": "61f1ac20", 275 | "metadata": {}, 276 | "source": [ 277 | "Spikexplore supports different backends: NetworkX (mostly for testing, in this case you can use littleballoffur), Twitter (requires the creation of a developer account to obtain API keys), and Wikipedia.\n", 278 | "\n", 279 | "You must create first the sampling backend you will use to acquire data:" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "id": "adb8bd43", 286 | "metadata": {}, 287 | "outputs": [], 288 | "source": [ 289 | "wiki_config = WikipediaConfig(lang='en') # adapt to your favorite language\n", 290 | "wiki_config.pages_ignored = [] # you can supply a list of page titles you want to ignore\n", 291 | "sampling_backend = WikipediaNetwork(wiki_config)" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "id": "a1fc3047", 297 | "metadata": {}, 298 | "source": [ 299 | "A second configuration object contains the parameters used for the graph creation. In our case the page graph is not weighted so keep the minimum edge weight to 1." 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": null, 305 | "id": "a4e53fe4", 306 | "metadata": {}, 307 | "outputs": [], 308 | "source": [ 309 | "graph_config = GraphConfig(min_degree=1, min_weight=1)" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "id": "de7d699f", 315 | "metadata": {}, 316 | "source": [ 317 | "Exploration parameters are stored in another object. In the example below, we sample randomly 10 % of the edges encountered at each hop. The `max_nodes_per_hop` provides an additional fine-tuning parameter to limit the growth of the graph, limiting the number of new neighbors for each node. The `Fireball` expansion will make it close to the forest fire sampling seen previously." 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": null, 323 | "id": "91e69885", 324 | "metadata": {}, 325 | "outputs": [], 326 | "source": [ 327 | "data_collection_config = DataCollectionConfig(exploration_depth=2, random_subset_mode=\"percent\",\n", 328 | " random_subset_size=10, expansion_type=\"fireball\",\n", 329 | " max_nodes_per_hop=100)" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "id": "6f8744e9", 335 | "metadata": {}, 336 | "source": [ 337 | "Finally the graph and data collection confuguration objects are combined into a single one that will be passed to spikexplore" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "id": "2de68a8b", 344 | "metadata": {}, 345 | "outputs": [], 346 | "source": [ 347 | "sampling_config = SamplingConfig(graph_config, data_collection_config)" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "id": "b77929a9", 353 | "metadata": {}, 354 | "source": [ 355 | "Let us define starting points to explore the graph (you might want to adapt this to your needs)" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": null, 361 | "id": "524e680b", 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [ 365 | "initial_nodes = ['Albert Einstein', 'Quantum mechanics', 'Theory of relativity']" 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": null, 371 | "id": "74f49217", 372 | "metadata": {}, 373 | "outputs": [], 374 | "source": [ 375 | "#set_wikipediaapi_logging(wikipediaapi.logging.INFO) # optional, peep under the hood what is happening !! PRINTS A LOT OF OUTPUT !!\n", 376 | "graph_result, _ = graph_explore.explore(sampling_backend, initial_nodes, sampling_config)\n", 377 | "print('Collected {} nodes and {} edges'.format(graph_result.number_of_nodes(), graph_result.number_of_edges()))" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "id": "617d7ce9", 384 | "metadata": {}, 385 | "outputs": [], 386 | "source": [ 387 | "nx.write_gexf(graph_result, 'wiki_einstein.gexf') # save result to view in Gephi" 388 | ] 389 | }, 390 | { 391 | "cell_type": "markdown", 392 | "id": "56782d5a", 393 | "metadata": {}, 394 | "source": [ 395 | "You can now open the result graph in Gephi and see if the communities make sense. Use different sets of parameters (sampling ratio, different initial nodes, etc.)\n", 396 | "\n", 397 | "With only 3 starting nodes and 2 hops, the graph is growing quickly and takes around 1 minute to collect. Adding an extra hop would take longer (but should remain reasonable), try it yourself if you have some time. Given the large number of connections in Wikipedia (the Albert Einstein page has more than 1000 links), exploring using Snowball sampling would lead to a much larger number of connections. You can try it by setting `random_subset_size` to 100 and set `max_nodes_per_hop` to a very large value to avoid being capped in any way." 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": null, 403 | "id": "3798d18b", 404 | "metadata": {}, 405 | "outputs": [], 406 | "source": [ 407 | "# WARNING - this is going take some time to collect (ca. 10 minutes) !!\n", 408 | "data_collection_config_full = DataCollectionConfig(exploration_depth=2, random_subset_mode=\"percent\",\n", 409 | " random_subset_size=100, expansion_type=\"coreball\",\n", 410 | " degree=2, max_nodes_per_hop=1000000)\n", 411 | "sampling_config_full = SamplingConfig(graph_config, data_collection_config_full)\n", 412 | "# Uncomment this if you want to collect the full neighborhood\n", 413 | "# graph_full, _ = graph_explore.explore(sampling_backend, initial_nodes, sampling_config_full)\n", 414 | "# print('Collected {} nodes and {} edges'.format(graph_full.number_of_nodes(), graph_full.number_of_edges()))" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "id": "b50b146c", 420 | "metadata": {}, 421 | "source": [ 422 | "Collecting the full neighborhood of the 3 initial nodes using Snowball sampling yields a graph having ca. 1800 nodes and 126000 edges ! You can download the resulting graph using [this link](https://drive.switch.ch/index.php/s/qJr6qqBLnOEOR2f)" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "id": "2ffd130b", 428 | "metadata": {}, 429 | "source": [ 430 | "## Adding features to the graph\n", 431 | "Thanks to the wikipedia API, it is possible to retrieve the number of times a page has been viewed using the `pageviews`property. Unfortunately this is not part of the Wikipedia-API package. We need a small helper function to get those values." 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": null, 437 | "id": "f9faaf63", 438 | "metadata": {}, 439 | "outputs": [], 440 | "source": [ 441 | "import numpy as np\n", 442 | "from urllib.parse import quote\n", 443 | "\n", 444 | "# get the pageview data for a page. this is not efficient, you can send requests simultaneously\n", 445 | "# for multiple pages. There is no error checking, sending bad data will return you exceptions.\n", 446 | "def get_pageviews(page_titles, number_of_days=7, lang='en'):\n", 447 | " pageviews = {}\n", 448 | " # pageview query supports at most 50 pages per request (in practice MUCH less), split page titles into smaller chunks\n", 449 | " chunk_size = 5 # try bigger values but you can have empty results..\n", 450 | " nodesplit = np.array_split(page_titles, len(page_titles)//chunk_size)\n", 451 | " \n", 452 | " # get pageview data for eahc chunk\n", 453 | " for ns in nodesplit:\n", 454 | " pages = quote('|'.join(ns))\n", 455 | " url = 'https://{}.wikipedia.org/w/api.php?action=query&format=json&prop=pageviews&titles={}&pvipdays={}'.format(lang, pages, number_of_days)\n", 456 | " res = requests.get(url).json()\n", 457 | " for k,v in res['query']['pages'].items():\n", 458 | " pv = list(filter(None, v['pageviews'].values())) # some pageviews value might be None -> do not take them into account\n", 459 | " pageviews[v['title']] = pv\n", 460 | "\n", 461 | " return pageviews" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "id": "94bd5ee1", 467 | "metadata": {}, 468 | "source": [ 469 | "Let us retrieve the page views for all the nodes in the graph, for the last 10 days " 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": null, 475 | "id": "562436a1", 476 | "metadata": {}, 477 | "outputs": [], 478 | "source": [ 479 | "graph_stats = get_pageviews(graph_result.nodes(), number_of_days=10)\n", 480 | "graph_stats" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "id": "d1a7b986", 486 | "metadata": {}, 487 | "source": [ 488 | "We will now compute the mean of the pageviews for each page and store it as an attribute in the graph" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": null, 494 | "id": "d4177d08", 495 | "metadata": {}, 496 | "outputs": [], 497 | "source": [ 498 | "pageviews_mean_graph = {k: np.mean(v) for k,v in graph_stats.items()}" 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "id": "57c4ecdc", 504 | "metadata": {}, 505 | "source": [ 506 | "Store those values as attribute, and save the resulting graph" 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": null, 512 | "id": "3905f5a6", 513 | "metadata": {}, 514 | "outputs": [], 515 | "source": [ 516 | "nx.set_node_attributes(graph_result, pageviews_mean_graph, name='mean_pageviews')\n", 517 | "nx.write_gexf(graph_result, 'wiki_einstein_pageviews.gexf')" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "id": "5c548433", 523 | "metadata": {}, 524 | "source": [ 525 | "You can now use Gephi to open the graph and define the node size according to the number of visits they received. It can also be used to remove some nodes from the graph that have little importance visit-wise." 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": null, 531 | "id": "b5872294", 532 | "metadata": {}, 533 | "outputs": [], 534 | "source": [ 535 | "print('Average pageviews range from min={} to max={}'.format(np.min(list(pageviews_mean_graph.values())), np.max(list(pageviews_mean_graph.values()))))" 536 | ] 537 | }, 538 | { 539 | "cell_type": "markdown", 540 | "id": "405dca06", 541 | "metadata": {}, 542 | "source": [ 543 | "*Exercise*: Acquire a bigger graph with pageviews data and use it to remove nodes that receive few visits. " 544 | ] 545 | }, 546 | { 547 | "cell_type": "code", 548 | "execution_count": null, 549 | "id": "2bd46193", 550 | "metadata": {}, 551 | "outputs": [], 552 | "source": [] 553 | } 554 | ], 555 | "metadata": { 556 | "kernelspec": { 557 | "display_name": "Python 3", 558 | "language": "python", 559 | "name": "python3" 560 | }, 561 | "language_info": { 562 | "codemirror_mode": { 563 | "name": "ipython", 564 | "version": 3 565 | }, 566 | "file_extension": ".py", 567 | "mimetype": "text/x-python", 568 | "name": "python", 569 | "nbconvert_exporter": "python", 570 | "pygments_lexer": "ipython3", 571 | "version": "3.9.2" 572 | } 573 | }, 574 | "nbformat": 4, 575 | "nbformat_minor": 5 576 | } 577 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 EPFL LTS2 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Large Scale Graph Mining: Visualization, Exploration, and Analysis 2 | 3 | This repository contains notebooks and material for the tutorial on large networks analysis, presented at The Web Conf 2021. 4 | 5 | You can find more information on the [Tutorial webpage](https://lts2.epfl.ch/reproducible-research/graph-exploration/). 6 | 7 | # Program 8 | 9 | * Introduction, setting up the environment, general presentation, building a graph from data using Python modules Pandas and Networkx. 10 | * Standard network properties (small world, hubs, centrality, page rank, degree distribution), experiments with Python module Networkx. 11 | * Graph visualization with Gephi. Layouts, visualizing node properties with color, size. Communities, centrality, page rank. Limits of visualization. [Link to tutorials](https://github.com/mizvol/gephi-tutorials/). Walkthrough videos: [Layouts](https://www.youtube.com/watch?v=aRZIeTroUog), [Publishing graphs online](https://www.youtube.com/watch?v=ok4iFOe9niU). 12 | * Principles of graph exploration and sampling. Reducing to a subgraph of interest with graph sampling, experiments on small toy graph models with Python library Little ball of Fur https://github.com/benedekrozemberczki/littleballoffur (Random walks, snowball sampling, Forest Fire, and more advanced Spikyball). 13 | * Conclusion and debriefing of Part I. Challenges, problems, data bottlenecks in large graphs and how to overcome them. 14 | * Some unsupervised and semi-supervised machine learning on graphs: clustering and community detection, label propagation, combining the graph structure with data on nodes (attributed graph). How to apply to large graphs: relation with part I) on graph sampling. 15 | * Exploring online data: online graph sampling via an API where access is limited. Example of Wikipedia and social networks (Reddit pushshift API or Twitter). 16 | * Mini project on real and fresh large graph data, using an API and combining what has been learned during the day. 17 | 18 | Each section (except the graph visualization with Gephi) is associated to a Jupyter notebook. These notebooks are described below. 19 | 20 | # Notebooks 21 | 22 | You can run the notebooks online by clicking on the binder button 23 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/epfl-lts2/GraphMining-TheWebConf2021/HEAD) 24 | 25 | You can also run them locally from your computer, check the [Local installation](#local-installation) section 26 | 27 | The visualization part uses [Gephi](https://gephi.org), you need to [download the 28 | latest version](https://gephi.org/users/download/) and install it on your computer. 29 | It requires a working Java runtime environment. 30 | 31 | ## Outline 32 | 33 | The Tutorial will explore the world of (large) graphs using the following notebooks: 34 | * `01_graph_from_edge_list.ipynb` is the introduction to building and handling a graph with Python, using `networkx`. 35 | * `02_basic_graph_properties.ipynb` show the different standard methods and tools to get information about the graph structure (graph diameter, connectivity, degree distribution...) 36 | * [Graph visualization using Gephi](https://github.com/mizvol/gephi-tutorials) 37 | * `03_graph_exploration_and_sampling.ipynb` presents the algorithms for large graph exploration and graph sampling, 38 | * `04_ML_on_graphs.ipynb` proposes to experiment with unsupervised and semi-supervised machine learning on graphs, 39 | * `05_Reddit_Pushshift_API.ipynb` shows in practice how to explore the Reddit social network through the Reddit Pushshift API. 40 | * `06_exploring_wikipedia.ipynb` uses the Wikipedia API with the Spikyball approach to efficiently build a graph of pages and hyperlinks, enriched by additional page information. 41 | 42 | ![Reddit neighbors](figures/redditneighbors.png "Reddit neighbors") 43 | 44 | ## Local installation 45 | 46 | If you prefer using the notebooks locally instead of running them on binder, you need to set 47 | up a number of tools. 48 | 49 | For a local installation, you will need [git], [Python], and packages from the [Python scientific stack][scipy]. 50 | If you don't know how to install those on your platform, we recommend to install [Miniconda] or [Anaconda], a distribution of the [conda] package and environment manager. 51 | Follow the below instructions to install it and create an environment for the course. 52 | 53 | 1. Download the Python 3.x installer for Windows, macOS, or Linux from and install with default settings. 54 | Skip this step if you have conda already installed (from [Miniconda] or [Anaconda]). 55 | * Windows: double-click on `Miniconda3-latest-Windows-x86_64.exe`. 56 | * macOS: double-click on `Miniconda3-latest-MacOSX-x86_64.pkg` or run `bash Miniconda3-latest-MacOSX-x86_64.sh` in a terminal. 57 | * Linux: run `bash Miniconda3-latest-Linux-x86_64.sh` in a terminal or use your package manager. 58 | 1. Open a terminal. 59 | Windows: open the Anaconda Prompt from the Start menu. 60 | 1. Install git with `conda install git`. 61 | 1. Navigate to the folder where you want to store the course material with `cd path/to/folder`. 62 | 1. Download this repository with `git clone https://github.com/epfl-lts2/GraphMining-TheWebConf2021`. 63 | 1. Enter the repository with `cd GraphMining-TheWebConf2021`. 64 | 1. Create an environment with the packages required for the course with `conda env create -f environment_local.yml`. 65 | 66 | Every time you want to work, do the following: 67 | 68 | 1. Open a terminal. 69 | Windows: open the Anaconda Prompt from the Start menu. 70 | 1. Activate the environment with `conda activate graphmining-twc21-local`. 71 | 1. Navigate to the folder where you stored the course material with `cd path/to/folder/GraphMining-TheWebConf2021`. 72 | 1. Start Jupyter with `jupyter lab`. 73 | The command should open a new tab in your web browser. 74 | 1. Edit and run the notebooks from your browser. 75 | 1. Once done, you can run `conda deactivate` to leave the `graphmining-twc21-local` environment. 76 | 77 | [git]: https://git-scm.com 78 | [python]: https://www.python.org 79 | [scipy]: https://www.scipy.org 80 | [anaconda]: https://www.anaconda.com/download 81 | [miniconda]: https://conda.io/miniconda.html 82 | [conda]: https://conda.io 83 | [conda-forge]: https://conda-forge.org 84 | 85 | 86 | 87 | -------------------------------------------------------------------------------- /data/taxonomy_small.tsv: -------------------------------------------------------------------------------- 1 | uid | parent_uid | name | rank | sourceinfo | uniqname | flags | 2 | 805080 | | life | no rank | silva:0,ncbi:1,worms:1,gbif:0,irmng:0 | | | 3 | 93302 | 805080 | cellular organisms | no rank | ncbi:131567 | | | 4 | 996421 | 93302 | Archaea | domain | silva:D37982/#1,ncbi:2157,worms:8,gbif:2,irmng:12 | Archaea (domain silva:D37982/#1) | | 5 | 5246114 | 996421 | Marine Hydrothermal Vent Group 1(MHVG-1) | no rank - terminal | silva:AB302039/#2 | | | 6 | 102415 | 996421 | Thaumarchaeota | phylum | silva:D87348/#2,ncbi:651137,worms:559429,irmng:258 | | | 7 | 5246628 | 102415 | terrestrial group | no rank - terminal | silva:AB600373/#3 | | | 8 | 4795965 | 102415 | Marine Group I | no rank | silva:D87348/#3,ncbi:905826 | | | 9 | 5205649 | 4795965 | uncultured marine crenarchaeote 'Gulf of Maine' | species | silva:AGBE01001967,ncbi:1089683 | | sibling_higher | 10 | 5208050 | 4795965 | uncultured marine archaeon DCM858 | species | silva:AF121992,ncbi:105567 | | sibling_higher | 11 | 5205092 | 4795965 | uncultured marine group I thaumarchaeote | species | silva:JF715361,ncbi:360837 | | sibling_higher | 12 | 5205072 | 4795965 | uncultured Nitrosopumilaceae archaeon | species | silva:JN591993,ncbi:1118069 | | sibling_higher | 13 | 5208765 | 4795965 | uncultured marine archaeon DCM874 | species | silva:AF122001,ncbi:105576 | | sibling_higher | 14 | 179705 | 4795965 | Cenarchaeales | order | silva:AY192631/#4,ncbi:205948,worms:573555,irmng:10957 | | | 15 | 189165 | 179705 | Cenarchaeaceae | family | silva:AY192631/#5,ncbi:205957,worms:573556,gbif:4896414,irmng:105060 | | | 16 | 888219 | 189165 | Cenarchaeum | genus | silva:AY192631/#6,ncbi:46769,worms:573557,gbif:4896415,irmng:1030758 | | | 17 | 5207306 | 888219 | Thermoplasmatales archaeon Gpl | species | silva:JN881616,ncbi:261391 | | | 18 | 376618 | 888219 | Cenarchaeum symbiosum A | no rank - terminal | silva:DQ397549,ncbi:414004 | | | 19 | 4796244 | 888219 | crenarchaeote symbiont of Axinella sp. | species | silva:AF421159,ncbi:173517 | | | 20 | 4796252 | 888219 | crenarchaeote symbiont of Axinella verrucosa | species | silva:AF420237,ncbi:171716 | | | 21 | 5204995 | 888219 | uncultured Cenarchaeaceae thaumarchaeote | species | silva:DQ299278,ncbi:375545 | | | 22 | 363497 | 888219 | Cenarchaeum symbiosum | species | silva:AF083072,ncbi:46770,worms:573558,gbif:5900469,irmng:11734707 | | | 23 | 376617 | 363497 | Cenarchaeum symbiosum B | no rank - terminal | ncbi:414005 | | infraspecific | 24 | 5204996 | 888219 | Cenarchaeum environmental samples | no rank - terminal | ncbi:355925 | | was_container | 25 | 5204994 | 189165 | Cenarchaeaceae environmental samples | no rank - terminal | ncbi:375544 | | was_container | 26 | 5204998 | 179705 | Cenarchaeales environmental samples | no rank - terminal | ncbi:260466 | | was_container | 27 | 5204999 | 179705 | uncultured Cenarchaeales thaumarchaeote | species | ncbi:260467 | | sibling_higher,not_otu | 28 | 5205398 | 4795965 | uncultured crenarchaeote ODPB-A18 | species | silva:AF121098,ncbi:95930 | | sibling_higher | 29 | 5205625 | 4795965 | uncultured crenarchaeote ODPB-A3 | species | silva:AF121093,ncbi:95925 | | sibling_higher | 30 | 5205019 | 4795965 | uncultured marine crenarchaeote KM3-86-C1 | species | silva:EU686625,ncbi:526685 | | sibling_higher | 31 | 5205058 | 4795965 | uncultured Nitrosopumilales archaeon | species | silva:EF069380,ncbi:171534 | | sibling_higher | 32 | 5205057 | 4795965 | uncultured crenarchaeote 4B7 | species | silva:U40238,ncbi:44557 | | sibling_higher | 33 | 5204997 | 4795965 | uncultured Cenarchaeum sp. | species | silva:AB240745,ncbi:355926 | | sibling_higher | 34 | 5205427 | 4795965 | uncultured crenarchaeote DeepAnt-EC39 | species | silva:AY316120,ncbi:247023 | | sibling_higher | 35 | 5208046 | 4795965 | uncultured marine archaeon DCM74161 | species | silva:AF121988,ncbi:105563 | | sibling_higher | 36 | 5208240 | 4795965 | archaeon enrichment culture clone CN25 | species | silva:HQ338108,ncbi:914639 | | sibling_higher | 37 | 5208589 | 4795965 | uncultured archaeon SAGMA-3 | species | silva:AB050234,ncbi:140274 | | sibling_higher | 38 | 5208762 | 4795965 | uncultured marine archaeon DCM867 | species | silva:AF121998,ncbi:105573 | | sibling_higher | 39 | 770080 | 4795965 | Candidatus Nitrosopumilus koreensis AR1 | no rank - terminal | silva:CP003842,ncbi:1229908 | | | 40 | 5574944 | 4795965 | cluster HP459251 | no rank - terminal | silva:HP459251 | | | 41 | 5207875 | 4795965 | uncultured archaeon 19a-19 | species | silva:AJ294879,ncbi:138488 | | sibling_higher | 42 | 5208594 | 4795965 | uncultured archaeon SAGMA-8 | species | silva:AB050238,ncbi:140278 | | sibling_higher | 43 | 5208200 | 4795965 | uncultured archaeon CRA8-11cm | species | silva:AF119127,ncbi:93178 | | sibling_higher | 44 | 5207749 | 4795965 | uncultured archaeon W4-93a | species | silva:JQ085821,ncbi:1131007 | | sibling_higher | 45 | 5208047 | 4795965 | uncultured marine archaeon DCM74159 | species | silva:AF121987,ncbi:105562 | | sibling_higher | 46 | 5274697 | 4795965 | Order Incertae Sedis | no rank - terminal | silva:D87348/#4 | Order Incertae Sedis (in Marine Group I) | was_container | 47 | 5208100 | 4795965 | uncultured archaeon CRA20-0cm | species | silva:AF119130,ncbi:93181 | | sibling_higher | 48 | 5208763 | 4795965 | uncultured marine archaeon DCM862 | species | silva:AF121995,ncbi:105570 | | sibling_higher | 49 | 5207771 | 4795965 | uncultured marine archaeon TS235C310 | species | silva:AF052949,ncbi:80928 | | sibling_higher | 50 | 5205628 | 4795965 | uncultured crenarchaeote ODPB-A9 | species | silva:AF121096,ncbi:95928 | | sibling_higher | 51 | 5207867 | 4795965 | uncultured archaeon 19b-52 | species | silva:AJ294873,ncbi:138482 | | sibling_higher | 52 | 5205035 | 4795965 | uncultured marine crenarchaeote AD1000-56-E4 | species | silva:EU686623,ncbi:526641 | | sibling_higher | 53 | 5207868 | 4795965 | uncultured archaeon 19a-1 | species | silva:AJ294874,ncbi:138483 | | sibling_higher | 54 | 5205615 | 4795965 | unidentified hydrothermal vent archaeon PVA_OTU_2 | species | silva:U46678,ncbi:45967 | | sibling_higher | 55 | 5207852 | 4795965 | uncultured archaeon APA1-0cm | species | silva:AF119134,ncbi:93185 | | sibling_higher | 56 | 5205022 | 4795965 | uncultured marine crenarchaeote AD1000-207-H3 | species | silva:EU686633,ncbi:526637 | | sibling_higher | 57 | 5205191 | 4795965 | uncultured thaumarchaeote | species | silva:JN825303,ncbi:651141 | | sibling_higher | 58 | 5205356 | 4795965 | unidentified archaeon PM7 | species | silva:U71109,ncbi:52268 | | sibling_higher | 59 | 5208043 | 4795965 | uncultured marine archaeon DCM861 | species | silva:AF121994,ncbi:105569 | | sibling_higher | 60 | 5208198 | 4795965 | uncultured archaeon CRA7-0cm | species | silva:AF119125,ncbi:93176 | | sibling_higher | 61 | 5205627 | 4795965 | uncultured crenarchaeote ODPB-A7 | species | silva:AF121095,ncbi:95927 | | sibling_higher | 62 | 5208099 | 4795965 | uncultured archaeon APA3-0cm | species | silva:AF119136,ncbi:93187 | | sibling_higher | 63 | 5207670 | 4795965 | uncultured sponge symbiont PAAR8 | species | silva:AF186423,ncbi:105057 | | sibling_higher | 64 | 5208170 | 4795965 | unidentified hydrothermal vent archaeon PVA_OTU_3 | species | silva:U46679,ncbi:45968 | | sibling_higher | 65 | 5208767 | 4795965 | uncultured marine archaeon DCM871 | species | silva:AF121999,ncbi:105574 | | sibling_higher | 66 | 5574945 | 4795965 | Family Incertae Sedis | no rank - terminal | silva:D87348/#5 | Family Incertae Sedis (in Marine Group I) | incertae_sedis,was_container | 67 | 1018283 | 4795965 | Candidatus Nitrosoarchaeum | genus | silva:AF293018/#6,ncbi:1007082 | | incertae_sedis | 68 | 5208590 | 1018283 | uncultured archaeon SAGMA-1 | species | silva:AB050232,ncbi:140272 | | incertae_sedis_inherited | 69 | 5205691 | 1018283 | uncultured Green Bay ferromanganous micronodule archaeon ARA7 | species | silva:AF293018,ncbi:140618 | | incertae_sedis_inherited | 70 | 1018291 | 1018283 | Candidatus Nitrosoarchaeum limnia | species | ncbi:1007084 | | incertae_sedis_inherited | 71 | 177954 | 1018291 | Candidatus Nitrosoarchaeum limnia SFB1 | no rank - terminal | silva:AEGP01000029,ncbi:886738 | | incertae_sedis_inherited,infraspecific | 72 | 889858 | 1018291 | Candidatus Nitrosoarchaeum limnia BG20 | no rank - terminal | ncbi:859192 | | incertae_sedis_inherited,infraspecific | 73 | 59398 | 1018283 | Candidatus Nitrosoarchaeum koreensis | species | ncbi:1088740 | | incertae_sedis_inherited | 74 | 784590 | 59398 | Candidatus Nitrosoarchaeum koreensis MY1 | no rank - terminal | silva:HQ331116,ncbi:1001994 | | incertae_sedis_inherited,infraspecific | 75 | 5580298 | 1018283 | Candidatus Nitrosoarchaeum environmental samples | no rank - terminal | ncbi:1297041 | | incertae_sedis_inherited,was_container | 76 | 5580299 | 1018283 | uncultured Candidatus Nitrosoarchaeum sp. | species | ncbi:1501411 | | incertae_sedis_inherited,not_otu | 77 | 5205071 | 4795965 | Nitrosopumilaceae environmental samples | no rank - terminal | ncbi:1118068 | | was_container | 78 | 933985 | 4795965 | Nitrosopumilaceae | family | ncbi:338190,worms:559432,gbif:6453386,irmng:121355 | | sibling_higher,merged,barren | 79 | 5205000 | 4795965 | Nitrosopumilales environmental samples | no rank - terminal | ncbi:371948 | | was_container | 80 | 5205049 | 4795965 | uncultured marine crenarchaeote HF4000_APKG10L15 | species | ncbi:455613 | | environmental,not_otu | 81 | 5205038 | 4795965 | uncultured marine crenarchaeote HF4000_APKG8D6 | species | ncbi:455602 | | environmental,not_otu | 82 | 5205026 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW133O4 | species | ncbi:455574 | | environmental,not_otu | 83 | 5205067 | 4795965 | uncultured marine crenarchaeote KM3-47-D6 | species | ncbi:526680 | | environmental,not_otu | 84 | 5205042 | 4795965 | uncultured marine crenarchaeote HF4000_APKG8I13 | species | ncbi:455606 | | environmental,not_otu | 85 | 5205014 | 4795965 | uncultured marine crenarchaeote HF4000_APKG5C13 | species | ncbi:455591 | | environmental,not_otu | 86 | 5205025 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW133M9 | species | ncbi:455573 | | environmental,not_otu | 87 | 5205068 | 4795965 | uncultured marine crenarchaeote SAT1000-21-C11 | species | ncbi:526689 | | environmental,not_otu | 88 | 5205024 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW133K13 | species | ncbi:455572 | | environmental,not_otu | 89 | 5205060 | 4795965 | uncultured marine crenarchaeote HF4000_APKG3K8 | species | ncbi:455588 | | environmental,not_otu | 90 | 5205047 | 4795965 | uncultured marine crenarchaeote HF4000_APKG10F15 | species | ncbi:455611 | | environmental,not_otu | 91 | 5205046 | 4795965 | uncultured marine crenarchaeote HF4000_APKG9M20 | species | ncbi:455608 | | environmental,not_otu | 92 | 5205013 | 4795965 | uncultured marine crenarchaeote HF4000_APKG5B22 | species | ncbi:455590 | | environmental,not_otu | 93 | 5205045 | 4795965 | uncultured marine crenarchaeote HF4000_APKG8G15 | species | ncbi:455605 | | environmental,not_otu | 94 | 5205006 | 4795965 | uncultured marine crenarchaeote HF4000_APKG3E18 | species | ncbi:455585 | | environmental,not_otu | 95 | 5205016 | 4795965 | uncultured marine crenarchaeote HF4000_APKG6D9 | species | ncbi:455597 | | environmental,not_otu | 96 | 5205003 | 4795965 | uncultured marine crenarchaeote HF4000_APKG5N21 | species | ncbi:455593 | | environmental,not_otu | 97 | 5205018 | 4795965 | uncultured marine crenarchaeote HF4000_APKG6C9 | species | ncbi:455595 | | environmental,not_otu | 98 | 5205061 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW141O9 | species | ncbi:455581 | | environmental,not_otu | 99 | 5205044 | 4795965 | uncultured marine crenarchaeote HF4000_APKG8G2 | species | ncbi:455604 | | environmental,not_otu | 100 | 5205056 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW97M7 | species | ncbi:455568 | | environmental,not_otu | 101 | 5205005 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW141N1 | species | ncbi:455580 | | environmental,not_otu | 102 | 5205063 | 4795965 | uncultured marine crenarchaeote HF4000_APKG2O16 | species | ncbi:455582 | | environmental,not_otu | 103 | 5205040 | 4795965 | uncultured marine crenarchaeote HF4000_APKG7F11 | species | ncbi:455600 | | environmental,not_otu | 104 | 5205012 | 4795965 | uncultured marine crenarchaeote HF4000_APKG5E24 | species | ncbi:455592 | | environmental,not_otu | 105 | 5205051 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW97J3 | species | ncbi:455567 | | environmental,not_otu | 106 | 5205069 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW133C7 | species | ncbi:455570 | | environmental,not_otu | 107 | 5205008 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW141M12 | species | ncbi:455578 | | environmental,not_otu | 108 | 5205027 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW137N13 | species | ncbi:455575 | | environmental,not_otu | 109 | 5205055 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW93E5 | species | ncbi:455563 | | environmental,not_otu | 110 | 5205004 | 4795965 | uncultured marine crenarchaeote HF4000_APKG4H17 | species | ncbi:455589 | | environmental,not_otu | 111 | 5205034 | 4795965 | uncultured crenarchaeote 83A10 | species | ncbi:166585 | | environmental,not_otu | 112 | 5205066 | 4795965 | uncultured crenarchaeote 74A4 | species | ncbi:166279 | | environmental,not_otu | 113 | 5205053 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW93I24 | species | ncbi:455565 | | environmental,not_otu | 114 | 5205070 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW133I6 | species | ncbi:455571 | | environmental,not_otu | 115 | 5205028 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW137N18 | species | ncbi:455576 | | environmental,not_otu | 116 | 5205009 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW141M18 | species | ncbi:455579 | | environmental,not_otu | 117 | 5205021 | 4795965 | uncultured marine crenarchaeote AD1000-202-A2 | species | ncbi:526636 | | environmental,not_otu | 118 | 5205062 | 4795965 | uncultured marine crenarchaeote HF4000_APKG3B16 | species | ncbi:455583 | | environmental,not_otu | 119 | 5205007 | 4795965 | uncultured marine crenarchaeote HF4000_APKG3D24 | species | ncbi:455584 | | environmental,not_otu | 120 | 5205002 | 4795965 | uncultured marine crenarchaeote SAT1000-49-D2 | species | ncbi:526692 | | environmental,not_otu | 121 | 5205033 | 4795965 | uncultured crenarchaeote 31B02 | species | ncbi:166584 | | environmental,not_otu | 122 | 5205015 | 4795965 | uncultured marine crenarchaeote HF4000_APKG6D3 | species | ncbi:455596 | | environmental,not_otu | 123 | 5205032 | 4795965 | uncultured crenarchaeote 19H08 | species | ncbi:166583 | | environmental,not_otu | 124 | 5205020 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW97P9 | species | ncbi:455569 | | environmental,not_otu | 125 | 5205037 | 4795965 | uncultured marine crenarchaeote HF4000_APKG9P22 | species | ncbi:455609 | | environmental,not_otu | 126 | 5205030 | 4795965 | uncultured marine crenarchaeote HF4000_APKG3J11 | species | ncbi:455587 | | environmental,not_otu | 127 | 5205052 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW93J19 | species | ncbi:455566 | | environmental,not_otu | 128 | 5205010 | 4795965 | uncultured marine crenarchaeote HF4000_APKG6J21 | species | ncbi:455598 | | environmental,not_otu | 129 | 5205064 | 4795965 | uncultured marine crenarchaeote HF4000_APKG3H9 | species | ncbi:455586 | | environmental,not_otu | 130 | 5205065 | 4795965 | uncultured marine crenarchaeote KM3-34-D9 | species | ncbi:526677 | | environmental,not_otu | 131 | 5205041 | 4795965 | uncultured marine crenarchaeote HF4000_APKG7F19 | species | ncbi:455601 | | environmental,not_otu | 132 | 5205031 | 4795965 | uncultured crenarchaeote 15G10 | species | ncbi:166582 | | environmental,not_otu | 133 | 5205001 | 4795965 | uncultured marine crenarchaeote SAT1000-23-F7 | species | ncbi:526690 | | environmental,not_otu | 134 | 5205043 | 4795965 | uncultured marine crenarchaeote HF4000_APKG8O8 | species | ncbi:455607 | | environmental,not_otu | 135 | 5205036 | 4795965 | uncultured marine crenarchaeote AD1000-325-A12 | species | ncbi:526639 | | environmental,not_otu | 136 | 5205029 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW141J13 | species | ncbi:455577 | | environmental,not_otu | 137 | 5205017 | 4795965 | uncultured marine crenarchaeote HF4000_APKG6B14 | species | ncbi:455594 | | environmental,not_otu | 138 | 5205011 | 4795965 | uncultured marine crenarchaeote HF4000_APKG6N3 | species | ncbi:455599 | | environmental,not_otu | 139 | 5205050 | 4795965 | uncultured marine crenarchaeote HF4000_APKG10I20 | species | ncbi:455612 | | environmental,not_otu | 140 | 5205048 | 4795965 | uncultured marine crenarchaeote HF4000_APKG10D8 | species | ncbi:455610 | | environmental,not_otu | 141 | 5205054 | 4795965 | uncultured marine crenarchaeote HF4000_ANIW93H17 | species | ncbi:455564 | | environmental,not_otu | 142 | 5205039 | 4795965 | uncultured marine crenarchaeote HF4000_APKG8D22 | species | ncbi:455603 | | environmental,not_otu | 143 | 701766 | 4795965 | Nitrosopumilales | order | ncbi:31932,worms:559431,irmng:12878 | | merged,barren | 144 | 5574946 | 4795965 | Marine Group I thaumarchaeote SCGC AAA799-D11 | species | ncbi:1502291 | | sibling_higher | 145 | 5574952 | 4795965 | Marine Group I thaumarchaeote SCGC AAA799-E16 | species | ncbi:1502292 | | sibling_higher | 146 | 5574953 | 4795965 | Marine Group I thaumarchaeote SCGC AAA799-N04 | species | ncbi:1502293 | | sibling_higher | 147 | 5247438 | 4795965 | Marine Group I thaumarchaeote SCGC AAA160-J20 | species | ncbi:1105053 | | sibling_higher | 148 | 5574947 | 4795965 | Marine Group I thaumarchaeote SCGC AAA799-P11 | species | ncbi:1502295 | | sibling_higher | 149 | 5574950 | 4795965 | Marine Group I thaumarchaeote SCGC AAA799-B03 | species | ncbi:1502289 | | sibling_higher | 150 | 5205089 | 4795965 | Marine Group I environmental samples | no rank - terminal | ncbi:905827 | | was_container | 151 | 5205093 | 4795965 | Thaumarchaeota archaeon SCGC AAA007-O23 | species | ncbi:913333 | | sibling_higher | 152 | 5574951 | 4795965 | Marine Group I thaumarchaeote SCGC AAA799-D07 | species | ncbi:1502290 | | sibling_higher | 153 | 5574949 | 4795965 | Marine Group I thaumarchaeote SCGC RSA3 | species | ncbi:1503183 | | sibling_higher | 154 | 5205088 | 4795965 | Marine Group I thaumarchaeote SCGC AB-629-I23 | species | ncbi:1131266 | | sibling_higher | 155 | 5205087 | 4795965 | Marine Group I thaumarchaeote SCGC AB-629-A13 | species | ncbi:1131267 | | sibling_higher | 156 | 5574948 | 4795965 | Marine Group I thaumarchaeote SCGC AAA799-O18 | species | ncbi:1502294 | | sibling_higher | 157 | 5205090 | 4795965 | archaeon enrichment culture clone AR | species | ncbi:1109400 | | environmental | 158 | 5205091 | 4795965 | archaeon enrichment culture clone SJ | species | ncbi:1109399 | | environmental | 159 | 933987 | 4795965 | Candidatus Nitrosopumilus | genus | silva:D87348/#6,ncbi:338191,worms:559433,gbif:6453387,irmng:1468524 | | sibling_higher | 160 | 5205074 | 933987 | uncultured Candidatus Nitrosopumilus sp. | species | silva:GU386315,ncbi:517606 | | | 161 | 5208197 | 933987 | uncultured archaeon CRA7-11cm | species | silva:AF119126,ncbi:93177 | | | 162 | 5205358 | 933987 | unidentified archaeon C6 | species | silva:U71112,ncbi:52266 | | | 163 | 5207873 | 933987 | uncultured archaeon 19a-4 | species | silva:AJ294875,ncbi:138484 | | | 164 | 5205360 | 933987 | unidentified archaeon C46 | species | silva:U71117,ncbi:52264 | | | 165 | 5205382 | 933987 | unculturable Mariana archaeon no. 1 | species | silva:D87348,ncbi:73126 | | | 166 | 5208115 | 933987 | uncultured sediment archaeon | species | silva:FN553854,ncbi:676242 | | | 167 | 604494 | 933987 | Nitrosopumilus sp. SJ | species | silva:AJVI01000001,ncbi:1027374 | | | 168 | 922836 | 933987 | Candidatus Nitrosopumilus sp. NM25 | species | silva:AB546961,ncbi:718286 | | | 169 | 5205357 | 933987 | unidentified archaeon PM8 | species | silva:U71110,ncbi:52269 | | | 170 | 5205629 | 933987 | uncultured crenarchaeote ODPB-A12 | species | silva:AF121097,ncbi:95929 | | | 171 | 5207855 | 933987 | uncultured archaeon CRA36-0cm | species | silva:AF119131,ncbi:93182 | | | 172 | 5205365 | 933987 | unidentified archaeon C20 | species | silva:U71114,ncbi:52261 | | | 173 | 5208257 | 933987 | archaeon enrichment culture clone CN150 | species | silva:HQ338109,ncbi:914640 | | | 174 | 5207871 | 933987 | uncultured archaeon 19a-14 | species | silva:AJ294877,ncbi:138486 | | | 175 | 5580292 | 933987 | Candidatus Nitrosopumilus koreensis | species | ncbi:1510466 | | merged | 176 | 5878051 | 933987 | Nitrosopumilus sp. LS_AOA | species | ncbi:1671876 | | | 177 | 5580295 | 933987 | Candidatus Nitrosopumilus adriaticus | species | ncbi:1580092 | | | 178 | 5878048 | 933987 | Nitrosopumilus sp. BACL13 MAG-120910-bin56 | species | ncbi:1655560 | | | 179 | 5878053 | 933987 | Nitrosopumilus sp. SW | species | ncbi:1818884 | | | 180 | 5361995 | 933987 | Candidatus Nitrosopumilus sp. PS0 | species | ncbi:1470067 | | | 181 | 770081 | 933987 | Candidatus Nitrosopumilus sp. AR2 | species | ncbi:1229909 | | | 182 | 297988 | 933987 | Candidatus Nitrosopumilus salaria | species | ncbi:1170320 | | | 183 | 136362 | 297988 | Candidatus Nitrosopumilus salaria BD31 | no rank - terminal | silva:AEXL02000090,ncbi:859350 | | infraspecific | 184 | 5580296 | 933987 | Candidatus Nitrosopumilus piranensis | species | ncbi:1582439 | | | 185 | 5205073 | 933987 | Nitrosopumilus environmental samples | no rank - terminal | ncbi:517605 | | was_container | 186 | 5878049 | 933987 | Nitrosopumilus sp. BACL13 MAG-121220-bin23 | species | ncbi:1655561 | | | 187 | 5878052 | 933987 | Nitrosopumilus sp. Nsub | species | ncbi:1776294 | | | 188 | 465781 | 933987 | Nitrosopumilus sp. AR | species | ncbi:1027373 | | | 189 | 5878047 | 933987 | Candidatus Nitrosopumilus sp. DDS1 | species | ncbi:1679446 | | | 190 | 5878050 | 933987 | Nitrosopumilus sp. DDS1 | species | ncbi:1740111 | | | 191 | 5580293 | 933987 | Nitrosopumilus sp. PRT-SC01 | species | ncbi:1527301 | | | 192 | 5361994 | 933987 | Candidatus Nitrosopumilus sp. HCA1 | species | ncbi:1470066 | | | 193 | 933975 | 933987 | Nitrosopumilus maritimus | species | ncbi:338192,worms:559434,gbif:6453388,irmng:11736333 | | | 194 | 922115 | 933975 | Nitrosopumilus maritimus SCM1 | no rank - terminal | silva:CP000866,ncbi:436308 | | infraspecific | 195 | 5580294 | 933987 | Nitrosopumilus sp. RSA3 | species | ncbi:1435467 | | | 196 | 5246629 | 102415 | Marine Benthic Group A | no rank | silva:AB177271/#3 | | | 197 | 5205059 | 5246629 | uncultured marine crenarchaeote KM3-153-F8 | species | silva:EU686631,ncbi:526665 | | | 198 | 5205648 | 5246629 | uncultured marine crenarchaeote | species | silva:FJ150808,ncbi:115413 | | | 199 | 5207853 | 5246629 | uncultured archaeon APA2-17cm | species | silva:AF119135,ncbi:93186 | | | 200 | 5205558 | 5246629 | uncultured crenarchaeote | species | silva:EU369858,ncbi:29281 | | | 201 | 5205171 | 5246629 | uncultured marine thaumarchaeote | species | silva:JF715383,ncbi:1167203 | | | 202 | 5207748 | 5246629 | uncultured archaeon W5-61a | species | silva:JQ085825,ncbi:1131008 | | | 203 | 5207149 | 5246629 | uncultured marine benthic group A euryarchaeote | species | silva:JN590083,ncbi:1082484 | | | 204 | 5246635 | 102415 | AK59 | no rank - terminal | silva:AB301997/#3 | | | 205 | 5246643 | 102415 | OPPD003 | no rank - terminal | silva:AY861942/#3 | | | 206 | 5246636 | 102415 | AK8 | no rank - terminal | silva:GQ848431/#3 | | | 207 | 5246651 | 102415 | pMC2A209 | no rank - terminal | silva:AB175574/#3 | | | 208 | 5246633 | 102415 | Group C3 | no rank | silva:U59984/#3 | | | 209 | 5205909 | 5246633 | uncultured marine crenarchaeote E6-3G | species | silva:HQ214610,ncbi:907719 | | | 210 | 5246634 | 102415 | Class Incertae Sedis | no rank - terminal | silva:EU239993/#3 | Class Incertae Sedis (in phylum Thaumarchaeota) | was_container | 211 | 5246649 | 102415 | Z273FA48 | no rank - terminal | silva:FJ485501/#3 | | | 212 | 5208593 | 102415 | uncultured archaeon SAGMA-Z | species | silva:AB050231,ncbi:140271 | | sibling_higher | 213 | 5246625 | 102415 | Miscellaneous Crenarchaeotic Group | no rank | silva:L25305/#3 | | | 214 | 5205620 | 5246625 | uncultured crenarchaeote pBRKC135 | species | silva:AF118657,ncbi:91314 | | | 215 | 5207854 | 5246625 | uncultured archaeon CRA9-27cm | species | silva:AF119129,ncbi:93180 | | | 216 | 5208511 | 5246625 | uncultured archaeon 19c-51 | species | silva:AJ294896,ncbi:138505 | | | 217 | 5208020 | 5246625 | uncultured archaeon Arc.171 | species | silva:AF005765,ncbi:62249 | | | 218 | 5208563 | 5246625 | uncultured archaeon Arc.118 | species | silva:AF005761,ncbi:62236 | | | 219 | 5208749 | 5246625 | uncultured archaeon 19b-17 | species | silva:AJ294859,ncbi:138468 | | | 220 | 5205910 | 5246625 | uncultured marine crenarchaeote E37-7F | species | silva:HQ214611,ncbi:907717 | | | 221 | 5205948 | 5246625 | uncultured Desulfurococcales archaeon | species | silva:HQ700684,ncbi:307553 | | | 222 | 5207698 | 5246625 | uncultured archaeon Arc.212 | species | silva:AF005767,ncbi:62254 | | | 223 | 5205908 | 5246625 | uncultured marine crenarchaeote E48-1C | species | silva:HQ214612,ncbi:907718 | | | 224 | 5208426 | 5246625 | uncultured archaeon 19a-29 | species | silva:AJ294882,ncbi:138491 | | | 225 | 5208806 | 5246625 | uncultured archaeon 19b-39 | species | silva:AJ294869,ncbi:138478 | | | 226 | 5205385 | 5246625 | uncultured crenarchaeote MCG | species | silva:EU559699,ncbi:529375 | | | 227 | 5205314 | 5246625 | uncultured crenarchaeote pBRKC88 | species | silva:AF118662,ncbi:91317 | | | 228 | 5207905 | 5246625 | uncultured archaeon Arc.119 | species | silva:AF005762,ncbi:62240 | | | 229 | 5208057 | 5246625 | archaeon enrichment culture clone C4-6C-A | species | silva:GU196155,ncbi:698585 | | | 230 | 5207787 | 5246625 | uncultured thermal soil archaeon | species | silva:AF391992,ncbi:166500 | | | 231 | 5878054 | 5246625 | hot springs metagenome | species | silva:ADKI01000120,ncbi:433727 | | | 232 | 5207788 | 5246625 | archaeon enrichment culture clone C4-32C-A | species | silva:GU196176,ncbi:698565 | | | 233 | 5207804 | 5246625 | archaeon enrichment culture clone C3-1C-A | species | silva:GQ470597,ncbi:670835 | | | 234 | 5246630 | 102415 | Terrestrial Hot Spring Gp(THSCG) | no rank | silva:U63341/#3 | | | 235 | 5205086 | 5246630 | Candidatus Caldiarchaeum subterraneum | species | silva:JN881568,ncbi:311458 | | | 236 | 5208097 | 5246630 | uncultured archaeon 20c-54 | species | silva:AJ299202,ncbi:138407 | | | 237 | 5274699 | 5246630 | Order Incertae Sedis | no rank - terminal | silva:AB566230/#4 | Order Incertae Sedis (in Terrestrial Hot Spring Gp(THSCG)) | was_container | 238 | 5207913 | 5246630 | uncultured archaeon 20a-28 | species | silva:AJ299159,ncbi:138364 | | | 239 | 5207909 | 5246630 | uncultured archaeon 20a-12 | species | silva:AJ299155,ncbi:138360 | | | 240 | 5208622 | 5246630 | uncultured archaeon 20a-6 | species | silva:AJ299151,ncbi:138356 | | | 241 | 5208623 | 5246630 | uncultured archaeon 20a-7 | species | silva:AJ299152,ncbi:138357 | | | 242 | 5207910 | 5246630 | uncultured archaeon 20b-7 | species | silva:AJ299162,ncbi:138367 | | | 243 | 5208598 | 5246630 | uncultured archaeon 20b-27 | species | silva:AJ299172,ncbi:138377 | | | 244 | 4917978 | 5246630 | uncultured marine microorganism HF4000_APKG10H11 | species | silva:EU016667,ncbi:455559 | | | 245 | 5574954 | 5246630 | Family Incertae Sedis | no rank - terminal | silva:AB566230/#5 | Family Incertae Sedis (in Terrestrial Hot Spring Gp(THSCG)) | incertae_sedis,was_container | 246 | 4795961 | 5246630 | Candidatus Caldiarchaeum | genus | silva:AB566230/#6,ncbi:1048752 | | incertae_sedis,barren | 247 | 5246646 | 102415 | Sc-EA05 | no rank - terminal | silva:JQ684417/#3 | | | 248 | 5246627 | 102415 | Soil Crenarchaeotic Group(SCG) | no rank | silva:U68604/#3 | | | 249 | 5205313 | 5246627 | Crenarchaeote enrichment culture clone OREC-R1073 | species | silva:JF799649,ncbi:1050482 | | | 250 | 5205368 | 5246627 | unidentified archaeon SCA1150 | species | silva:U62812,ncbi:50850 | | | 251 | 5205475 | 5246627 | unidentified archaeon SCA1158 | species | silva:U62815,ncbi:50853 | | | 252 | 5205395 | 5246627 | Crenarchaeote enrichment culture clone OREC-R104 | species | silva:JF799591,ncbi:1050424 | | | 253 | 5205480 | 5246627 | unidentified archaeon SCA1170 | species | silva:U62817,ncbi:50855 | | | 254 | 5205295 | 5246627 | uncultured crenarchaeote TRC132-9 | species | silva:AF227639,ncbi:115024 | | | 255 | 5205481 | 5246627 | unidentified archaeon SCA1166 | species | silva:U62816,ncbi:50854 | | | 256 | 5205477 | 5246627 | unidentified archaeon SCA1151 | species | silva:U62813,ncbi:50851 | | | 257 | 5205566 | 5246627 | Crenarchaeote enrichment culture clone OREC-B1081 | species | silva:JF799656,ncbi:1050489 | | | 258 | 5274698 | 5246627 | Order Incertae Sedis | no rank - terminal | silva:U62820/#4 | Order Incertae Sedis (in Soil Crenarchaeotic Group(SCG)) | was_container | 259 | 5205695 | 5246627 | Crenarchaeote enrichment culture clone OREC-R1342 | species | silva:JF799622,ncbi:1050455 | | | 260 | 5208322 | 5246627 | uncultured ammonia-oxidizing archaeon | species | silva:JQ668662,ncbi:418404 | | | 261 | 5205297 | 5246627 | uncultured crenarchaeote TREC89-44 | species | silva:AY487101,ncbi:258871 | | | 262 | 5205479 | 5246627 | unidentified archaeon SCA1173 | species | silva:U62818,ncbi:50856 | | | 263 | 5205593 | 5246627 | uncultured crenarchaeote TREC89-34 | species | silva:AY487103,ncbi:258873 | | | 264 | 5205286 | 5246627 | Crenarchaeote enrichment culture clone OREC-B1021 | species | silva:JF799608,ncbi:1050441 | | | 265 | 5247440 | 5246627 | cluster CQ786497 | no rank - terminal | silva:CQ786497 | | | 266 | 5205476 | 5246627 | unidentified archaeon SCA1154 | species | silva:U62814,ncbi:50852 | | | 267 | 5208787 | 5246627 | uncultured soil archaeon | species | silva:JQ668088,ncbi:164850 | | | 268 | 5574957 | 5246627 | Family Incertae Sedis | no rank - terminal | silva:U62820/#5 | Family Incertae Sedis (in Soil Crenarchaeotic Group(SCG)) | incertae_sedis,was_container | 269 | 378065 | 5246627 | Candidatus Nitrososphaera | genus | silva:U62820/#6,ncbi:497726 | | incertae_sedis | 270 | 378062 | 378065 | Candidatus Nitrososphaera gargensis | species | silva:EU281334,ncbi:497727 | | incertae_sedis_inherited | 271 | 5205408 | 378065 | Crenarchaeote enrichment culture clone OREC-B1045 | species | silva:JF799625,ncbi:1050458 | | incertae_sedis_inherited | 272 | 5205562 | 378065 | Crenarchaeote enrichment culture clone OREC-R1390 | species | silva:JF799664,ncbi:1050497 | | incertae_sedis_inherited | 273 | 5208591 | 378065 | uncultured archaeon SAGMA-2 | species | silva:AB050233,ncbi:140273 | | incertae_sedis_inherited | 274 | 811534 | 378065 | Candidatus Nitrososphaera gargensis Ga9.2 | no rank - terminal | silva:CP002408,ncbi:1237085 | | incertae_sedis_inherited | 275 | 5207940 | 378065 | uncultured archaeon SAGMA-W | species | silva:AB050228,ncbi:140268 | | incertae_sedis_inherited | 276 | 5205257 | 378065 | uncultured Nitrososphaera sp. | species | silva:JX047156,ncbi:759874 | | incertae_sedis_inherited | 277 | 5205644 | 378065 | Crenarchaeote enrichment culture clone OREC-R1050 | species | silva:JF799630,ncbi:1050463 | | incertae_sedis_inherited | 278 | 5208592 | 378065 | uncultured archaeon SAGMA-Y | species | silva:AB050230,ncbi:140270 | | incertae_sedis_inherited | 279 | 5205474 | 378065 | unidentified archaeon SCA11 | species | silva:U62820,ncbi:50858 | | incertae_sedis_inherited | 280 | 5205256 | 378065 | Nitrososphaera environmental samples | no rank - terminal | ncbi:759873 | | incertae_sedis_inherited,was_container | 281 | 5580300 | 378065 | Candidatus Nitrososphaera sp. N89-12 | species | ncbi:1622297 | | incertae_sedis_inherited | 282 | 118845 | 378065 | Nitrososphaera sp. JG1 | species | ncbi:1110358 | | incertae_sedis_inherited | 283 | 5878060 | 378065 | Candidatus Nitrososphaera sp. 13_1_40CM_48_12 | species | ncbi:1805054 | | incertae_sedis_inherited | 284 | 400100 | 378065 | Nitrososphaera viennensis | species | ncbi:1034015 | | incertae_sedis_inherited | 285 | 750289 | 400100 | Nitrososphaera viennensis EN76 | no rank - terminal | silva:FR773157,ncbi:926571 | | incertae_sedis_inherited,infraspecific | 286 | 5580302 | 378065 | Candidatus Nitrososphaera evergladensis | species | ncbi:1459637 | | incertae_sedis_inherited | 287 | 5585661 | 5580302 | Candidatus Nitrososphaera evergladensis SR1 | no rank - terminal | ncbi:1459636 | | incertae_sedis_inherited,infraspecific | 288 | 5878061 | 378065 | Nitrososphaera sp. 13_1_20CM_3_36_3 | species | ncbi:1805257 | | incertae_sedis_inherited | 289 | 5580301 | 378065 | Candidatus Nitrososphaera sp. THUAOA | species | ncbi:1526932 | | incertae_sedis_inherited | 290 | 5580303 | 378065 | Nitrososphaera sp. enrichment culture | species | ncbi:1616795 | | incertae_sedis_inherited,environmental | 291 | 5246647 | 102415 | pSL12 | no rank | silva:U63343/#3 | | | 292 | 5205023 | 5246647 | uncultured marine crenarchaeote AD1000-23-H12 | species | silva:EU686635,ncbi:526638 | | | 293 | 5246652 | 102415 | HDBA-SITS389 | no rank - terminal | silva:HM187516/#3 | | | 294 | 5246644 | 102415 | FS243A-60 | no rank - terminal | silva:AB302012/#3 | | | 295 | 5246632 | 102415 | AK31 | no rank - terminal | silva:DQ190090/#3 | | | 296 | 5246640 | 102415 | F9P122000-Arc-2-E02 | no rank - terminal | silva:JQ221038/#3 | | | 297 | 5246639 | 102415 | D-F10 | no rank | silva:AB293212/#3 | | | 298 | 5205892 | 5246639 | uncultured Candidatus Nitrosocaldus sp. | species | silva:JN881577,ncbi:766501 | | | 299 | 5246638 | 102415 | AB64A-17 | no rank - terminal | silva:FR846896/#3 | | | 300 | 5246637 | 102415 | AK56 | no rank - terminal | silva:AB302036/#3 | | | 301 | 5246648 | 102415 | ArcC-u-cD06 | no rank - terminal | silva:EU307065/#3 | | | 302 | 5246642 | 102415 | AS48 | no rank - terminal | silva:JX047158/#3 | | | 303 | 5246641 | 102415 | Papm3A43 | no rank - terminal | silva:AB213098/#3 | | | 304 | 5246645 | 102415 | TOTO-A6-15 | no rank - terminal | silva:AB167488/#3 | | | 305 | 5246631 | 102415 | South African Gold Mine Gp 1(SAGMCG-1) | no rank | silva:AJ535128/#3 | | | 306 | 5207933 | 5246631 | uncultured archaeon SAGMA-V | species | silva:AB050227,ncbi:140267 | | | 307 | 5205954 | 5246631 | uncultured Thermoprotei archaeon | species | silva:HQ671235,ncbi:476105 | | | 308 | 5208822 | 5246631 | uncultured archaeon SAGMA-11 | species | silva:AB050241,ncbi:140281 | | | 309 | 5208823 | 5246631 | uncultured archaeon SAGMA-10 | species | silva:AB050240,ncbi:140280 | | | 310 | 5274700 | 5246631 | Order Incertae Sedis | no rank - terminal | silva:AJ535128/#4 | Order Incertae Sedis (in South African Gold Mine Gp 1(SAGMCG-1)) | was_container | 311 | 5574955 | 5246631 | Family Incertae Sedis | no rank - terminal | silva:AJ535128/#5 | Family Incertae Sedis (in South African Gold Mine Gp 1(SAGMCG-1)) | incertae_sedis,was_container | 312 | 4795963 | 5246631 | Candidatus Nitrosotalea | genus | silva:AJ535128/#6,ncbi:1078904 | | incertae_sedis | 313 | 5205311 | 4795963 | Crenarchaeote enrichment culture clone OREC-R1076 | species | silva:JF799651,ncbi:1050484 | | incertae_sedis_inherited | 314 | 5207939 | 4795963 | uncultured archaeon SAGMA-X | species | silva:AB050229,ncbi:140269 | | incertae_sedis_inherited | 315 | 5361996 | 4795963 | Nitrosotalea sp. Nd2 | species | ncbi:1499975 | | incertae_sedis_inherited | 316 | 4795964 | 4795963 | Candidatus Nitrosotalea devanaterra | species | ncbi:1078905 | | incertae_sedis_inherited | 317 | 5878063 | 4795963 | Candidatus Nitrosotalea environmental samples | no rank - terminal | ncbi:1617642 | | incertae_sedis_inherited,was_container | 318 | 5878064 | 4795963 | uncultured Candidatus Nitrosotalea sp. | species | ncbi:1617643 | | incertae_sedis_inherited,not_otu | 319 | 5246626 | 102415 | Marine Benthic Group B | no rank | silva:AB052992/#3 | | | 320 | 5207849 | 5246626 | uncultured archaeon APA3-11cm | species | silva:AF119137,ncbi:93188 | | | 321 | 5207568 | 5246626 | uncultured archaeon pPACMA-Y | species | silva:AB052992,ncbi:146990 | | | 322 | 5205935 | 5246626 | uncultured Desulfurococcus sp. | species | silva:AB240734,ncbi:158791 | | | 323 | 5205316 | 5246626 | uncultured crenarchaeote pBRKC86 | species | silva:AF118656,ncbi:91313 | | | 324 | 5205619 | 5246626 | uncultured crenarchaeote pBRKC129 | species | silva:AF118658,ncbi:91315 | | | 325 | 5208729 | 5246626 | uncultured archaeon VC2.1 Arc31 | species | silva:AF068822,ncbi:78316 | | | 326 | 5205622 | 5246626 | uncultured crenarchaeote pBRKC108 | species | silva:AF118664,ncbi:91318 | | | 327 | 5246650 | 102415 | HDBA-SITS413 | no rank - terminal | silva:HM187524/#3 | | | 328 | 5571543 | 102415 | Order Incertae Sedis | no rank - terminal | silva:EU239993/#4 | Order Incertae Sedis (silva:EU239993/#4) | incertae_sedis,was_container | 329 | 5571544 | 102415 | Family Incertae Sedis | no rank - terminal | silva:EU239993/#5 | Family Incertae Sedis (silva:EU239993/#5) | incertae_sedis,was_container | 330 | 4796256 | 102415 | Candidatus Nitrosocaldus | genus | silva:EU239993/#6,ncbi:498374 | | incertae_sedis | 331 | 4796257 | 4796256 | Candidatus Nitrosocaldus yellowstonii | species | ncbi:498375 | | incertae_sedis_inherited | 332 | 4796258 | 4796257 | Candidatus Nitrosocaldus yellowstonii HL72 | no rank - terminal | ncbi:1268556 | | incertae_sedis_inherited,infraspecific | 333 | 5205891 | 4796256 | Candidatus Nitrosocaldus environmental samples | no rank - terminal | ncbi:766500 | | incertae_sedis_inherited,was_container | 334 | 5205094 | 102415 | Thaumarchaeota environmental samples | no rank - terminal | ncbi:651140 | | was_container | 335 | 4795960 | 102415 | unclassified Thaumarchaeota | no rank - terminal | ncbi:651142 | | was_container,not_otu | 336 | 5571545 | 102415 | Nitrososphaeria | class | ncbi:1643678 | | | 337 | 686295 | 5571545 | Nitrososphaerales | order | ncbi:1033996 | | | 338 | 5878058 | 686295 | Nitrososphaerales environmental samples | no rank - terminal | ncbi:1740633 | | was_container | 339 | 686307 | 686295 | Nitrososphaeraceae | family | ncbi:1033997 | | | 340 | 5878055 | 686307 | Candidatus Nitrosocosmicus | genus | ncbi:1826864 | | | 341 | 5245645 | 5878055 | Thaumarchaeota archaeon MY3 | species | ncbi:1353260 | | | 342 | 5878057 | 5878055 | Candidatus Nitrosocosmicus sp. G61 | species | ncbi:1826872 | | | 343 | 5878056 | 5878055 | Candidatus Nitrosocosmicus franklandus | species | ncbi:1798806 | | | 344 | 5878059 | 686295 | uncultured Nitrososphaerales archaeon | species | ncbi:1740634 | | sibling_higher,not_otu | 345 | 5571817 | 102415 | uncultured marine thaumarchaeote KM3_170_G11 | species | ncbi:1456047 | | environmental,not_otu | 346 | 5572076 | 102415 | uncultured marine thaumarchaeote SAT1000_25_G12 | species | ncbi:1456399 | | environmental,not_otu | 347 | 5572081 | 102415 | uncultured marine thaumarchaeote SAT1000_27_B10 | species | ncbi:1456401 | | environmental,not_otu | 348 | 5205134 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-08 | species | ncbi:1238010 | | environmental | 349 | 5571608 | 102415 | uncultured marine thaumarchaeote KM3_25_B05 | species | ncbi:1456103 | | environmental,not_otu | 350 | 5572091 | 102415 | uncultured marine thaumarchaeote AD1000_21_H05 | species | ncbi:1455901 | | environmental,not_otu | 351 | 5571913 | 102415 | uncultured marine thaumarchaeote KM3_169_D08 | species | ncbi:1456040 | | environmental,not_otu | 352 | 5571779 | 102415 | uncultured marine thaumarchaeote SAT1000_45_H05 | species | ncbi:1456412 | | environmental,not_otu | 353 | 5205232 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-20 | species | ncbi:1238128 | | environmental | 354 | 5205205 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-19 | species | ncbi:1238127 | | environmental | 355 | 5571764 | 102415 | uncultured marine thaumarchaeote KM3_86_D01 | species | ncbi:1456320 | | environmental,not_otu | 356 | 5572010 | 102415 | uncultured marine thaumarchaeote KM3_34_C02 | species | ncbi:1456129 | | environmental,not_otu | 357 | 5205133 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-16 | species | ncbi:1238017 | | environmental | 358 | 5205165 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-13 | species | ncbi:1238099 | | environmental | 359 | 5205209 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-03 | species | ncbi:1238005 | | environmental | 360 | 5571753 | 102415 | uncultured marine thaumarchaeote KM3_28_B05 | species | ncbi:1456111 | | environmental,not_otu | 361 | 5571554 | 102415 | uncultured marine thaumarchaeote KM3_77_H05 | species | ncbi:1456288 | | environmental,not_otu | 362 | 5205196 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-08 | species | ncbi:1238029 | | environmental | 363 | 5572070 | 102415 | uncultured marine thaumarchaeote KM3_01_C08 | species | ncbi:1455951 | | environmental,not_otu | 364 | 5571846 | 102415 | uncultured marine thaumarchaeote KM3_82_G09 | species | ncbi:1456306 | | environmental,not_otu | 365 | 5205252 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-01 | species | ncbi:1237970 | | environmental | 366 | 5571835 | 102415 | uncultured marine thaumarchaeote KM3_02_H10 | species | ncbi:1455957 | | environmental,not_otu | 367 | 5571625 | 102415 | uncultured marine thaumarchaeote KM3_17_G07 | species | ncbi:1456063 | | environmental,not_otu | 368 | 5205168 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-T16S-04 | species | ncbi:1238062 | | environmental | 369 | 5571811 | 102415 | uncultured marine thaumarchaeote KM3_160_B06 | species | ncbi:1456030 | | environmental,not_otu | 370 | 5571635 | 102415 | uncultured marine thaumarchaeote KM3_70_D07 | species | ncbi:1456252 | | environmental,not_otu | 371 | 5205189 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-T16S-02 | species | ncbi:1238060 | | environmental | 372 | 5571866 | 102415 | uncultured marine thaumarchaeote KM3_130_H01 | species | ncbi:1456001 | | environmental,not_otu | 373 | 5571795 | 102415 | uncultured marine thaumarchaeote KM3_100_D10 | species | ncbi:1455979 | | environmental,not_otu | 374 | 5571963 | 102415 | uncultured marine thaumarchaeote KM3_71_H11 | species | ncbi:1456260 | | environmental,not_otu | 375 | 5571758 | 102415 | uncultured marine thaumarchaeote KM3_65_D11 | species | ncbi:1456225 | | environmental,not_otu | 376 | 5571906 | 102415 | uncultured marine thaumarchaeote SAT1000_40_A08 | species | ncbi:1456408 | | environmental,not_otu | 377 | 5571581 | 102415 | uncultured marine thaumarchaeote AD1000_66_F10 | species | ncbi:1455930 | | environmental,not_otu | 378 | 5571949 | 102415 | uncultured marine thaumarchaeote KM3_59_E10 | species | ncbi:1456211 | | environmental,not_otu | 379 | 5571887 | 102415 | uncultured marine thaumarchaeote KM3_94_B01 | species | ncbi:1456346 | | environmental,not_otu | 380 | 5571964 | 102415 | uncultured marine thaumarchaeote KM3_182_G12 | species | ncbi:1456067 | | environmental,not_otu | 381 | 5572029 | 102415 | uncultured marine thaumarchaeote KM3_153_E08 | species | ncbi:1456018 | | environmental,not_otu | 382 | 5205164 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-12 | species | ncbi:1238098 | | environmental | 383 | 5205154 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-23 | species | ncbi:1238107 | | environmental | 384 | 5572037 | 102415 | uncultured marine thaumarchaeote KM3_64_A03 | species | ncbi:1456220 | | environmental,not_otu | 385 | 5205180 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-18 | species | ncbi:1237995 | | environmental | 386 | 5572014 | 102415 | uncultured marine thaumarchaeote KM3_145_B06 | species | ncbi:1456011 | | environmental,not_otu | 387 | 5571995 | 102415 | uncultured marine thaumarchaeote AD1000_70_C10 | species | ncbi:1455933 | | environmental,not_otu | 388 | 5571785 | 102415 | uncultured marine thaumarchaeote SAT1000_24_E05 | species | ncbi:1456397 | | environmental,not_otu | 389 | 5572009 | 102415 | uncultured marine thaumarchaeote KM3_03_B05 | species | ncbi:1455958 | | environmental,not_otu | 390 | 5571907 | 102415 | uncultured marine thaumarchaeote KM3_130_G11 | species | ncbi:1456000 | | environmental,not_otu | 391 | 5572043 | 102415 | uncultured marine thaumarchaeote KM3_41_D10 | species | ncbi:1456144 | | environmental,not_otu | 392 | 5571725 | 102415 | uncultured marine thaumarchaeote KM3_186_G04 | species | ncbi:1456071 | | environmental,not_otu | 393 | 5571703 | 102415 | uncultured marine thaumarchaeote KM3_33_G02 | species | ncbi:1456127 | | environmental,not_otu | 394 | 5571886 | 102415 | uncultured marine thaumarchaeote KM3_76_G12 | species | ncbi:1456285 | | environmental,not_otu | 395 | 5571609 | 102415 | uncultured marine thaumarchaeote KM3_140_D09 | species | ncbi:1456009 | | environmental,not_otu | 396 | 5571599 | 102415 | uncultured marine thaumarchaeote KM3_56_B06 | species | ncbi:1456201 | | environmental,not_otu | 397 | 5572038 | 102415 | uncultured marine thaumarchaeote SAT1000_09_C08 | species | ncbi:1456370 | | environmental,not_otu | 398 | 5571904 | 102415 | uncultured marine thaumarchaeote KM3_03_H02 | species | ncbi:1455963 | | environmental,not_otu | 399 | 5571917 | 102415 | uncultured marine thaumarchaeote AD1000_40_H03 | species | ncbi:1455914 | | environmental,not_otu | 400 | 5571843 | 102415 | uncultured marine thaumarchaeote KM3_35_E05 | species | ncbi:1456134 | | environmental,not_otu | 401 | 5571920 | 102415 | uncultured marine thaumarchaeote KM3_87_A02 | species | ncbi:1456325 | | environmental,not_otu | 402 | 5205229 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-08 | species | ncbi:1238048 | | environmental | 403 | 5571885 | 102415 | uncultured marine thaumarchaeote KM3_18_D11 | species | ncbi:1456075 | | environmental,not_otu | 404 | 5205136 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-10 | species | ncbi:1238012 | | environmental | 405 | 5571601 | 102415 | uncultured marine thaumarchaeote KM3_15_C08 | species | ncbi:1456026 | | environmental,not_otu | 406 | 5205238 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-14 | species | ncbi:1238122 | | environmental | 407 | 5571654 | 102415 | uncultured marine thaumarchaeote KM3_44_G08 | species | ncbi:1456152 | | environmental,not_otu | 408 | 5571799 | 102415 | uncultured thaumarchaeote Rifle_16ft_4_minimus_1872 | species | ncbi:1665209 | | environmental,not_otu | 409 | 5571600 | 102415 | uncultured marine thaumarchaeote SAT1000_44_H06 | species | ncbi:1456411 | | environmental,not_otu | 410 | 5571937 | 102415 | uncultured marine thaumarchaeote KM3_79_H02 | species | ncbi:1456297 | | environmental,not_otu | 411 | 5571877 | 102415 | uncultured marine thaumarchaeote KM3_46_G12 | species | ncbi:1456162 | | environmental,not_otu | 412 | 5571570 | 102415 | uncultured marine thaumarchaeote SAT1000_06_B02 | species | ncbi:1456361 | | environmental,not_otu | 413 | 5205216 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-04 | species | ncbi:1237973 | | environmental | 414 | 5571911 | 102415 | uncultured marine thaumarchaeote KM3_170_F12 | species | ncbi:1456046 | | environmental,not_otu | 415 | 5571822 | 102415 | uncultured marine thaumarchaeote KM3_53_F06 | species | ncbi:1456185 | | environmental,not_otu | 416 | 5571681 | 102415 | uncultured marine thaumarchaeote KM3_53_E01 | species | ncbi:1456183 | | environmental,not_otu | 417 | 5571783 | 102415 | uncultured marine thaumarchaeote KM3_187_A08 | species | ncbi:1456073 | | environmental,not_otu | 418 | 5571759 | 102415 | uncultured marine thaumarchaeote AD1000_31_F12 | species | ncbi:1455906 | | environmental,not_otu | 419 | 5571677 | 102415 | uncultured marine thaumarchaeote SAT1000_06_A02 | species | ncbi:1456359 | | environmental,not_otu | 420 | 5571711 | 102415 | uncultured marine thaumarchaeote KM3_36_B08 | species | ncbi:1456135 | | environmental,not_otu | 421 | 5571693 | 102415 | uncultured marine thaumarchaeote KM3_85_A07 | species | ncbi:1456315 | | environmental,not_otu | 422 | 5571766 | 102415 | uncultured marine thaumarchaeote KM3_76_B07 | species | ncbi:1456283 | | environmental,not_otu | 423 | 5205167 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-03 | species | ncbi:1238068 | | environmental | 424 | 5571586 | 102415 | uncultured marine thaumarchaeote KM3_153_F09 | species | ncbi:1456019 | | environmental,not_otu | 425 | 5571925 | 102415 | uncultured marine thaumarchaeote KM3_39_A11 | species | ncbi:1456140 | | environmental,not_otu | 426 | 5571710 | 102415 | uncultured marine thaumarchaeote KM3_01_G08 | species | ncbi:1455954 | | environmental,not_otu | 427 | 5571585 | 102415 | uncultured marine thaumarchaeote KM3_175_E11 | species | ncbi:1456055 | | environmental,not_otu | 428 | 5571837 | 102415 | uncultured marine thaumarchaeote KM3_204_F10 | species | ncbi:1456098 | | environmental,not_otu | 429 | 5571592 | 102415 | uncultured marine thaumarchaeote KM3_90_H07 | species | ncbi:1456345 | | environmental,not_otu | 430 | 5571567 | 102415 | uncultured marine thaumarchaeote AD1000_38_A02 | species | ncbi:1455911 | | environmental,not_otu | 431 | 5571560 | 102415 | uncultured marine thaumarchaeote AD1000_18_B11 | species | ncbi:1455896 | | environmental,not_otu | 432 | 5205127 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-10 | species | ncbi:1237989 | | environmental | 433 | 5571615 | 102415 | uncultured marine thaumarchaeote AD1000_07_E11 | species | ncbi:1455886 | | environmental,not_otu | 434 | 5571928 | 102415 | uncultured marine thaumarchaeote KM3_62_H05 | species | ncbi:1456218 | | environmental,not_otu | 435 | 5571952 | 102415 | uncultured marine thaumarchaeote AD1000_31_G03 | species | ncbi:1455907 | | environmental,not_otu | 436 | 5205178 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-12 | species | ncbi:1237991 | | environmental | 437 | 5571798 | 102415 | uncultured marine thaumarchaeote KM3_42_G11 | species | ncbi:1456149 | | environmental,not_otu | 438 | 5571992 | 102415 | uncultured marine thaumarchaeote AD1000_69_E02 | species | ncbi:1455932 | | environmental,not_otu | 439 | 5571744 | 102415 | uncultured marine thaumarchaeote KM3_156_B03 | species | ncbi:1456022 | | environmental,not_otu | 440 | 5205193 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-02 | species | ncbi:1238042 | | environmental | 441 | 5205104 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-05 | species | ncbi:1238070 | | environmental | 442 | 5205249 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-24 | species | ncbi:1238088 | | environmental | 443 | 5205117 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-T16S-01 | species | ncbi:1238059 | | environmental | 444 | 5571648 | 102415 | uncultured marine thaumarchaeote KM3_75_C09 | species | ncbi:1456277 | | environmental,not_otu | 445 | 5571563 | 102415 | uncultured marine thaumarchaeote KM3_72_A09 | species | ncbi:1456261 | | environmental,not_otu | 446 | 5571856 | 102415 | uncultured marine thaumarchaeote KM3_72_D04 | species | ncbi:1456262 | | environmental,not_otu | 447 | 5571734 | 102415 | uncultured marine thaumarchaeote AD1000_71_D06 | species | ncbi:1455937 | | environmental,not_otu | 448 | 5571809 | 102415 | uncultured marine thaumarchaeote KM3_45_E05 | species | ncbi:1456156 | | environmental,not_otu | 449 | 5205101 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-02 | species | ncbi:1238023 | | environmental | 450 | 5571980 | 102415 | uncultured marine thaumarchaeote KM3_201_G04 | species | ncbi:1456094 | | environmental,not_otu | 451 | 5571714 | 102415 | uncultured marine thaumarchaeote SAT1000_04_B06 | species | ncbi:1456354 | | environmental,not_otu | 452 | 5205241 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-18 | species | ncbi:1238057 | | environmental | 453 | 5571660 | 102415 | uncultured marine thaumarchaeote KM3_176_H06 | species | ncbi:1456057 | | environmental,not_otu | 454 | 5572093 | 102415 | uncultured marine thaumarchaeote KM3_178_G06 | species | ncbi:1456060 | | environmental,not_otu | 455 | 5571722 | 102415 | uncultured marine thaumarchaeote KM3_66_A07 | species | ncbi:1456227 | | environmental,not_otu | 456 | 5571924 | 102415 | uncultured marine thaumarchaeote KM3_67_E09 | species | ncbi:1456238 | | environmental,not_otu | 457 | 5571602 | 102415 | uncultured marine thaumarchaeote KM3_40_F03 | species | ncbi:1456143 | | environmental,not_otu | 458 | 5571639 | 102415 | uncultured marine thaumarchaeote SAT1000_09_H09 | species | ncbi:1456371 | | environmental,not_otu | 459 | 5205224 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-01 | species | ncbi:1238109 | | environmental | 460 | 5571627 | 102415 | uncultured marine thaumarchaeote KM3_61_F01 | species | ncbi:1456213 | | environmental,not_otu | 461 | 5571595 | 102415 | uncultured marine thaumarchaeote KM3_23_F10 | species | ncbi:1456100 | | environmental,not_otu | 462 | 5571851 | 102415 | uncultured marine thaumarchaeote KM3_52_H04 | species | ncbi:1456178 | | environmental,not_otu | 463 | 5571808 | 102415 | uncultured marine thaumarchaeote KM3_135_A07 | species | ncbi:1456003 | | environmental,not_otu | 464 | 5571998 | 102415 | uncultured marine thaumarchaeote KM3_168_C12 | species | ncbi:1456038 | | environmental,not_otu | 465 | 5571884 | 102415 | uncultured marine thaumarchaeote KM3_05_F10 | species | ncbi:1455969 | | environmental,not_otu | 466 | 5571634 | 102415 | uncultured marine thaumarchaeote KM3_88_E12 | species | ncbi:1456336 | | environmental,not_otu | 467 | 5571664 | 102415 | uncultured marine thaumarchaeote KM3_84_A09 | species | ncbi:1456310 | | environmental,not_otu | 468 | 5571978 | 102415 | uncultured marine thaumarchaeote KM3_04_H11 | species | ncbi:1455968 | | environmental,not_otu | 469 | 5571938 | 102415 | uncultured marine thaumarchaeote KM3_25_D06 | species | ncbi:1456104 | | environmental,not_otu | 470 | 5571671 | 102415 | uncultured marine thaumarchaeote KM3_85_H09 | species | ncbi:1456318 | | environmental,not_otu | 471 | 5571854 | 102415 | uncultured marine thaumarchaeote SAT1000_06_F08 | species | ncbi:1456362 | | environmental,not_otu | 472 | 5571985 | 102415 | uncultured marine thaumarchaeote KM3_90_E04 | species | ncbi:1456343 | | environmental,not_otu | 473 | 5571981 | 102415 | uncultured marine thaumarchaeote KM3_04_E09 | species | ncbi:1455966 | | environmental,not_otu | 474 | 5572024 | 102415 | uncultured marine thaumarchaeote KM3_162_C12 | species | ncbi:1456032 | | environmental,not_otu | 475 | 5571721 | 102415 | uncultured marine thaumarchaeote SAT1000_50_F07 | species | ncbi:1456417 | | environmental,not_otu | 476 | 5571598 | 102415 | uncultured marine thaumarchaeote SAT1000_12_D12 | species | ncbi:1456378 | | environmental,not_otu | 477 | 5571643 | 102415 | uncultured marine thaumarchaeote KM3_67_B10 | species | ncbi:1456233 | | environmental,not_otu | 478 | 5571742 | 102415 | uncultured marine thaumarchaeote KM3_01_F02 | species | ncbi:1455952 | | environmental,not_otu | 479 | 5571919 | 102415 | uncultured marine thaumarchaeote KM3_73_B11 | species | ncbi:1456265 | | environmental,not_otu | 480 | 5571966 | 102415 | uncultured marine thaumarchaeote KM3_191_D11 | species | ncbi:1456079 | | environmental,not_otu | 481 | 5205144 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-04 | species | ncbi:1238112 | | environmental | 482 | 5205114 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-15 | species | ncbi:1238054 | | environmental | 483 | 5571984 | 102415 | uncultured marine thaumarchaeote KM3_48_E01 | species | ncbi:1456170 | | environmental,not_otu | 484 | 5571745 | 102415 | uncultured marine thaumarchaeote KM3_198_G09 | species | ncbi:1456089 | | environmental,not_otu | 485 | 5572054 | 102415 | uncultured marine thaumarchaeote AD1000_45_G09 | species | ncbi:1455919 | | environmental,not_otu | 486 | 5571673 | 102415 | uncultured marine thaumarchaeote KM3_73_E02 | species | ncbi:1456267 | | environmental,not_otu | 487 | 5571967 | 102415 | uncultured marine thaumarchaeote KM3_136_D12 | species | ncbi:1456005 | | environmental,not_otu | 488 | 5571596 | 102415 | uncultured marine thaumarchaeote KM3_34_B07 | species | ncbi:1456128 | | environmental,not_otu | 489 | 5205125 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-19 | species | ncbi:1238039 | | environmental | 490 | 5205179 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-13 | species | ncbi:1237992 | | environmental | 491 | 5571613 | 102415 | uncultured marine thaumarchaeote KM3_161_D03 | species | ncbi:1456031 | | environmental,not_otu | 492 | 5571584 | 102415 | uncultured marine thaumarchaeote KM3_86_F11 | species | ncbi:1456322 | | environmental,not_otu | 493 | 5572074 | 102415 | uncultured marine thaumarchaeote KM3_199_E03 | species | ncbi:1456092 | | environmental,not_otu | 494 | 5571786 | 102415 | uncultured marine thaumarchaeote KM3_05_H01 | species | ncbi:1455971 | | environmental,not_otu | 495 | 5205246 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-17 | species | ncbi:1238081 | | environmental | 496 | 5571804 | 102415 | uncultured marine thaumarchaeote SAT1000_48_C08 | species | ncbi:1456415 | | environmental,not_otu | 497 | 5571773 | 102415 | uncultured marine thaumarchaeote AD1000_14_H02 | species | ncbi:1455893 | | environmental,not_otu | 498 | 5205146 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-06 | species | ncbi:1238114 | | environmental | 499 | 5205187 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-01 | species | ncbi:1238066 | | environmental | 500 | 5572068 | 102415 | uncultured marine thaumarchaeote KM3_32_G12 | species | ncbi:1456124 | | environmental,not_otu | 501 | 5572078 | 102415 | uncultured marine thaumarchaeote KM3_162_H04 | species | ncbi:1456033 | | environmental,not_otu | 502 | 5571990 | 102415 | uncultured marine thaumarchaeote SAT1000_05_A05 | species | ncbi:1456356 | | environmental,not_otu | 503 | 5572025 | 102415 | uncultured marine thaumarchaeote AD1000_41_B03 | species | ncbi:1455915 | | environmental,not_otu | 504 | 5571999 | 102415 | uncultured marine thaumarchaeote KM3_195_B01 | species | ncbi:1456083 | | environmental,not_otu | 505 | 5205230 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-20 | species | ncbi:1238040 | | environmental | 506 | 5571894 | 102415 | uncultured marine thaumarchaeote KM3_53_C08 | species | ncbi:1456181 | | environmental,not_otu | 507 | 5571674 | 102415 | uncultured marine thaumarchaeote KM3_103_A05 | species | ncbi:1455980 | | environmental,not_otu | 508 | 5572072 | 102415 | uncultured marine thaumarchaeote AD1000_14_F02 | species | ncbi:1455892 | | environmental,not_otu | 509 | 5205143 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-05 | species | ncbi:1238113 | | environmental | 510 | 5572031 | 102415 | uncultured marine thaumarchaeote KM3_84_E02 | species | ncbi:1456312 | | environmental,not_otu | 511 | 5205152 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-17 | species | ncbi:1237994 | | environmental | 512 | 5205158 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-10 | species | ncbi:1238096 | | environmental | 513 | 5572089 | 102415 | uncultured marine thaumarchaeote KM3_54_C03 | species | ncbi:1456190 | | environmental,not_otu | 514 | 5571872 | 102415 | uncultured marine thaumarchaeote KM3_08_B06 | species | ncbi:1455978 | | environmental,not_otu | 515 | 5571922 | 102415 | uncultured marine thaumarchaeote KM3_16_C10 | species | ncbi:1456042 | | environmental,not_otu | 516 | 5571876 | 102415 | uncultured marine thaumarchaeote KM3_12_C10 | species | ncbi:1455998 | | environmental,not_otu | 517 | 5571750 | 102415 | uncultured marine thaumarchaeote KM3_55_A12 | species | ncbi:1456196 | | environmental,not_otu | 518 | 5571571 | 102415 | uncultured marine thaumarchaeote SAT1000_09_B07 | species | ncbi:1456367 | | environmental,not_otu | 519 | 5571629 | 102415 | uncultured marine thaumarchaeote AD1000_69_B10 | species | ncbi:1455931 | | environmental,not_otu | 520 | 5571988 | 102415 | uncultured marine thaumarchaeote KM3_196_E01 | species | ncbi:1456085 | | environmental,not_otu | 521 | 5205124 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-18 | species | ncbi:1238038 | | environmental | 522 | 5571797 | 102415 | uncultured marine thaumarchaeote AD1000_36_B08 | species | ncbi:1455910 | | environmental,not_otu | 523 | 5571882 | 102415 | uncultured marine thaumarchaeote KM3_26_F01 | species | ncbi:1456108 | | environmental,not_otu | 524 | 5571604 | 102415 | uncultured marine thaumarchaeote KM3_56_C06 | species | ncbi:1456202 | | environmental,not_otu | 525 | 5205113 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-14 | species | ncbi:1238053 | | environmental | 526 | 5572059 | 102415 | uncultured marine thaumarchaeote SAT1000_15_E07 | species | ncbi:1456385 | | environmental,not_otu | 527 | 5571781 | 102415 | uncultured marine thaumarchaeote KM3_53_E03 | species | ncbi:1456184 | | environmental,not_otu | 528 | 5571736 | 102415 | uncultured marine thaumarchaeote KM3_41_D11 | species | ncbi:1456145 | | environmental,not_otu | 529 | 5571676 | 102415 | uncultured marine thaumarchaeote KM3_128_G11 | species | ncbi:1455996 | | environmental,not_otu | 530 | 5205110 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-11 | species | ncbi:1238051 | | environmental | 531 | 5571723 | 102415 | uncultured marine thaumarchaeote KM3_188_F10 | species | ncbi:1456074 | | environmental,not_otu | 532 | 5571620 | 102415 | uncultured marine thaumarchaeote AD1000_54_F06 | species | ncbi:1455925 | | environmental,not_otu | 533 | 5571945 | 102415 | uncultured marine thaumarchaeote KM3_26_G04 | species | ncbi:1456109 | | environmental,not_otu | 534 | 5571640 | 102415 | uncultured marine thaumarchaeote KM3_175_G11 | species | ncbi:1456056 | | environmental,not_otu | 535 | 5205151 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-08 | species | ncbi:1237977 | | environmental | 536 | 5571819 | 102415 | uncultured marine thaumarchaeote KM3_15_F02 | species | ncbi:1456029 | | environmental,not_otu | 537 | 5572087 | 102415 | uncultured marine thaumarchaeote KM3_51_F10 | species | ncbi:1456175 | | environmental,not_otu | 538 | 5571796 | 102415 | uncultured marine thaumarchaeote KM3_195_B03 | species | ncbi:1456084 | | environmental,not_otu | 539 | 5572019 | 102415 | uncultured marine thaumarchaeote AD1000_39_D02 | species | ncbi:1455912 | | environmental,not_otu | 540 | 5571842 | 102415 | uncultured marine thaumarchaeote KM3_88_D06 | species | ncbi:1456334 | | environmental,not_otu | 541 | 5205176 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-14 | species | ncbi:1237993 | | environmental | 542 | 5571829 | 102415 | uncultured marine thaumarchaeote KM3_57_F01 | species | ncbi:1456208 | | environmental,not_otu | 543 | 5572047 | 102415 | uncultured marine thaumarchaeote KM3_31_F03 | species | ncbi:1456119 | | environmental,not_otu | 544 | 5205116 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-19 | species | ncbi:1238058 | | environmental | 545 | 5571898 | 102415 | uncultured marine thaumarchaeote KM3_03_H03 | species | ncbi:1455964 | | environmental,not_otu | 546 | 5205185 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-T16S-06 | species | ncbi:1238064 | | environmental | 547 | 5571942 | 102415 | uncultured marine thaumarchaeote KM3_85_C11 | species | ncbi:1456316 | | environmental,not_otu | 548 | 5571977 | 102415 | uncultured marine thaumarchaeote KM3_47_F06 | species | ncbi:1456168 | | environmental,not_otu | 549 | 5205111 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-10 | species | ncbi:1238050 | | environmental | 550 | 5571617 | 102415 | uncultured marine thaumarchaeote AD1000_106_A06 | species | ncbi:1455888 | | environmental,not_otu | 551 | 5571713 | 102415 | uncultured marine thaumarchaeote SAT1000_06_A07 | species | ncbi:1456360 | | environmental,not_otu | 552 | 5205121 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-13 | species | ncbi:1238033 | | environmental | 553 | 5572062 | 102415 | uncultured marine thaumarchaeote KM3_06_C02 | species | ncbi:1455976 | | environmental,not_otu | 554 | 5572064 | 102415 | uncultured marine thaumarchaeote KM3_33_B12 | species | ncbi:1456125 | | environmental,not_otu | 555 | 5571761 | 102415 | uncultured marine thaumarchaeote KM3_15_A07 | species | ncbi:1456025 | | environmental,not_otu | 556 | 5571860 | 102415 | uncultured marine thaumarchaeote KM3_11_E10 | species | ncbi:1455991 | | environmental,not_otu | 557 | 5205115 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-17 | species | ncbi:1238056 | | environmental | 558 | 5571770 | 102415 | uncultured marine thaumarchaeote KM3_89_A10 | species | ncbi:1456338 | | environmental,not_otu | 559 | 5571719 | 102415 | uncultured marine thaumarchaeote KM3_38_E04 | species | ncbi:1456139 | | environmental,not_otu | 560 | 5572053 | 102415 | uncultured marine thaumarchaeote KM3_82_D11 | species | ncbi:1456305 | | environmental,not_otu | 561 | 5571918 | 102415 | uncultured marine thaumarchaeote KM3_24_H04 | species | ncbi:1456101 | | environmental,not_otu | 562 | 5571915 | 102415 | uncultured marine thaumarchaeote KM3_73_F02 | species | ncbi:1456268 | | environmental,not_otu | 563 | 5572033 | 102415 | uncultured marine thaumarchaeote KM3_87_C09 | species | ncbi:1456327 | | environmental,not_otu | 564 | 5571874 | 102415 | uncultured marine thaumarchaeote KM3_105_E03 | species | ncbi:1455981 | | environmental,not_otu | 565 | 5205129 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-18 | species | ncbi:1238019 | | environmental | 566 | 5571838 | 102415 | uncultured marine thaumarchaeote AD1000_46_F05 | species | ncbi:1455921 | | environmental,not_otu | 567 | 5205217 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-02 | species | ncbi:1237971 | | environmental | 568 | 5205244 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-21 | species | ncbi:1238085 | | environmental | 569 | 5205099 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-04 | species | ncbi:1237985 | | environmental | 570 | 5571557 | 102415 | uncultured marine thaumarchaeote KM3_71_H03 | species | ncbi:1456259 | | environmental,not_otu | 571 | 5571960 | 102415 | uncultured marine thaumarchaeote AD1000_80_D11 | species | ncbi:1455942 | | environmental,not_otu | 572 | 5571688 | 102415 | uncultured marine thaumarchaeote KM3_73_E01 | species | ncbi:1456266 | | environmental,not_otu | 573 | 5572039 | 102415 | uncultured marine thaumarchaeote KM3_53_F08 | species | ncbi:1456186 | | environmental,not_otu | 574 | 5571566 | 102415 | uncultured marine thaumarchaeote KM3_70_E10 | species | ncbi:1456254 | | environmental,not_otu | 575 | 5205157 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-06 | species | ncbi:1238094 | | environmental | 576 | 5572058 | 102415 | uncultured marine thaumarchaeote KM3_16_E08 | species | ncbi:1456044 | | environmental,not_otu | 577 | 5571982 | 102415 | uncultured marine thaumarchaeote AD1000_100_C06 | species | ncbi:1455887 | | environmental,not_otu | 578 | 5571923 | 102415 | uncultured marine thaumarchaeote AD1000_70_G10 | species | ncbi:1455934 | | environmental,not_otu | 579 | 5572060 | 102415 | uncultured marine thaumarchaeote KM3_168_C06 | species | ncbi:1456037 | | environmental,not_otu | 580 | 5205095 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-02 | species | ncbi:1237983 | | environmental | 581 | 5572083 | 102415 | uncultured marine thaumarchaeote SAT1000_22_C02 | species | ncbi:1456394 | | environmental,not_otu | 582 | 5205255 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-22 | species | ncbi:1237999 | | environmental | 583 | 5571547 | 102415 | uncultured marine thaumarchaeote KM3_70_F07 | species | ncbi:1456256 | | environmental,not_otu | 584 | 5572088 | 102415 | uncultured marine thaumarchaeote KM3_06_B11 | species | ncbi:1455975 | | environmental,not_otu | 585 | 5571852 | 102415 | uncultured marine thaumarchaeote AD1000_19_G07 | species | ncbi:1455897 | | environmental,not_otu | 586 | 5571864 | 102415 | uncultured marine thaumarchaeote KM3_45_A02 | species | ncbi:1456154 | | environmental,not_otu | 587 | 5572018 | 102415 | uncultured marine thaumarchaeote KM3_74_C01 | species | ncbi:1456269 | | environmental,not_otu | 588 | 5205126 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-17 | species | ncbi:1238018 | | environmental | 589 | 5571805 | 102415 | uncultured marine thaumarchaeote KM3_179_B04 | species | ncbi:1456061 | | environmental,not_otu | 590 | 5571606 | 102415 | uncultured marine thaumarchaeote KM3_200_B02 | species | ncbi:1456093 | | environmental,not_otu | 591 | 5571767 | 102415 | uncultured marine thaumarchaeote AD1000_06_F06 | species | ncbi:1455885 | | environmental,not_otu | 592 | 5571661 | 102415 | uncultured marine thaumarchaeote KM3_79_H09 | species | ncbi:1456299 | | environmental,not_otu | 593 | 5571612 | 102415 | uncultured marine thaumarchaeote KM3_95_D02 | species | ncbi:1456347 | | environmental,not_otu | 594 | 5205097 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-08 | species | ncbi:1237987 | | environmental | 595 | 5571950 | 102415 | uncultured marine thaumarchaeote AD1000_79_B02 | species | ncbi:1455941 | | environmental,not_otu | 596 | 5205221 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-18 | species | ncbi:1238102 | | environmental | 597 | 5571827 | 102415 | uncultured marine thaumarchaeote SAT1000_27_H05 | species | ncbi:1456402 | | environmental,not_otu | 598 | 5571587 | 102415 | uncultured marine thaumarchaeote KM3_65_A09 | species | ncbi:1456223 | | environmental,not_otu | 599 | 5205096 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-13 | species | ncbi:1237982 | | environmental | 600 | 5571769 | 102415 | uncultured marine thaumarchaeote KM3_55_G04 | species | ncbi:1456199 | | environmental,not_otu | 601 | 5571791 | 102415 | uncultured marine thaumarchaeote KM3_29_F10 | species | ncbi:1456114 | | environmental,not_otu | 602 | 5572001 | 102415 | uncultured marine thaumarchaeote KM3_46_G10 | species | ncbi:1456161 | | environmental,not_otu | 603 | 5571562 | 102415 | uncultured marine thaumarchaeote AD1000_44_B05 | species | ncbi:1455917 | | environmental,not_otu | 604 | 5571593 | 102415 | uncultured marine thaumarchaeote KM3_115_A11 | species | ncbi:1455988 | | environmental,not_otu | 605 | 5571971 | 102415 | uncultured marine thaumarchaeote KM3_201_G07 | species | ncbi:1456095 | | environmental,not_otu | 606 | 5571706 | 102415 | uncultured marine thaumarchaeote KM3_79_E03 | species | ncbi:1456296 | | environmental,not_otu | 607 | 5205199 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-19 | species | ncbi:1238020 | | environmental | 608 | 5571738 | 102415 | uncultured marine thaumarchaeote KM3_197_B03 | species | ncbi:1456087 | | environmental,not_otu | 609 | 5571741 | 102415 | uncultured marine thaumarchaeote KM3_156_A10 | species | ncbi:1456021 | | environmental,not_otu | 610 | 5572084 | 102415 | uncultured marine thaumarchaeote KM3_70_B05 | species | ncbi:1456248 | | environmental,not_otu | 611 | 5571863 | 102415 | uncultured marine thaumarchaeote KM3_29_B12 | species | ncbi:1456113 | | environmental,not_otu | 612 | 5571631 | 102415 | uncultured marine thaumarchaeote KM3_43_G12 | species | ncbi:1456151 | | environmental,not_otu | 613 | 5205132 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-15 | species | ncbi:1238016 | | environmental | 614 | 5572005 | 102415 | uncultured marine thaumarchaeote KM3_30_B02 | species | ncbi:1456115 | | environmental,not_otu | 615 | 5571792 | 102415 | uncultured marine thaumarchaeote KM3_172_D05 | species | ncbi:1456048 | | environmental,not_otu | 616 | 5571855 | 102415 | uncultured marine thaumarchaeote KM3_74_G04 | species | ncbi:1456274 | | environmental,not_otu | 617 | 5205233 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-16 | species | ncbi:1238124 | | environmental | 618 | 5205109 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-11 | species | ncbi:1238075 | | environmental | 619 | 5571979 | 102415 | uncultured marine thaumarchaeote SAT1000_23_E11 | species | ncbi:1456396 | | environmental,not_otu | 620 | 5571836 | 102415 | uncultured marine thaumarchaeote KM3_67_E02 | species | ncbi:1456235 | | environmental,not_otu | 621 | 5205135 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-09 | species | ncbi:1238011 | | environmental | 622 | 5571832 | 102415 | uncultured marine thaumarchaeote KM3_04_H06 | species | ncbi:1455967 | | environmental,not_otu | 623 | 5571968 | 102415 | uncultured marine thaumarchaeote KM3_86_B06 | species | ncbi:1456319 | | environmental,not_otu | 624 | 5571707 | 102415 | uncultured marine thaumarchaeote KM3_197_F10 | species | ncbi:1456088 | | environmental,not_otu | 625 | 5205141 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-03 | species | ncbi:1238111 | | environmental | 626 | 5205140 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-10 | species | ncbi:1238118 | | environmental | 627 | 5572013 | 102415 | uncultured marine thaumarchaeote SAT1000_17_H05 | species | ncbi:1456390 | | environmental,not_otu | 628 | 5572008 | 102415 | uncultured marine thaumarchaeote KM3_10_C07 | species | ncbi:1455986 | | environmental,not_otu | 629 | 5572021 | 102415 | uncultured marine thaumarchaeote KM3_62_H02 | species | ncbi:1456217 | | environmental,not_otu | 630 | 5572077 | 102415 | uncultured marine thaumarchaeote KM3_64_C01 | species | ncbi:1456221 | | environmental,not_otu | 631 | 5571987 | 102415 | uncultured marine thaumarchaeote SAT1000_51_C10 | species | ncbi:1456419 | | environmental,not_otu | 632 | 5571729 | 102415 | uncultured marine thaumarchaeote KM3_151_D03 | species | ncbi:1456016 | | environmental,not_otu | 633 | 5205182 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-20 | species | ncbi:1237997 | | environmental | 634 | 5571825 | 102415 | uncultured marine thaumarchaeote SAT1000_25_G11 | species | ncbi:1456398 | | environmental,not_otu | 635 | 5205198 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-20 | species | ncbi:1238021 | | environmental | 636 | 5572046 | 102415 | uncultured marine thaumarchaeote KM3_53_D03 | species | ncbi:1456182 | | environmental,not_otu | 637 | 5205142 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-02 | species | ncbi:1238110 | | environmental | 638 | 5571954 | 102415 | uncultured marine thaumarchaeote KM3_190_B07 | species | ncbi:1456077 | | environmental,not_otu | 639 | 5572073 | 102415 | uncultured marine thaumarchaeote KM3_40_C06 | species | ncbi:1456142 | | environmental,not_otu | 640 | 5205254 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-11 | species | ncbi:1237990 | | environmental | 641 | 5571972 | 102415 | uncultured marine thaumarchaeote KM3_106_G11 | species | ncbi:1455984 | | environmental,not_otu | 642 | 5205212 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-24 | species | ncbi:1238001 | | environmental | 643 | 5571733 | 102415 | uncultured marine thaumarchaeote KM3_87_C05 | species | ncbi:1456326 | | environmental,not_otu | 644 | 5205240 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-16 | species | ncbi:1238036 | | environmental | 645 | 5571776 | 102415 | uncultured marine thaumarchaeote KM3_163_G06 | species | ncbi:1456034 | | environmental,not_otu | 646 | 5571644 | 102415 | uncultured marine thaumarchaeote KM3_54_F04 | species | ncbi:1456191 | | environmental,not_otu | 647 | 5571623 | 102415 | uncultured marine thaumarchaeote AD1000_98_C07 | species | ncbi:1455949 | | environmental,not_otu | 648 | 5572000 | 102415 | uncultured marine thaumarchaeote KM3_81_E07 | species | ncbi:1456300 | | environmental,not_otu | 649 | 5571555 | 102415 | uncultured marine thaumarchaeote SAT1000_07_E02 | species | ncbi:1456363 | | environmental,not_otu | 650 | 5572004 | 102415 | uncultured marine thaumarchaeote AD1000_46_C12 | species | ncbi:1455920 | | environmental,not_otu | 651 | 5571908 | 102415 | uncultured marine thaumarchaeote KM3_31_F07 | species | ncbi:1456120 | | environmental,not_otu | 652 | 5571572 | 102415 | uncultured marine thaumarchaeote KM3_12_B11 | species | ncbi:1455997 | | environmental,not_otu | 653 | 5571970 | 102415 | uncultured marine thaumarchaeote KM3_40_A11 | species | ncbi:1456141 | | environmental,not_otu | 654 | 5572027 | 102415 | uncultured marine thaumarchaeote KM3_54_G11 | species | ncbi:1456193 | | environmental,not_otu | 655 | 5205211 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-01 | species | ncbi:1238003 | | environmental | 656 | 5205169 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-22 | species | ncbi:1238106 | | environmental | 657 | 5571873 | 102415 | uncultured marine thaumarchaeote KM3_167_H09 | species | ncbi:1456036 | | environmental,not_otu | 658 | 5571642 | 102415 | uncultured marine thaumarchaeote KM3_87_F05 | species | ncbi:1456329 | | environmental,not_otu | 659 | 5571868 | 102415 | uncultured marine thaumarchaeote KM3_61_F08 | species | ncbi:1456214 | | environmental,not_otu | 660 | 5571989 | 102415 | uncultured marine thaumarchaeote KM3_86_G02 | species | ncbi:1456323 | | environmental,not_otu | 661 | 5572030 | 102415 | uncultured marine thaumarchaeote SAT1000_07_E05 | species | ncbi:1456364 | | environmental,not_otu | 662 | 5572028 | 102415 | uncultured marine thaumarchaeote KM3_75_F03 | species | ncbi:1456279 | | environmental,not_otu | 663 | 5571774 | 102415 | uncultured marine thaumarchaeote AD1000_17_C04 | species | ncbi:1455895 | | environmental,not_otu | 664 | 5572049 | 102415 | uncultured marine thaumarchaeote KM3_46_D02 | species | ncbi:1456158 | | environmental,not_otu | 665 | 5572063 | 102415 | uncultured marine thaumarchaeote SAT1000_41_C02 | species | ncbi:1456409 | | environmental,not_otu | 666 | 5205106 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-12 | species | ncbi:1238076 | | environmental | 667 | 5571965 | 102415 | uncultured marine thaumarchaeote KM3_90_C11 | species | ncbi:1456342 | | environmental,not_otu | 668 | 5571892 | 102415 | uncultured marine thaumarchaeote KM3_32_D07 | species | ncbi:1456123 | | environmental,not_otu | 669 | 5205202 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-03 | species | ncbi:1238024 | | environmental | 670 | 5571605 | 102415 | uncultured marine thaumarchaeote SAT1000_53_A12 | species | ncbi:1456422 | | environmental,not_otu | 671 | 5205156 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-15 | species | ncbi:1238079 | | environmental | 672 | 5571951 | 102415 | uncultured marine thaumarchaeote AD1000_06_A03 | species | ncbi:1455884 | | environmental,not_otu | 673 | 5205155 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-15 | species | ncbi:1238101 | | environmental | 674 | 5205215 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-03 | species | ncbi:1237972 | | environmental | 675 | 5571579 | 102415 | uncultured marine thaumarchaeote KM3_54_A10 | species | ncbi:1456188 | | environmental,not_otu | 676 | 5571561 | 102415 | uncultured marine thaumarchaeote KM3_87_E01 | species | ncbi:1456328 | | environmental,not_otu | 677 | 5571958 | 102415 | uncultured marine thaumarchaeote SAT1000_09_A04 | species | ncbi:1456366 | | environmental,not_otu | 678 | 5571726 | 102415 | uncultured marine thaumarchaeote KM3_01_F07 | species | ncbi:1455953 | | environmental,not_otu | 679 | 5205181 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-19 | species | ncbi:1237996 | | environmental | 680 | 5205105 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-06 | species | ncbi:1238071 | | environmental | 681 | 5205197 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-07 | species | ncbi:1238028 | | environmental | 682 | 5571901 | 102415 | uncultured marine thaumarchaeote SAT1000_10_H08 | species | ncbi:1456376 | | environmental,not_otu | 683 | 5572022 | 102415 | uncultured marine thaumarchaeote KM3_70_D05 | species | ncbi:1456251 | | environmental,not_otu | 684 | 5571879 | 102415 | uncultured marine thaumarchaeote KM3_106_B12 | species | ncbi:1455982 | | environmental,not_otu | 685 | 5205138 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-09 | species | ncbi:1238095 | | environmental | 686 | 5571755 | 102415 | uncultured marine thaumarchaeote KM3_137_A09 | species | ncbi:1456006 | | environmental,not_otu | 687 | 5205145 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-07 | species | ncbi:1238115 | | environmental | 688 | 5571683 | 102415 | uncultured marine thaumarchaeote AD1000_12_H07 | species | ncbi:1455891 | | environmental,not_otu | 689 | 5571633 | 102415 | uncultured marine thaumarchaeote KM3_146_F08 | species | ncbi:1456012 | | environmental,not_otu | 690 | 5571793 | 102415 | uncultured marine thaumarchaeote KM3_13_C11 | species | ncbi:1456007 | | environmental,not_otu | 691 | 5572055 | 102415 | uncultured marine thaumarchaeote KM3_18_F10 | species | ncbi:1456076 | | environmental,not_otu | 692 | 5572092 | 102415 | uncultured marine thaumarchaeote KM3_35_A11 | species | ncbi:1456130 | | environmental,not_otu | 693 | 5571784 | 102415 | uncultured marine thaumarchaeote SAT1000_13_B06 | species | ncbi:1456381 | | environmental,not_otu | 694 | 5571651 | 102415 | uncultured marine thaumarchaeote KM3_55_A11 | species | ncbi:1456195 | | environmental,not_otu | 695 | 5571883 | 102415 | uncultured marine thaumarchaeote AD1000_02_C08 | species | ncbi:1455880 | | environmental,not_otu | 696 | 5571941 | 102415 | uncultured marine thaumarchaeote SAT1000_35_E09 | species | ncbi:1456405 | | environmental,not_otu | 697 | 5571765 | 102415 | uncultured marine thaumarchaeote KM3_88_G03 | species | ncbi:1456337 | | environmental,not_otu | 698 | 5571802 | 102415 | uncultured marine thaumarchaeote KM3_157_H08 | species | ncbi:1456023 | | environmental,not_otu | 699 | 5571902 | 102415 | uncultured marine thaumarchaeote AD1000_30_G09 | species | ncbi:1455905 | | environmental,not_otu | 700 | 5571824 | 102415 | uncultured marine thaumarchaeote KM3_31_D02 | species | ncbi:1456117 | | environmental,not_otu | 701 | 5572002 | 102415 | uncultured marine thaumarchaeote KM3_181_D10 | species | ncbi:1456064 | | environmental,not_otu | 702 | 5205128 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-09 | species | ncbi:1237988 | | environmental | 703 | 5571686 | 102415 | uncultured marine thaumarchaeote KM3_88_B06 | species | ncbi:1456332 | | environmental,not_otu | 704 | 5205108 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-10 | species | ncbi:1238074 | | environmental | 705 | 5571929 | 102415 | uncultured marine thaumarchaeote SAT1000_22_D06 | species | ncbi:1456395 | | environmental,not_otu | 706 | 5571875 | 102415 | uncultured marine thaumarchaeote KM3_47_A04 | species | ncbi:1456165 | | environmental,not_otu | 707 | 5571859 | 102415 | uncultured marine thaumarchaeote KM3_74_H09 | species | ncbi:1456276 | | environmental,not_otu | 708 | 5572045 | 102415 | uncultured marine thaumarchaeote KM3_79_D07 | species | ncbi:1456295 | | environmental,not_otu | 709 | 5571789 | 102415 | uncultured marine thaumarchaeote KM3_16_G02 | species | ncbi:1456045 | | environmental,not_otu | 710 | 5571659 | 102415 | uncultured marine thaumarchaeote KM3_74_F07 | species | ncbi:1456272 | | environmental,not_otu | 711 | 5571749 | 102415 | uncultured marine thaumarchaeote AD1000_49_H01 | species | ncbi:1455922 | | environmental,not_otu | 712 | 5571820 | 102415 | uncultured marine thaumarchaeote KM3_67_A06 | species | ncbi:1456231 | | environmental,not_otu | 713 | 5571751 | 102415 | uncultured marine thaumarchaeote AD1000_88_E07 | species | ncbi:1455946 | | environmental,not_otu | 714 | 5571662 | 102415 | uncultured marine thaumarchaeote SAT1000_05_B05 | species | ncbi:1456357 | | environmental,not_otu | 715 | 5205194 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-01 | species | ncbi:1238041 | | environmental | 716 | 5571903 | 102415 | uncultured marine thaumarchaeote KM3_27_D09 | species | ncbi:1456110 | | environmental,not_otu | 717 | 5571912 | 102415 | uncultured marine thaumarchaeote KM3_67_A04 | species | ncbi:1456230 | | environmental,not_otu | 718 | 5205137 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-11 | species | ncbi:1238013 | | environmental | 719 | 5571763 | 102415 | uncultured marine thaumarchaeote SAT1000_15_B11 | species | ncbi:1456384 | | environmental,not_otu | 720 | 5571833 | 102415 | uncultured marine thaumarchaeote SAT1000_37_C08 | species | ncbi:1456406 | | environmental,not_otu | 721 | 5571947 | 102415 | uncultured marine thaumarchaeote KM3_05_G02 | species | ncbi:1455970 | | environmental,not_otu | 722 | 5571752 | 102415 | uncultured marine thaumarchaeote KM3_177_A10 | species | ncbi:1456058 | | environmental,not_otu | 723 | 5571632 | 102415 | uncultured marine thaumarchaeote KM3_83_E04 | species | ncbi:1456308 | | environmental,not_otu | 724 | 5571705 | 102415 | uncultured marine thaumarchaeote KM3_25_G08 | species | ncbi:1456105 | | environmental,not_otu | 725 | 5571694 | 102415 | uncultured marine thaumarchaeote KM3_72_F07 | species | ncbi:1456263 | | environmental,not_otu | 726 | 5571657 | 102415 | uncultured marine thaumarchaeote KM3_125_A06 | species | ncbi:1455993 | | environmental,not_otu | 727 | 5571893 | 102415 | uncultured marine thaumarchaeote SAT1000_12_G09 | species | ncbi:1456379 | | environmental,not_otu | 728 | 5572090 | 102415 | uncultured marine thaumarchaeote SAT1000_39_F02 | species | ncbi:1456407 | | environmental,not_otu | 729 | 5571698 | 102415 | uncultured marine thaumarchaeote AD1000_65_A02 | species | ncbi:1455929 | | environmental,not_otu | 730 | 5572036 | 102415 | uncultured marine thaumarchaeote AD1000_39_D08 | species | ncbi:1455913 | | environmental,not_otu | 731 | 5571718 | 102415 | uncultured marine thaumarchaeote KM3_78_D07 | species | ncbi:1456291 | | environmental,not_otu | 732 | 5571708 | 102415 | uncultured marine thaumarchaeote KM3_74_E11 | species | ncbi:1456271 | | environmental,not_otu | 733 | 5205174 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-12 | species | ncbi:1238032 | | environmental | 734 | 5571840 | 102415 | uncultured marine thaumarchaeote KM3_11_F08 | species | ncbi:1455992 | | environmental,not_otu | 735 | 5571896 | 102415 | uncultured marine thaumarchaeote SAT1000_12_G12 | species | ncbi:1456380 | | environmental,not_otu | 736 | 5205206 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-07 | species | ncbi:1238009 | | environmental | 737 | 5571994 | 102415 | uncultured marine thaumarchaeote KM3_97_B07 | species | ncbi:1456350 | | environmental,not_otu | 738 | 5571739 | 102415 | uncultured marine thaumarchaeote KM3_33_G01 | species | ncbi:1456126 | | environmental,not_otu | 739 | 5571610 | 102415 | uncultured marine thaumarchaeote KM3_203_A01 | species | ncbi:1456097 | | environmental,not_otu | 740 | 5205207 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-05 | species | ncbi:1238007 | | environmental | 741 | 5571996 | 102415 | uncultured marine thaumarchaeote SAT1000_05_G10 | species | ncbi:1456358 | | environmental,not_otu | 742 | 5205223 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-25 | species | ncbi:1238108 | | environmental | 743 | 5571858 | 102415 | uncultured marine thaumarchaeote KM3_38_E02 | species | ncbi:1456138 | | environmental,not_otu | 744 | 5571910 | 102415 | uncultured marine thaumarchaeote KM3_175_E04 | species | ncbi:1456054 | | environmental,not_otu | 745 | 5571818 | 102415 | uncultured marine thaumarchaeote AD1000_82_B05 | species | ncbi:1455944 | | environmental,not_otu | 746 | 5571588 | 102415 | uncultured marine thaumarchaeote KM3_76_B05 | species | ncbi:1456282 | | environmental,not_otu | 747 | 5571826 | 102415 | uncultured marine thaumarchaeote KM3_78_A04 | species | ncbi:1456289 | | environmental,not_otu | 748 | 5571546 | 102415 | uncultured marine thaumarchaeote KM3_65_H02 | species | ncbi:1456226 | | environmental,not_otu | 749 | 5572044 | 102415 | uncultured marine thaumarchaeote KM3_72_H08 | species | ncbi:1456264 | | environmental,not_otu | 750 | 5571720 | 102415 | uncultured marine thaumarchaeote KM3_183_D02 | species | ncbi:1456068 | | environmental,not_otu | 751 | 5571934 | 102415 | uncultured marine thaumarchaeote AD1000_33_B07 | species | ncbi:1455908 | | environmental,not_otu | 752 | 5205247 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-16 | species | ncbi:1238080 | | environmental | 753 | 5571780 | 102415 | uncultured marine thaumarchaeote KM3_44_H07 | species | ncbi:1456153 | | environmental,not_otu | 754 | 5571927 | 102415 | uncultured marine thaumarchaeote KM3_78_E10 | species | ncbi:1456292 | | environmental,not_otu | 755 | 5571658 | 102415 | uncultured marine thaumarchaeote SAT1000_26_E12 | species | ncbi:1456400 | | environmental,not_otu | 756 | 5571641 | 102415 | uncultured marine thaumarchaeote KM3_03_F11 | species | ncbi:1455961 | | environmental,not_otu | 757 | 5571895 | 102415 | uncultured marine thaumarchaeote KM3_177_D01 | species | ncbi:1456059 | | environmental,not_otu | 758 | 5571700 | 102415 | uncultured marine thaumarchaeote KM3_196_F10 | species | ncbi:1456086 | | environmental,not_otu | 759 | 5572048 | 102415 | uncultured marine thaumarchaeote KM3_82_C03 | species | ncbi:1456303 | | environmental,not_otu | 760 | 5571699 | 102415 | uncultured marine thaumarchaeote SAT1000_08_G06 | species | ncbi:1456365 | | environmental,not_otu | 761 | 5205234 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-17 | species | ncbi:1238125 | | environmental | 762 | 5571897 | 102415 | uncultured marine thaumarchaeote AD1000_118_C08 | species | ncbi:1455889 | | environmental,not_otu | 763 | 5571946 | 102415 | uncultured marine thaumarchaeote KM3_99_A02 | species | ncbi:1456353 | | environmental,not_otu | 764 | 5205218 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-14 | species | ncbi:1238078 | | environmental | 765 | 5571582 | 102415 | uncultured marine thaumarchaeote SAT1000_09_B08 | species | ncbi:1456368 | | environmental,not_otu | 766 | 5205237 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-13 | species | ncbi:1238121 | | environmental | 767 | 5571653 | 102415 | uncultured marine thaumarchaeote AD1000_96_F07 | species | ncbi:1455948 | | environmental,not_otu | 768 | 5571881 | 102415 | uncultured marine thaumarchaeote KM3_42_C02 | species | ncbi:1456147 | | environmental,not_otu | 769 | 5205201 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-04 | species | ncbi:1238025 | | environmental | 770 | 5571583 | 102415 | uncultured marine thaumarchaeote KM3_67_E04 | species | ncbi:1456236 | | environmental,not_otu | 771 | 5572015 | 102415 | uncultured marine thaumarchaeote KM3_74_C10 | species | ncbi:1456270 | | environmental,not_otu | 772 | 5571930 | 102415 | uncultured marine thaumarchaeote KM3_153_B11 | species | ncbi:1456017 | | environmental,not_otu | 773 | 5571841 | 102415 | uncultured marine thaumarchaeote AD1000_80_F03 | species | ncbi:1455943 | | environmental,not_otu | 774 | 5205118 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-12 | species | ncbi:1237981 | | environmental | 775 | 5571800 | 102415 | uncultured marine thaumarchaeote KM3_70_D04 | species | ncbi:1456250 | | environmental,not_otu | 776 | 5571689 | 102415 | uncultured marine thaumarchaeote KM3_55_F05 | species | ncbi:1456198 | | environmental,not_otu | 777 | 5205227 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-05 | species | ncbi:1238045 | | environmental | 778 | 5571916 | 102415 | uncultured marine thaumarchaeote KM3_76_D06 | species | ncbi:1456284 | | environmental,not_otu | 779 | 5571787 | 102415 | uncultured marine thaumarchaeote KM3_51_A05 | species | ncbi:1456173 | | environmental,not_otu | 780 | 5571944 | 102415 | uncultured marine thaumarchaeote KM3_126_D02 | species | ncbi:1455994 | | environmental,not_otu | 781 | 5571771 | 102415 | uncultured marine thaumarchaeote AD1000_04_C02 | species | ncbi:1455881 | | environmental,not_otu | 782 | 5205102 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-07 | species | ncbi:1238072 | | environmental | 783 | 5571624 | 102415 | uncultured marine thaumarchaeote KM3_168_H11 | species | ncbi:1456039 | | environmental,not_otu | 784 | 5571955 | 102415 | uncultured marine thaumarchaeote KM3_51_E02 | species | ncbi:1456174 | | environmental,not_otu | 785 | 5205173 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-15 | species | ncbi:1238035 | | environmental | 786 | 5205162 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-04 | species | ncbi:1238092 | | environmental | 787 | 5571899 | 102415 | uncultured marine thaumarchaeote AD1000_71_A04 | species | ncbi:1455935 | | environmental,not_otu | 788 | 5571943 | 102415 | uncultured marine thaumarchaeote SAT1000_52_D06 | species | ncbi:1456420 | | environmental,not_otu | 789 | 5571790 | 102415 | uncultured marine thaumarchaeote SAT1000_15_H02 | species | ncbi:1456386 | | environmental,not_otu | 790 | 5572082 | 102415 | uncultured marine thaumarchaeote KM3_57_F02 | species | ncbi:1456209 | | environmental,not_otu | 791 | 5571680 | 102415 | uncultured marine thaumarchaeote KM3_90_G11 | species | ncbi:1456344 | | environmental,not_otu | 792 | 5571724 | 102415 | uncultured marine thaumarchaeote KM3_68_B04 | species | ncbi:1456242 | | environmental,not_otu | 793 | 5571691 | 102415 | uncultured marine thaumarchaeote KM3_164_C03 | species | ncbi:1456035 | | environmental,not_otu | 794 | 5571578 | 102415 | uncultured marine thaumarchaeote KM3_52_B01 | species | ncbi:1456176 | | environmental,not_otu | 795 | 5571735 | 102415 | uncultured marine thaumarchaeote AD1000_54_E04 | species | ncbi:1455924 | | environmental,not_otu | 796 | 5205098 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-07 | species | ncbi:1237986 | | environmental | 797 | 5571716 | 102415 | uncultured marine thaumarchaeote SAT1000_47_G01 | species | ncbi:1456413 | | environmental,not_otu | 798 | 5571556 | 102415 | uncultured marine thaumarchaeote KM3_08_A11 | species | ncbi:1455977 | | environmental,not_otu | 799 | 5572026 | 102415 | uncultured marine thaumarchaeote KM3_69_B11 | species | ncbi:1456244 | | environmental,not_otu | 800 | 5205119 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-11 | species | ncbi:1237980 | | environmental | 801 | 5571867 | 102415 | uncultured marine thaumarchaeote KM3_77_E10 | species | ncbi:1456287 | | environmental,not_otu | 802 | 5571652 | 102415 | uncultured marine thaumarchaeote KM3_47_G11 | species | ncbi:1456169 | | environmental,not_otu | 803 | 5571656 | 102415 | uncultured marine thaumarchaeote KM3_84_F11 | species | ncbi:1456314 | | environmental,not_otu | 804 | 5571909 | 102415 | uncultured marine thaumarchaeote KM3_182_G05 | species | ncbi:1456066 | | environmental,not_otu | 805 | 5571813 | 102415 | uncultured marine thaumarchaeote KM3_67_B11 | species | ncbi:1456234 | | environmental,not_otu | 806 | 5571806 | 102415 | uncultured marine thaumarchaeote AD1000_21_E03 | species | ncbi:1455900 | | environmental,not_otu | 807 | 5572051 | 102415 | uncultured marine thaumarchaeote KM3_50_F11 | species | ncbi:1456172 | | environmental,not_otu | 808 | 5571590 | 102415 | uncultured marine thaumarchaeote KM3_46_H07 | species | ncbi:1456163 | | environmental,not_otu | 809 | 5571936 | 102415 | uncultured marine thaumarchaeote KM3_15_E09 | species | ncbi:1456028 | | environmental,not_otu | 810 | 5571939 | 102415 | uncultured marine thaumarchaeote KM3_89_H08 | species | ncbi:1456341 | | environmental,not_otu | 811 | 5571621 | 102415 | uncultured marine thaumarchaeote KM3_26_A01 | species | ncbi:1456106 | | environmental,not_otu | 812 | 5571717 | 102415 | uncultured marine thaumarchaeote KM3_70_B04 | species | ncbi:1456247 | | environmental,not_otu | 813 | 5571715 | 102415 | uncultured marine thaumarchaeote KM3_49_A08 | species | ncbi:1456171 | | environmental,not_otu | 814 | 5205172 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-17 | species | ncbi:1238037 | | environmental | 815 | 5205153 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-25 | species | ncbi:1238002 | | environmental | 816 | 5571997 | 102415 | uncultured marine thaumarchaeote KM3_98_C03 | species | ncbi:1456352 | | environmental,not_otu | 817 | 5571577 | 102415 | uncultured marine thaumarchaeote KM3_30_B03 | species | ncbi:1456116 | | environmental,not_otu | 818 | 5571709 | 102415 | uncultured marine thaumarchaeote KM3_35_D03 | species | ncbi:1456132 | | environmental,not_otu | 819 | 5571692 | 102415 | uncultured marine thaumarchaeote KM3_60_F11 | species | ncbi:1456212 | | environmental,not_otu | 820 | 5572075 | 102415 | uncultured marine thaumarchaeote SAT1000_14_E04 | species | ncbi:1456383 | | environmental,not_otu | 821 | 5571948 | 102415 | uncultured marine thaumarchaeote KM3_15_E05 | species | ncbi:1456027 | | environmental,not_otu | 822 | 5572069 | 102415 | uncultured marine thaumarchaeote KM3_03_B08 | species | ncbi:1455959 | | environmental,not_otu | 823 | 5572011 | 102415 | uncultured marine thaumarchaeote KM3_183_D03 | species | ncbi:1456069 | | environmental,not_otu | 824 | 5571630 | 102415 | uncultured marine thaumarchaeote KM3_57_B01 | species | ncbi:1456205 | | environmental,not_otu | 825 | 5571889 | 102415 | uncultured marine thaumarchaeote KM3_54_H01 | species | ncbi:1456194 | | environmental,not_otu | 826 | 5205236 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-12 | species | ncbi:1238120 | | environmental | 827 | 5571728 | 102415 | uncultured marine thaumarchaeote KM3_173_E01 | species | ncbi:1456050 | | environmental,not_otu | 828 | 5572007 | 102415 | uncultured marine thaumarchaeote KM3_24_H11 | species | ncbi:1456102 | | environmental,not_otu | 829 | 5571914 | 102415 | uncultured marine thaumarchaeote KM3_75_C11 | species | ncbi:1456278 | | environmental,not_otu | 830 | 5205159 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-11 | species | ncbi:1238097 | | environmental | 831 | 5571933 | 102415 | uncultured marine thaumarchaeote KM3_70_D01 | species | ncbi:1456249 | | environmental,not_otu | 832 | 5571768 | 102415 | uncultured marine thaumarchaeote KM3_149_F07 | species | ncbi:1456013 | | environmental,not_otu | 833 | 5205250 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-09 | species | ncbi:1237978 | | environmental | 834 | 5571580 | 102415 | uncultured marine thaumarchaeote KM3_35_D07 | species | ncbi:1456133 | | environmental,not_otu | 835 | 5205242 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-23 | species | ncbi:1238087 | | environmental | 836 | 5571839 | 102415 | uncultured marine thaumarchaeote KM3_05_H05 | species | ncbi:1455972 | | environmental,not_otu | 837 | 5571782 | 102415 | uncultured marine thaumarchaeote KM3_59_C01 | species | ncbi:1456210 | | environmental,not_otu | 838 | 5572061 | 102415 | uncultured marine thaumarchaeote KM3_57_D03 | species | ncbi:1456206 | | environmental,not_otu | 839 | 5571568 | 102415 | uncultured marine thaumarchaeote SAT1000_51_A06 | species | ncbi:1456418 | | environmental,not_otu | 840 | 5572020 | 102415 | uncultured marine thaumarchaeote KM3_84_F03 | species | ncbi:1456313 | | environmental,not_otu | 841 | 5571616 | 102415 | uncultured marine thaumarchaeote AD1000_63_F07 | species | ncbi:1455928 | | environmental,not_otu | 842 | 5572086 | 102415 | uncultured marine thaumarchaeote SAT1000_18_D03 | species | ncbi:1456391 | | environmental,not_otu | 843 | 5571845 | 102415 | uncultured marine thaumarchaeote SAT1000_20_C01 | species | ncbi:1456393 | | environmental,not_otu | 844 | 5571757 | 102415 | uncultured marine thaumarchaeote KM3_85_E11 | species | ncbi:1456317 | | environmental,not_otu | 845 | 5205190 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-16 | species | ncbi:1238055 | | environmental | 846 | 5571638 | 102415 | uncultured marine thaumarchaeote AD1000_89_F09 | species | ncbi:1455947 | | environmental,not_otu | 847 | 5571668 | 102415 | uncultured marine thaumarchaeote KM3_74_F11 | species | ncbi:1456273 | | environmental,not_otu | 848 | 5571737 | 102415 | uncultured marine thaumarchaeote KM3_107_D09 | species | ncbi:1455985 | | environmental,not_otu | 849 | 5571569 | 102415 | uncultured marine thaumarchaeote KM3_13_H10 | species | ncbi:1456008 | | environmental,not_otu | 850 | 5572080 | 102415 | uncultured marine thaumarchaeote KM3_77_D12 | species | ncbi:1456286 | | environmental,not_otu | 851 | 5205219 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-20 | species | ncbi:1238104 | | environmental | 852 | 5572066 | 102415 | uncultured marine thaumarchaeote KM3_55_D03 | species | ncbi:1456197 | | environmental,not_otu | 853 | 5572079 | 102415 | uncultured marine thaumarchaeote AD1000_04_G03 | species | ncbi:1455882 | | environmental,not_otu | 854 | 5205200 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-01 | species | ncbi:1238022 | | environmental | 855 | 5571849 | 102415 | uncultured marine thaumarchaeote KM3_95_H05 | species | ncbi:1456348 | | environmental,not_otu | 856 | 5571690 | 102415 | uncultured marine thaumarchaeote KM3_31_G08 | species | ncbi:1456121 | | environmental,not_otu | 857 | 5571574 | 102415 | uncultured marine thaumarchaeote KM3_88_C09 | species | ncbi:1456333 | | environmental,not_otu | 858 | 5205160 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-01 | species | ncbi:1238090 | | environmental | 859 | 5571857 | 102415 | uncultured marine thaumarchaeote KM3_82_B03 | species | ncbi:1456302 | | environmental,not_otu | 860 | 5572071 | 102415 | uncultured marine thaumarchaeote KM3_66_E12 | species | ncbi:1456229 | | environmental,not_otu | 861 | 5571551 | 102415 | uncultured marine thaumarchaeote KM3_89_C12 | species | ncbi:1456339 | | environmental,not_otu | 862 | 5205203 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-06 | species | ncbi:1238027 | | environmental | 863 | 5571682 | 102415 | uncultured marine thaumarchaeote KM3_36_H05 | species | ncbi:1456136 | | environmental,not_otu | 864 | 5571731 | 102415 | uncultured marine thaumarchaeote SAT1000_10_G09 | species | ncbi:1456375 | | environmental,not_otu | 865 | 5205148 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-08 | species | ncbi:1238116 | | environmental | 866 | 5571746 | 102415 | uncultured marine thaumarchaeote SAT1000_16_E04 | species | ncbi:1456389 | | environmental,not_otu | 867 | 5571637 | 102415 | uncultured marine thaumarchaeote KM3_82_A11 | species | ncbi:1456301 | | environmental,not_otu | 868 | 5571760 | 102415 | uncultured marine thaumarchaeote KM3_63_D09 | species | ncbi:1456219 | | environmental,not_otu | 869 | 5205213 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-23 | species | ncbi:1238000 | | environmental | 870 | 5205243 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-22 | species | ncbi:1238086 | | environmental | 871 | 5571861 | 102415 | uncultured marine thaumarchaeote KM3_89_G03 | species | ncbi:1456340 | | environmental,not_otu | 872 | 5205107 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-13 | species | ncbi:1238077 | | environmental | 873 | 5571975 | 102415 | uncultured marine thaumarchaeote KM3_67_H03 | species | ncbi:1456239 | | environmental,not_otu | 874 | 5571810 | 102415 | uncultured marine thaumarchaeote KM3_98_B07 | species | ncbi:1456351 | | environmental,not_otu | 875 | 5571669 | 102415 | uncultured thaumarchaeote Rifle_16ft_4_minimus_11813 | species | ncbi:1665208 | | environmental,not_otu | 876 | 5571704 | 102415 | uncultured marine thaumarchaeote KM3_70_D10 | species | ncbi:1456253 | | environmental,not_otu | 877 | 5205228 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-04 | species | ncbi:1238044 | | environmental | 878 | 5571678 | 102415 | uncultured marine thaumarchaeote SAT1000_04_F07 | species | ncbi:1456355 | | environmental,not_otu | 879 | 5571666 | 102415 | uncultured marine thaumarchaeote KM3_79_C04 | species | ncbi:1456294 | | environmental,not_otu | 880 | 5205184 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-T16S-08 | species | ncbi:1238065 | | environmental | 881 | 5572057 | 102415 | uncultured marine thaumarchaeote SAT1000_29_H07 | species | ncbi:1456403 | | environmental,not_otu | 882 | 5571932 | 102415 | uncultured marine thaumarchaeote KM3_56_D04 | species | ncbi:1456203 | | environmental,not_otu | 883 | 5571812 | 102415 | uncultured marine thaumarchaeote KM3_78_D03 | species | ncbi:1456290 | | environmental,not_otu | 884 | 5205100 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-TamoA-03 | species | ncbi:1237984 | | environmental | 885 | 5571976 | 102415 | uncultured marine thaumarchaeote KM3_37_D10 | species | ncbi:1456137 | | environmental,not_otu | 886 | 5571549 | 102415 | uncultured marine thaumarchaeote KM3_02_B09 | species | ncbi:1455956 | | environmental,not_otu | 887 | 5571622 | 102415 | uncultured marine thaumarchaeote KM3_79_B12 | species | ncbi:1456293 | | environmental,not_otu | 888 | 5571801 | 102415 | uncultured marine thaumarchaeote KM3_14_C04 | species | ncbi:1456014 | | environmental,not_otu | 889 | 5205188 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-T16S-03 | species | ncbi:1238061 | | environmental | 890 | 5205112 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-13 | species | ncbi:1238052 | | environmental | 891 | 5571695 | 102415 | uncultured marine thaumarchaeote AD1000_25_B10 | species | ncbi:1455903 | | environmental,not_otu | 892 | 5205161 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-02 | species | ncbi:1238091 | | environmental | 893 | 5571816 | 102415 | uncultured marine thaumarchaeote KM3_66_E06 | species | ncbi:1456228 | | environmental,not_otu | 894 | 5571777 | 102415 | uncultured marine thaumarchaeote KM3_110_B01 | species | ncbi:1455987 | | environmental,not_otu | 895 | 5571684 | 102415 | uncultured marine thaumarchaeote KM3_68_A08 | species | ncbi:1456241 | | environmental,not_otu | 896 | 5571589 | 102415 | uncultured marine thaumarchaeote KM3_57_E08 | species | ncbi:1456207 | | environmental,not_otu | 897 | 5205204 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-05 | species | ncbi:1238026 | | environmental | 898 | 5571553 | 102415 | uncultured marine thaumarchaeote KM3_43_D05 | species | ncbi:1456150 | | environmental,not_otu | 899 | 5571628 | 102415 | uncultured marine thaumarchaeote KM3_55_H11 | species | ncbi:1456200 | | environmental,not_otu | 900 | 5571552 | 102415 | uncultured marine thaumarchaeote KM3_76_A03 | species | ncbi:1456281 | | environmental,not_otu | 901 | 5571665 | 102415 | uncultured marine thaumarchaeote SAT1000_11_C10 | species | ncbi:1456377 | | environmental,not_otu | 902 | 5572067 | 102415 | uncultured marine thaumarchaeote KM3_28_C05 | species | ncbi:1456112 | | environmental,not_otu | 903 | 5205149 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-05 | species | ncbi:1237974 | | environmental | 904 | 5205248 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa1b9-01 | species | ncbi:1238089 | | environmental | 905 | 5571575 | 102415 | uncultured marine thaumarchaeote KM3_61_H06 | species | ncbi:1456215 | | environmental,not_otu | 906 | 5572012 | 102415 | uncultured marine thaumarchaeote KM3_87_G04 | species | ncbi:1456330 | | environmental,not_otu | 907 | 5571645 | 102415 | uncultured marine thaumarchaeote KM3_03_G08 | species | ncbi:1455962 | | environmental,not_otu | 908 | 5571953 | 102415 | uncultured marine thaumarchaeote KM3_79_H05 | species | ncbi:1456298 | | environmental,not_otu | 909 | 5571576 | 102415 | uncultured marine thaumarchaeote AD1000_20_B03 | species | ncbi:1455899 | | environmental,not_otu | 910 | 5571828 | 102415 | uncultured marine thaumarchaeote KM3_47_A03 | species | ncbi:1456164 | | environmental,not_otu | 911 | 5571626 | 102415 | uncultured marine thaumarchaeote KM3_47_C08 | species | ncbi:1456167 | | environmental,not_otu | 912 | 5572035 | 102415 | uncultured marine thaumarchaeote KM3_181_G03 | species | ncbi:1456065 | | environmental,not_otu | 913 | 5571732 | 102415 | uncultured marine thaumarchaeote KM3_54_B03 | species | ncbi:1456189 | | environmental,not_otu | 914 | 5571772 | 102415 | uncultured marine thaumarchaeote SAT1000_18_G08 | species | ncbi:1456392 | | environmental,not_otu | 915 | 5571548 | 102415 | uncultured marine thaumarchaeote KM3_74_G09 | species | ncbi:1456275 | | environmental,not_otu | 916 | 5571821 | 102415 | uncultured marine thaumarchaeote SAT1000_10_F12 | species | ncbi:1456373 | | environmental,not_otu | 917 | 5571803 | 102415 | uncultured marine thaumarchaeote KM3_45_G08 | species | ncbi:1456157 | | environmental,not_otu | 918 | 5571670 | 102415 | uncultured marine thaumarchaeote KM3_87_H02 | species | ncbi:1456331 | | environmental,not_otu | 919 | 5571619 | 102415 | uncultured marine thaumarchaeote KM3_126_H01 | species | ncbi:1455995 | | environmental,not_otu | 920 | 5571986 | 102415 | uncultured marine thaumarchaeote SAT1000_48_G08 | species | ncbi:1456416 | | environmental,not_otu | 921 | 5571647 | 102415 | uncultured marine thaumarchaeote SAT1000_13_G01 | species | ncbi:1456382 | | environmental,not_otu | 922 | 5571573 | 102415 | uncultured marine thaumarchaeote KM3_88_D08 | species | ncbi:1456335 | | environmental,not_otu | 923 | 5571878 | 102415 | uncultured marine thaumarchaeote KM3_70_B03 | species | ncbi:1456246 | | environmental,not_otu | 924 | 5205139 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-11 | species | ncbi:1238119 | | environmental | 925 | 5571559 | 102415 | uncultured marine thaumarchaeote KM3_26_B10 | species | ncbi:1456107 | | environmental,not_otu | 926 | 5571687 | 102415 | uncultured marine thaumarchaeote KM3_04_D06 | species | ncbi:1455965 | | environmental,not_otu | 927 | 5571597 | 102415 | uncultured marine thaumarchaeote SAT1000_16_A03 | species | ncbi:1456387 | | environmental,not_otu | 928 | 5571959 | 102415 | uncultured marine thaumarchaeote KM3_83_D12 | species | ncbi:1456307 | | environmental,not_otu | 929 | 5571788 | 102415 | uncultured marine thaumarchaeote KM3_173_D12 | species | ncbi:1456049 | | environmental,not_otu | 930 | 5205147 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-09 | species | ncbi:1238117 | | environmental | 931 | 5571974 | 102415 | uncultured marine thaumarchaeote KM3_199_C08 | species | ncbi:1456090 | | environmental,not_otu | 932 | 5572034 | 102415 | uncultured marine thaumarchaeote KM3_154_A05 | species | ncbi:1456020 | | environmental,not_otu | 933 | 5571880 | 102415 | thaumarchaeote enrichment culture | species | ncbi:1608626 | | environmental | 934 | 5571685 | 102415 | uncultured marine thaumarchaeote KM3_03_D08 | species | ncbi:1455960 | | environmental,not_otu | 935 | 5571675 | 102415 | uncultured marine thaumarchaeote KM3_199_D03 | species | ncbi:1456091 | | environmental,not_otu | 936 | 5571957 | 102415 | uncultured marine thaumarchaeote KM3_16_C11 | species | ncbi:1456043 | | environmental,not_otu | 937 | 5571603 | 102415 | uncultured marine thaumarchaeote KM3_69_H10 | species | ncbi:1456245 | | environmental,not_otu | 938 | 5205186 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-02 | species | ncbi:1238067 | | environmental | 939 | 5572042 | 102415 | uncultured marine thaumarchaeote KM3_46_F12 | species | ncbi:1456160 | | environmental,not_otu | 940 | 5205195 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-09 | species | ncbi:1238049 | | environmental | 941 | 5205226 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa2b1-06 | species | ncbi:1238046 | | environmental | 942 | 5571931 | 102415 | uncultured marine thaumarchaeote KM3_56_F06 | species | ncbi:1456204 | | environmental,not_otu | 943 | 5571565 | 102415 | uncultured marine thaumarchaeote AD1000_16_B05 | species | ncbi:1455894 | | environmental,not_otu | 944 | 5205231 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-19 | species | ncbi:1238083 | | environmental | 945 | 5572003 | 102415 | uncultured marine thaumarchaeote KM3_187_A01 | species | ncbi:1456072 | | environmental,not_otu | 946 | 5572056 | 102415 | uncultured marine thaumarchaeote KM3_12_F11 | species | ncbi:1455999 | | environmental,not_otu | 947 | 5571871 | 102415 | uncultured marine thaumarchaeote KM3_97_A02 | species | ncbi:1456349 | | environmental,not_otu | 948 | 5571853 | 102415 | uncultured marine thaumarchaeote KM3_68_E02 | species | ncbi:1456243 | | environmental,not_otu | 949 | 5571727 | 102415 | uncultured marine thaumarchaeote KM3_42_E08 | species | ncbi:1456148 | | environmental,not_otu | 950 | 5205210 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-02 | species | ncbi:1238004 | | environmental | 951 | 5205120 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-14 | species | ncbi:1238034 | | environmental | 952 | 5571991 | 102415 | uncultured marine thaumarchaeote KM3_47_A07 | species | ncbi:1456166 | | environmental,not_otu | 953 | 5205208 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-04 | species | ncbi:1238006 | | environmental | 954 | 5571636 | 102415 | uncultured marine thaumarchaeote KM3_32_A02 | species | ncbi:1456122 | | environmental,not_otu | 955 | 5205214 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-07 | species | ncbi:1237976 | | environmental | 956 | 5572065 | 102415 | uncultured marine thaumarchaeote KM3_115_G03 | species | ncbi:1455989 | | environmental,not_otu | 957 | 5205253 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-18 | species | ncbi:1238082 | | environmental | 958 | 5205103 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-09 | species | ncbi:1238073 | | environmental | 959 | 5205170 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-19 | species | ncbi:1238103 | | environmental | 960 | 5571672 | 102415 | uncultured marine thaumarchaeote KM3_11_C04 | species | ncbi:1455990 | | environmental,not_otu | 961 | 5571702 | 102415 | uncultured marine thaumarchaeote KM3_54_G03 | species | ncbi:1456192 | | environmental,not_otu | 962 | 5572085 | 102415 | uncultured marine thaumarchaeote AD1000_11_E10 | species | ncbi:1455890 | | environmental,not_otu | 963 | 5571848 | 102415 | uncultured marine thaumarchaeote KM3_86_E11 | species | ncbi:1456321 | | environmental,not_otu | 964 | 5571993 | 102415 | uncultured marine thaumarchaeote AD1000_88_A06 | species | ncbi:1455945 | | environmental,not_otu | 965 | 5205163 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-05 | species | ncbi:1238093 | | environmental | 966 | 5571730 | 102415 | uncultured marine thaumarchaeote SAT1000_31_A02 | species | ncbi:1456404 | | environmental,not_otu | 967 | 5571663 | 102415 | uncultured marine thaumarchaeote KM3_169_G08 | species | ncbi:1456041 | | environmental,not_otu | 968 | 5572017 | 102415 | uncultured marine thaumarchaeote SAT1000_43_E07 | species | ncbi:1456410 | | environmental,not_otu | 969 | 5571778 | 102415 | uncultured marine thaumarchaeote AD1000_77_H06 | species | ncbi:1455940 | | environmental,not_otu | 970 | 5205177 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-06 | species | ncbi:1238008 | | environmental | 971 | 5205123 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b9-11 | species | ncbi:1238031 | | environmental | 972 | 5205150 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-06 | species | ncbi:1237975 | | environmental | 973 | 5571747 | 102415 | uncultured marine thaumarchaeote KM3_158_B05 | species | ncbi:1456024 | | environmental,not_otu | 974 | 5571649 | 102415 | uncultured marine thaumarchaeote KM3_84_D12 | species | ncbi:1456311 | | environmental,not_otu | 975 | 5572006 | 102415 | uncultured marine thaumarchaeote AD1000_60_A11 | species | ncbi:1455927 | | environmental,not_otu | 976 | 5205251 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa-T16S-10 | species | ncbi:1237979 | | environmental | 977 | 5571794 | 102415 | uncultured marine thaumarchaeote KM3_201_H03 | species | ncbi:1456096 | | environmental,not_otu | 978 | 5571869 | 102415 | uncultured marine thaumarchaeote KM3_82_D05 | species | ncbi:1456304 | | environmental,not_otu | 979 | 5571712 | 102415 | uncultured marine thaumarchaeote AD1000_24_H07 | species | ncbi:1455902 | | environmental,not_otu | 980 | 5571961 | 102415 | uncultured marine thaumarchaeote KM3_75_F06 | species | ncbi:1456280 | | environmental,not_otu | 981 | 5571926 | 102415 | uncultured marine thaumarchaeote KM3_71_E12 | species | ncbi:1456258 | | environmental,not_otu | 982 | 5205222 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b1-14 | species | ncbi:1238100 | | environmental | 983 | 5571900 | 102415 | uncultured marine thaumarchaeote KM3_06_B06 | species | ncbi:1455974 | | environmental,not_otu | 984 | 5205166 | 102415 | thaumarchaeote enrichment culture clone Ec.FBb-TamoA-04 | species | ncbi:1238069 | | environmental | 985 | 5571740 | 102415 | uncultured marine thaumarchaeote KM3_193_A03 | species | ncbi:1456081 | | environmental,not_otu | 986 | 5205235 | 102415 | thaumarchaeote enrichment culture clone Ec.MTa2b4-18 | species | ncbi:1238126 | | environmental | 987 | 5571679 | 102415 | uncultured marine thaumarchaeote KM3_23_E01 | species | ncbi:1456099 | | environmental,not_otu | 988 | 5571865 | 102415 | uncultured marine thaumarchaeote SAT1000_48_A08 | species | ncbi:1456414 | | environmental,not_otu | 989 | 5571850 | 102415 | uncultured marine thaumarchaeote SAT1000_10_G06 | species | ncbi:1456374 | | environmental,not_otu | 990 | 5571611 | 102415 | uncultured marine thaumarchaeote AD1000_26_G12 | species | ncbi:1455904 | | environmental,not_otu | 991 | 5572016 | 102415 | uncultured marine thaumarchaeote KM3_65_D04 | species | ncbi:1456224 | | environmental,not_otu | 992 | 5571969 | 102415 | uncultured marine thaumarchaeote KM3_186_C08 | species | ncbi:1456070 | | environmental,not_otu | 993 | 5571667 | 102415 | uncultured marine thaumarchaeote KM3_41_H02 | species | ncbi:1456146 | | environmental,not_otu | 994 | 5571890 | 102415 | uncultured marine thaumarchaeote KM3_52_F05 | species | ncbi:1456177 | | environmental,not_otu | 995 | 5571807 | 102415 | uncultured marine thaumarchaeote AD1000_54_F09 | species | ncbi:1455926 | | environmental,not_otu | 996 | 5571591 | 102415 | uncultured marine thaumarchaeote KM3_175_A05 | species | ncbi:1456051 | | environmental,not_otu | 997 | 5571756 | 102415 | uncultured marine thaumarchaeote KM3_46_E07 | species | ncbi:1456159 | | environmental,not_otu | 998 | 5571888 | 102415 | uncultured marine thaumarchaeote KM3_02_A10 | species | ncbi:1455955 | | environmental,not_otu | 999 | 5205131 | 102415 | thaumarchaeote enrichment culture clone Ec.FBa1b4-14 | species | ncbi:1238015 | | environmental | 1000 | 5572032 | 102415 | uncultured marine thaumarchaeote KM3_53_B02 | species | ncbi:1456180 | | environmental,not_otu | 1001 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: graphmining-twc21 2 | channels: 3 | - conda-forge 4 | dependencies: 5 | - pip 6 | - numpy 7 | - pandas 8 | - networkx 9 | - matplotlib 10 | - pandas 11 | - scipy 12 | - networkit==7.1 13 | - louvain 14 | - pip: 15 | - littleballoffur 16 | - git+https://github.com/epfl-lts2/spikexplore 17 | -------------------------------------------------------------------------------- /environment_local.yml: -------------------------------------------------------------------------------- 1 | name: graphmining-twc21-local 2 | channels: 3 | - conda-forge 4 | dependencies: 5 | - pip 6 | - numpy 7 | - pandas 8 | - networkx 9 | - matplotlib 10 | - pandas 11 | - scipy 12 | - networkit==7.1 13 | - louvain 14 | - jupyterlab 15 | - pip: 16 | - littleballoffur 17 | - git+https://github.com/epfl-lts2/spikexplore 18 | -------------------------------------------------------------------------------- /figures/MHRWalgo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/MHRWalgo.png -------------------------------------------------------------------------------- /figures/explorationstep.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/explorationstep.png -------------------------------------------------------------------------------- /figures/gcb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/gcb.png -------------------------------------------------------------------------------- /figures/gfb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/gfb.png -------------------------------------------------------------------------------- /figures/gff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/gff.png -------------------------------------------------------------------------------- /figures/gmh.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/gmh.png -------------------------------------------------------------------------------- /figures/graphdiameter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/graphdiameter.png -------------------------------------------------------------------------------- /figures/redditneighbors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/redditneighbors.png -------------------------------------------------------------------------------- /figures/spikyballfinal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/spikyballfinal.png -------------------------------------------------------------------------------- /figures/spikyballproba.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epfl-lts2/GraphMining-TheWebConf2021/0bd86a5dd5772e951bd22ac169e6101718425337/figures/spikyballproba.png --------------------------------------------------------------------------------