├── .gitignore ├── 01 - Introduction.ipynb ├── 02 - Gradients and Edges.ipynb ├── 03 - Orientation Analysis.ipynb ├── LICENSE ├── README.md ├── context.py ├── data ├── __init__.py ├── bgs_rock │ ├── N1495_ppl.jpg │ ├── N1495_xpl.jpg │ ├── README.md │ ├── __init__.py │ └── download.sh ├── gebco │ ├── __init__.py │ └── seamounts.tif └── naip │ ├── README.md │ ├── __init__.py │ └── q2839_sw_NAIP2018_RGB_clipped.tif ├── environment.yml ├── examples ├── __init__.py ├── combine_adjacent_seamounts.py ├── context.py ├── seamount_detection.py ├── seamount_detection_saving.py ├── structure_tensor.py ├── thin_section_grain_orientation.py └── utils.py ├── exercises ├── 3356_38_3085-3090m.jpg ├── 3356_38_3085-3090m_crop.jpg └── README.md └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *.egg-info 2 | build 3 | dist 4 | *.pyc 5 | *.swp 6 | docs/examples/*.rst 7 | docs/_static/examples/*.png 8 | docs/_build 9 | -------------------------------------------------------------------------------- /01 - Introduction.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Geologic Image Processing in Python\n", 8 | "===============================\n", 9 | "\n", 10 | "As a geoscientist, some of the most useful and frequently-used computational tools fall under the broad category of image processing. It's more than working with photographs or satellite imagery, though. All \"image processing\" means in this context is working with data that's on a regular grid. For example, a digital elevation model is every bit as much an image as a core photograph is. \n", 11 | "\n", 12 | "Outline for Today\n", 13 | "--------------------------\n", 14 | "\n", 15 | " * Overview / Introduction\n", 16 | " * Seamount Detection Example\n", 17 | " - Thresholding\n", 18 | " - Filtering\n", 19 | " - Segmentation\n", 20 | " - Simplification\n", 21 | " * Slope and Hillshade of Topographic Data\n", 22 | " - Gradients\n", 23 | " - Hillshade\n", 24 | " * Toe of Slope Detection\n", 25 | " - Laplacian\n", 26 | " - Skeltonization\n", 27 | " * Lineament Analysis from Aerial Photography\n", 28 | " - Edge detection\n", 29 | " - Hough Transform\n", 30 | " - Structure Tensor\n", 31 | " * Grain Detection in Thin Sections\n", 32 | " - Color-based segmentation\n", 33 | " - SLIC\n", 34 | " - Image moments\n", 35 | " \n", 36 | "\n", 37 | "#### Goals\n", 38 | "\n", 39 | "This tutorial will introduce you to some core image processing methods by solving a handful of realistic tasks related to geology and geophysics. The goal is to gain familiarity with key \"building blocks\" and terminology so that you can understand how to use common Python libraries such as `scipy.ndimage` and `sklearn` in your day-to-day work. For many of you, these may seem like simple tasks and things that are trivial to accomplish in ArcGIS, ImageJ, or Photoshop. However, the terminology is a bit different when working with image processing and computer vision libraries. Many operations are called very different things, or are broken into smaller pieces. Therefore, it's important to understand how to string the fundamental operations that are usually exposed in programming libraries into the higher-level operations you're used to thinking about. \n", 40 | "\n", 41 | "There are a lot of great tutorials out there already for the libraries we'll be working with. However, there are not many geoscience-focused examples freely available. It's much easier to see how methods can be applied to your domain when there are examples of familiar tasks related to what you're doing. \n", 42 | "\n", 43 | "Often, particularly with image processing, what we want to do as scientisits is significantly different than the tasks that image processing tutorials are aimed at. The methods are the same, but thousands of examples of manipulating cat photos doesn't always help form a connection to an analysis you're stuck on. Hopefully this tutorial can provide a bit of a bridge between the two worlds.\n", 44 | "\n", 45 | "This is not meant to be a complete introduction to image processing, or even a complete introduction to common geoscience image processing problems. Those of you already familiar with image processing methods will notice that we completely gloss over some very important points and do not fully explain many underlying methods. The goal here, however, is to give you a quick overview of what's possible by combinging a few well-known methods. Hopefully after this tutorial you feel comfortable enough to start experimenting and learning more on your own.\n", 46 | "\n", 47 | "#### Libraries\n", 48 | "\n", 49 | "We'll focus on using a combination of `numpy`, `scipy.ndimage`, and `sklearn`. `sklearn` is a leading image processing library that has many very nice features and exposes many advanced methods. `scipy.ndimage` is a bit more low-level, but has the advantage of both working in 3D (or N-D) and focusing on efficient implementations of common operations. We'll also use libraries like `rasterio` for reading and writing geospatial data, but we won't dive deeply into the details of working with geospatial data.\n", 50 | "\n", 51 | "#### Notes\n", 52 | "\n", 53 | "Because the Transform2020 tutorial is remote, it's difficult to provide the type of hands-on help we normally would. Therefore, this set of notebooks is meant to be a \"cookbook\" demonstrating common tasks and illustrating underlying principles through specific examples. We won't go over all of the details, but hopefully you can come back to these examples later and adjust them to your needs." 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "#### Additional Resources\n", 61 | "\n", 62 | "This tutorial assumes basic Python knowledge and at least some familiarity with `numpy`. However, if you're new to all of this, that's perfectly okay! In that case, focus on how the operations are chained together more than the details of the code. If you're looking for tutorials on learning Python for scientific purposes, here are some resources you may find useful:\n", 63 | "\n", 64 | "Scipy-lectures is a great introduction to scientific computing in python: https://scipy-lectures.org/index.html I can't recommend this strongly enough, honestly. If you've picked up basic python syntax and looking for a place to start actually applying python, this is a great practical guide to the tools you'll need to know most. It has sections on most of the libraries we'll use today, and I'd recommend browsing through it even if you're an expert.\n", 65 | "\n", 66 | "The (new) official `scipy.ndimage` tutorial is also a good overview of a library we'll be using extensively here: https://docs.scipy.org/doc/scipy/reference/tutorial/ndimage.html However, it's most useful if you're already familiar with the basics of both image processing and numpy/scipy operations. You may find it a bit dense otherwise, but it's full of excellent examples of all of the key functionality in `scipy.ndimage`.\n", 67 | "\n", 68 | "There are many `skimage` tutorials out there, but the gallery is a great place to start: https://scikit-image.org/docs/dev/auto_examples/index.html Image processing operations are often visual, so it's not uncommon to suspect something is easily accomplished but not know the name of the operation. The `skimage` gallery is something of a visual index to many of the operations in the library." 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "Diving in: Identifying Seamounts\n", 76 | "------------------------------------------------\n", 77 | "\n", 78 | "Let's get started with some concrete examples. If you'd like to see where we're going, you can jump straight to the complete example and run it:" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "%matplotlib notebook\n", 88 | "%load examples/seamount_detection.py" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "We're going to try to detect, count, and calculate areas of seamounts based on bathymetry data. Along the way, we'll cover the following image processing concepts:\n", 96 | "\n", 97 | " * Array representation\n", 98 | " * Thresholding\n", 99 | " * Filtering\n", 100 | " * Segmentation\n", 101 | " \n", 102 | "Let's start by loading our data from a geotiff and taking a look at it:" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "%matplotlib notebook\n", 112 | "import numpy as np\n", 113 | "import matplotlib.pyplot as plt\n", 114 | "import rasterio as rio\n", 115 | "\n", 116 | "from context import data\n", 117 | "\n", 118 | "# Let's load data from a geotiff using rasterio...\n", 119 | "with rio.open(data.gebco.seamounts, 'r') as src:\n", 120 | " bathy = src.read(1)\n", 121 | " \n", 122 | "print(bathy)" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "So `bathy` is a 2D array with integer values. The units are meters relative to sea level. Note that more or less everything is negative: This is GEBCO bathymetry data from a of the Western Pacific near the Marianas Trench.\n", 130 | "\n", 131 | "Let's take a look at what this data looks like:" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": null, 137 | "metadata": {}, 138 | "outputs": [], 139 | "source": [ 140 | "fig, ax = plt.subplots()\n", 141 | "im = ax.imshow(bathy, cmap='gray')\n", 142 | "fig.colorbar(im, orientation='horizontal')\n", 143 | "plt.show()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "We have a single array of data that we're displaying as grayscale. Let's go ahead and add some color to that. We'll discuss color in images in more detail later, but this is a good chance to briefly introduce using colormaps in matplotlib:" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "fig, ax = plt.subplots(constrained_layout=True)\n", 160 | "im = ax.imshow(bathy, cmap='Blues_r', vmax=0)\n", 161 | "im.cmap.set_over('green') # Just display any land as green...\n", 162 | "fig.colorbar(im, orientation='horizontal')\n", 163 | "plt.show()" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "Okay, so we're looking at a large number of seamounts rising up above the abyssal plain. It's really obvious where they are visually, but it would be nice to be able to quickly identify them programatically. For example, we might want to look at their distribution by area or volume, or to just get a count without manually counting all of them.\n", 171 | "\n", 172 | "### Thresholding\n", 173 | "\n", 174 | "The simplest approach we could take would be to threshold the bathymetry data. The abyssal plain is usually around 4km depth due to the relatively constant thickness and density of oceanic crust. Therefore, we could try thresholding out anything above 3500 meters. Because this is a numpy array, the operation is quite simple:" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": null, 180 | "metadata": {}, 181 | "outputs": [], 182 | "source": [ 183 | "simple_threshold = bathy > -3500\n", 184 | "print(simple_threshold)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "Note that we have an array of True/False values. This is a boolean array, and some of the operations we'll work with today only operate on these sort of boolean True/False arrays. \n", 192 | "\n", 193 | "Often, in image processing, you'll convert the True/False representation into a 1/0 representation. Behind the scenes, the `True` values above can be efficiently converted into `1` and the `False` values into `0`. For example:" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": null, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "# \"view\" only changes the way we're interepting the underlying data.\n", 203 | "# If you're not familiar with \"view\" vs \"astype\", use \"astype\". All I'm\n", 204 | "# showing here is that it's seamless to go from True/False --> 1/0\n", 205 | "print(simple_threshold.view(np.uint8)) " 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "Okay, enough beating around the bush. Let's take a look at what we've accomplished:" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "fig, axes = plt.subplots(nrows=2, sharex=True, sharey=True, constrained_layout=True)\n", 222 | "axes[0].imshow(bathy, cmap='Blues_r', vmax=0)\n", 223 | "axes[1].imshow(simple_threshold.view(np.uint8))\n", 224 | "\n", 225 | "for ax in axes.flat:\n", 226 | " ax.set(xticks=[], yticks=[])\n", 227 | "\n", 228 | "plt.show()" 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "The yellow regions are `True` in the boolean array. Note that we capture many seamounts out in the abyssal plain, but classify the entire volcanic arc and forearc in the west as a single large seamount. We also miss a lot of smaller seamounts that \n", 236 | "\n", 237 | "Let's take a second to make a fancier display so we can explore what we capture and what we don't. I'm going to use a quick utility included with this tutorial to allow toggling of different overlays on the plot. We'll re-use this throughout the tutorial." 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": null, 243 | "metadata": { 244 | "scrolled": false 245 | }, 246 | "outputs": [], 247 | "source": [ 248 | "from context import utils\n", 249 | "\n", 250 | "fig, ax = plt.subplots()\n", 251 | "ax.imshow(bathy, cmap='Blues_r', vmax=0).cmap.set_over('green') \n", 252 | "\n", 253 | "# We'll mask any False values so that they're transparent\n", 254 | "im = ax.imshow(np.ma.masked_where(~simple_threshold, simple_threshold),\n", 255 | " vmin=0, vmax=1, label='>3500 mbsl')\n", 256 | "\n", 257 | "ax.set(xticks=[], yticks=[])\n", 258 | "\n", 259 | "utils.Toggler(im).show()" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "As you can see, we're doing an okay job of detecting large seamounts, but an awful job of detecting smaller ones and arc volcanoes. This is because we're using a fixed elevation threshold to determine whether or not something is a seamount.\n", 267 | "\n", 268 | "Visually, we'd determine whether or not a pixel is part of a seamount based on the area around it. We're looking for features that rise up from the surrounding topography. However, that \"base level\" of topography varies throughout our study area. Therefore we need a way of finding the \"background\" elevation. Remember that -- we'll come back to it soon.\n", 269 | "\n", 270 | "### Filters\n", 271 | "\n", 272 | "Filters (and convolution, which is verly closely related) are an ubiquitous concept in image processing. Filtering an image is a type of \"moving window\" operation. For each pixel in the image, we take a region around it and apply some operation based on that region to define a new pixel value. Most commonly used filters involve multiplying each pixel in the region by a weight and then summing (i.e. a convoluion). The array of weights is usually [called a kernel](https://en.wikipedia.org/wiki/Kernel_(image_processing)). This allows blurring, sharpening, edge detection, and many other useful operations. \n", 273 | "\n", 274 | "Other filers aren't defined by weights, but by more flexible operations. A simple example of this is a median filter, where the value of the pixel is the median of the pixels in some window surrounding it. To calculate a median, we need to sort all the pixels we're using and find the one in the middle -- it can't be defined by a weighted average. As a result, it can become slow if a large window is used. (Note: in practice, there are some shortcuts to the \"full\" sort, but either way a median filter is slower than filters defined by weights.)\n", 275 | "\n", 276 | "Let's use one of the simplest possible filters: [a uniform filter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.uniform_filter.html). A uniform filter is an average of all pixel values in a square region. It's simple, but it's fast. In practice, it blurs the result:" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "import scipy.ndimage\n", 286 | "\n", 287 | "fig, ax = plt.subplots()\n", 288 | "im = ax.imshow(bathy, cmap='Blues_r', vmax=0)\n", 289 | "\n", 290 | "ax.set(xticks=[], yticks=[])\n", 291 | "\n", 292 | "# Values are width of the square window in _pixels_\n", 293 | "def update(value):\n", 294 | " blurred = scipy.ndimage.uniform_filter(bathy, value)\n", 295 | " im.set_data(blurred)\n", 296 | "\n", 297 | "utils.Slider(ax, 0, 150, update, start=50).show()" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "Okay, so we mentioned this was a moving window. We need to think about what happens at the edges. By default, most functions in `scipy.ndimage` use \"reflect\" boundary conditions. That's perfect for this use case, but it's good to have a look at the other options. The `mode` kwarg controls how boundaries are handled: (this is the same for any function where boundaries matter in `scipy.ndimage`)" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "scipy.ndimage.uniform_filter?" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "### Using filters in seamount detection\n", 321 | "\n", 322 | "Going back to our seamount detection problem, let's use the blurred / uniform-filtered bathymetry to define the \"background\" elevation. Seamounts or other peak-like features will be significantly higher than the background elevation and trenches or other trough-like features will be significantly below it. Therefore, we can identify seamounts by comparing the uniform-filtered bathymetry data to the original bathymetry data. Anything that's more than some amount higher than the background elevation, we'll consider a seamount:" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": { 329 | "scrolled": false 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "# We'll use a simple filter to define the local background elevation and\n", 334 | "# assume anything more than 500m above the background is a seamount\n", 335 | "blurred = scipy.ndimage.uniform_filter(bathy, 150)\n", 336 | "better_threshold = bathy > (blurred + 500)\n", 337 | "\n", 338 | "fig, ax = plt.subplots()\n", 339 | "ax.imshow(bathy, cmap='Blues_r', vmax=0).cmap.set_over('green') \n", 340 | "\n", 341 | "# Compare this to our original 3500m threshold\n", 342 | "im1 = ax.imshow(np.ma.masked_where(~simple_threshold, simple_threshold),\n", 343 | " vmin=0, vmax=1, label='>3500m')\n", 344 | "im2 = ax.imshow(np.ma.masked_where(~better_threshold, better_threshold),\n", 345 | " vmin=0, vmax=1, label='Above Background')\n", 346 | "\n", 347 | "ax.set(xticks=[], yticks=[])\n", 348 | "fig.tight_layout()\n", 349 | "\n", 350 | "utils.Toggler(im1, im2).show()" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "### Cleanup of Thresholded Regions\n", 358 | "\n", 359 | "As you can see, we've done a better job identifying smaller seamounts, and we're no longer identifying the entire volcanic arc as one gigantic seamount.\n", 360 | "\n", 361 | "However, we're also identifying some very tiny features that we'd rather leave out, and the boundaries of each feature are very rough. It would be nice to \"clean up\" our classification so that we have smoother boundaries, holes are filled in, and very small regions are excluded.\n", 362 | "\n", 363 | "To do this, we'll operate on our thresholded boolean array, rather than operating on the bathymetry data directly. \n", 364 | "\n", 365 | "First, let's fill any \"holes\" in our classified areas. Anything that's fully surrounded by something we're considering a seamount is clearly a seamount as well. There aren't many holes in our classification, but if you look around, you can find several small ones. They may not matter much for this case, but it's an extremely common post-processing step in classifications, and one that's very frequently handy to know how to do.\n", 366 | "\n", 367 | "To fill holes, we'll rely on [mathematical morphology](https://en.wikipedia.org/wiki/Mathematical_morphology) (which is a cornerstone of image processing that comes from geology, by the way). Most mathematical morphology operators work on boolean arrays similar to what we have. It provides ways of identifying connected and non-connected regions, buffering, eroding, and similar key operations on classified images. It's straight-forward to identify holes that are surrounded by True values in a boolean array, and `scipy.ndimage` even gives us a one-step function to do it: [`scipy.ndimage.binary_fill_holes`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.binary_fill_holes.html#scipy.ndimage.binary_fill_holes)" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": null, 373 | "metadata": {}, 374 | "outputs": [], 375 | "source": [ 376 | "# Filling holes in our classified regions is straight-forward\n", 377 | "filled = scipy.ndimage.binary_fill_holes(better_threshold)\n", 378 | "\n", 379 | "# And let's compare the results:\n", 380 | "fig, ax = plt.subplots()\n", 381 | "ax.imshow(bathy, cmap='Blues_r', vmax=0).cmap.set_over('green') \n", 382 | "\n", 383 | "im1 = ax.imshow(np.ma.masked_where(~better_threshold, better_threshold),\n", 384 | " vmin=0, vmax=1, label='Original')\n", 385 | "im2 = ax.imshow(np.ma.masked_where(~filled, filled),\n", 386 | " vmin=0, vmax=1, label='Filled')\n", 387 | "\n", 388 | "ax.set(xticks=[], yticks=[])\n", 389 | "fig.tight_layout()\n", 390 | "\n", 391 | "utils.Toggler(im1, im2).show()" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "Next, let's do something a bit more interesting. Let's clean up our classification with a single filter. This will smooth boundaries and eliminate very small features we're not interested in.\n", 399 | "\n", 400 | "Remember the median filter we very briefly talked about earlier? You can also apply a median filter to boolean True/False data. In that case, the pixel will be whatever the majority of pixels in the region around it are. The large a region (a.k.a. kernel) we choose, the smoother/simpler the result will be. \n", 401 | "\n", 402 | "Let's have a look at that and experiment with different filter sizes. We'll use [`scipy.ndimage.median_filter`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.median_filter.html#scipy.ndimage.median_filter) for this.\n", 403 | "\n", 404 | "Quick note: If we had more than one class (e.g. say this was a land use classification with water, urban, forest, and agriculture classes), the equivalent \"cleanup\" operator would be a [majority filter](https://scikit-image.org/docs/stable/api/skimage.filters.rank.html#majority). For a boolean True/False array, a median filter is equivalent to a majority filter.\n", 405 | "\n", 406 | "This will take a bit to run. A median filter is a much slower operation than the other filters we've applied so far..." 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": null, 412 | "metadata": {}, 413 | "outputs": [], 414 | "source": [ 415 | "widths = [3, 5, 11, 21]\n", 416 | "versions = ([filled] +\n", 417 | " [scipy.ndimage.median_filter(filled, x) for x in widths])\n", 418 | "\n", 419 | "fig, ax = plt.subplots()\n", 420 | "ax.imshow(bathy, cmap='Blues_r', vmax=0)\n", 421 | "\n", 422 | "ims = []\n", 423 | "for width, version in zip([0] + widths, versions):\n", 424 | " transparent = np.ma.masked_equal(version.view(np.uint8), 0)\n", 425 | " im = ax.imshow(transparent, vmin=0, vmax=1, label=f\"{width}x{width}\")\n", 426 | " ims.append(im)\n", 427 | "\n", 428 | "ax.set(xticks=[], yticks=[])\n", 429 | "fig.tight_layout()\n", 430 | "\n", 431 | "utils.Toggler(*ims).show()" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "Let's choose a final width and we'll fill holes again, as the median filter can introduce _new_ holes into the result:" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "metadata": {}, 445 | "outputs": [], 446 | "source": [ 447 | "cleaned = scipy.ndimage.median_filter(better_threshold, 13)\n", 448 | "cleaned = scipy.ndimage.binary_fill_holes(cleaned)" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "Okay, so we've simplified things reasonably well, but we still have a single True/False array. \n", 456 | "\n", 457 | "### Identifying individual features\n", 458 | "\n", 459 | "Originally we said we were going to count the seamounts and calculate areas. How do we go from a bunch of True/False regions to a count? The answer is to use mathematical morphology again. We can identify and label each distinct group of `True` pixels that touch each other but do not touch any other `True` pixels. In `scipy.ndimage`, this is the [`label`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.label.html#scipy.ndimage.label) function. It assigns a unique value to each group of pixels that touch each other:" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": null, 465 | "metadata": {}, 466 | "outputs": [], 467 | "source": [ 468 | "labels, count = scipy.ndimage.label(cleaned)\n", 469 | "\n", 470 | "fig, ax = plt.subplots(constrained_layout=True)\n", 471 | "ax.imshow(bathy, cmap='Blues_r', vmax=0)\n", 472 | "ax.imshow(np.ma.masked_equal(labels, 0), cmap='tab20b')\n", 473 | "\n", 474 | "ax.set(xticks=[], yticks=[],\n", 475 | " title=f'There are {count} different seamounts!')\n", 476 | "plt.show()" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "Okay, great! `labels` is now an array where each separate region has a unqiue value. The colormap doesn't quite show it, but if you hover over each seamount, you'll see that the value is different.\n", 484 | "\n", 485 | "Remember we mentioned calculating area? Let's go ahead and get a pixel count for each seamount. We could do something like:\n", 486 | "\n", 487 | "```\n", 488 | "seamount_area = []\n", 489 | "for i in range(1, count + 1):\n", 490 | " seamount_area.append((labels == i).sum())\n", 491 | "```\n", 492 | "\n", 493 | "However, that's a bit inefficient in this case, and `scipy.ndimage` has a builtin function to do similar \"zonal statistics\", so let's go ahead and use [`scipy.ndimage.sum`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.sum.html) to get a pixel count for each individual seamount and then plot a histogram:" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": null, 499 | "metadata": { 500 | "scrolled": false 501 | }, 502 | "outputs": [], 503 | "source": [ 504 | "pixel_counts = scipy.ndimage.sum(cleaned, labels, np.arange(count)+1)\n", 505 | "\n", 506 | "fig, ax = plt.subplots(constrained_layout=True)\n", 507 | "ax.hist(pixel_counts, bins=40)\n", 508 | "ax.set(ylabel='Number of seamounts', xlabel='Area in pixels')\n", 509 | "plt.show()" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "Now we have a histogram, but what the heck does \"area in pixels\" mean? Units matter!\n", 517 | "\n", 518 | "### Calculating Area (for a geographic WGS84 raster)\n", 519 | "\n", 520 | "Let's convert that to something more sensible. We're working with data projected in geographic WGS84 in this case. We can find the cellsize of the pixels, but it's in degrees. Because it's in degrees, the area of the pixel varies by latitude. \n", 521 | "\n", 522 | "Normally, we'd just multiply by a constant factor to convert pixel counts to area, but because we're working in degrees, the area varies depending on location.\n", 523 | "\n", 524 | "We can very closely approximate the true area with a simple conversion. A degree of latitude is a constant ~111.3 km. A degree of longitude is ~111.3 km at the equator. At other latitudes, multiply by the cosine of the latitude to get the width of a degree of longitude. If you need more precision, you can reproject into a proper projection, but this is precise enough for most purposes.\n", 525 | "\n", 526 | "Let's use `rasterio` to get the latitude and longitude of each pixel and use that to calculate the area of each pixel, then we'll sum over each seamount's region to get their total area." 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": null, 532 | "metadata": {}, 533 | "outputs": [], 534 | "source": [ 535 | "# Calculate area for each pixel (memory-inefficient, but fine for now)\n", 536 | "i, j = np.mgrid[:bathy.shape[0], :bathy.shape[1]]\n", 537 | "with rio.open(data.gebco.seamounts, 'r') as src:\n", 538 | " cellsize = src.transform.a # Only because pixels are square an north-south\n", 539 | " lon, lat = src.xy(j, i)\n", 540 | "area = (cellsize * 111.3)**2 * np.cos(np.radians(lat))\n", 541 | "\n", 542 | "areas = scipy.ndimage.sum(area, labels, np.arange(count)+1)\n", 543 | "\n", 544 | "fig, ax = plt.subplots(constrained_layout=True)\n", 545 | "ax.hist(areas, bins=40)\n", 546 | "ax.set(ylabel='Number of seamounts', xlabel='Area in $km^2$')\n", 547 | "plt.show()" 548 | ] 549 | }, 550 | { 551 | "cell_type": "markdown", 552 | "metadata": {}, 553 | "source": [ 554 | "Quick aside... Note the shape of the histogram. It's very similar to a [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution) (i.e. see the long tail on the right hand side). This is very common in any natural process that involves areas or volumes. As an arm-wavy explanation: If you have many independent normally distributed variables (say, width and height) and you multiply them together, the resulting distribution will be approximately log-normal. (We don't actually expect seamount size to be as simple as width x height in this case, but still, expect log-normal distributions instead of normal anytime you start working with areas or volumes.) Be careful about the way you interpet statistics like standard deviation or mode anytime area or volume pops up.\n", 555 | "\n", 556 | "### But I Need to Use This Data Elsewhere...\n", 557 | "\n", 558 | "I don't want to spend too long talking about geopatial data formats and I/O. However, it's the natural next step to ask. Therefore, I've included an example that both saves what we've done as a geotiff and vectorizes our regions and saves them as a shapefile:" 559 | ] 560 | }, 561 | { 562 | "cell_type": "code", 563 | "execution_count": null, 564 | "metadata": {}, 565 | "outputs": [], 566 | "source": [ 567 | "%load examples/seamount_detection_saving.py" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": {}, 573 | "source": [ 574 | "### Review\n", 575 | "\n", 576 | "Okay, let's quickly review the steps we took and have a look at a complete, stand-alone example. To detect seamounts, we did the following:\n", 577 | "\n", 578 | " 1. Load bathymetry data into an array\n", 579 | " 2. Tried thresholding based on a constant elevation (didn't work well)\n", 580 | " 3. Estimated background elevations with a local average (uniform filter)\n", 581 | " 4. Detected seamounts as being regions more than 500m above the background (worked well)\n", 582 | " 5. Cleaned up our detection to remove small regions and have smoother boundaries (median filter)\n", 583 | " 6. Filled in any holes in our detected seamounts\n", 584 | " 7. Separated each connected region of pixels into a unique seamount (i.e. count how many)\n", 585 | " 8. Calculated the area distribution of seamounts" 586 | ] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": null, 591 | "metadata": {}, 592 | "outputs": [], 593 | "source": [ 594 | "%load examples/seamount_detection.py" 595 | ] 596 | }, 597 | { 598 | "cell_type": "markdown", 599 | "metadata": {}, 600 | "source": [ 601 | "Take-home Question:\n", 602 | "-------------------------------\n", 603 | "\n", 604 | "If you're comfortable with what we've done so far, here's a short challenge for you to try later:\n", 605 | "\n", 606 | "Some of these seamounts the surface and form islands. We usually don't consider a feature to be a seamount if a part of it reaches the surface. **Can you exclude all seamounts that reach the surface?**\n", 607 | "\n", 608 | "(Hint: This mostly needs numpy boolean indexing. There's also another zonal statistics function in [`scipy.ndimage`](https://docs.scipy.org/doc/scipy/reference/ndimage.html#measurements) that you might find useful, but there's definitely more than one way to accomlish this.)\n", 609 | "\n", 610 | "If you're very familiar with numpy alread and that seems too obvious, try some of the [mathematical morpohology operations](https://docs.scipy.org/doc/scipy/reference/ndimage.html#morphology) we didn't talk about. How would you join together seamounts that are within some distance of X pixels of each other so that they're considered a single connected region? (Hint: The most efficient way will change the shape of the regions slightly. There's more than one way to approach it, though.)\n", 611 | "\n", 612 | "There are example answers in the `examples` folder for both of these if you get stuck. However, there's very much more than one right way to do it, so I'd encourage experimentation!" 613 | ] 614 | }, 615 | { 616 | "cell_type": "markdown", 617 | "metadata": {}, 618 | "source": [ 619 | "Break!\n", 620 | "----------\n", 621 | "\n", 622 | "Let's take five minutes. My voice needs a break!\n", 623 | "\n", 624 | "Next Section\n", 625 | "------------------\n", 626 | "\n", 627 | "We'll move on to talking about gradients in images next. We'll start out with simple slope and hillshade calculations, but then use the same methods to identify separate grains in thin section images.\n", 628 | "\n", 629 | "[02 - Gradients and Edges](./02%20-%20Gradients%20and%20Edges.ipynb)" 630 | ] 631 | } 632 | ], 633 | "metadata": { 634 | "kernelspec": { 635 | "display_name": "Python 3", 636 | "language": "python", 637 | "name": "python3" 638 | }, 639 | "language_info": { 640 | "codemirror_mode": { 641 | "name": "ipython", 642 | "version": 3 643 | }, 644 | "file_extension": ".py", 645 | "mimetype": "text/x-python", 646 | "name": "python", 647 | "nbconvert_exporter": "python", 648 | "pygments_lexer": "ipython3", 649 | "version": "3.8.3" 650 | } 651 | }, 652 | "nbformat": 4, 653 | "nbformat_minor": 4 654 | } 655 | -------------------------------------------------------------------------------- /02 - Gradients and Edges.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Gradients, Edges, and Lineaments\n", 8 | "---------------------------------------------------\n", 9 | "\n", 10 | "Welcome back!\n", 11 | "\n", 12 | "In this section, we're going to explore image gradients, edge detectors, and lineament analysis. In other words, we're going to try to start detecting features in images that aren't defined by the raw value of the image. \n", 13 | "\n", 14 | "A lot of what we're interested in in geology is often defined by small local changes in our input data. We're often using local \"texture\" as well as overall value/color to make a determination about what's present. There's a wealth of different methods to investigate, but gradients are a good place to start.\n", 15 | "\n", 16 | "### Calculate Slope and Hillshade\n", 17 | "\n", 18 | "Let's start by going back to the same bathymetry data we were working with before. Sure, it's kinda boring data, but it's easy to reason about, and doesn't require much specalized knowledge to interpret.\n", 19 | "\n", 20 | "Let's look at the local image gradients (i.e. the partial derivates of the surface in x/y). To be a full gradient, we'd need to include information about the cellsize, but we'll ignore it for now. For just a minute, let's ignore units..." 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": null, 26 | "metadata": { 27 | "scrolled": false 28 | }, 29 | "outputs": [], 30 | "source": [ 31 | "%matplotlib notebook\n", 32 | "import numpy as np\n", 33 | "import matplotlib.pyplot as plt\n", 34 | "import rasterio as rio\n", 35 | "\n", 36 | "from context import data\n", 37 | "from context import utils\n", 38 | "\n", 39 | "# Let's load data from a geotiff using rasterio...\n", 40 | "with rio.open(data.gebco.seamounts, 'r') as src:\n", 41 | " bathy = src.read(1)\n", 42 | "\n", 43 | "# Simple differences in each direction\n", 44 | "dy, dx = np.gradient(bathy)\n", 45 | "\n", 46 | "# And let's compare the gradient to the image\n", 47 | "fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)\n", 48 | "axes[0].imshow(bathy, cmap='Blues_r', vmax=0).cmap.set_over('green') \n", 49 | "axes[1].imshow(dy, vmin=-100, vmax=100, cmap='coolwarm')\n", 50 | "im = axes[2].imshow(dx, vmin=-100, vmax=100, cmap='coolwarm')\n", 51 | "\n", 52 | "# Set up a single shared colorbar.\n", 53 | "cax = fig.add_axes([0.9, 0.3, 0.02, 0.4])\n", 54 | "cbar = fig.colorbar(im, cax=cax, label='Gradient')\n", 55 | "cbar.ax.yaxis.set_label_position('left')\n", 56 | "\n", 57 | "for label, ax in zip(['Data', 'DY', 'DX'], axes):\n", 58 | " ax.set(ylabel=label, xticks=[], yticks=[])\n", 59 | "plt.show()" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "Okay, great. Rate of change in each direction. By itself, that's not too interesting. However, let's take a look at the magnitude of these gradients. This is basically slope (though, again, we're ignoring units for now)." 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": null, 72 | "metadata": { 73 | "scrolled": false 74 | }, 75 | "outputs": [], 76 | "source": [ 77 | "# This is essentially slope with different units.\n", 78 | "gradient_magnitude = np.hypot(dx, dy)\n", 79 | "\n", 80 | "# And let's compare the gradient to the image\n", 81 | "fig, axes = plt.subplots(nrows=2, sharex=True, sharey=True)\n", 82 | "axes[0].imshow(bathy, cmap='Blues_r', vmax=0).cmap.set_over('green') \n", 83 | "im = axes[1].imshow(gradient_magnitude, vmin=0, vmax=200, cmap='gray_r')\n", 84 | "\n", 85 | "cax = fig.add_axes([0.9, 0.3, 0.02, 0.4])\n", 86 | "cbar = fig.colorbar(im, cax=cax, label='Gradient Magnitude')\n", 87 | "cbar.ax.yaxis.set_label_position('left')\n", 88 | "\n", 89 | "for ax in axes:\n", 90 | " ax.set(xticks=[], yticks=[])\n", 91 | "plt.show()" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "Now let's go ahead and do a quick slope calculation just to be able to compare:" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [ 107 | "# First we need to get the cellsize so that we know the \"run\" in \"rise over run\"\n", 108 | "with rio.open(data.gebco.seamounts, 'r') as src:\n", 109 | " # Assumes cells are square and north-south aligned\n", 110 | " cellsize_deg = src.transform.a \n", 111 | "\n", 112 | "# This actually varies from north to south, but we'll ignore that for now\n", 113 | "cellsize_m = 111.3 * 1000 * cellsize_deg\n", 114 | "\n", 115 | "# And now a slope calculation - Basically inverse tangent of gradient magnitude\n", 116 | "dy_m, dx_m = np.gradient(bathy, cellsize_m, cellsize_m)\n", 117 | "slope = np.degrees(np.arctan(np.hypot(dy_m, dx_m)))\n", 118 | "\n", 119 | "# Quick comparison plot...\n", 120 | "fig, axes = plt.subplots(nrows=2, sharex=True, sharey=True)\n", 121 | "im1 = axes[0].imshow(gradient_magnitude, vmin=0, vmax=200, cmap='gray_r')\n", 122 | "im2 = axes[1].imshow(slope, vmin=0, vmax=30, cmap='viridis')\n", 123 | "fig.colorbar(im1, ax=axes[0])\n", 124 | "fig.colorbar(im2, ax=axes[1])\n", 125 | "axes[0].set(xticks=[], yticks=[], ylabel='Gradient Magnitude')\n", 126 | "axes[1].set(xticks=[], yticks=[], ylabel='Slope')\n", 127 | "plt.show()" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "You might have noticed that the gradient magnitude image is displayed with a reversed gray colormap so that high values are black. This is deliberate: It makes it visually similar to a hillshade, and it helps to treat the high gradient areas as shadows. However, given that this is geological image processing, we'd be remiss not to talk about hillshade. It's a different filter, but one that's commonly applied to non-topographic data in geology. It's a great way of visually highlighting small changes in an otherwise smoothly varying surface. As a result, those of us in the geosciences tend to use it as a visualization technique on all sorts of data.\n", 135 | "\n", 136 | "Let's compare gradient magnitude and hillshade. They're quite different, despite the visual similarity of the gradient magnitude visualization to shadows. Hillshade highlights smaller features, but is therefore much noiser: (Note plot below toggles layers -- note the controls below/on the plot. The default view will be the bathymetry data.)" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "def hillshade(data, azdeg=315, altdeg=45, ve=1, cellsize=1):\n", 146 | " \"\"\"Don't actually use this - Just showing that it's straightforward.\"\"\"\n", 147 | " # Using trig here, but this is the dot product of the illumination vector \n", 148 | " # with the normal vector of the surface at each point\n", 149 | " az = np.radians(90 - azdeg)\n", 150 | " alt = np.radians(altdeg)\n", 151 | " dy, dx = np.gradient(ve * -data, -cellsize, cellsize)\n", 152 | " aspect = np.arctan2(dy, dx)\n", 153 | " slope = 0.5 * np.pi - np.arctan(np.hypot(dx, dy))\n", 154 | " intensity = (np.sin(alt) * np.sin(slope) +\n", 155 | " np.cos(alt) * np.cos(slope) * np.cos(az - aspect))\n", 156 | " return intensity\n", 157 | "\n", 158 | "fig, ax = plt.subplots()\n", 159 | "ax.imshow(bathy, cmap='Blues_r', vmax=0).cmap.set_over('green')\n", 160 | "im1 = ax.imshow(gradient_magnitude, vmin=0, vmax=200, cmap='gray_r',\n", 161 | " label='Grad. Mag.')\n", 162 | "im2 = ax.imshow(hillshade(bathy), vmin=-1, vmax=1, cmap='gray',\n", 163 | " label='Hillshade')\n", 164 | "\n", 165 | "ax.set(xticks=[], yticks=[])\n", 166 | "utils.Toggler(im1, im2).show()" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "Hillshade makes it difficult to see the long-wavelength / large-scale changes in the underlying data. For that reason, it's most commonly combined with a colormapped version of the underlying data to produce a visualization that shows both large-scale changes and fine detail. You've undoubtedly seen this a gazillion times, but it's very useful:" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": null, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "from matplotlib.colors import LightSource\n", 183 | "\n", 184 | "# We'll use a very slightly downsampled version to keep it fast\n", 185 | "# That's the reason for bathy[::2, ::2]\n", 186 | "# Also note that we're re-using \"cellsize_m\" from earlier...\n", 187 | "ls = LightSource(azdeg=315, altdeg=45)\n", 188 | "rgb = ls.shade(bathy[::2, ::2], cmap=plt.get_cmap('Blues_r'), \n", 189 | " blend_mode='soft', dx=cellsize_m, dy=cellsize_m, \n", 190 | " vert_exag=5)\n", 191 | "\n", 192 | "fig, ax = plt.subplots(constrained_layout=True)\n", 193 | "ax.imshow(rgb)\n", 194 | "ax.set(xticks=[], yticks=[], title='Shaded Bathymetry')\n", 195 | "plt.show()" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "### Edge Detection Filters\n", 203 | "\n", 204 | "Okay, let's go back to our earlier gradient magnitude plot for a bit:" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": null, 210 | "metadata": {}, 211 | "outputs": [], 212 | "source": [ 213 | "fig, axes = plt.subplots(nrows=2, sharex=True, sharey=True)\n", 214 | "axes[0].imshow(bathy, cmap='Blues_r', vmax=0).cmap.set_over('green') \n", 215 | "im = axes[1].imshow(gradient_magnitude, vmin=0, vmax=200, cmap='gray_r')\n", 216 | "\n", 217 | "cax = fig.add_axes([0.9, 0.3, 0.02, 0.4])\n", 218 | "cbar = fig.colorbar(im, cax=cax, label='Gradient Magnitude')\n", 219 | "cbar.ax.yaxis.set_label_position('left')\n", 220 | "\n", 221 | "for ax in axes:\n", 222 | " ax.set(xticks=[], yticks=[])\n", 223 | "\n", 224 | "plt.show()" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "As you can see, the steep edges of each seamount are highlighted in black.\n", 232 | "\n", 233 | "A term you'll hear frequently in image processing is [\"edge detection\"](https://en.wikipedia.org/wiki/Edge_detection). The simplest and best known of these methods is the [Sobel filter](https://en.wikipedia.org/wiki/Sobel_operator). It's almost exactly identical to the gradient calculations we've been using so far. Let's take a closer look:" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "# The near-equivalent of `np.gradient` is a \"sobel filter\" in image processing terms\n", 243 | "import scipy.ndimage \n", 244 | "\n", 245 | "# The divide by 8 here is because the absolute value of the sobel kernel sums to 8.\n", 246 | "# It's not _exactly_ the same as np.gradient, but it's _very_ close. Think of it as\n", 247 | "# a unitless, more efficient np.gradient with a little more averaging.\n", 248 | "correction = 8\n", 249 | "sobel_dy = scipy.ndimage.sobel(bathy.astype(int), axis=0) / correction\n", 250 | "sobel_dx = scipy.ndimage.sobel(bathy.astype(int), axis=1) / correction\n", 251 | "sobel_grad_mag = scipy.ndimage.generic_gradient_magnitude(bathy.astype(int), \n", 252 | " scipy.ndimage.sobel) / correction\n", 253 | "\n", 254 | "# ---------------------------------------------------------------------------------\n", 255 | "# Now let's make a fancy figure that just shows that they're near-identical...\n", 256 | "fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(7, 7), sharex=True, sharey=True)\n", 257 | "\n", 258 | "im1 = axes[0,0].imshow(dy, vmin=-100, vmax=100, cmap='coolwarm')\n", 259 | "axes[1,0].imshow(dx, vmin=-100, vmax=100, cmap='coolwarm')\n", 260 | "im2 = axes[2,0].imshow(gradient_magnitude, vmin=0, vmax=200, cmap='gray_r')\n", 261 | "\n", 262 | "axes[0,1].imshow(sobel_dy, vmin=-100, vmax=100, cmap='coolwarm')\n", 263 | "axes[1,1].imshow(sobel_dx, vmin=-100, vmax=100, cmap='coolwarm')\n", 264 | "axes[2,1].imshow(sobel_grad_mag, vmin=0, vmax=200, cmap='gray_r')\n", 265 | "\n", 266 | "for ax in axes.flat:\n", 267 | " ax.set(xticks=[], yticks=[])\n", 268 | "axes[0,0].set(title='np.gradient')\n", 269 | "axes[0,1].set(title='scipy.ndimage.sobel')\n", 270 | "\n", 271 | "# Right hand colorbar\n", 272 | "cax = fig.add_axes([0.85, 0.3, 0.02, 0.4])\n", 273 | "cbar = fig.colorbar(im1, cax=cax, label='Gradient')\n", 274 | "cbar.ax.yaxis.set_label_position('left')\n", 275 | "\n", 276 | "# Left hand colorbar\n", 277 | "cax = fig.add_axes([0.08, 0.3, 0.02, 0.4])\n", 278 | "cbar = fig.colorbar(im2, cax=cax, label='Gradient Magnitude')\n", 279 | "cbar.ax.yaxis.tick_left()\n", 280 | "cbar.ax.yaxis.set_label_position('right')\n", 281 | "cbar.ax.yaxis.label.set(rotation=-90, va='bottom')\n", 282 | "\n", 283 | "fig.subplots_adjust(left=0.15, right=0.8)\n", 284 | "plt.show()\n" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "They're identical for a lot of use cases other than one being 8x more than the other. However, note the slight differences even after we correct for the factor-of-8 difference between the two.\n", 292 | "\n", 293 | "Remember filters and kernels? This is a type of filter, and the difference is because `np.gradient` kernel in the x-direction is basically `[[-0.5, 0, 0.5]]`, and the sobel kernel in the x-direction is:\n", 294 | " \n", 295 | " [[-1, 0, 1],\n", 296 | " [-2, 0, 2],\n", 297 | " [-1, 0, 1]]\n", 298 | " \n", 299 | "In practice, the only difference is scaling (factor of 8) and a bit more averaging over nearby pixels. They're essentially the same thing.\n", 300 | "\n", 301 | "Regardless, use `np.gradient` when you care about the real-world units (e.g. a slope calculation) and use as sobel filter (or other image processing edge filter) when you only care about relative differences.\n", 302 | "\n", 303 | "Okay, we're spending a lot of time on gradients, but let's briefly cover one more. For many images, it's useful to have a slightly smoother gradient. You could blur the input image slightly and then take the gradient, but you can accomplish the same thing in fewer steps with a single filter. A common smoothing/blurring filter is a guassian filter. You use the derivative of a guassian function as a filter to calulate the effect of filtering with a guassian" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": null, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [ 312 | "sigma = 5\n", 313 | "gauss_dy = scipy.ndimage.gaussian_filter1d(bathy.astype(int), sigma, axis=0, order=1)\n", 314 | "gauss_dx = scipy.ndimage.gaussian_filter1d(bathy.astype(int), sigma, axis=1, order=1)\n", 315 | "gauss_grad_mag = scipy.ndimage.gaussian_gradient_magnitude(bathy.astype(int), sigma)\n", 316 | "\n", 317 | "# ---------------------------------------------------------------------------------\n", 318 | "# Now let's make a fancy figure that just shows that they're near-identical...\n", 319 | "fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(7, 7), sharex=True, sharey=True)\n", 320 | "\n", 321 | "axes[0,0].imshow(dy, vmin=-100, vmax=100, cmap='coolwarm')\n", 322 | "axes[1,0].imshow(dx, vmin=-100, vmax=100, cmap='coolwarm')\n", 323 | "axes[2,0].imshow(gradient_magnitude, vmin=0, vmax=200, cmap='gray_r')\n", 324 | "\n", 325 | "im_dy = axes[0,1].imshow(gauss_dy, vmin=-100, vmax=100, cmap='coolwarm')\n", 326 | "im_dx = axes[1,1].imshow(gauss_dx, vmin=-100, vmax=100, cmap='coolwarm')\n", 327 | "im_mag = axes[2,1].imshow(gauss_grad_mag, vmin=0, vmax=200, cmap='gray_r')\n", 328 | "\n", 329 | "# Customize ticks and labels\n", 330 | "for ax in axes.flat:\n", 331 | " ax.set(xticks=[], yticks=[])\n", 332 | "axes[0,0].set(title='np.gradient')\n", 333 | "axes[0,1].set(title='Gaussian')\n", 334 | "\n", 335 | "# Right hand colorbar\n", 336 | "cax = fig.add_axes([0.85, 0.3, 0.02, 0.4])\n", 337 | "cbar = fig.colorbar(im_dx, cax=cax, label='Gradient')\n", 338 | "cbar.ax.yaxis.set_label_position('left')\n", 339 | "\n", 340 | "# Left hand colorbar\n", 341 | "cax = fig.add_axes([0.08, 0.3, 0.02, 0.4])\n", 342 | "cbar = fig.colorbar(im_mag, cax=cax, label='Gradient Magnitude')\n", 343 | "cbar.ax.yaxis.tick_left()\n", 344 | "cbar.ax.yaxis.set_label_position('right')\n", 345 | "cbar.ax.yaxis.label.set(rotation=-90, va='bottom')\n", 346 | "\n", 347 | "fig.subplots_adjust(left=0.15, right=0.8)\n", 348 | "\n", 349 | "plt.show()\n" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "`sigma` is analagous to the standard deviation of a normal distribution's \"bell curve\". The units are pixels, but the kernel in the filter is larger than `sigma`. Regardless, increasing it leads to more smoothing. To get a sense of how `sigma` varies, here's a quick interactive plot:" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [ 365 | "fig, ax = plt.subplots(constrained_layout=True)\n", 366 | "im = ax.imshow(gauss_grad_mag, vmin=0, vmax=200, cmap='gray_r')\n", 367 | "ax.set(xticks=[], yticks=[])\n", 368 | "\n", 369 | "def update(sigma):\n", 370 | " if sigma == 0:\n", 371 | " grad = gradient_magnitude\n", 372 | " else:\n", 373 | " grad = scipy.ndimage.gaussian_gradient_magnitude(bathy.astype(float), sigma)\n", 374 | " im.set_data(grad)\n", 375 | "\n", 376 | "# Note: This slider is fixed to integer intervals, but `sigma` doesn't have to be\n", 377 | "utils.Slider(ax, 0, 20, update, start=0).show()" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "As you can see, this is great for identifying large scale changes. Edge filters tend to enhance noise, and a gaussian gradient magnitude filter gives you a flexible way of supressing some noise. It's often applied to photographics and other \"non-smooth\" image data for that reason. We'll come back to it later, but for now, let's start to actually use these edge detection filters for something...\n", 385 | "\n", 386 | "### Finding the Toe of Slope\n", 387 | "\n", 388 | "We've spent _a lot_ of time on seamounts so far, but let's do one more example with them. Each seamount is surrounded by a talus cone that's much larger than the seamount itself. It's often useful to be able to identify the subtle end of these sort of features. In common terms, we're looking for the toe of the slope. However, the same idea shows up in many other applications. Similarly, we might want to find the base of the steep cliffs surrounding the seamounts, regardless of what exact depth it's at.\n", 389 | "\n", 390 | "Gradients are first derivates. What about second derivates? Let's look at curvature. We'll only explore one filter for curvature: A Gaussian Laplace filter. First derivates are noisy, but second derivates are even more so. As a result, the noise supression inherent in the gaussian derivatives we just talked about is _very_ useful in second derivative.\n", 391 | "\n", 392 | "Why second derivates? They're good at finding maximum convexity or maximum concavity, which is what we're looking for in a \"toe of slope\" or \"base of cliff\" measurement." 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": null, 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [ 401 | "sigma = 5\n", 402 | "gauss_laplace = scipy.ndimage.gaussian_laplace(bathy.astype(float), sigma)\n", 403 | "\n", 404 | "fig, ax = plt.subplots()\n", 405 | "\n", 406 | "im = ax.imshow(gauss_laplace, cmap='coolwarm', vmin=-10, vmax=10, label='Laplace')\n", 407 | "ax.imshow(rgb, extent=im.get_extent(), zorder=-1) # Re-using our hillshaded plot earlier...\n", 408 | "\n", 409 | "ax.set(xticks=[], yticks=[])\n", 410 | "utils.Toggler(im).show()" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": null, 416 | "metadata": {}, 417 | "outputs": [], 418 | "source": [ 419 | "# Quickly re-run our seamount detection and buffer it by 5 pixels\n", 420 | "background = scipy.ndimage.uniform_filter(bathy, 120)\n", 421 | "seamounts = bathy > (background + 500)\n", 422 | "seamounts = scipy.ndimage.binary_dilation(seamounts, iterations=5)\n", 423 | "\n", 424 | "# We want only convex edges, not concave (Negative values are concave)\n", 425 | "convexity_thresh = 2\n", 426 | "convex = gauss_laplace > convexity_thresh\n", 427 | "convex[seamounts] = False\n", 428 | "\n", 429 | "fig, ax = plt.subplots(constrained_layout=True)\n", 430 | "im = ax.imshow(np.ma.masked_equal(convex, 0), vmin=0, interpolation='nearest')\n", 431 | "ax.imshow(rgb, extent=im.get_extent(), zorder=-1)\n", 432 | "ax.set(xticks=[], yticks=[])\n", 433 | "plt.show()" 434 | ] 435 | }, 436 | { 437 | "cell_type": "markdown", 438 | "metadata": {}, 439 | "source": [ 440 | "Now let's turn those regions into more discrete lines. \n", 441 | "\n", 442 | "To do this, we'll use a \"skeltonization\" operation (a.k.a. a medial transform). This is also the first time we'll depart from `scipy.ndimage` and start diving into scikit image. Up until now, all of these operations are N-dimensional and could be preformed on volumes as well as 2D images. We'll start to depart from that now..." 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": null, 448 | "metadata": {}, 449 | "outputs": [], 450 | "source": [ 451 | "import skimage.morphology\n", 452 | "\n", 453 | "linear_toe_of_slope = skimage.morphology.skeletonize(convex)\n", 454 | "\n", 455 | "fig, ax = plt.subplots(constrained_layout=True)\n", 456 | "im = ax.imshow(np.ma.masked_equal(linear_toe_of_slope, 0), \n", 457 | " vmin=0, interpolation='nearest')\n", 458 | "ax.imshow(rgb, extent=im.get_extent(), zorder=-1)\n", 459 | "ax.set(xticks=[], yticks=[])\n", 460 | "plt.show()" 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": {}, 466 | "source": [ 467 | "Hey, not bad! It can definitely be improved upon, but for a few minutes work, we've done a relatively okay job at defining something that's actually fairly tricky to get right.\n", 468 | "\n", 469 | "### Take Home Challenge\n", 470 | "\n", 471 | "Let's keep this one simple and hard all at the same time... **Can you tweak the parameters/methods we used to find toe of slope to do a better job?** What parameters would you try adjusting first? Can you get a more continuous result? Can you exclude false positives away from seamounts?" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "### Next Section\n", 479 | "\n", 480 | "We'll look at lineaments and grain orientation analysis next:\n", 481 | "[03 - Orientation Analysis](./03%20-%20Orientation%20Analysis.ipynb)" 482 | ] 483 | } 484 | ], 485 | "metadata": { 486 | "kernelspec": { 487 | "display_name": "Python 3", 488 | "language": "python", 489 | "name": "python3" 490 | }, 491 | "language_info": { 492 | "codemirror_mode": { 493 | "name": "ipython", 494 | "version": 3 495 | }, 496 | "file_extension": ".py", 497 | "mimetype": "text/x-python", 498 | "name": "python", 499 | "nbconvert_exporter": "python", 500 | "pygments_lexer": "ipython3", 501 | "version": "3.8.3" 502 | } 503 | }, 504 | "nbformat": 4, 505 | "nbformat_minor": 4 506 | } 507 | -------------------------------------------------------------------------------- /03 - Orientation Analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Lineament Analysis\n", 8 | "\n", 9 | "Let's take what we've learned about edge filters and start to do something more interesting...\n", 10 | "\n", 11 | "Many of us have spent time trying to interpret, digitize, and analyze orientations of linear features that are visible in data of some sort. There are very good reasons why it's often done by hand, but sometimes it's possible to automate.\n", 12 | "\n", 13 | "Let's take a look at some aerial photography data from near Arches National Park in Utah, USA. There are prominent linear features (joints/fractures, in this case) visible in the imagery. Ideally, we want a rose diagram of them without manually digitizing each one:" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": null, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "%matplotlib notebook\n", 23 | "import numpy as np\n", 24 | "import matplotlib.pyplot as plt\n", 25 | "import rasterio as rio\n", 26 | "import scipy.ndimage\n", 27 | "\n", 28 | "from context import data\n", 29 | "from context import utils\n", 30 | "\n", 31 | "with rio.open(data.naip.lineaments, 'r') as src:\n", 32 | " aerial_photo = src.read()\n", 33 | "\n", 34 | "# Rasterio has a \"bands-on-first-axis\" convention, matplotlib/etc has\n", 35 | "# a \"bands-on-last-axis\" convention. Use moveaxis to switch between.\n", 36 | "aerial_photo = np.moveaxis(aerial_photo, 0, -1)\n", 37 | "\n", 38 | "fig, ax = plt.subplots(constrained_layout=True)\n", 39 | "ax.imshow(aerial_photo)\n", 40 | "ax.set(xticks=[], yticks=[])\n", 41 | "plt.show()" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "For this type of analysis, we often don't need color information. It's easiest to leave it out. We'll analyze grayscale data instead. Let's use a simple average of RGB values:" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "gray_aerial = aerial_photo.astype(float).mean(axis=-1)\n", 58 | "\n", 59 | "fig, ax = plt.subplots(constrained_layout=True)\n", 60 | "ax.imshow(gray_aerial, cmap='gray')\n", 61 | "ax.set(xticks=[], yticks=[])\n", 62 | "plt.show()" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "Our lineaments are basically edges. Remember that we said gradient magnitude is a type of edge detector? Let's go ahead and look at the raw gradient magnitude. We don't care about absolute values at all in this case, so let's use a Sobel filter.\n", 70 | "\n", 71 | "Last time we used `scipy.ndimage.generic_gradient_magnitude` and `scipy.ndimage.sobel`. However, that's a bit verbose and we don't care about the separate X and Y components, so let's use scikit-image's Sobel filter method instead, which is a bit simpler. We'll use the \"toggler\" again so that we can easily compare it to the original imagery." 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [ 80 | "import skimage.filters\n", 81 | "\n", 82 | "im_grad_mag = skimage.filters.sobel(gray_aerial)\n", 83 | "\n", 84 | "fig, ax = plt.subplots(constrained_layout=True)\n", 85 | "ax.imshow(aerial_photo)\n", 86 | "im = ax.imshow(im_grad_mag, cmap='gray_r', label='Sobel', vmin=0, vmax=50)\n", 87 | "ax.set(xticks=[], yticks=[])\n", 88 | "utils.Toggler(im).show()" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "Oy... That's noisy... \n", 96 | "\n", 97 | "Thankfully, we just talked about gaussian gradient magnitude as a way of producing less noisy gradients. Let's apply it:" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "sigma = 3\n", 107 | "gauss_grad_mag = scipy.ndimage.gaussian_gradient_magnitude(gray_aerial, sigma)\n", 108 | "\n", 109 | "fig, ax = plt.subplots(constrained_layout=True)\n", 110 | "ax.imshow(aerial_photo)\n", 111 | "im1 = ax.imshow(im_grad_mag, cmap='gray_r', label='Sobel', vmin=0, vmax=50)\n", 112 | "im2 = ax.imshow(gauss_grad_mag, cmap='gray_r', label='Gauss', vmin=0, vmax=15)\n", 113 | "ax.set(xticks=[], yticks=[])\n", 114 | "utils.Toggler(im1, im2).show()" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "Okay, we have something now that we could think about using directly. Let's try thresholding the gaussian gradient magnitude." 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "grad_thresh = gauss_grad_mag > 5\n", 131 | "\n", 132 | "fig, ax = plt.subplots(constrained_layout=True)\n", 133 | "ax.imshow(gray_aerial, cmap='gray')\n", 134 | "im = ax.imshow(np.ma.masked_equal(grad_thresh, 0), vmin=0,\n", 135 | " interpolation='nearest')\n", 136 | "ax.set(xticks=[], yticks=[])\n", 137 | "\n", 138 | "def update(thresh):\n", 139 | " grad_thresh = gauss_grad_mag > thresh\n", 140 | " im.set_data(np.ma.masked_equal(grad_thresh, 0))\n", 141 | "\n", 142 | "utils.Slider(ax, 1, 10, update, start=3).show()" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "You can fairly easily imagine skeletonizing the classification we just created to get the exact edges. It's the same idea as what we used to get a nice line at the toe of slope around seamounts earlier.\n", 150 | "\n", 151 | "You might even imagine trying to do a bit better job of skeletonizing so that nearby \"ridges\" linked up instead of being separate features.\n", 152 | "\n", 153 | "That's the basic idea behind the [Canny filter](https://en.wikipedia.org/wiki/Canny_edge_detector). \n", 154 | "\n", 155 | "It's essentially skeletonizing a thresholding gaussian gradient magnitude, but it tries to join up nearby features to give nice, continuous lines. Let's give it a try on this data:" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "import skimage.feature\n", 165 | "\n", 166 | "canny = skimage.feature.canny(gray_aerial, sigma=3)\n", 167 | "\n", 168 | "fig, ax = plt.subplots(constrained_layout=True)\n", 169 | "ax.imshow(gray_aerial, cmap='gray')\n", 170 | "ax.imshow(np.ma.masked_equal(canny, 0), vmin=0, interpolation='nearest')\n", 171 | "ax.set(xticks=[], yticks=[])\n", 172 | "plt.show()" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "Okay! Now we're getting closer to identifying our lineaments. However, we've still got a lot of noise and a lot of small features we're not too interested in.\n", 180 | "\n", 181 | "Not only that, but how are we going to turn these into actual lineaments we can get an orientation of? If we vectorize this result, the line segments would be mostly curved and not pointing along the features we're interested in.\n", 182 | "\n", 183 | "Basically we want to try to extract straight lines from these ridges.\n", 184 | "\n", 185 | "A good way to detect straight lines in an image is to use a [Hough Transform](https://en.wikipedia.org/wiki/Hough_transform). The basic idea is to progressively rotate the image and sum along rows of the rotated result, then add each summed version as a new column. Straight lines will form a local peak in the resulting array. I'm not going to go over this in too much detail, though.\n", 186 | "\n", 187 | "The key thing to know about a Hough transform is that it gives an _infinite_ line. I.e. `y = Ax + B`. You don't get start and end points, which is usually what we're interested in.\n", 188 | "\n", 189 | "Therefore, there's a variant called a [probabilistic Hough transform](https://scikit-image.org/docs/dev/auto_examples/edges/plot_line_hough_transform.html#probabilistic-hough-transform) that attempts to identify likely straight line segments of a given length. It results in discrete line segments with specific locations. The length can be used as \"knob\" to tune whether you're finding lots of small linear features or fewer large linear features. \n", 190 | "\n", 191 | "Sounds perfect for this task! Let's appy it to our Canny-filtered edges above:" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "from skimage.transform import probabilistic_hough_line\n", 201 | "from matplotlib.collections import LineCollection\n", 202 | "\n", 203 | "gap_ratio = 0.12\n", 204 | "\n", 205 | "fig, ax = plt.subplots(constrained_layout=True)\n", 206 | "ax.imshow(aerial_photo)\n", 207 | "col = ax.add_collection(LineCollection([], color='yellow'))\n", 208 | "ax.set(xticks=[], yticks=[])\n", 209 | "\n", 210 | "def update(length):\n", 211 | " lines = probabilistic_hough_line(canny, line_length=length, \n", 212 | " line_gap=int(length*gap_ratio))\n", 213 | " col.set_segments(lines)\n", 214 | "\n", 215 | "utils.Slider(ax, 5, 50, update, start=10).show()" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "No matter how we do this, we pick up some features we're not interested in, and leave out some that we are. Overall, it's reasonable if we're mostly interested in orientations. Let's show a rose diagram of the lineaments we identified:" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": { 229 | "scrolled": false 230 | }, 231 | "outputs": [], 232 | "source": [ 233 | "lines = probabilistic_hough_line(canny, line_length=30, \n", 234 | " line_gap=5)\n", 235 | " \n", 236 | "# Calculate azimuth\n", 237 | "lines = np.array(lines)\n", 238 | "dx, dy = np.squeeze(np.diff(lines, axis=1)).T\n", 239 | "# Negative dy due to image orientation, 90 - angle for azimuth\n", 240 | "angles = np.pi / 2 - np.arctan2(-dy, dx)\n", 241 | "\n", 242 | "fig = plt.figure(constrained_layout=True)\n", 243 | "ax1 = fig.add_subplot(2, 1, 1)\n", 244 | "ax2 = fig.add_subplot(2, 1, 2, projection='polar', theta_offset=np.pi/2,\n", 245 | " theta_direction=-1)\n", 246 | "ax1.imshow(aerial_photo)\n", 247 | "ax1.add_collection(LineCollection(lines, color='yellow'))\n", 248 | "ax2.hist(np.concatenate([angles, angles + np.pi]), bins=60)\n", 249 | "\n", 250 | "ax1.set(xticks=[], yticks=[])\n", 251 | "ax2.set(xticks=[], yticks=[], axisbelow=True)\n", 252 | "\n", 253 | "plt.show()" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "Okay, this worked. However it did a poor job of identifying many of the prominent lineaments. It also identifies a lot of lineaments that are purely perpendicular to the sun direction. We should be able to do better.\n", 261 | "\n", 262 | "One of the ways we can identify lineaments is to look for _consistent directions_ of image gradients. Rather than just looking at the magnitudes, let's take direction into account as well.\n", 263 | "\n", 264 | "A common and very useful technique is the [structure tensor](https://en.wikipedia.org/wiki/Structure_tensor). The idea should be familiar to most folks who've worked on orientation statistics. We take the gradient vectors of the image within a moving window for each pixel. Then we build a 2x2 covariance matrix from the dx, dy componenents of those gradient vectors. This is the structure tensor -- it's a symmetric 2x2 matrix for each pixel in the image, based on the covariance of nearby gradients.\n", 265 | "\n", 266 | "In other words, it tells us how aligned image gradients are within each region of the image as well as how large they are.\n", 267 | "\n", 268 | "Let's dive right in to a farily large standalone example to understand what's going on." 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": { 275 | "scrolled": false 276 | }, 277 | "outputs": [], 278 | "source": [ 279 | "import numpy as np\n", 280 | "import matplotlib.pyplot as plt\n", 281 | "import rasterio as rio\n", 282 | "\n", 283 | "from skimage.feature import structure_tensor, structure_tensor_eigvals\n", 284 | "\n", 285 | "from context import data\n", 286 | "\n", 287 | "with rio.open(data.naip.lineaments, 'r') as src:\n", 288 | " image = src.read()\n", 289 | "\n", 290 | "# This assumes a grayscale image. For simplicity, we'll just use RGB mean.\n", 291 | "data = image.astype(float).mean(axis=0)\n", 292 | "\n", 293 | "# Compute the structure tensor. This is basically local gradient similarity.\n", 294 | "# We're getting three components at each pixel that correspond to a 2x2\n", 295 | "# symmetric matrix. i.e. [[axx, axy],[axy, ayy]]\n", 296 | "axx, axy, ayy = structure_tensor(data, sigma=2.5, mode='mirror')\n", 297 | "\n", 298 | "# Then we'll compute the eigenvalues of that matrix.\n", 299 | "v1, v2 = structure_tensor_eigvals(axx, axy, ayy)\n", 300 | "\n", 301 | "# And calculate the eigenvector corresponding to the largest eigenvalue.\n", 302 | "dx, dy = v1 - axx, -axy\n", 303 | "\n", 304 | "# We have a vector at each pixel now. However, we don't really care about all\n", 305 | "# of them, only those with a large magnitude. Also, we don't need to worry\n", 306 | "# about every pixel, as adjacent values are very highly correlated. Therefore,\n", 307 | "# let's only consider every 10th pixel in each direction.\n", 308 | "\n", 309 | "# Top 10th percentile of magnitude\n", 310 | "mag = np.hypot(dx, dy)\n", 311 | "selection = mag > np.percentile(mag, 90)\n", 312 | "\n", 313 | "# Every 10th pixel (skipping left edge due to boundary effects)\n", 314 | "ds = np.zeros_like(selection)\n", 315 | "ds[::10, 10::10] = True\n", 316 | "selection = ds & selection\n", 317 | "\n", 318 | "\n", 319 | "# Now we'll visualize the selected (large) structure tensor directions both\n", 320 | "# superimposed on the image and as a rose diagram...\n", 321 | "fig = plt.figure(constrained_layout=True)\n", 322 | "ax1 = fig.add_subplot(2, 1, 1)\n", 323 | "ax2 = fig.add_subplot(2, 1, 2, projection='polar', theta_offset=np.pi/2,\n", 324 | " theta_direction=-1)\n", 325 | "\n", 326 | "ax1.imshow(np.moveaxis(image, 0, -1))\n", 327 | "\n", 328 | "y, x = np.mgrid[:dx.shape[0], :dx.shape[1]]\n", 329 | "\n", 330 | "no_arrow = dict(headwidth=0, headlength=0, headaxislength=0)\n", 331 | "ax1.quiver(x[selection], y[selection], dx[selection], dy[selection],\n", 332 | " angles='xy', units='xy', pivot='middle', color='red', **no_arrow)\n", 333 | "\n", 334 | "\n", 335 | "# We actually want to be perpendictular to the direction of change.. i.e.\n", 336 | "# we want to point _along_ the lineament. Therefore we'll subtract 90 degrees.\n", 337 | "# (Could have just gotten the direction of the smaller eigenvector, but we\n", 338 | "# need to base the magnitude on the largest eigenvector.)\n", 339 | "angle = np.arctan2(dy[selection], dx[selection]) - np.pi/2\n", 340 | "ax2.hist(np.concatenate([angle.ravel(), angle.ravel() + np.pi]), bins=120)\n", 341 | "\n", 342 | "ax1.set(xticks=[], yticks=[])\n", 343 | "ax2.set(xticks=[], yticks=[], axisbelow=True)\n", 344 | "plt.show()" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "Thin Section Grain Analysis\n", 352 | "-----------------------------------------\n", 353 | "\n", 354 | "We've done a lot so far with large-scale data. A key reason to know image processing methods, though, is that they apply to all scales. Let's shift gears and spend a bit of time working with photomicrographs. \n", 355 | "\n", 356 | "In this case, we'll try to measure the shape preferred orientation of mineral grains in a rock. We won't go too deeply into the specifics, so let's just try to get an idea of the distribution of grain orientations in our sample. \n", 357 | "\n", 358 | "If you're not familiar with optical petrology, we're looking at a slide with a very thin (~30 microns) slice of rock attached to it. It's common to use both plane polarized light and cross polarized light. You can compare the change in appearance in the figure below:" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "%matplotlib notebook\n", 368 | "import numpy as np\n", 369 | "import matplotlib.pyplot as plt\n", 370 | "import rasterio as rio\n", 371 | "import skimage.io\n", 372 | "\n", 373 | "from context import data\n", 374 | "from context import utils\n", 375 | "\n", 376 | "xpl_rgb = skimage.io.imread(data.bgs_rock.amphibolite_xpl)\n", 377 | "ppl_rgb = skimage.io.imread(data.bgs_rock.amphibolite_ppl)\n", 378 | "\n", 379 | "fig, ax = plt.subplots(constrained_layout=True)\n", 380 | "im_ppl = ax.imshow(ppl_rgb, label='Plane Polarized')\n", 381 | "im_xpl = ax.imshow(xpl_rgb, label='Cross Polarized')\n", 382 | "ax.set(xticks=[], yticks=[])\n", 383 | "utils.Toggler(im_xpl).show()" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "Here's the (poor) 40,000 foot (yes, I know... Units...) overview: \n", 391 | "\n", 392 | "Minerals with a high degree of birefringence (controlled by anisotropy in the speed of light through the crystal) can appear in a range of \"gaudy\" colors under cross polars. The color depends on the orientation of the crystal, so an individual color doesn't tell us much. The range of colors for a certain mineral is a useful indicator, though. \n", 393 | "\n", 394 | "In this case, we don't really care what the colors mean, we only want to use the color to distinguish one grain from another adjacent grain. We're interested in being able to identify distinct grains because we're interested in the shape of each grain.\n", 395 | "\n", 396 | "If you look closely, you'll see that the colors within a grain aren't constant. They vary a bit. There are also lots of small mineral grains and inclusions within larger grains that we don't need to worry as much about for this analysis. Ideally we'd try to separate them, but it's okay if we don't. We mostly want to look at the orientation of the largest grains.\n", 397 | "\n", 398 | "Okay. So we want to identify distinct grains. In image processing terms, we want to segment the image. We'll do this by finding regions with similar colors. These \"regions of a similar color\" often are referred to as \"superpixels\" in image processing terms. There are a _huge_ variety of methods to do this. Some use gradients to define \"watersheds\" of similar color, some use clustering, and you can even apply more flexible methods like a trained CNN to do this.\n", 399 | "\n", 400 | "Let's apply a fairly well-known and widely used \"superpixel\" method: Simple Linear Iterative Clustering (SLIC). It's based on [K-means clustering](https://en.wikipedia.org/wiki/K-means_clustering), which is a common method to find groups of similar data. In this case, it's clustering spatially as well as in color-space. It's actually using X,Y coordinates directly in the clustering and working in 5 dimensions (RGB + XY).\n", 401 | "\n", 402 | "To make things a bit easier on ourselves, let's just work with the center of the image (avoids needing to separate out the background that's not part of the thin section):" 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": null, 408 | "metadata": { 409 | "scrolled": false 410 | }, 411 | "outputs": [], 412 | "source": [ 413 | "xpl_rgb = xpl_rgb[500:3000, 1000:4000, :]\n", 414 | "\n", 415 | "fig, ax = plt.subplots(constrained_layout=True)\n", 416 | "ax.imshow(xpl_rgb)\n", 417 | "ax.set(xticks=[], yticks=[])\n", 418 | "plt.show()" 419 | ] 420 | }, 421 | { 422 | "cell_type": "markdown", 423 | "metadata": {}, 424 | "source": [ 425 | "Now let's segment the image using SLIC and play around with the parameters a bit..." 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": null, 431 | "metadata": {}, 432 | "outputs": [], 433 | "source": [ 434 | "import skimage.segmentation\n", 435 | "\n", 436 | "grains = skimage.segmentation.slic(xpl_rgb, sigma=0.5, multichannel=True,\n", 437 | " n_segments=1500, compactness=0.1)\n", 438 | "\n", 439 | "# It's hard to color each grain with a unique color, so we'll show boundaries\n", 440 | "# in yellow instead of coloring them like we did before.\n", 441 | "overlay = skimage.segmentation.mark_boundaries(xpl_rgb, grains, start_label=1)\n", 442 | "\n", 443 | "fig, ax = plt.subplots(constrained_layout=True)\n", 444 | "ax.imshow(overlay)\n", 445 | "plt.show()" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "Here's a more compelete example where we extract properties (such as long axis and short axis) of each segmented region to look at the distribution of grain orientations:" 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": null, 458 | "metadata": {}, 459 | "outputs": [], 460 | "source": [ 461 | "import numpy as np\n", 462 | "import matplotlib.pyplot as plt\n", 463 | "import skimage.io\n", 464 | "import skimage.segmentation\n", 465 | "import skimage.measure\n", 466 | "import scipy.ndimage\n", 467 | "\n", 468 | "from context import data\n", 469 | "from context import utils\n", 470 | "\n", 471 | "# Amphibolite under cross polars (from BGS, see data/bgs_rock/README.md)\n", 472 | "xpl_rgb = skimage.io.imread(data.bgs_rock.amphibolite_xpl)\n", 473 | "\n", 474 | "# Let's use the center of the image to avoid needing to worry about the edges.\n", 475 | "xpl_rgb = xpl_rgb[500:3000, 1000:4000, :]\n", 476 | "\n", 477 | "# This attempts to group locally similar colors. It's kmeans in 5 dimensions\n", 478 | "# (RGB + XY). N_segments and compactness are the main \"knobs\" to turn.\n", 479 | "grains = skimage.segmentation.slic(xpl_rgb, sigma=0.5, multichannel=True,\n", 480 | " n_segments=1500, compactness=0.1)\n", 481 | "\n", 482 | "# It's hard to color each grain with a unique color, so we'll show boundaries\n", 483 | "# in yellow instead of coloring them like we did before.\n", 484 | "overlay = skimage.segmentation.mark_boundaries(xpl_rgb, grains)\n", 485 | "\n", 486 | "# Now let's extract information about each individual grain we've classified.\n", 487 | "# In this case, we're only interested in orientation, but there's a lot more\n", 488 | "# we could extract.\n", 489 | "info = skimage.measure.regionprops(grains)\n", 490 | "\n", 491 | "# And calculate the orientation of the long axis of each grain...\n", 492 | "angles = []\n", 493 | "for item in info:\n", 494 | " cov = item['inertia_tensor']\n", 495 | " azi = np.degrees(np.arctan2((-2 * cov[0, 1]), (cov[0,0] - cov[1,1])))\n", 496 | " angles.append(azi)\n", 497 | "\n", 498 | "# Make bidirectional (quick hack for plotting)\n", 499 | "angles = angles + [x + 180 for x in angles]\n", 500 | "\n", 501 | "# Now display the segmentation and a rose diagram\n", 502 | "fig = plt.figure(constrained_layout=True)\n", 503 | "ax1 = fig.add_subplot(1, 2, 1)\n", 504 | "ax2 = fig.add_subplot(1, 2, 2, projection='polar', theta_offset=np.pi/2,\n", 505 | " theta_direction=-1)\n", 506 | "ax1.imshow(overlay)\n", 507 | "ax2.hist(np.radians(angles), bins=60)\n", 508 | "\n", 509 | "ax1.set(xticks=[], yticks=[])\n", 510 | "ax2.set(xticklabels=[], yticklabels=[], axisbelow=True)\n", 511 | "plt.show()\n" 512 | ] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "metadata": {}, 517 | "source": [ 518 | "### Wrapping it all up\n", 519 | "\n", 520 | "That's all for now, folks! Hopefully you can see ways to apply some of this to problems you're actively working on. There are a ton of very powerful methods exposed in common python image processing libraries, and it's easy to get a bit lost in the huge variety of options. Hopefully this has given you enough of an understanding of common methods to start exploring on your own. There's a lot out there there's very " 521 | ] 522 | } 523 | ], 524 | "metadata": { 525 | "kernelspec": { 526 | "display_name": "Python 3", 527 | "language": "python", 528 | "name": "python3" 529 | }, 530 | "language_info": { 531 | "codemirror_mode": { 532 | "name": "ipython", 533 | "version": 3 534 | }, 535 | "file_extension": ".py", 536 | "mimetype": "text/x-python", 537 | "name": "python", 538 | "nbconvert_exporter": "python", 539 | "pygments_lexer": "ipython3", 540 | "version": "3.8.3" 541 | } 542 | }, 543 | "nbformat": 4, 544 | "nbformat_minor": 4 545 | } 546 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) Joe Kington 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Geologic Image Processing Tutorial - Transform 2020 2 | -------------------------------------------------- 3 | 4 | You can find a recording of this tutorial at: https://www.youtube.com/watch?v=3ZvRVB6Eeq4&feature=youtu.be 5 | 6 | This repository contains the material for the geologic image processing 7 | tutorial given on June 11th, 2020 at the Transform 2020 virtual 8 | conference. 9 | 10 | There's also an [upcoming workshop](https://www.nordicsrg.com/events) held by 11 | the [Nordic Sedimentary Research Group](https://www.nordicsrg.com/) that will 12 | use this tutorial. 13 | 14 | Binder Setup 15 | ------------ 16 | 17 | If you don't have a local python setup yet, you can run this tutorial in your 18 | browser by clicking 19 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/joferkington/geo_image_processing_tutorial/master?filepath=01%20-%20Introduction.ipynb) 20 | 21 | However, there is a limit to how many people can use binder for this repo at 22 | any given time. If you're comfortable running things locally, consider 23 | following the instructions below. Getting a local python installation set up 24 | will also allow you to work with your own data. 25 | 26 | Conda Setup 27 | ----------- 28 | 29 | The easiest way to get a complete local installation is to use Anaconda. You 30 | can find an [overview and download 31 | link](https://www.anaconda.com/products/individual) on their main page as well 32 | as [more complete installation 33 | instructions](https://docs.anaconda.com/anaconda/install/). 34 | 35 | To create the conda environment for this tutorial run: 36 | 37 | ``` 38 | conda env create -f environment.yml 39 | ``` 40 | 41 | The environment is called `t20-thu-images` to match the Transform2020 slack 42 | channel and avoid conflicts with other tutorial's environment names. To switch 43 | to that environment, you'd use: 44 | 45 | ``` 46 | conda activate t20-thu-images 47 | ``` 48 | 49 | or select the environment when starting anaconda from the gui launcher. After 50 | that, you'd launch `jupyter notebook` and select the first notebook in this 51 | tutorial. 52 | 53 | Manual Setup 54 | ------------ 55 | 56 | Alternatively, the requirements for this are quite minimal, and you may already 57 | have what you need installed. This depends on: 58 | 59 | * rasterio 60 | * matplotlib 61 | * scipy 62 | * scikit-image 63 | * jupyter 64 | 65 | Any relatively recent version of the above libraries should be fine. We're not 66 | depending on any bleeding-edge functionality. In principle, these examples 67 | should work with python 2.7 as well as 3.5 or greater. However, things have 68 | not been tested extensively with python 2.7, and I'd recommend using python 3.5 69 | or greater if you're setting things up from scratch. 70 | 71 | 72 | -------------------------------------------------------------------------------- /context.py: -------------------------------------------------------------------------------- 1 | import data 2 | from examples import utils 3 | -------------------------------------------------------------------------------- /data/__init__.py: -------------------------------------------------------------------------------- 1 | from . import naip 2 | from . import gebco 3 | from . import bgs_rock 4 | 5 | __all__ = ['naip', 'gebco', 'bgs_rock'] 6 | -------------------------------------------------------------------------------- /data/bgs_rock/N1495_ppl.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joferkington/geo_image_processing_tutorial/60880291e78c19f5daf65d1eda581ccbc6cc4299/data/bgs_rock/N1495_ppl.jpg -------------------------------------------------------------------------------- /data/bgs_rock/N1495_xpl.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joferkington/geo_image_processing_tutorial/60880291e78c19f5daf65d1eda581ccbc6cc4299/data/bgs_rock/N1495_xpl.jpg -------------------------------------------------------------------------------- /data/bgs_rock/README.md: -------------------------------------------------------------------------------- 1 | Amphibolite from Cairngorms, Scotland, UK under cross polarized (N1495_xpl.jpg) and plane polarized (N1495_ppl.jpg) light. 2 | 3 | Data from the British Geological Survey's mineralogy and petrology collection. 4 | Under an Open Government License: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ 5 | 6 | Sample ID: N1495; COLLNOETS643; 7 | 8 | URL: https://www.bgs.ac.uk/data/britrocks/britrocks.cfc?method=viewSamples&sampleId=157565 9 | 10 | Location: https://gridreferencefinder.com/os.php?gr=NH9834036140|NH_s_9834_s_3614|1&t=NH%209834%203614&v=r 11 | -------------------------------------------------------------------------------- /data/bgs_rock/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | base = os.path.dirname(__file__) 4 | amphibolite_xpl = os.path.join(base, 'N1495_xpl.jpg') 5 | amphibolite_ppl = os.path.join(base, 'N1495_ppl.jpg') 6 | -------------------------------------------------------------------------------- /data/bgs_rock/download.sh: -------------------------------------------------------------------------------- 1 | #! /bin/sh 2 | # From https://www.bgs.ac.uk/data/britrocks/britrocks.cfc?method=viewSamples&sampleId=157565 3 | # Under Open Government License: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ 4 | # Using gdal to retrieve data from IIP viewer 5 | gdal_translate 'IIP:http://www.largeimages.bgs.ac.uk/cgi-bin/iipsrv.fcgi?FIF=/opndata/Petrology_Images/ThinSections_JP2/290000/294119.jp2' N1495_xpl.jpg -of JPEG 6 | gdal_translate 'IIP:http://www.largeimages.bgs.ac.uk/cgi-bin/iipsrv.fcgi?FIF=/opndata/Petrology_Images/ThinSections_JP2/290000/294120.jp2' N1495_ppl.jpg -of JPEG 7 | -------------------------------------------------------------------------------- /data/gebco/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | base = os.path.dirname(__file__) 4 | seamounts = os.path.join(base, 'seamounts.tif') 5 | -------------------------------------------------------------------------------- /data/gebco/seamounts.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joferkington/geo_image_processing_tutorial/60880291e78c19f5daf65d1eda581ccbc6cc4299/data/gebco/seamounts.tif -------------------------------------------------------------------------------- /data/naip/README.md: -------------------------------------------------------------------------------- 1 | This is derived from the 2018 National Agriculture Imagery Program (NAIP) survey, accessed via https://raster.utah.gov . The area show is near Moab, Utah, USA, just outside of Arches National Park (exact location information is embedded in the geotiff). 2 | 3 | NAIP data is a US government work and is therefore not subject to copyright within the united states. 4 | -------------------------------------------------------------------------------- /data/naip/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | base = os.path.dirname(__file__) 4 | lineaments = os.path.join(base, 'q2839_sw_NAIP2018_RGB_clipped.tif') 5 | -------------------------------------------------------------------------------- /data/naip/q2839_sw_NAIP2018_RGB_clipped.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joferkington/geo_image_processing_tutorial/60880291e78c19f5daf65d1eda581ccbc6cc4299/data/naip/q2839_sw_NAIP2018_RGB_clipped.tif -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: t20-thu-images 2 | channels: 3 | - conda-forge 4 | - defaults 5 | dependencies: 6 | - jupyter 7 | - matplotlib 8 | - rasterio 9 | - scipy 10 | - scikit-image 11 | - fiona 12 | - shapely 13 | -------------------------------------------------------------------------------- /examples/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joferkington/geo_image_processing_tutorial/60880291e78c19f5daf65d1eda581ccbc6cc4299/examples/__init__.py -------------------------------------------------------------------------------- /examples/combine_adjacent_seamounts.py: -------------------------------------------------------------------------------- 1 | """ 2 | Example of one possible solution to the "combine nearby seamounts into one" 3 | take-home question. 4 | """ 5 | import numpy as np 6 | import matplotlib.pyplot as plt 7 | import rasterio as rio 8 | import scipy.ndimage 9 | 10 | from context import data 11 | from context import utils 12 | 13 | # Let's repeat some key steps of `seamount_detection.py` to separate seamounts 14 | with rio.open(data.gebco.seamounts, 'r') as src: 15 | bathy = src.read(1) 16 | cellsize = src.transform.a 17 | 18 | background = scipy.ndimage.uniform_filter(bathy, int(0.5 / cellsize)) 19 | threshold = bathy > (background + 500) 20 | 21 | cleaned = scipy.ndimage.median_filter(threshold, 15) 22 | orig_labels, orig_count = scipy.ndimage.label(cleaned) 23 | 24 | # Now let's try to combine any seamounts that are within 20 pixels of each other 25 | combined = scipy.ndimage.binary_closing(cleaned, iterations=20) 26 | 27 | # And we'll fill holes on the result, as we don't want any doughnuts 28 | filled = scipy.ndimage.binary_fill_holes(combined) 29 | 30 | # Separate into non-touching features 31 | final_labels, final_count = scipy.ndimage.label(filled) 32 | 33 | # Compare the differences... Note the "tails" connecting features that are just 34 | # barely within the threshold of each other. 35 | fig, ax = plt.subplots() 36 | ax.imshow(bathy, cmap='Blues_r', vmax=0, label='Bathymetry') 37 | ax.set(title=f'{orig_count} Seamounts Before Combining, {final_count} After') 38 | 39 | layers = [ 40 | ax.imshow(np.ma.masked_equal(threshold, 0), label='Threshold'), 41 | ax.imshow(np.ma.masked_equal(cleaned, 0), label='Cleaned'), 42 | ax.imshow(np.ma.masked_equal(orig_labels, 0), cmap='tab20', 43 | label='Orig Labels'), 44 | ax.imshow(np.ma.masked_equal(combined, 0), label='Combined'), 45 | ax.imshow(np.ma.masked_equal(final_labels, 0), cmap='tab20', 46 | label='Final Labels'), 47 | ] 48 | 49 | ax.set(xticks=[], yticks=[]) 50 | fig.tight_layout() 51 | utils.Toggler(*layers).show() 52 | -------------------------------------------------------------------------------- /examples/context.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | 4 | proj_dir = os.path.join(os.path.dirname(__file__), '..') 5 | proj_dir = os.path.abspath(proj_dir) 6 | sys.path.insert(0, proj_dir) 7 | 8 | import data 9 | import utils 10 | -------------------------------------------------------------------------------- /examples/seamount_detection.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | import rasterio as rio 4 | import scipy.ndimage 5 | 6 | from context import data 7 | from context import utils 8 | 9 | with rio.open(data.gebco.seamounts, 'r') as src: 10 | bathy = src.read(1) 11 | cellsize = src.transform.a # Cells are square and N-S in this case 12 | 13 | # First let's try a simple threshold based on absolute depth 14 | # Depth in meters 15 | simple_threshold = bathy > -3500 16 | 17 | # Next let's try thresholding based on being more than 500m above a local 18 | # Average within a 0.5 degree moving window. 19 | background = scipy.ndimage.uniform_filter(bathy, int(0.5 / cellsize)) 20 | better_threshold = bathy > (background + 500) 21 | 22 | # And we'll apply some cleanup to the thresholded result 23 | cleaned = scipy.ndimage.median_filter(better_threshold, 15) 24 | filled = scipy.ndimage.binary_fill_holes(cleaned) 25 | labels, count = scipy.ndimage.label(filled) 26 | 27 | # And now let's compare all of those operations 28 | fig, ax = plt.subplots() 29 | ax.imshow(bathy, cmap='Blues_r', vmax=0, label='Bathymetry') 30 | 31 | layers = [ 32 | ax.imshow(np.ma.masked_equal(simple_threshold, 0), label='Simple'), 33 | ax.imshow(background, cmap='Blues_r', vmax=0, vmin=bathy.min(), 34 | label='Filtered Bathy'), 35 | ax.imshow(np.ma.masked_equal(better_threshold, 0), label='Better'), 36 | ax.imshow(np.ma.masked_equal(cleaned, 0), label='Cleaned'), 37 | ax.imshow(np.ma.masked_equal(filled, 0), label='Filled'), 38 | ax.imshow(np.ma.masked_equal(labels, 0), label='Labeled', cmap='tab20'), 39 | ] 40 | 41 | ax.set(xticks=[], yticks=[]) 42 | fig.tight_layout() 43 | utils.Toggler(*layers).show() 44 | 45 | # Now let's look at area distribution. In this projection (geographic) the area 46 | # of a pixel varies by latitude. 47 | i, j = np.mgrid[:bathy.shape[0], :bathy.shape[1]] 48 | with rio.open(data.gebco.seamounts, 'r') as src: 49 | lon, lat = src.xy(j, i) 50 | 51 | # In square km. ~111.32 is 1 degree in km at the equator 52 | area = (cellsize * 111.32)**2 * np.cos(np.radians(lat)) 53 | 54 | # Now we'll sum by zone in our labeled seamount array 55 | areas = scipy.ndimage.sum(area, labels, np.arange(count)+1) 56 | 57 | # And let's have a look at the distribution... 58 | fig, ax = plt.subplots() 59 | ax.hist(areas, bins=40) 60 | ax.set(xlabel='Area in $km^2$', ylabel='Number of seamounts') 61 | plt.show() 62 | -------------------------------------------------------------------------------- /examples/seamount_detection_saving.py: -------------------------------------------------------------------------------- 1 | """ 2 | Illustrates saving things back to a geotiff and vectorizing to a shapefile 3 | """ 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | import rasterio as rio 7 | import rasterio.features 8 | import scipy.ndimage 9 | import fiona 10 | import shapely.geometry as geom 11 | 12 | from context import data 13 | from context import utils 14 | 15 | 16 | # First, let's reproduce the labeled array of seamounts and areas 17 | with rio.open(data.gebco.seamounts, 'r') as src: 18 | bathy = src.read(1) 19 | cellsize = src.transform.a # Cells are square and N-S in this case 20 | 21 | background = scipy.ndimage.uniform_filter(bathy, int(0.5 / cellsize)) 22 | better_threshold = bathy > (background + 500) 23 | cleaned = scipy.ndimage.median_filter(better_threshold, 15) 24 | filled = scipy.ndimage.binary_fill_holes(cleaned) 25 | labels, count = scipy.ndimage.label(filled) 26 | 27 | # ------ Save as a geotiff --------------------------------------------------- 28 | # Next, let's save the result as a geotiff. Because our data is the same size 29 | # as the original raster, it's quite straight-forward: 30 | 31 | # We'll copy over all settings from the original, but change two... 32 | with rio.open(data.gebco.seamounts, 'r') as src: 33 | profile = src.profile.copy() 34 | 35 | # Background features are 0, so we'll make that nodata/transparent. 36 | profile['nodata'] = 0 37 | profile['dtype'] = labels.dtype 38 | 39 | # And let's actually write out the new geotiff... 40 | with rio.open('regions_flagged_as_seamounts.tif', 'w', **profile) as dst: 41 | dst.write(labels, 1) 42 | 43 | # ------ Save as a shapefile ------------------------------------------------- 44 | # Now let's vectorize the results and save them as a shapefile 45 | 46 | # Just to make things a bit more interesting, let's go ahead and calculate some 47 | # additional information to save in the shapefile's attribute table. 48 | deepest = scipy.ndimage.maximum(bathy, labels, np.arange(count) + 1) 49 | shallowest = scipy.ndimage.minimum(bathy, labels, np.arange(count) + 1) 50 | 51 | # We'll need the affine transformation and the projection to go from pixel 52 | # indices to actual locations. Let's grab those from the original geotiff. 53 | with rio.open(data.gebco.seamounts, 'r') as src: 54 | transform = src.transform 55 | crs = src.crs 56 | 57 | # Now let's specify our output shapefile's format... 58 | meta = {'crs': crs, 'schema': {}, 'driver': 'ESRI Shapefile'} 59 | meta['schema']['geometry'] = 'Polygon' 60 | # And now we'll define the fields in the attribute table 61 | meta['schema']['properties'] = {'raster_id': 'int', 62 | 'deepest': 'int', 63 | 'shallowest': 'int'} 64 | 65 | 66 | # We don't want the background 0 to be a feature, so let's mask it out. 67 | labels = np.ma.masked_equal(labels, 0) 68 | 69 | with fiona.open('regions_flagged_as_seamounts.shp', 'w', **meta) as dst: 70 | 71 | vectors = rio.features.shapes(labels, transform=transform, connectivity=8) 72 | for poly, val in vectors: 73 | val = int(val) # shapes returns a float, even when the input is ints. 74 | 75 | # The polygon we get here will have stairsteps along each pixel edge. 76 | # This part is optional, but it's often useful to simplify the geometry 77 | # instead of saving the full "stairstep" version. 78 | poly = geom.shape(poly).simplify(cellsize) 79 | poly = geom.mapping(poly) # Back to a dict 80 | 81 | record = {'geometry': poly, 82 | 'properties': {'deepest': int(deepest[val-1]), 83 | 'shallowest': int(shallowest[val-1]), 84 | 'raster_id': val}} 85 | dst.write(record) 86 | -------------------------------------------------------------------------------- /examples/structure_tensor.py: -------------------------------------------------------------------------------- 1 | """ 2 | Using a structure tensor for lineament analysis. 3 | """ 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | import rasterio as rio 7 | 8 | from skimage.feature import structure_tensor, structure_tensor_eigvals 9 | 10 | from context import data 11 | 12 | with rio.open(data.naip.lineaments, 'r') as src: 13 | image = src.read() 14 | 15 | # This assumes a grayscale image. For simplicity, we'll just use RGB mean. 16 | data = image.astype(float).mean(axis=0) 17 | 18 | # Compute the structure tensor. This is basically local gradient similarity. 19 | # We're getting three components at each pixel that correspond to a 2x2 20 | # symmetric matrix. i.e. [[axx, axy],[axy, ayy]] 21 | axx, axy, ayy = structure_tensor(data, sigma=2.5, mode='mirror') 22 | 23 | # Then we'll compute the eigenvalues of that matrix. 24 | v1, v2 = structure_tensor_eigvals(axx, axy, ayy) 25 | 26 | # And calculate the eigenvector corresponding to the largest eigenvalue. 27 | dx, dy = v1 - axx, -axy 28 | 29 | # We have a vector at each pixel now. However, we don't really care about all 30 | # of them, only those with a large magnitude. Also, we don't need to worry 31 | # about every pixel, as adjacent values are very highly correlated. Therefore, 32 | # let's only consider every 10th pixel in each direction. 33 | 34 | # Top 10th percentile of magnitude 35 | mag = np.hypot(dx, dy) 36 | selection = mag > np.percentile(mag, 90) 37 | 38 | # Every 10th pixel (skipping left edge due to boundary effects) 39 | ds = np.zeros_like(selection) 40 | ds[::10, 10::10] = True 41 | selection = ds & selection 42 | 43 | 44 | # Now we'll visualize the selected (large) structure tensor directions both 45 | # superimposed on the image and as a rose diagram... 46 | fig = plt.figure(constrained_layout=True) 47 | ax1 = fig.add_subplot(2, 1, 1) 48 | ax2 = fig.add_subplot(2, 1, 2, projection='polar', theta_offset=np.pi/2, 49 | theta_direction=-1) 50 | 51 | ax1.imshow(np.moveaxis(image, 0, -1)) 52 | 53 | y, x = np.mgrid[:dx.shape[0], :dx.shape[1]] 54 | 55 | no_arrow = dict(headwidth=0, headlength=0, headaxislength=0) 56 | ax1.quiver(x[selection], y[selection], dx[selection], dy[selection], 57 | angles='xy', units='xy', pivot='middle', color='red', **no_arrow) 58 | 59 | 60 | # We actually want to be perpendictular to the direction of change.. i.e. 61 | # we want to point _along_ the lineament. Therefore we'll subtract 90 degrees. 62 | # (Could have just gotten the direction of the smaller eigenvector, but we 63 | # need to base the magnitude on the largest eigenvector.) 64 | angle = np.arctan2(dy[selection], dx[selection]) - np.pi/2 65 | angle = np.concatenate([angle, angle + np.pi]) # Make bidirectional 66 | ax2.hist(angle.ravel(), bins=120, edgecolor='C0', color='C0') 67 | 68 | ax1.set(xticks=[], yticks=[]) 69 | ax2.set(yticklabels=[], xticklabels=[], axisbelow=True) 70 | 71 | plt.show() 72 | -------------------------------------------------------------------------------- /examples/thin_section_grain_orientation.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | import skimage.io 4 | import skimage.segmentation 5 | import skimage.measure 6 | import scipy.ndimage 7 | 8 | from context import data 9 | from context import utils 10 | 11 | # Amphibolite under cross polars (from BGS, see data/bgs_rock/README.md) 12 | xpl_rgb = skimage.io.imread(data.bgs_rock.amphibolite_xpl) 13 | 14 | # Let's use the center of the image to avoid needing to worry about the edges. 15 | xpl_rgb = xpl_rgb[500:3000, 1000:4000, :] 16 | 17 | # This attempts to group locally similar colors. It's kmeans in 5 dimensions 18 | # (RGB + XY). N_segments and compactness are the main "knobs" to turn. 19 | grains = skimage.segmentation.slic(xpl_rgb, sigma=0.5, multichannel=True, 20 | n_segments=1500, compactness=0.1) 21 | 22 | # It's hard to color each grain with a unique color, so we'll show boundaries 23 | # in yellow instead of coloring them like we did before. 24 | overlay = skimage.segmentation.mark_boundaries(xpl_rgb, grains) 25 | 26 | # Now let's extract information about each individual grain we've classified. 27 | # In this case, we're only interested in orientation, but there's a lot more 28 | # we could extract. 29 | info = skimage.measure.regionprops(grains) 30 | 31 | # And calculate the orientation of the long axis of each grain... 32 | angles = [] 33 | for item in info: 34 | cov = item['inertia_tensor'] 35 | azi = np.degrees(np.arctan2((-2 * cov[0, 1]), (cov[0,0] - cov[1,1]))) 36 | angles.append(azi) 37 | 38 | # Make bidirectional (quick hack for plotting) 39 | angles = angles + [x + 180 for x in angles] 40 | 41 | # Now display the segmentation and a rose diagram 42 | fig = plt.figure(constrained_layout=True) 43 | ax1 = fig.add_subplot(1, 2, 1) 44 | ax2 = fig.add_subplot(1, 2, 2, projection='polar', theta_offset=np.pi/2, 45 | theta_direction=-1) 46 | ax1.imshow(overlay) 47 | ax2.hist(np.radians(angles), bins=60) 48 | 49 | ax1.set(xticks=[], yticks=[]) 50 | ax2.set(xticklabels=[], yticklabels=[], axisbelow=True) 51 | plt.show() 52 | -------------------------------------------------------------------------------- /examples/utils.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import matplotlib as mpl 3 | import ipywidgets 4 | from matplotlib.colorbar import make_axes_gridspec 5 | import matplotlib.widgets 6 | 7 | 8 | class BaseWidget: 9 | 10 | def callback(self, value): 11 | raise NotImplementedError 12 | 13 | @property 14 | def is_native(self): 15 | """Whether or not we're working with a notebook backend.""" 16 | return mpl.get_backend().lower() not in ['nbagg', 'webagg', 'ipympl'] 17 | 18 | def show(self): 19 | """ 20 | Display the figure. Provided as a method to avoid the need to save a 21 | reference to this object. 22 | """ 23 | if self.is_native: 24 | plt.show() 25 | else: 26 | return ipywidgets.interact(self.callback, value=self.widget) 27 | 28 | def _make_room(self, ax, **kwargs): 29 | """Shrink parent axes and make a new axes for widgets.""" 30 | cax, _ = make_axes_gridspec(ax, **kwargs) 31 | cax.axis('off') 32 | cax.set(aspect=1) 33 | return cax 34 | 35 | 36 | 37 | class Toggler(BaseWidget): 38 | """ 39 | Toggle between several artists in a matplotlib figure. Uses ipython widgets 40 | if we're in a notebook, or native matplotlib widgets if we're not. 41 | """ 42 | def __init__(self, *layers): 43 | """ 44 | Parameters 45 | ---------- 46 | *layers : Matplotlib artist to enable toggling on/off. 47 | """ 48 | self.layers = layers 49 | self.lookup = {x.get_label(): x for x in self.layers} 50 | self.ax = self.layers[0].axes 51 | self.fig = self.ax.figure 52 | self.widget = self._native() if self.is_native else self._notebook() 53 | self._hide() 54 | 55 | def _notebook(self): 56 | return ipywidgets.RadioButtons(options=self.labels, 57 | description='Overlay:', 58 | value=self.labels[0], 59 | disabled=False) 60 | 61 | def _native(self): 62 | widget_ax = self._make_room(self.ax) 63 | widget = mpl.widgets.RadioButtons(widget_ax, self.labels, active=0, 64 | activecolor='black') 65 | widget.on_clicked(self.callback) 66 | return widget 67 | 68 | @property 69 | def labels(self): 70 | return ['Off'] + [layer.get_label() for layer in self.layers] 71 | 72 | def callback(self, value=None): 73 | self._hide() 74 | if self.is_native and not value: 75 | value = self.widget.value_selected 76 | 77 | layer = self.lookup.get(value) 78 | if layer: 79 | layer.set_visible(True) 80 | self.fig.canvas.draw_idle() 81 | 82 | def _hide(self): 83 | for artist in self.layers: 84 | artist.set_visible(False) 85 | 86 | 87 | class Slider(BaseWidget): 88 | """ 89 | Simple slider bar that uses a callback. Uses ipython widgets when in a 90 | notebook or native widgets if using a native backend. 91 | """ 92 | def __init__(self, ax, vmin, vmax, callback, label='Value', start=None): 93 | self.ax = ax 94 | self.fig = self.ax.figure 95 | self.label = label 96 | self.vmin = vmin 97 | self.vmax = vmax 98 | self._callback = callback 99 | if start is None: 100 | start = (vmin + vmax) / 2 101 | self.start = start 102 | self.widget = self._native() if self.is_native else self._notebook() 103 | 104 | def _notebook(self): 105 | return ipywidgets.IntSlider(min=self.vmin, max=self.vmax, step=1, 106 | value=self.start) 107 | 108 | def _native(self): 109 | widget_ax = self._make_room(self.ax, orientation='horizontal') 110 | widget = mpl.widgets.Slider(widget_ax, self.label, self.vmin, 111 | self.vmax, self.start) 112 | widget.on_changed(self.callback) 113 | return widget 114 | 115 | def callback(self, value): 116 | self._callback(value) 117 | self.fig.canvas.draw_idle() 118 | -------------------------------------------------------------------------------- /exercises/3356_38_3085-3090m.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joferkington/geo_image_processing_tutorial/60880291e78c19f5daf65d1eda581ccbc6cc4299/exercises/3356_38_3085-3090m.jpg -------------------------------------------------------------------------------- /exercises/3356_38_3085-3090m_crop.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joferkington/geo_image_processing_tutorial/60880291e78c19f5daf65d1eda581ccbc6cc4299/exercises/3356_38_3085-3090m_crop.jpg -------------------------------------------------------------------------------- /exercises/README.md: -------------------------------------------------------------------------------- 1 | This is core photograph data from NPD: https://factpages.npd.no/en/wellbore/pageview/exploration/all/3356 2 | 3 | Let's look at some cross laminations and try to identify the boundaries between 4 | different sets. This is a bit of a "toy" exercise, but you might imagine 5 | trying to identify the average dip of bedding in core photographs using similar 6 | methods. 7 | 8 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | scipy 2 | numpy 3 | matplotlib 4 | scikit-image 5 | jupyter 6 | rasterio 7 | fiona 8 | shapely 9 | --------------------------------------------------------------------------------