├── README.md ├── main_tutorial ├── .ipynb_checkpoints │ └── pipeline-checkpoint.ipynb ├── README.md ├── example_cells_1.tif ├── example_cells_2.tif ├── tutorial_pipeline.py ├── tutorial_pipeline_batch.py ├── tutorial_pipeline_batch_partial_solutions.py ├── tutorial_pipeline_batch_solutions.py ├── tutorial_pipeline_partial_solutions.py └── tutorial_pipeline_solutions.py ├── optional_advanced_content ├── Multiprocessing │ ├── batch_multiprocessing.py │ ├── example_cells_1.tif │ ├── example_cells_2.tif │ └── example_multiprocessing.py ├── Vectorization │ ├── example_cells_1.tif │ ├── example_cells_1_segmented.npy │ └── example_vectorization.py ├── cluster_computation │ ├── README.md │ ├── batch_cluster.py │ ├── example_cells_1.tif │ ├── example_cells_2.tif │ └── example_cluster.py └── data_analysis │ ├── example_cells_1.tif │ ├── example_cells_1_green.npy │ ├── example_cells_1_segmented.npy │ ├── example_data_analysis.py │ └── example_data_analysis_solutions.py └── pre_tutorial ├── .ipynb_checkpoints ├── Short tutorial on functions-checkpoint.ipynb ├── arrays-and-numpy-checkpoint.ipynb ├── pre-tutorial-checkpoint.ipynb ├── pre_tutorial-checkpoint.ipynb └── tutorial-on-functions-checkpoint.ipynb ├── README.md ├── arrays-and-numpy.ipynb ├── ext_nuc_AP2_beta_subunit.tif ├── figA.jpeg ├── figC.png ├── figD.png ├── figE.png ├── figF.png ├── module_example.py ├── nuclei.png ├── nuclei.txt ├── pre-tutorial.ipynb ├── randimg.txt ├── results.txt └── tutorial-on-functions.ipynb /README.md: -------------------------------------------------------------------------------- 1 | Python Workshop - Image Processing 2 | =================================== 3 | 4 | **Please note that a new and improved version of the materials for this course is available [here](https://github.com/WhoIsJack/python-workshop-image-processing)!** 5 | 6 | 7 | ## Course Aims and Overview 8 | 9 | This course teaches the basics of bio-image processing, segmentation and analysis in python. It is based on tutorials that integrate explanations and exercises, enabling participants to build their own image analysis pipeline step by step. 10 | 11 | The `main_tutorial` uses single-cell segmentation of a confocal fluorescence microscopy image to illustrate key concepts from preprocessing to segmentation to data analysis. It includes a tutorial on how to apply such a pipeline to multiple images at once (batch processing). 12 | 13 | The main tutorial is complemented by the `pre-tutorial`, which reviews some basic python concepts using as an example a rat fibroblast image, and by the `optional_advanced_content`, which features further examples and tutorials on the topics of vectorization, multiprocessing, cluster computation and advanced data analysis. 14 | 15 | This course is aimed at people with basic to intermediate knowledge of python and basic knowledge of microscopy. For people with basic knowledge of image processing, the tutorials can be followed without attending the lectures. 16 | 17 | 18 | ## Instructions on following this course 19 | 20 | - If you have only very basic knowledge of python or if you are feeling a little rusty, you should start with the `pre-tutorial`, which includes three tutorials: one on numpy arrays, one on python functions and one on the basics of interacting with image data in Python. If you are more experienced, you may want to skim or skip the pre-tutorial. 21 | Note: The pre-tutorial is organized as an iPython notebook. 22 | 23 | - In the `main_tutorial`, it is recommended to follow the `tutorial_pipeline` first. By following the exercises, you should be able to implement your own segmentation pipeline. If you run into trouble, you can use the provided solutions as inspiration - however, it is *highly* recommended to spend a lot of time figuring things out yourself, as this is an important part of any programming exercise. If you are having a lot of trouble, you may want to use the `partial_solutions`, which give you some help yet still demand that you think about it yourself. After completing the segmentation pipeline, you can follow the `tutorial_pipeline_batch` to learn how to run your program for several images and collect all the results. 24 | Note: The main tutorial is organized simply as comments in an empty script. It is up to you to fill in the appropriate code. 25 | 26 | - Finally, the `advanced_content` contains an introductory example to three important techniques for making making your scripts faster and operating on large datasets, namely *vectorization*, *multiprocessing* and *cluster processing*. The `data_analysis` tutorial (currently in *BETA*!) is an introduction to piping segmentation results into more advanced statistical data analysis, including *feature extraction*, *PCA*, *clustering* and *graph-based analysis*. 27 | 28 | 29 | ## Concepts discussed in course lectures 30 | 31 | 1. **Basic Python (KS)** 32 | * Importing packages and modules 33 | * Reading files 34 | * Data and variable types 35 | * Importing data 36 | * Storing data in variables 37 | * Defining and using functions 38 | * Arrays, indexing, slicing 39 | * Control flow 40 | * Plotting images 41 | * Debugging by printing 42 | * Output formatting and writing files 43 | * Using the documentation 44 | 45 | 46 | 2. **Basics of BioImage Processing (KM)** 47 | * Images as numbers 48 | * Bit/colour depth 49 | * Colour maps and look up tables 50 | * Definition of Bio-image Analysis 51 | * Image Analysis definition for signal processing science 52 | * Image Analysis definition for biology 53 | * Algorithms and Workflows 54 | * Typical workflows in biology 55 | * Convolution and Filtering 56 | * Why do we do filtering? 57 | * Convolution in 1D, 2D and 3D 58 | * Pre-segmentation filtering 59 | * De-noising 60 | * Smoothing 61 | * Unsharp mask 62 | * Post-segmentation filtering 63 | * Tuning segmented structures 64 | * Mathematical morphology, erosion, dilation 65 | * Distance map 66 | * Watershed 67 | 68 | 3. **Introduction to the Tutorial Pipeline (JH)** 69 | * Automated Single-Cell Segmentation 70 | * Why? (Advantages of single-cell approaches) 71 | * How? (Standard segmentation pipeline build) 72 | * Preprocessing (smoothing, background subtraction) 73 | * Presegmentation (thresholding, seed detection) 74 | * Segmentation (seed expansion; e.g. watershed) 75 | * Postprocessing (removing artefacts, refining segmentation) 76 | * Quantification and analysis 77 | * What? (for the main tutorial: 2D spinning disc confocal fluorescence microscopy images of Zebrafish embryonic cells) 78 | * Who? (YOU!) 79 | 80 | 3. **Advanced material** 81 | * CellProfiler to automate image analysis workflows and python plugin module **(VH)** 82 | * Code Optimisation (vectorisation, multiprocessing, cluster processing) & advanced data analysis **(JH)** 83 | 84 | 85 | ## Instructors 86 | 87 | - Karin Sasaki 88 | - EMBL Centre for Biological Modelling 89 | - Organiser of course, practical materials preparation, tutor, TA 90 | - Jonas Hartmann 91 | - Gilmour Lab, CBB, EMBL 92 | - Pipeline developer, practical materials preparation, tutor, TA 93 | - Kora Miura 94 | - EMBL Centre for Molecular and Cellular Imaging 95 | - Tutor 96 | - Volker Hilsenstein 97 | - Scientific officer at the ALMF 98 | - Tutor, TA (image processing) 99 | - Toby Hodges 100 | - Bio-IT, EMBL 101 | - TA (python) 102 | - Aliaksandr Halavatyi 103 | - Postdoc at the Pepperkik group 104 | - TA (programming/image processing) 105 | - Imre Gaspar 106 | - Staff scientists at the Ephrussi group 107 | - TA (programming/image processing) 108 | 109 | 110 | ## Feedback 111 | 112 | We welcome any feedback on this course! 113 | 114 | Feel free to contact us at *karin.sasaki@embl.de* or *jonas.hartmann@embl.de*. 115 | -------------------------------------------------------------------------------- /main_tutorial/.ipynb_checkpoints/pipeline-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## SECTION 0 - SET UP" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "1. Make sure that all your python and image files are in the same directory.\n", 15 | "\n", 16 | "2. Remember that you can develop this pipeline using a) a simple text editor and running it on the terminal, b) using Spyder or c) a Jupyter notebook. For all of them, you need to navigate to the directory where all your files are saved.\n", 17 | " \n", 18 | " * On the terminal, type cd dir_path, replacing dir_path for the path of the directory\n", 19 | " * On Spyder and Jupyter nootebook it can be done interactively.\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": null, 25 | "metadata": { 26 | "collapsed": true 27 | }, 28 | "outputs": [], 29 | "source": [] 30 | } 31 | ], 32 | "metadata": { 33 | "kernelspec": { 34 | "display_name": "Python 2", 35 | "language": "python", 36 | "name": "python2" 37 | }, 38 | "language_info": { 39 | "codemirror_mode": { 40 | "name": "ipython", 41 | "version": 2 42 | }, 43 | "file_extension": ".py", 44 | "mimetype": "text/x-python", 45 | "name": "python", 46 | "nbconvert_exporter": "python", 47 | "pygments_lexer": "ipython2", 48 | "version": "2.7.11" 49 | } 50 | }, 51 | "nbformat": 4, 52 | "nbformat_minor": 0 53 | } 54 | -------------------------------------------------------------------------------- /main_tutorial/README.md: -------------------------------------------------------------------------------- 1 | ## README on Main Tutorial 2 | 3 | ### DESCRIPTION 4 | This is a tutorial to exemplify fundamental concepts of automated image processing and segmentation, using python. 5 | 6 | This course assumes a basic knowledge of the Python Programming Language. For those at EMBL, this means that you have participated in *a* beginners course for programming, preferably a Python course. 7 | 8 | 9 | ### TASK 10 | Segmentation of 2D spinning-disc confocal fluorescence microscopy images of a membrane marker in Zebrafish early embryonic cells. 11 | 12 | 13 | ### REQUIREMENTS 14 | - Python 2.7 (we recommend the Anaconda distribution, which includes most of the required modules) 15 | - Modules: NumPy, SciPy, scikit-image, tifffile 16 | - A text/code editor 17 | 18 | 19 | ### HOW TO FOLLOW THIS TUTORIAL 20 | - Files you should have: 21 | - `tutorial_pipeline.py`; the tutorial Python script 22 | - `tutorial_pipeline_batch.py`; the tutorial introducing batch processing 23 | - The corresponding solutions and if desired partial solutions 24 | - `example_cells_1.tif` and `example_cells_2.tif`; these images are dual-color spinning-disc confocal confocal micrographs (40x) of two membrane-localised proteins during zebrafish early embryonic development (~10hpf). They have 2 colors and are 930 by 780 pixels. 25 | 26 | - The tutorial is self explanatory, indicating the steps to be taken next, what you should aim at achieving after carrying out those steps and the commands that could be used (there is not a one correct answer; these are suggestions). 27 | 28 | - With this exercise we want to encourage you to become a more independent programmer, so if there is a command you don't quite know how to use, make sure you read the documentation. To do so, most of the time it's enough to google for "python" and the name of the module or function you want to use. 29 | 30 | - If you are following this tutorial in class, if you have any questions, raise your hand and someone will come to help you. Otherwise, feel free to send your query to one of these two email addresses: 31 | *jonas.hartmann@embl.de* 32 | *karin.sasaki@embl.de* 33 | 34 | ### IMAGE PROCESSING CONCEPTS DISCUSSED IN THIS TUTORIAL 35 | - Loading and visualising images 36 | - Images are arrays of numbers; they can be indexed, sliced, etc... 37 | - Images contain 3 types of information: Intensity, Shape, Size (a good segmentation pipeline uses them all) 38 | - Preprocessing: smoothing, background substraction 39 | - Segmentation: adaptive thresholding, distance transformation, detection of maxima, watershed 40 | - Filtering: Discarding undesired objects, e.g. cells at the image boundaries 41 | - Analysis: Extracting measurements from segmentation 42 | - Saving output (segmentation, data, graphs) 43 | - Automation: How to apply a pipeline to all files in a directory (batch processing) 44 | 45 | ### PROGRAMMING CONCEPTS DISCUSSED/TRAINED IN THIS TUTORIAL 46 | - Python scripts, functions 47 | - Common variable types: numpy array, dictionaries 48 | - Control flow 49 | - Modules, packages, importing modules and packages and using them 50 | - Importing data 51 | - Using the documentation 52 | - Arrays and manipulation (dimensions, indexing, slicing, arithmetic) 53 | - Visualising images 54 | - Debugging by printing relevant information and plotting images at appropriate stages 55 | - Exporting data and writing files 56 | - Good practice 57 | 58 | -------------------------------------------------------------------------------- /main_tutorial/example_cells_1.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/main_tutorial/example_cells_1.tif -------------------------------------------------------------------------------- /main_tutorial/example_cells_2.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/main_tutorial/example_cells_2.tif -------------------------------------------------------------------------------- /main_tutorial/tutorial_pipeline_batch.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue Dec 22 00:12:38 2015 4 | 5 | @author: Created by Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | Edited by Karin Sasaki @ CBM @ EMBL Heidelberg 7 | 8 | @descript: This is the batch version of 'tutorial_pipeline.py', which is an 9 | example pipeline for the segmentation of 2D confocal fluorescence 10 | microscopy images of a membrane marker in confluent epithel-like 11 | cells. This batch version serves to illustrate how such a pipeline 12 | can be run automatically on multiple images that are saved in the 13 | current directory. 14 | 15 | The pipeline is optimized to run with the provided example images, 16 | which are dual-color spinning-disc confocal micrographs (40x) of 17 | two membrane-localized proteins during zebrafish early embryonic 18 | development (~10hpf). 19 | 20 | @requires: Python 2.7 21 | NumPy 1.9, SciPy 0.15 22 | scikit-image 0.11.2, tifffile 0.3.1 23 | """ 24 | 25 | #%% 26 | #------------------------------------------------------------------------------ 27 | # SECTION 0 - SET UP 28 | 29 | # 1. (Re)check that the segmentation pipeline works! 30 | # You should already have the final pipeline that segments and quantifies one of the example images. Make sure it is working by running it in one go from start to finish. Check that you do not get errors and that output is what you expect. 31 | 32 | # 2. Check that you have the right data! 33 | # We provide two images ('example_cells_1.tif' and 'example_cells_2.tif') to test the batch version of the pipeline. Make sure you have them both ready in the working directory. 34 | 35 | # 3. Deal with Python 2.7 legacy 36 | from __future__ import division 37 | 38 | # 4. EXERCISE: Import modules required by the pipeline 39 | # Array manipulation package numpy as np 40 | # Plotting package matplotlib.pyplot as plt 41 | # Image processing package scipy.ndimage as ndi 42 | 43 | 44 | #%% 45 | #------------------------------------------------------------------------------ 46 | # SECTION 1 - PACKAGE PIPELINE INTO A FUNCTION 47 | 48 | # The goal of this script is to repeatedly run the segmentation algorithm you 49 | # programmed in tutorial_pipeline.py. The easiest way of packaging code to run 50 | # it multiple times is to make it into a function. 51 | 52 | # EXERCISE 53 | # Define a function that... 54 | # ...takes one argument as input: a filename as a string 55 | # ...returns two outputs: the final segmentation and the quantified data 56 | # ...reports that it is finished with the current file just before returning the result. 57 | 58 | # To do this, you need to copy the pipeline you developed, up to section 8, and paste it inside the function. Since the pipeline should run without any supervision by the user, remove (or comment out) any instances where an image would be shown. You can also exclude section 4. Make sure everything is set up such that the function can be called and the entire pipeline will run with the filename that is passed to the function. 59 | 60 | # Recall that to define a new function the syntax is 61 | # def function_name(input arguments): 62 | # """function documentation string""" 63 | # function procedure 64 | # return [expression] 65 | 66 | 67 | #%% 68 | #------------------------------------------------------------------------------ 69 | # SECTION 2 - EXECUTION SCRIPT 70 | 71 | # Now that the pipeline function is defined, we can run it for each image file in a directory and collect the results as they are returned. 72 | 73 | 74 | #------------------------ 75 | # Part A - Get the current working directory 76 | #------------------------ 77 | 78 | # Define a variable 'input_dir' with the path to the current working directory, where the images should be saved. In principle, you can also specify any other path where you store your images. 79 | 80 | # (i) Import the function 'getcwd' from the module 'os' 81 | 82 | # (ii) Get the name of the current working directory with 'getcwd' 83 | 84 | 85 | #------------------------ 86 | # Part B - Generate a list of image filenames 87 | #------------------------ 88 | 89 | # (i) Make a list variable containing the names of all the files in the directory, using the function 'listdir' from the module 'os'. (Suggested variable name: 'filelist') 90 | 91 | # (ii) From the above list, collect the filenames of only those files that are tifs and allocate them to a new list variable named 'tiflist'. Here, it is useful to use a for-loop to loop over all the names in 'filelist' and to use an if-statement combined with slicing (indexing) to check if the current string ends with the characters '.tif'. 92 | 93 | # (iii) Double-check that you have the right files in tiflist. You can either print the number of files in the list, or print all the names in the list. 94 | 95 | 96 | #------------------------ 97 | # Part C - Loop over the tiflist, run the pipeline for each filename and collect the results 98 | #------------------------ 99 | 100 | 101 | # (i) Initialise two empty lists, 'all_results' and 'all_segmentations', where you will collect the quantifications and the segmented images, respectively, for each file. 102 | 103 | # (ii) Write a for-loop that goes through every file in the tiflist. Within this for loop, you should: 104 | 105 | # Run the pipeline function and allocate the output to new variables; remember that this pipeline returns two arguments, so you need two output variables. 106 | 107 | # Add the output to the variables 'all_results' and 'all_segmentations', respectively. You can use the '.append' method to add them to the lists. 108 | 109 | # (iii) Check your understanding: 110 | # Try to think about the complete data structure of 'all_results' and 'all_segmentations'. What type of variable are they? What type of variable to they contain? What data is contained within these variables? You can try printing things to fully understand the data structure. 111 | 112 | # (iv) [OPTIONAL] Exception handling 113 | # It would be a good idea to make sure that not everything fails (i.e. the program stops and you loose all data) if there is an error in just one file. To avoid this, you can include a "try-except block" in your loop. To learn about handling exceptions (errors) in this way, visit http://www.tutorialspoint.com/python/python_exceptions.htm and https://wiki.python.org/moin/HandlingExceptions. Also, remember to include a warning message when the pipeline fails and to print the name of the file that caused the error, making a diagnosis possible. To do this properly, you should use the function 'warn' from the module 'warnings'. Finally, you may want to count how many times the pipeline runs correctly and print that number at the end, informing the user how many out of the total number of images were successfully segmented. 114 | 115 | 116 | #------------------------ 117 | # Part D - Print a short summary 118 | #------------------------ 119 | 120 | # Find out how many cells in total were detected, from all the images: 121 | 122 | # (i) Initialise a counter 'num_cells' to 0 123 | 124 | # (ii) Use a for loop that goes through 'all_results'; 125 | 126 | # For each entry, identify how many cells were segmented in the image (e.g. by getting the length of the "cell_id" entry in the result dict). Add this length to the counter. 127 | 128 | # (iii) Print a statement that reports the final count of cells detected, for all images segmented. 129 | 130 | 131 | #------------------------ 132 | # Part E - Quick visualisation of results 133 | #------------------------ 134 | 135 | # (i) Plot a scatter plot for all data and save the image: 136 | 137 | # Loop through all_results and scatter plot 'cell_size' vs 'the red_membrane_mean'. Remember to use a for-loop and the function 'enumerate'. 138 | 139 | # Save the image to a png file using 'plt.savefig'. 140 | 141 | 142 | # (ii) [OPTIONAL] You may want to give cells from different images different colors: 143 | 144 | # Use the module 'cm' (for colormaps) from 'plt' and choose a colormap, e.g. 'jet'. 145 | 146 | # Create the colormap with the number of colors required for the different images (in this example just 2). You can use 'range' or 'np.linspace' to ensure that you will always have the correct number of colors required, irrespective of the number of images you run the pipeline on. This colormap needs to be defined before making the plots. 147 | 148 | # When generating the scatter plot, use the parameter 'color' to use a different color from your colormap for each image you iterate through. Using 'enumerate' for the iterations makes this easier. For more info on 'color' see the docs of 'plt.scatter'. 149 | 150 | 151 | #------------------------ 152 | # Part F - Save all the segmentations as a "3D" tif 153 | #------------------------ 154 | 155 | # (i) Convert 'all_segmentations' to a 3D numpy array (instead of a list of 2D arrays) 156 | 157 | # (ii) Save the result to a tif file using the 'imsave' function from the 'tifffile' module 158 | 159 | # (iii) Have a look at the file in Fiji/ImageJ. The quality of segmentation across multiple images (that you did not use to optimize the pipeline) tells you how robust your pipeline is. 160 | 161 | 162 | #------------------------ 163 | # Part G - Save the quantification data as a txt file 164 | #------------------------ 165 | 166 | # Saving your data as tab- or comma-separated text files allows you to import it into other programs (excel, Prism, R, ...) for further analysis and visualization. 167 | 168 | # (i) Open an empty file object using "with open" (as explained at the end of the pipeline tutorial). Specify the file format to '.txt' and the mode to write ('w'). 169 | 170 | # (ii) The headers of the data are the key names of the dict containing the result for each input image (i.e. 'cell_id', 'green_mean', etc.). Write them on the first line of the file, separated by tabs ('\t'). You need the methods 'string.join' and 'file.write'. 171 | 172 | # (iii) Loop through each filename in 'tiflist' (using a for-loop and enumerate of 'tiflist'). For each filename... 173 | 174 | # ...write the filename itself. 175 | 176 | # ...extract the corresponding results from 'all_results'. 177 | 178 | # ...iterate over all the cells (using a for-loop and 'enumerate' of 'resultsDict["cell_id"]') and... 179 | 180 | # ...write the data of the cell, separated by '\t'. 181 | 182 | 183 | 184 | #%% 185 | #------------------------------------------------------------------------------ 186 | # SECTION 4 - RATIOMETRIC NORMALIZATION TO CONTROL CHANNEL 187 | 188 | # To correct for technical variability it is often useful to have an internal control, e.g. some fluorophore that we expect to be the same between all analyzed conditions, and use it to normalize other measurements. 189 | 190 | # For example, we can assume that our green channel is just a generic membrane marker, whereas the red channel is a labelled protein of interest. Thus, using the red/green ratio instead of the raw values from the red channel may yield a clearer result when comparing intensity measurements of the red protein of interest between different images. 191 | 192 | #------------------------ 193 | # Part A - Create the ratio 194 | #------------------------ 195 | 196 | # (i) Calculate the ratio of red membrane mean intensity to green membrane mean intensity for each cell in each image. Add the results to the 'result_dict' of each image with a new key, for example 'red_green_mem_ratio'. 197 | 198 | 199 | #------------------------ 200 | # Part B - Make a scatter plot, this time with the ratio 201 | #------------------------ 202 | 203 | # (i) Recreate the scatterplot from Section 3, part E, but plotting the ratio over cell size rather than the red membrane mean intensity. 204 | 205 | # (ii) Compare the two plots. Does the outcome match your expectations? Can you explain the newly 'generated' outliers? 206 | 207 | # Note: Depending on the type of data and the question, normalizing with internal controls can be crucial to arrive at the correct conclusion. However, as you can see from the outliers here, a ratio is not always the ideal approach to normalization. When doing data analysis, you may want to spend some time thinking about how best to normalize your data. Testing different outcomes using 'synthetic' data (created using random number generators) can also help to confirm that your normalization (or your analysis in general) does not bias your results. 208 | 209 | 210 | #------------------------------------------------------------------------------ 211 | #------------------------------------------------------------------------------ 212 | # THIS IS THE END OF THE TUTORIAL. 213 | #------------------------------------------------------------------------------ 214 | #------------------------------------------------------------------------------ 215 | -------------------------------------------------------------------------------- /main_tutorial/tutorial_pipeline_batch_partial_solutions.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue Dec 22 00:12:38 2015 4 | 5 | @author: Created by Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | Edited by Karin Sasaki @ CBM @ EMBL Heidelberg 7 | 8 | @descript: This is the batch version of 'tutorial_pipeline.py', which is an 9 | example pipeline for the segmentation of 2D confocal fluorescence 10 | microscopy images of a membrane marker in confluent epithel-like 11 | cells. This batch version serves to illustrate how such a pipeline 12 | can be run automatically on multiple images that are saved in the 13 | current directory. 14 | 15 | The pipeline is optimized to run with the provided example images, 16 | which are dual-color spinning-disc confocal micrographs (40x) of 17 | two membrane-localized proteins during zebrafish early embryonic 18 | development (~10hpf). 19 | 20 | @requires: Python 2.7 21 | NumPy 1.9, SciPy 0.15 22 | scikit-image 0.11.2, tifffile 0.3.1 23 | """ 24 | 25 | 26 | #%% 27 | #------------------------------------------------------------------------------ 28 | # SECTION 0 - SET UP 29 | 30 | # 1. (Re)check that the segmentation pipeline works! 31 | # You should already have the final pipeline that segments and quantifies one of the example images. Make sure it is working by running it in one go from start to finish. Check that you do not get errors and that output is what you expect. 32 | 33 | # 2. Check that you have the right data! 34 | # We provide two images ('example_cells_1.tif' and 'example_cells_2.tif') to test the batch version of the pipeline. Make sure you have them both ready in the working directory. 35 | 36 | # 3. Deal with Python 2.7 legacy 37 | from __future__ import division 38 | 39 | # 4. EXERCISE: Import modules required by the pipeline 40 | import --- as np # Array manipulation package 41 | import --- as plt # Plotting package 42 | import --- as ndi # Image processing package 43 | 44 | 45 | #%% 46 | #------------------------------------------------------------------------------ 47 | # SECTION 1 - PACKAGE PIPELINE INTO A FUNCTION 48 | 49 | # The goal of this script is to repeatedly run the segmentation algorithm you 50 | # programmed in tutorial_pipeline.py. The easiest way of packaging code to run 51 | # it multiple times is to make it into a function. 52 | 53 | # EXERCISE 54 | # Define a function that... 55 | # ...takes one argument as input: a filename as a string 56 | # ...returns two outputs: the final segmentation and the quantified data 57 | # ...reports that it is finished with the current file just before returning the result. 58 | 59 | # To do this, you need to copy the pipeline you developed, up to section 8, and paste it inside the function. Since the pipeline should run without any supervision by the user, remove (or comment out) any instances where an image would be shown. You can also exclude section 4. Make sure everything is set up such that the function can be called and the entire pipeline will run with the filename that is passed to the function. 60 | 61 | # Recall that to define a new function the syntax is 62 | # def function_name(input arguments): 63 | # """function documentation string""" 64 | # function procedure 65 | # return [expression] 66 | 67 | --- pipeline(filename): 68 | 69 | # Report that the pipelineis being executed 70 | print " Starting pipeline for", filename 71 | 72 | # Import tif file 73 | import skimage.io as io # Image file manipulation module 74 | img = io.imread(filename) # Importing multi-color tif file 75 | 76 | # Slicing: We only work on one channel for segmentation 77 | green = img[0,:,:] 78 | 79 | 80 | #------------------------------------------------------------------------------ 81 | # PREPROCESSING AND SIMPLE CELL SEGMENTATION: 82 | # (I) SMOOTHING AND (II) ADAPTIVE THRESHOLDING 83 | 84 | # ------- 85 | # Part I 86 | # ------- 87 | 88 | # Gaussian smoothing 89 | sigma = 3 # Smoothing factor for Gaussian 90 | green_smooth = ndi.filters.gaussian_filter(green,sigma) # Perform smoothing 91 | 92 | 93 | # ------- 94 | # Part II 95 | # ------- 96 | 97 | # Create an adaptive background 98 | struct = ((np.mgrid[:31,:31][0] - 15)**2 + (np.mgrid[:31,:31][1] - 15)**2) <= 15**2 # Create a disk-shaped structural element 99 | from skimage.filters import rank # Import module containing mean filter function 100 | bg = rank.mean(green_smooth, selem=struct) # Run a mean filter over the image using the disc 101 | 102 | # Threshold using created background 103 | green_mem = green_smooth >= bg 104 | 105 | # Clean by morphological hole filling 106 | green_mem = ndi.binary_fill_holes(np.logical_not(green_mem)) 107 | 108 | 109 | #------------------------------------------------------------------------------ 110 | # IMPROVED CELL SEGMENTATION BY SEEDING AND EXPANSION: 111 | # (I) SEEDING BY DISTANCE TRANSFORM 112 | # (II) EXPANSION BY WATERSHED 113 | 114 | # ------- 115 | # Part I 116 | # ------- 117 | 118 | # Distance transform on thresholded membranes 119 | # Advantage of distance transform for seeding: It is quite robust to local 120 | # "holes" in the membranes. 121 | green_dt= ndi.distance_transform_edt(green_mem) 122 | 123 | # Dilating (maximum filter) of distance transform improves results 124 | green_dt = ndi.filters.maximum_filter(green_dt,size=10) 125 | 126 | # Retrieve and label the local maxima 127 | from skimage.feature import peak_local_max 128 | green_max = peak_local_max(green_dt,indices=False,min_distance=10) # Local maximum detection 129 | green_max = ndi.label(green_max)[0] # Labeling 130 | 131 | 132 | # ------- 133 | # Part II 134 | # ------- 135 | 136 | # Get the watershed function and run it 137 | from skimage.morphology import watershed 138 | green_ws = watershed(green_smooth,green_max) 139 | 140 | 141 | #------------------------------------------------------------------------------ 142 | # IDENTIFICATION OF CELL EDGES 143 | 144 | # Define the edge detection function 145 | def edge_finder(footprint_values): 146 | if (footprint_values == footprint_values[0]).all(): 147 | return 0 148 | else: 149 | return 1 150 | 151 | # Iterate the edge finder over the segmentation 152 | green_edges = ndi.filters.generic_filter(green_ws,edge_finder,size=3) 153 | 154 | 155 | #------------------------------------------------------------------------------ 156 | # POSTPROCESSING: REMOVING CELLS AT THE IMAGE BORDER 157 | 158 | # Create a mask for the image boundary pixels 159 | boundary_mask = np.ones_like(green_ws) # Initialize with all ones 160 | boundary_mask[1:-1,1:-1] = 0 # Set middle square to 0 161 | 162 | # Iterate over all cells in the segmentation 163 | current_label = 1 164 | for cell_id in np.unique(green_ws): 165 | 166 | # If the current cell touches the boundary, remove it 167 | if np.sum((green_ws==cell_id)*boundary_mask) != 0: 168 | green_ws[green_ws==cell_id] = 0 169 | 170 | # This is to keep the labeling continuous, which is cleaner 171 | else: 172 | green_ws[green_ws==cell_id] = current_label 173 | current_label += 1 174 | 175 | 176 | #------------------------------------------------------------------------------ 177 | # MEASUREMENTS: SINGLE-CELL AND MEMBRANE READOUTS 178 | 179 | # Initialize a dict for results of choice 180 | results = {"cell_id":[], "green_mean":[], "red_mean":[],"green_membrane_mean":[], 181 | "red_membrane_mean":[],"cell_size":[],"cell_outline":[]} 182 | 183 | # Iterate over segmented cells 184 | for cell_id in np.unique(green_ws)[1:]: 185 | 186 | # Mask the pixels of the current cell 187 | cell_mask = green_ws==cell_id 188 | edge_mask = np.logical_and(cell_mask,green_edges) 189 | 190 | # Get the current cell's values 191 | # Note that the original raw data is used for quantification! 192 | results["cell_id"].append(cell_id) 193 | results["green_mean"].append(np.mean(img[0,:,:][cell_mask])) 194 | results["red_mean"].append(np.mean(img[1,:,:][cell_mask])) 195 | results["green_membrane_mean"].append(np.mean(img[0,:,:][edge_mask])) 196 | results["red_membrane_mean"].append(np.mean(img[1,:,:][edge_mask])) 197 | results["cell_size"].append(np.sum(cell_mask)) 198 | results["cell_outline"].append(np.sum(edge_mask)) 199 | 200 | 201 | #------------------------------------------------------------------------------ 202 | # REPORT PROGRESS AND RETURN RESULTS 203 | 204 | print " Completed pipeline for", filename 205 | 206 | return green_ws, results 207 | 208 | 209 | #%% 210 | #------------------------------------------------------------------------------ 211 | # SECTION 2 - EXECUTION SCRIPT 212 | 213 | # Now that the pipeline function is defined, we can run it for each image file in a directory and collect the results as they are returned. 214 | 215 | 216 | #------------------------ 217 | # Part A - Get the current working directory 218 | #------------------------ 219 | 220 | # Define a variable 'input_dir' with the path to the current working directory, where the images should be saved. In principle, you can also specify any other path where you store your images. 221 | 222 | # (i) Import the function 'getcwd' from the module 'os' 223 | --- os --- getcwd 224 | 225 | # (ii) Get the name of the current working directory with 'getcwd' 226 | input_dir = ---- 227 | 228 | 229 | 230 | #------------------------ 231 | # Part B - Generate a list of image filenames 232 | #------------------------ 233 | 234 | # (i) Make a list variable containing the names of all the files in the directory, using the function 'listdir' from the module 'os'. (Suggested variable name: 'filelist') 235 | --- os --- listdir 236 | filelist = listdir(----) 237 | 238 | 239 | # (ii) From the above list, collect the filenames of only those files that are tifs and allocate them to a new list variable named 'tiflist'. Here, it is useful to use a for-loop to loop over all the names in 'filelist' and to use an if-statement combined with slicing (indexing) to check if the current string ends with the characters '.tif'. 240 | tiflist = [] 241 | --- filename --- filelist: 242 | --- filename[-4:]=='.tif': 243 | tiflist.---(filename) 244 | 245 | # (iii) Double-check that you have the right files in tiflist. You can either print the number of files in the list, or print all the names in the list. 246 | print "Found", ---(tiflist), "tif files in target directory" 247 | 248 | 249 | #------------------------ 250 | # Part C - Loop over the tiflist, run the pipeline for each filename and collect the results 251 | #------------------------ 252 | 253 | # (i) Initialise two empty lists, 'all_results' and 'all_segmentations', where you will collect the quantifications and the segmented images, respectively, for each file. 254 | all_results = [] 255 | all_segmentations = [] 256 | 257 | # (ii) Write a for-loop that goes through every file in the tiflist. Within this for loop, you should: 258 | print "Running batch processing" 259 | --- filename --- tiflist: # For every file in tiflist... 260 | 261 | # Run the pipeline function and allocate the output to new variables; remember that this pipeline returns two arguments, so you need two output variables. 262 | seg,results = ---(---) 263 | 264 | # Add the output to the variables 'all_results' and 'all_segmentations', respectively. You can use the '.append' method to add them to the lists. 265 | all_results.---(results) 266 | all_segmentations.---(seg) 267 | 268 | # (iii) Check your understanding: 269 | # Try to think about the complete data structure of 'all_results' and 'all_segmentations'. What type of variable are they? What type of variable to they contain? What data is contained within these variables? You can try printing things to fully understand the data structure. 270 | 271 | # (iv) [OPTIONAL] Exception handling 272 | # It would be a good idea to make sure that not everything fails (i.e. the program stops and you loose all data) if there is an error in just one file. To avoid this, you can include a "try-except block" in your loop. To learn about handling exceptions (errors) in this way, visit http://www.tutorialspoint.com/python/python_exceptions.htm and https://wiki.python.org/moin/HandlingExceptions. Also, remember to include a warning message when the pipeline fails and to print the name of the file that caused the error, making a diagnosis possible. To do this properly, you should use the function 'warn' from the module 'warnings'. Finally, you may want to count how many times the pipeline runs correctly and print that number at the end, informing the user how many out of the total number of images were successfully segmented. 273 | # see below 274 | 275 | 276 | # you can use the for loop below instead of the one above 277 | #--- filename --- tiflist: # For every file in tiflist... 278 | # 279 | # # Exception handling so the program can move on if one image fails for some reason. 280 | # try--- 281 | # 282 | # # Run the pipeline 283 | # seg,results = ---(---) 284 | # 285 | # # Add the results to our collection lists 286 | # all_results.---(results) 287 | # all_segmentations.---(seg) 288 | # 289 | # # Update the success counter 290 | # success_counter += 1 291 | # 292 | # # What to do if something goes wrong. 293 | # except Exception: 294 | # 295 | # # Warn the user, then carry on with the next file 296 | # from warnings import warn 297 | # warn("There was an exception in " --- filename + "!!!") 298 | 299 | 300 | 301 | #------------------------ 302 | # Part D - Print a short summary 303 | #------------------------ 304 | 305 | # Find out how many cells in total were detected, from all the images: 306 | 307 | # (i) Initialise a counter 'num_cells' to 0 308 | num_cells = 0 309 | 310 | # (ii) Use a for loop that goes through 'all_results'; 311 | for --- in ---: # For each image... 312 | 313 | # For each entry, identify how many cells were segmented in the image (e.g. by getting the length of the "cell_id" entry in the result dict). Add this length to the counter. 314 | num_cells = num_cells + len(---["cell_id"]) # ...add the number of cells detected. 315 | 316 | # (iii) Print a statement that reports the final count of cells detected, for all images segmented. 317 | print "Detected", ---, "cells in total" 318 | 319 | 320 | #------------------------ 321 | # Part E - Quick visualisation of results 322 | #------------------------ 323 | 324 | # (i) Plot a scatter plot for all data and save the image: 325 | 326 | # Loop through all_results and scatter plot 'cell_size' vs 'the red_membrane_mean'. Remember to use a for-loop and the function 'enumerate'. 327 | for image_id,resultDict in ---(all_results): 328 | # ...add the datapoints to the plot. 329 | plt.---(resultDict["cell_size"],resultDict[---]) 330 | 331 | # Label axes 332 | plt.x---("cell size") 333 | plt.---label("red membrane mean") 334 | 335 | # Save the image to a png file using 'plt.savefig'. 336 | plt.---('BATCH_all_cells_scatter.png', bbox_inches='tight') 337 | plt.show() 338 | 339 | 340 | # (ii) [OPTIONAL] You may want to give cells from different images different colors: 341 | 342 | # Use the module 'cm' (for colormaps) from 'plt' and choose a colormap, e.g. 'jet'. 343 | 344 | # Create the colormap with the number of colors required for the different images (in this example just 2). You can use 'range' or 'np.linspace' to ensure that you will always have the correct number of colors required, irrespective of the number of images you run the pipeline on. This colormap needs to be defined before making the plots. 345 | 346 | # When generating the scatter plot, use the parameter 'color' to use a different color from your colormap for each image you iterate through. Using 'enumerate' for the iterations makes this easier. For more info on 'color' see the docs of 'plt.scatter'. 347 | 348 | # Note: Use either the version below or the one above 349 | # Prepare colormap to color cells from each image differently 350 | colors = plt.---.jet(np.linspace(0,---,len(all_results))) 351 | 352 | # For each analyzed image... 353 | for image_id,resultDict in ---(all_results): 354 | 355 | # ...add the datapoints to the plot. 356 | plt.---(resultDict["cell_size"],resultDict[---],color=---[image_id]) 357 | 358 | # Label axes 359 | plt.x---("cell size") 360 | plt.---label("red membrane mean") 361 | 362 | # Show or save result 363 | plt.---('BATCH_all_cells_scatter.png', bbox_inches='tight') 364 | plt.show() 365 | 366 | 367 | #------------------------ 368 | # Part F - Save all the segmentations as a "3D" tif 369 | #------------------------ 370 | 371 | # (i) Convert 'all_segmentations' to a 3D numpy array (instead of a list of 2D arrays) 372 | all_segmentations = np.array(---) 373 | 374 | # (ii) Save the result to a tif file using the 'imsave' function from the 'tifffile' module 375 | from tifffile import imsave 376 | imsave("BATCH_segmentations.tif",---,bigtiff=True) 377 | 378 | # (iii) Have a look at the file in Fiji/ImageJ. The quality of segmentation across multiple images (that you did not use to optimize the pipeline) tells you how robust your pipeline is. 379 | 380 | 381 | #------------------------ 382 | # Part G - Save the quantification data as a txt file 383 | #------------------------ 384 | 385 | # Saving your data as tab- or comma-separated text files allows you to import it into other programs (excel, Prism, R, ...) for further analysis and visualization. 386 | 387 | # (i) Open an empty file object using "with open" (as explained at the end of the pipeline tutorial). Specify the file format to '.txt' and the mode to write ('w'). 388 | --- ---("BATCH_results.txt","w") --- txt_out: 389 | 390 | # (ii) The headers of the data are the key names of the dict containing the result for each input image (i.e. 'cell_id', 'green_mean', etc.). Write them on the first line of the file, separated by tabs ('\t'). You need the methods 'string.join' and 'file.write'. 391 | txt_out.---(''.---(key+'\t' for key in results.keys()) + '\n') 392 | 393 | # (iii) Loop through each filename in 'tiflist' (using a for-loop and enumerate of 'tiflist'). For each filename... 394 | for image_id,filename in ---(tiflist): 395 | 396 | # ...write the filename itself. 397 | txt_out.---(--- + "\n") 398 | 399 | # ...extract the corresponding results from 'all_results'. 400 | resultDict = all_results[---] 401 | 402 | # ...iterate over all the cells (using a for-loop and 'enumerate' of 'resultsDict["cell_id"]') and... 403 | for index,value in ---(resultDict["cell_id"]): 404 | 405 | # ...write the data of the cell, separated by '\t'. 406 | txt_out.---(''.---(str(resultDict[key][index])+'\t' for key in resultDict.keys()) + '\n') 407 | 408 | 409 | 410 | #%% 411 | #------------------------------------------------------------------------------ 412 | # SECTION 4 - RATIOMETRIC NORMALIZATION TO CONTROL CHANNEL 413 | 414 | # To correct for technical variability it is often useful to have an internal control, e.g. some fluorophore that we expect to be the same between all analyzed conditions, and use it to normalize other measurements. 415 | 416 | # For example, we can assume that our green channel is just a generic membrane marker, whereas the red channel is a labelled protein of interest. Thus, using the red/green ratio instead of the raw values from the red channel may yield a clearer result when comparing intensity measurements of the red protein of interest between different images. 417 | 418 | #------------------------ 419 | # Part A - Create the ratio 420 | #------------------------ 421 | 422 | # (i) Calculate the ratio of red membrane mean intensity to green membrane mean intensity for each cell in each image. Add the results to the 'result_dict' of each image with a new key, for example 'red_green_mem_ratio'. 423 | 424 | # For each image... 425 | for image_id,resultDict in ---(all_results): 426 | 427 | # Calculate red/green ratio and save it under a new key in result_dict. Done for each cell using list comprehension. 428 | all_results[image_id]["red_green_mem_ratio"] = [resultDict["---"][i] --- resultDict["---"][i] --- i --- range(len(resultDict["cell_id"]))] 429 | 430 | 431 | #------------------------ 432 | # Part B - Make a scatter plot, this time with the ratio 433 | #------------------------ 434 | # (i) Recreate the scatterplot from Section 3, part E, but plotting the ratio over cell size rather than the red membrane mean intensity. 435 | # Prepare colormap to color the cells of each image differently 436 | colors = plt.cm.jet(np.linspace(0,1,len(all_results))) 437 | 438 | # For each image... 439 | for ---,--- in ---(all_results): 440 | 441 | # ...add the data points to the scatter. 442 | plt.---(resultDict["---"],resultDict["---"],color=colors[image_id]) 443 | 444 | # Label axes 445 | plt.xlabel("cell size") 446 | plt.ylabel("red membrane mean") 447 | 448 | # Show or save result 449 | plt.---('BATCH_all_cells_ratio_scatter.png', bbox_inches='tight') 450 | plt.---() 451 | 452 | 453 | # (ii) Compare the two plots. Does the outcome match your expectations? Can you explain the newly 'generated' outliers? 454 | 455 | # Note: Depending on the type of data and the question, normalizing with internal controls can be crucial to arrive at the correct conclusion. However, as you can see from the outliers here, a ratio is not always the ideal approach to normalization. When doing data analysis, you may want to spend some time thinking about how best to normalize your data. Testing different outcomes using 'synthetic' data (created using random number generators) can also help to confirm that your normalization (or your analysis in general) does not bias your results. 456 | 457 | 458 | #------------------------------------------------------------------------------ 459 | #------------------------------------------------------------------------------ 460 | # THIS IS THE END OF THE TUTORIAL. 461 | #------------------------------------------------------------------------------ 462 | #------------------------------------------------------------------------------ 463 | -------------------------------------------------------------------------------- /main_tutorial/tutorial_pipeline_batch_solutions.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue Dec 22 00:12:38 2015 4 | 5 | @author: Created by Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | Edited by Karin Sasaki @ CBM @ EMBL Heidelberg 7 | 8 | @descript: This is the batch version of 'tutorial_pipeline.py', which is an 9 | example pipeline for the segmentation of 2D confocal fluorescence 10 | microscopy images of a membrane marker in confluent epithel-like 11 | cells. This batch version serves to illustrate how such a pipeline 12 | can be run automatically on multiple images that are saved in the 13 | current directory. 14 | 15 | The pipeline is optimized to run with the provided example images, 16 | which are dual-color spinning-disc confocal micrographs (40x) of 17 | two membrane-localized proteins during zebrafish early embryonic 18 | development (~10hpf). 19 | 20 | @requires: Python 2.7 21 | NumPy 1.9, SciPy 0.15 22 | scikit-image 0.11.2, tifffile 0.3.1 23 | """ 24 | 25 | 26 | #%% 27 | #------------------------------------------------------------------------------ 28 | # SECTION 0 - SET UP 29 | 30 | # 1. (Re)check that the segmentation pipeline works! 31 | # You should already have the final pipeline that segments and quantifies one of the example images. Make sure it is working by running it in one go from start to finish. Check that you do not get errors and that output is what you expect. 32 | 33 | # 2. Check that you have the right data! 34 | # We provide two images ('example_cells_1.tif' and 'example_cells_2.tif') to test the batch version of the pipeline. Make sure you have them both ready in the working directory. 35 | 36 | # 3. Deal with Python 2.7 legacy 37 | from __future__ import division 38 | 39 | # 4. EXERCISE: Import modules required by the pipeline 40 | import numpy as np # Array manipulation package 41 | import matplotlib.pyplot as plt # Plotting package 42 | import scipy.ndimage as ndi # Image processing package 43 | 44 | 45 | #%% 46 | #------------------------------------------------------------------------------ 47 | # SECTION 1 - PACKAGE PIPELINE INTO A FUNCTION 48 | 49 | # The goal of this script is to repeatedly run the segmentation algorithm you 50 | # programmed in tutorial_pipeline.py. The easiest way of packaging code to run 51 | # it multiple times is to make it into a function. 52 | 53 | # EXERCISE 54 | # Define a function that... 55 | # ...takes one argument as input: a filename as a string 56 | # ...returns two outputs: the final segmentation and the quantified data 57 | # ...reports that it is finished with the current file just before returning the result. 58 | 59 | # To do this, you need to copy the pipeline you developed, up to section 8, and paste it inside the function. Since the pipeline should run without any supervision by the user, remove (or comment out) any instances where an image would be shown. You can also exclude section 4. Make sure everything is set up such that the function can be called and the entire pipeline will run with the filename that is passed to the function. 60 | 61 | # Recall that to define a new function the syntax is 62 | # def function_name(input arguments): 63 | # """function documentation string""" 64 | # function procedure 65 | # return [expression] 66 | 67 | def pipeline(filename): 68 | 69 | # Report that the pipeline is being executed 70 | print " Starting pipeline for", filename 71 | 72 | # Import tif file 73 | import skimage.io as io # Image file manipulation module 74 | img = io.imread(filename) # Importing multi-color tif file 75 | 76 | # Slicing: We only work on one channel for segmentation 77 | green = img[0,:,:] 78 | 79 | 80 | #------------------------------------------------------------------------------ 81 | # PREPROCESSING AND SIMPLE CELL SEGMENTATION: 82 | # (I) SMOOTHING AND (II) ADAPTIVE THRESHOLDING 83 | 84 | # ------- 85 | # Part I 86 | # ------- 87 | 88 | # Gaussian smoothing 89 | sigma = 3 # Smoothing factor for Gaussian 90 | green_smooth = ndi.filters.gaussian_filter(green,sigma) # Perform smoothing 91 | 92 | 93 | # ------- 94 | # Part II 95 | # ------- 96 | 97 | # Create an adaptive background 98 | struct = ((np.mgrid[:31,:31][0] - 15)**2 + (np.mgrid[:31,:31][1] - 15)**2) <= 15**2 # Create a disk-shaped structural element 99 | from skimage.filters import rank # Import module containing mean filter function 100 | bg = rank.mean(green_smooth, selem=struct) # Run a mean filter over the image using the disc 101 | 102 | # Threshold using created background 103 | green_mem = green_smooth >= bg 104 | 105 | # Clean by morphological hole filling 106 | green_mem = ndi.binary_fill_holes(np.logical_not(green_mem)) 107 | 108 | 109 | #------------------------------------------------------------------------------ 110 | # IMPROVED CELL SEGMENTATION BY SEEDING AND EXPANSION: 111 | # (I) SEEDING BY DISTANCE TRANSFORM 112 | # (II) EXPANSION BY WATERSHED 113 | 114 | # ------- 115 | # Part I 116 | # ------- 117 | 118 | # Distance transform on thresholded membranes 119 | # Advantage of distance transform for seeding: It is quite robust to local 120 | # "holes" in the membranes. 121 | green_dt= ndi.distance_transform_edt(green_mem) 122 | 123 | # Dilating (maximum filter) of distance transform improves results 124 | green_dt = ndi.filters.maximum_filter(green_dt,size=10) 125 | 126 | # Retrieve and label the local maxima 127 | from skimage.feature import peak_local_max 128 | green_max = peak_local_max(green_dt,indices=False,min_distance=10) # Local maximum detection 129 | green_max = ndi.label(green_max)[0] # Labeling 130 | 131 | 132 | # ------- 133 | # Part II 134 | # ------- 135 | 136 | # Get the watershed function and run it 137 | from skimage.morphology import watershed 138 | green_ws = watershed(green_smooth,green_max) 139 | 140 | 141 | #------------------------------------------------------------------------------ 142 | # IDENTIFICATION OF CELL EDGES 143 | 144 | # Define the edge detection function 145 | def edge_finder(footprint_values): 146 | if (footprint_values == footprint_values[0]).all(): 147 | return 0 148 | else: 149 | return 1 150 | 151 | # Iterate the edge finder over the segmentation 152 | green_edges = ndi.filters.generic_filter(green_ws,edge_finder,size=3) 153 | 154 | 155 | #------------------------------------------------------------------------------ 156 | # POSTPROCESSING: REMOVING CELLS AT THE IMAGE BORDER 157 | 158 | # Create a mask for the image boundary pixels 159 | boundary_mask = np.ones_like(green_ws) # Initialize with all ones 160 | boundary_mask[1:-1,1:-1] = 0 # Set middle square to 0 161 | 162 | # Iterate over all cells in the segmentation 163 | current_label = 1 164 | for cell_id in np.unique(green_ws): 165 | 166 | # If the current cell touches the boundary, remove it 167 | if np.sum((green_ws==cell_id)*boundary_mask) != 0: 168 | green_ws[green_ws==cell_id] = 0 169 | 170 | # This is to keep the labeling continuous, which is cleaner 171 | else: 172 | green_ws[green_ws==cell_id] = current_label 173 | current_label += 1 174 | 175 | 176 | #------------------------------------------------------------------------------ 177 | # MEASUREMENTS: SINGLE-CELL AND MEMBRANE READOUTS 178 | 179 | # Initialize a dict for results of choice 180 | results = {"cell_id":[], "green_mean":[], "red_mean":[],"green_membrane_mean":[], 181 | "red_membrane_mean":[],"cell_size":[],"cell_outline":[]} 182 | 183 | # Iterate over segmented cells 184 | for cell_id in np.unique(green_ws)[1:]: 185 | 186 | # Mask the pixels of the current cell 187 | cell_mask = green_ws==cell_id 188 | edge_mask = np.logical_and(cell_mask,green_edges) 189 | 190 | # Get the current cell's values 191 | # Note that the original raw data is used for quantification! 192 | results["cell_id"].append(cell_id) 193 | results["green_mean"].append(np.mean(img[0,:,:][cell_mask])) 194 | results["red_mean"].append(np.mean(img[1,:,:][cell_mask])) 195 | results["green_membrane_mean"].append(np.mean(img[0,:,:][edge_mask])) 196 | results["red_membrane_mean"].append(np.mean(img[1,:,:][edge_mask])) 197 | results["cell_size"].append(np.sum(cell_mask)) 198 | results["cell_outline"].append(np.sum(edge_mask)) 199 | 200 | 201 | #------------------------------------------------------------------------------ 202 | # REPORT PROGRESS AND RETURN RESULTS 203 | 204 | print " Completed pipeline for", filename 205 | 206 | return green_ws, results 207 | 208 | 209 | #%% 210 | #------------------------------------------------------------------------------ 211 | # SECTION 2 - EXECUTION SCRIPT 212 | 213 | # Now that the pipeline function is defined, we can run it for each image file in a directory and collect the results as they are returned. 214 | 215 | 216 | #------------------------ 217 | # Part A - Get the current working directory 218 | #------------------------ 219 | 220 | # Define a variable 'input_dir' with the path to the current working directory, where the images should be saved. In principle, you can also specify any other path where you store your images. 221 | 222 | # Import 'getcwd' and retrieve working directory 223 | from os import getcwd 224 | input_dir = getcwd() 225 | 226 | 227 | #------------------------ 228 | # Part B - Generate a list of image filenames 229 | #------------------------ 230 | 231 | # Make a list of files in the directory 232 | from os import listdir 233 | filelist = listdir(input_dir) 234 | 235 | # Collect the file names only for files that are tifs 236 | 237 | # Note: This is an elegant solution using "list comprehension". 238 | tiflist = [filename for filename in filelist if filename[-4:]=='.tif'] 239 | 240 | # This is the more classical solution. It does exactly the same: 241 | tiflist = [] 242 | for filename in filelist: 243 | if filename[-4:]=='.tif': 244 | tiflist.append(filename) 245 | 246 | # Check that you have the right files in tiflist. 247 | print "Found", len(tiflist), "tif files in target directory" 248 | 249 | 250 | #------------------------ 251 | # Part C - Loop over the tiflist, run the pipeline for each filename and collect the results 252 | #------------------------ 253 | 254 | # Initialise 'all_results' and 'all_segmentations' to store output. 255 | all_results = [] 256 | all_segmentations = [] 257 | 258 | # Initialise a counter to count how many times the pipeline is run succesfully 259 | success_counter = 0 260 | 261 | # Run the actual batch processing 262 | print "Running batch processing" 263 | for filename in tiflist: # For every file in tiflist... 264 | 265 | # Exception handling so the program can move on if one image fails for some reason. 266 | try: 267 | 268 | # Run the pipeline 269 | seg,results = pipeline(filename) 270 | 271 | # Add the results to our collection lists 272 | all_results.append(results) 273 | all_segmentations.append(seg) 274 | 275 | # Update the success counter 276 | success_counter += 1 277 | 278 | # What to do if something goes wrong. 279 | except Exception: 280 | 281 | # Warn the user, then carry on with the next file 282 | from warnings import warn 283 | warn("There was an exception in " + filename + "!!!") 284 | 285 | 286 | #------------------------ 287 | # Part D - Print a short summary 288 | #------------------------ 289 | 290 | # How many images were successfully analyzed? 291 | print "Successfully analyzed", success_counter, "of", len(tiflist), "images" 292 | 293 | # How many cells were segmented in total. 294 | num_cells = 0 295 | for resultDict in all_results: # For each image... 296 | num_cells = num_cells + len(resultDict["cell_id"]) # ...add the number of cells detected. 297 | 298 | # Print a statement that reports the final count of cells detected, for all images segmented. 299 | print "Detected", num_cells, "cells in total" 300 | 301 | 302 | #------------------------ 303 | # Part E - Quick visualisation of results 304 | #------------------------ 305 | 306 | # Scatter plot of red membrane mean intensity over cell size. 307 | 308 | # Prepare colormap to color cells from each image differently 309 | colors = plt.cm.jet(np.linspace(0,1,len(all_results))) 310 | 311 | # For each analyzed image... 312 | for image_id,resultDict in enumerate(all_results): 313 | 314 | # ...add the datapoints to the plot. 315 | plt.scatter(resultDict["cell_size"],resultDict["red_membrane_mean"],color=colors[image_id]) 316 | 317 | # Label axes 318 | plt.xlabel("cell size") 319 | plt.ylabel("red membrane mean") 320 | 321 | # Show or save result 322 | plt.savefig('BATCH_all_cells_scatter.png', bbox_inches='tight') 323 | plt.show() 324 | 325 | 326 | #------------------------ 327 | # Part F - Save all the segmentations as a "3D" tif 328 | #------------------------ 329 | 330 | # Convert 'all_segmentations' to a 3D numpy array 331 | all_segmentations = np.array(all_segmentations) 332 | 333 | # Save the result to a tif file using the 'imsave' function from the 'tifffile' module 334 | from tifffile import imsave 335 | imsave("BATCH_segmentations.tif",all_segmentations,bigtiff=True) 336 | 337 | 338 | #------------------------ 339 | # Part G - Save the quantification data as a txt file 340 | #------------------------ 341 | 342 | # Open an empty file using the context manager ('with') 343 | with open("BATCH_results.txt","w") as txt_out: 344 | 345 | # Write the headers (note the use of a "list comprehension"-style in-line for-loop) 346 | txt_out.write(''.join(key+'\t' for key in results.keys()) + '\n') 347 | 348 | # For each analyzed image... 349 | for image_id,filename in enumerate(tiflist): 350 | 351 | # ...write the filename 352 | txt_out.write(filename + "\n") 353 | 354 | # ...extract the corresponding results 355 | resultDict = all_results[image_id] 356 | 357 | # ...iterate over cells... 358 | for index,value in enumerate(resultDict["cell_id"]): 359 | 360 | # ...and write cell data (note the use of a "list comprehension"-style in-line for-loop) 361 | txt_out.write(''.join(str(resultDict[key][index])+'\t' for key in resultDict.keys()) + '\n') 362 | 363 | 364 | #%% 365 | #------------------------------------------------------------------------------ 366 | # SECTION 4 - RATIOMETRIC NORMALIZATION TO CONTROL CHANNEL 367 | 368 | # To correct for technical variability it is often useful to have an internal control, e.g. some fluorophore that we expect to be the same between all analyzed conditions, and use it to normalize other measurements. 369 | 370 | # For example, we can assume that our green channel is just a generic membrane marker, whereas the red channel is a labelled protein of interest. Thus, using the red/green ratio instead of the raw values from the red channel may yield a clearer result when comparing intensity measurements of the red protein of interest between different images. 371 | 372 | #------------------------ 373 | # Part A - Create the ratio 374 | #------------------------ 375 | 376 | # For each image... 377 | for image_id,resultDict in enumerate(all_results): 378 | 379 | # Calculate red/green ratio and save it under a new key in result_dict. Done for each cell using list comprehension. 380 | all_results[image_id]["red_green_mem_ratio"] = [resultDict["red_membrane_mean"][i] / resultDict["green_membrane_mean"][i] for i in range(len(resultDict["cell_id"]))] 381 | 382 | 383 | #------------------------ 384 | # Part B - Make a scatter plot, this time with the ratio 385 | #------------------------ 386 | 387 | # Scatterplot of red/green ratio over cell size. 388 | 389 | # Prepare colormap to color the cells of each image differently 390 | colors = plt.cm.jet(np.linspace(0,1,len(all_results))) 391 | 392 | # For each image... 393 | for image_id,resultDict in enumerate(all_results): 394 | 395 | # ...add the data points to the scatter. 396 | plt.scatter(resultDict["cell_size"],resultDict["red_green_mem_ratio"],color=colors[image_id]) 397 | 398 | # Label axes 399 | plt.xlabel("cell size") 400 | plt.ylabel("red membrane mean") 401 | 402 | # Show or save result 403 | plt.savefig('BATCH_all_cells_ratio_scatter.png', bbox_inches='tight') 404 | plt.show() 405 | 406 | 407 | #------------------------------------------------------------------------------ 408 | #------------------------------------------------------------------------------ 409 | # THIS IS THE END OF THE TUTORIAL. 410 | #------------------------------------------------------------------------------ 411 | #------------------------------------------------------------------------------ 412 | -------------------------------------------------------------------------------- /main_tutorial/tutorial_pipeline_solutions.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue Dec 22 00:12:38 2015 4 | 5 | @author: Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | Edited by Karin Sasaki @ CBM @ EMBL Heidelberg 7 | 8 | @descript: This is an example pipeline for the segmentation of 2D confocal 9 | fluorescence microscopy images of a membrane marker in confluent 10 | epithel-like cells. It exemplifies many fundamental concepts of 11 | automated image processing and segmentation. 12 | 13 | The pipeline is optimized to run with the provided example images, 14 | which are dual-color spinning-disc confocal micrographs (40x) of 15 | two membrane-localized proteins during zebrafish early embryonic 16 | development (~10hpf). 17 | 18 | 'tutorial_pipeline_batch.py' shows how the same pipeline could be 19 | adapted to run automatically on multiple images in a directory. 20 | 21 | @requires: Python 2.7 22 | NumPy 1.9, SciPy 0.15 23 | scikit-image 0.11.2, tifffile 0.3.1 24 | """ 25 | 26 | #%% 27 | #------------------------------------------------------------------------------ 28 | # SECTION 0 - SET UP 29 | 30 | # 1. Remember that you can develop this pipeline using 31 | # a) a simple text editor and running it on the terminal 32 | # b) the Spyder IDE or 33 | # c) a Jupyter notebook. 34 | 35 | # 2. Make sure that all your python and image files are in the same directory, then make that directory your working directory. 36 | # - On the terminal, type "cd dir_path", replacing dir_path for the path of the directory 37 | # - In Spyder and Jupyter nootebook it can be done interactively. 38 | 39 | # 3. Python is continuously under development to get better and better. In some rare cases, these new improvements need to be specifically imported to be active. One such case is the division operation in Python 2.7, which has some undesired behavior for the division of integer numbers. We can easily fix this by importing the new and improved division function from Python 3. It makes sense to do this at the start of all Python 2.7 scripts. 40 | from __future__ import division 41 | 42 | # 4. This script consists of explanations and exercises that guide you to complete the pipeline. It is designed to give you a guided experience of what "real programming" is like. This is one of the reasons why the pre-tutorial is provided as a Jupyter Notebook, but this main tutorial is not; we, and our colleagues, mostly develop programs using a text editor and the terminal. In that same spirit, if you already have access to the solutions, we recommend that you try to solve the tutorial alone, without looking at them. 43 | 44 | # 5. If you are not feeling comfortable with the exercises, there is a partially-solved version that you can also follow. 45 | 46 | 47 | #%% 48 | #------------------------------------------------------------------------------ 49 | # SECTION 1 - IMPORT MODULES AND PACKAGES 50 | 51 | from __future__ import division # Python 2.7 legacy 52 | import numpy as np # Array manipulation package 53 | import matplotlib.pyplot as plt # Plotting package 54 | import scipy.ndimage as ndi # Image processing package 55 | 56 | 57 | #%% 58 | #------------------------------------------------------------------------------ 59 | # SECTION 2 - IMPORT AND PREPARE DATA 60 | 61 | # Image processing essentially means carrying out mathematical operations on images. For this purpose, it is useful to represent image data in orderly data structures called "arrays" for which many mathematical operations are well defined. Arrays are grids with rows and columns that are filled with numbers; in the case of image data, those numbers correspond to the pixel values of the image. Arrays can have any number of dimensions (or "axes"). For example, a 2D array could represent the x and y axis of a normal image, a 3D array could contain a z-stack (xyz), a 4D array could also have multiple channels for each image (xyzc) and a 5D array could have time on top of that (xyzct). 62 | 63 | # EXERCISE 64 | # We will now proceed to import the image data, verifying we get what we expect and specifying the data we will work with. Before you start, it makes sense to have a quick look at the data in Fiji/imagej so you know what you are working with. 65 | 66 | # Specify filename 67 | filename = "example_cells_1.tif" 68 | 69 | # Import tif files 70 | import skimage.io as io # Image file manipulation module 71 | img = io.imread(filename) # Importing multi-color tif file 72 | 73 | # Check that everything is in order 74 | print type(img) # Check that img is a variable of type ndarray 75 | print img.dtype # Check data type is 8uint 76 | print "Loaded array has shape", img.shape # Printing array shape; 2 colors, 930 by 780 pixels 77 | 78 | # Show image 79 | plt.imshow(img[0,:,:],interpolation='none',cmap='gray') # Showing one of the channels (notice "interpolation='none'"!) 80 | plt.show() 81 | 82 | # Slicing: We only work on one channel for segmentation 83 | green = img[0,:,:] 84 | 85 | 86 | #%% 87 | #------------------------------------------------------------------------------ 88 | # SECTION 3 - PREPROCESSING AND SIMPLE CELL SEGMENTATION: 89 | # (I) SMOOTHING AND (II) ADAPTIVE THRESHOLDING 90 | 91 | # ------- 92 | # Part I 93 | # ------- 94 | 95 | # Gaussian smoothing 96 | sigma = 3 # Smoothing factor for Gaussian 97 | green_smooth = ndi.filters.gaussian_filter(green,sigma) # Perform smoothing 98 | 99 | # visualise 100 | plt.imshow(green_smooth,interpolation='none',cmap='gray') 101 | plt.show() 102 | 103 | 104 | # ------- 105 | # Part II 106 | # ------- 107 | 108 | # Create an adaptive background 109 | struct = ((np.mgrid[:31,:31][0] - 15)**2 + (np.mgrid[:31,:31][1] - 15)**2) <= 15**2 # Create a disk-shaped structural element 110 | from skimage.filters import rank # Import module containing mean filter function 111 | bg = rank.mean(green_smooth, selem=struct) # Run a mean filter over the image using the disc 112 | 113 | # Threshold using created background 114 | green_mem = green_smooth >= bg 115 | 116 | # Clean by morphological hole filling 117 | green_mem = ndi.binary_fill_holes(np.logical_not(green_mem)) 118 | 119 | # Show the result 120 | plt.imshow(green_mem,interpolation='none',cmap='gray') 121 | plt.show() 122 | 123 | 124 | 125 | #%% 126 | #------------------------------------------------------------------------------ 127 | # SECTION 4 - CONNECTED COMPONENTS LABELING (OR: "WE COULD BE DONE NOW") 128 | 129 | # If the data is clean and we just want a very quick cell or membrane segmentation, we could be done now. All we would still need to do is to label the individual cells - in other words, to give each separate "connected component" an individual number. 130 | 131 | # Labeling connected components 132 | green_components = ndi.label(green_mem)[0] 133 | plt.imshow(green_components,interpolation='none', cmap='gray') 134 | plt.show() 135 | 136 | # The result you get here should look not to bad but will likely still have some problems. For example, some cells will be connected because there were small gaps between them in the membrane. Also, the membranes themselves are not partitioned to the individual cells, so we cannot make measurements of membrane intensities for each cell. These problems can be resolved by means of a "seeding-expansion" strategy, which we will implement below. 137 | 138 | 139 | 140 | #%% 141 | #------------------------------------------------------------------------------ 142 | # SECTION 5 - IMPROVED CELL SEGMENTATION BY SEEDING AND EXPANSION: 143 | # (I) SEEDING BY DISTANCE TRANSFORM 144 | # (II) EXPANSION BY WATERSHED 145 | # 146 | # Part I - Seeding refers to the identification of 'seeds', a few pixels that can assigned to each particular cell with great certainty. If available, a channel showing the cell nuclei is often used for seeding. However, using the membrane segmentation we have developed above, we can also generate relatively reliable seeds without the need to image nuclei. 147 | # Part II - The generated seeds are expanded into regions of the image where the cell assignment is less clear-cut than in the seed region itself. The goal is to expand each seed exactly up to the borders of the corresponding cell, resulting in a full segmentation. The watershed technique is the most common algorithm for expansion. 148 | 149 | # ------- 150 | # Part I 151 | # ------- 152 | 153 | # Distance transform on thresholded membranes 154 | # Advantage of distance transform for seeding: It is quite robust to local 155 | # "holes" in the membranes. 156 | green_dt= ndi.distance_transform_edt(green_mem) 157 | plt.imshow(green_dt,interpolation='none') 158 | plt.show() 159 | 160 | # Dilating (maximum filter) of distance transform improves results 161 | green_dt = ndi.filters.maximum_filter(green_dt,size=10) 162 | plt.imshow(green_dt,interpolation='none') 163 | plt.show() 164 | 165 | # Retrieve and label the local maxima 166 | from skimage.feature import peak_local_max 167 | green_max = peak_local_max(green_dt,indices=False,min_distance=10) # Local maximum detection 168 | green_max = ndi.label(green_max)[0] # Labeling 169 | 170 | # Show maxima as masked overlay 171 | plt.imshow(green_smooth,cmap='gray',interpolation='none') 172 | plt.imshow(np.ma.array(green_max,mask=green_max==0),interpolation='none') 173 | plt.show() 174 | 175 | 176 | # ------- 177 | # Part II 178 | # ------- 179 | 180 | # Watershedding is a relatively simple but powerful algorithm for expanding seeds. The image intensity is considered as a topographical map (with high intensities being "mountains" and low intensities "valleys") and water is poured into the valleys, starting from each of the seeds. The water first labels the lowest intensity pixels around the seeds, then continues to fill up. The cell boundaries (the 'mountains') are where the "waterfronts" between different seeds ultimately touch and stop expanding. 181 | 182 | # Get the watershed function and run it 183 | from skimage.morphology import watershed 184 | green_ws = watershed(green_smooth,green_max) 185 | 186 | # Show result as transparent overlay 187 | # Note: For a better visualization, see "FINDING CELL EDGES" below! 188 | plt.imshow(green_smooth,cmap='gray',interpolation='none') 189 | plt.imshow(green_ws,interpolation='none',alpha=0.7) 190 | plt.show() 191 | 192 | # OBSERVATION 193 | # Note that the previously connected cells are now mostly separated and the membranes are partitioned to their respective cells. Depending on the quality of the seeding, however, there may now be some cases of oversegmentation (a single cell split into multiple segmentation objects). This is a typical example of the trade-off between specificity and sensitivity one always has to face in computational classification tasks. As an advanced task, you can try to think of ways to fuse the wrongly oversegmented cells back together. 194 | 195 | 196 | #%% 197 | #------------------------------------------------------------------------------ 198 | # SECTION 6 - IDENTIFICATION OF CELL EDGES 199 | 200 | # Now that we have a full cell segmentation, we can retrieve the cell edges, that is the pixels bordering neighboring cells. This is useful for many purposes; in our case, for example, edge intensities are a good measure of membrane intensity, which may be a desired readout. The length of the edge (relative to cell size) is also an informative feature about the cell shape. Finally, showing colored edges is a nice way of visualizing cell segmentations. 201 | 202 | # There are many ways of identifying edge pixels in a fully labeled segmentation. It can be done using erosion or dilation, for example, or it can be done in an extremely fast and fully vectorized way (for this, see "Vectorization" in the optional advanced content). Here, we use a slow but intuitive method that also serves to showcase the 'generic_filter' function in ndimage. 203 | 204 | # 'ndi.filters.generic_filter' is a powerful way of quickly iterating any function over numpy arrays (including functions that use a structuring element). 'generic_filter' iterates a structure element over all the values in an array and passes the corresponding values to a user-defined function. The result returned by this function is then allocated to the pixel in the image that corresponds to the origin of the se. Check the documentation to find out more about the arguments for 'generic_filter'. 205 | 206 | # Define the edge detection function 207 | def edge_finder(footprint_values): 208 | if (footprint_values == footprint_values[0]).all(): 209 | return 0 210 | else: 211 | return 1 212 | 213 | # Iterate the edge finder over the segmentation 214 | green_edges = ndi.filters.generic_filter(green_ws,edge_finder,size=3) 215 | 216 | # Label the detected edges based on the underlying cells 217 | green_edges_labeled = green_edges * green_ws 218 | 219 | # Show them as masked overlay 220 | plt.imshow(green_smooth,cmap='gray',interpolation='none') 221 | plt.imshow(np.ma.array(green_edges_labeled,mask=green_edges_labeled==0),interpolation='none') 222 | plt.show() 223 | 224 | 225 | 226 | 227 | #%% 228 | #------------------------------------------------------------------------------ 229 | # SECTION 7 - POSTPROCESSING: REMOVING CELLS AT THE IMAGE BORDER 230 | 231 | # Segmentation is never perfect and it often makes sense to remove artefacts afterwards. For example, one could filter out objects that are too small, have a very strange shape, or very strange intensity values. Note that this is equivalent to the removal of outliers in data analysis and should only be done for good reason and with caution. 232 | 233 | # As an example of postprocessing, we will now filter out a particular group of problematic cells: those that are cut off at the image border. 234 | 235 | 236 | # Create a mask for the image boundary pixels 237 | boundary_mask = np.ones_like(green_ws) # Initialize with all ones 238 | boundary_mask[1:-1,1:-1] = 0 # Set middle square to 0 239 | 240 | # Iterate over all cells in the segmentation 241 | current_label = 1 242 | for cell_id in np.unique(green_ws): 243 | 244 | # If the current cell touches the boundary, remove it 245 | if np.sum((green_ws==cell_id)*boundary_mask) != 0: 246 | green_ws[green_ws==cell_id] = 0 247 | 248 | # This is to keep the labeling continuous, which is cleaner 249 | else: 250 | green_ws[green_ws==cell_id] = current_label 251 | current_label += 1 252 | 253 | # Show result as transparent overlay 254 | plt.imshow(green_smooth,cmap='gray',interpolation='none') 255 | plt.imshow(np.ma.array(green_ws,mask=green_ws==0),interpolation='none',alpha=0.7) 256 | plt.show() 257 | 258 | 259 | #%% 260 | #------------------------------------------------------------------------------ 261 | # SECTION 8 - MEASUREMENTS: SINGLE-CELL AND MEMBRANE READOUTS 262 | 263 | # Now that the cells and membranes in the image are segmented, we can quantify various readouts for every cell individually. Readouts can be based on the intensity in different channels in the original image or on the size and shape of the cells themselves. 264 | 265 | # To exemplify how different properties of cells can be measured, we will quantify the following: 266 | # Cell ID (so all other measurements can be traced back to the cell that was measured) 267 | # Mean intensity of each cell, for each channel 268 | # Mean intensity at the membrane of each cell, for each channel 269 | # The cell size, in terms of the number of pixels that make up the cell 270 | # The cell outline length, in terms of the number of pixels that make up the cell boundary 271 | 272 | # We will use a dictionary to collect all the information in an orderly fashion. 273 | 274 | 275 | # Initialize a dict for results of choice 276 | results = {"cell_id":[], "green_mean":[], "red_mean":[],"green_membrane_mean":[], 277 | "red_membrane_mean":[],"cell_size":[],"cell_outline":[]} 278 | 279 | # Iterate over segmented cells 280 | for cell_id in np.unique(green_ws)[1:]: 281 | 282 | # Mask the pixels of the current cell 283 | cell_mask = green_ws==cell_id 284 | edge_mask = np.logical_and(cell_mask,green_edges) 285 | 286 | # Get the current cell's values 287 | # Note that the original raw data is used for quantification! 288 | results["cell_id"].append(cell_id) 289 | results["green_mean"].append(np.mean(img[0,:,:][cell_mask])) 290 | results["red_mean"].append(np.mean(img[1,:,:][cell_mask])) 291 | results["green_membrane_mean"].append(np.mean(img[0,:,:][edge_mask])) 292 | results["red_membrane_mean"].append(np.mean(img[1,:,:][edge_mask])) 293 | results["cell_size"].append(np.sum(cell_mask)) 294 | results["cell_outline"].append(np.sum(edge_mask)) 295 | 296 | 297 | #%% 298 | #------------------------------------------------------------------------------ 299 | # SECTION 9 - SIMPLE ANALYSIS AND VISUALIZATION 300 | 301 | # Now that you have collected the readouts to a dictionary you can analyse them in any way you wish. This section shows how to do basic plotting and analysis of the results, including mapping the data back onto the image (as a 'heatmap') and producing boxplots, scatterplots and a linear fit. A more in-depth example of how to couple image analysis into advanced data analysis can be found in 'data_analysis' in the 'optional_advanced_material' directory. 302 | 303 | 304 | # (i) Print out the results you want to see 305 | for key in results.keys(): 306 | print "\n" + key 307 | print results[key] 308 | 309 | 310 | # (ii) Make box plots of the cell and membrane intensities, for both channels. 311 | plt.boxplot([results[key] for key in results.keys()][2:-1],labels=results.keys()[2:-1]) 312 | plt.show() 313 | 314 | 315 | # (iii) 316 | # Make a scatter plot to show if there is a dependancy of the membrane intensity (either channel) on the cell size (for example) and add the linear fit to scatter plot, to see the correlation. 317 | 318 | # Import the module stats from scipy 319 | from scipy import stats 320 | 321 | # Linear fit of cell size vs membrane intensity 322 | linfit = stats.linregress(results["cell_size"],results["red_membrane_mean"]) 323 | 324 | # Make scatter plot. 325 | plt.scatter(results["cell_size"],results["red_membrane_mean"]) 326 | plt.xlabel("cell size") 327 | plt.ylabel("red_membrane_mean") 328 | 329 | # Define the equation of the line that fits the data, using an anonymous function 330 | fit = lambda x: linfit[0] * x + linfit[1] 331 | 332 | # Get the fitted values (for graph limits) 333 | ax = plt.gca() 334 | x_lims = ax.get_xlim() 335 | fit_vals = map(fit,x_lims) 336 | 337 | # Plot the line 338 | plt.gca().set_autoscale_on(False) # Prevent the figure from rescaling when line is added, by using plt.gca().set_autoscale_on(False) 339 | plt.plot(x_lims,fit_vals,'r-',lw=2) 340 | plt.show() 341 | 342 | 343 | 344 | # (iv) Print out results from stats analysis 345 | linnames = ["slope","intercept","r-value","p-value","stderr"] # Names of esults of stats.linregress 346 | print "\nLinear fit of cell size to red membrane intensity" # Header 347 | for index,value in enumerate(linfit): # For each value... 348 | print " " + linnames[index] + "\t\t" + str(value) # ...print the result 349 | print " r-squared\t\t" + str(linfit[2]**2) # Also print R-squared 350 | 351 | 352 | # (v) Map the cell size and cell membrane back onto the image. 353 | sizes_8bit = results["cell_size"] / max(results["cell_size"]) * 255 # Map to 8bit 354 | size_map = np.zeros_like(green_ws,dtype=np.uint8) # Initialize image 355 | for index,cell_id in enumerate(np.unique(green_ws)[1:]): # Iterate over cells 356 | size_map[green_ws==cell_id] = sizes_8bit[index] # Assign corresponding cell size to cell pixels 357 | 358 | 359 | plt.imshow(green_smooth,cmap='gray',interpolation='none') # Set grayscale background image 360 | plt.imshow(np.ma.array(size_map,mask=size_map==0),interpolation='none',alpha=0.7) # Colored overlay 361 | plt.show() 362 | 363 | 364 | # (vi) 365 | # Note that this seems to return a highly significant p-value but a very low 366 | # correlation coefficient (r-value). We also would not expect this correlation 367 | # to be present in our data. This should prompt several considerations: 368 | # 1) What does this p-value actually mean? See help(stats.linregress) 369 | # 2) Since we have not filtered properly for artefacts (e.g. "cells" of very 370 | # small size), they might bias this particular fit. 371 | # 3) We're now working with a lot of datapoints. This can skew statistical 372 | # analyses! To some extent, we can correct for this by multiple testing 373 | # correction and by comparison with randomized datasets. Additionaly, a 374 | # closer look at Bayesian statistics is highly recommended for people 375 | # working with large datasets. 376 | 377 | 378 | #%% 379 | #------------------------------------------------------------------------------ 380 | # SECTION 10 - MEASUREMENTS: WRITING OUTPUT 381 | 382 | # There are several ways of presenting the output of a program. Data can be saved to files in a human-readable format (e.g. text files (e.g. to import into Excel), images, etc), or written to language-specific files for future use (i.e. instead of having to run the whole program again). Here you will learn some of this possibilities. 383 | 384 | 385 | # (i) 386 | # Write an image to a tif (could be opened e.g. in Fiji) 387 | 388 | # Get file handling function 389 | from tifffile import imsave 390 | 391 | # Save array to tif 392 | imsave(filename+"_labeledEdges.tif",green_edges_labeled,bigtiff=True) 393 | 394 | 395 | 396 | # (ii) 397 | # Write a figure to a png or pdf 398 | 399 | # Recreate scatter plot from above 400 | plt.scatter(results["cell_size"],results["red_membrane_mean"]) 401 | plt.xlabel("cell size") 402 | plt.ylabel("red membrane mean") 403 | 404 | # Save to png (rasterized) 405 | plt.savefig(filename+'_scatter.png', bbox_inches='tight') 406 | 407 | # Save to pdf (vectorized) 408 | plt.savefig(filename+'_scatter.pdf', bbox_inches='tight') 409 | 410 | 411 | 412 | # (iii) 413 | # Write a python file that can be reloaded in other Python programs 414 | import json 415 | with open('k_resultsDict.json', 'w') as fp: 416 | json.dump(results, fp) 417 | 418 | # This could be loaded again in this way: 419 | #with open(filename+'_resultsDict.json', 'r') as fp: 420 | # results = json.load(fp) 421 | 422 | 423 | # (iv) 424 | # Write a text file of the numerical data gathered (could be opened e.g. in Excel) 425 | with open(filename+"_output.txt","w") as txt_out: # Open an empty file object (with context manager) 426 | txt_out.write(''.join(key+'\t' for key in results.keys()) + '\n') # Write the headers 427 | for index,value in enumerate(results["cell_id"]): # Iterate over cells 428 | txt_out.write(''.join(str(results[key][index])+'\t' for key in results.keys()) + '\n') # Write cell data 429 | 430 | 431 | #------------------------------------------------------------------------------ 432 | #------------------------------------------------------------------------------ 433 | # THIS IS THE END OF THE TUTORIAL. 434 | #------------------------------------------------------------------------------ 435 | #------------------------------------------------------------------------------ 436 | 437 | 438 | -------------------------------------------------------------------------------- /optional_advanced_content/Multiprocessing/batch_multiprocessing.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue Dec 22 00:12:38 2015 4 | 5 | @author: Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | 7 | @descript: A cleaned version of the batch segmentation pipeline, ready for 8 | multiprocessed execution. For further information, please see 9 | example_multiprocessing.py. 10 | 11 | @requires: Python 2.7 12 | NumPy 1.9, SciPy 0.15, scikit-image 0.11.3 13 | """ 14 | 15 | 16 | # IMPORT STUFF 17 | from __future__ import division # Python 2.7 legacy 18 | import numpy as np # Array manipulation package 19 | import scipy.ndimage as ndi # Image processing package 20 | 21 | 22 | #------------------------------------------------------------------------------ 23 | 24 | # PIPELINE FUNCTION 25 | 26 | def pipeline(filename): 27 | 28 | 29 | #------------------------------------------------------------------------------ 30 | 31 | # IMPORT AND SLICE DATA 32 | 33 | # Check if input filename exists, else return an error 34 | from os.path import isfile 35 | if not isfile(filename): 36 | from warnings import warn 37 | warn("Could not find file" + filename + '.tif') 38 | return "ERROR" 39 | 40 | # Import tif files 41 | import skimage.io as io # Image file manipulation module 42 | img = io.imread(filename) # Importing multi-color tif file 43 | img = np.array(img) # Converting MultiImage object to numpy array 44 | 45 | # Slicing: We only work on one channel for segmentation 46 | green = img[0,:,:] 47 | 48 | 49 | #------------------------------------------------------------------------------ 50 | 51 | # PREPROCESSING: SMOOTHING AND ADAPTIVE THRESHOLDING 52 | # It's standard to smoothen images to reduce technical noise - this improves 53 | # all subsequent image processing steps. Adaptive thresholding allows the 54 | # masking of foreground objects even if the background intensity varies across 55 | # the image. 56 | 57 | # Gaussian smoothing 58 | sigma = 3 # Smoothing factor for Gaussian 59 | green_smooth = ndi.filters.gaussian_filter(green,sigma) # Perform smoothing 60 | 61 | # Create an adaptive background 62 | #struct = ndi.iterate_structure(ndi.generate_binary_structure(2,1),24) # Create a diamond-shaped structural element 63 | struct = ((np.mgrid[:31,:31][0] - 15)**2 + (np.mgrid[:31,:31][1] - 15)**2) <= 15**2 # Create a disk-shaped structural element 64 | bg = ndi.filters.generic_filter(green_smooth,np.mean,footprint=struct) # Run a mean filter over the image using the disc 65 | 66 | # Threshold using created background 67 | green_thresh = green_smooth >= bg 68 | 69 | # Clean by morphological hole filling 70 | green_thresh = ndi.binary_fill_holes(np.logical_not(green_thresh)) 71 | 72 | # Show the result 73 | # plt.imshow(green_thresh,interpolation='none',cmap='gray') 74 | # plt.show() 75 | 76 | 77 | #------------------------------------------------------------------------------ 78 | 79 | # (SIDE NOTE: WE COULD BE DONE NOW) 80 | # If the data is very clean and/or we just want a quick look, we could simply 81 | # label all connected pixels now and consider the result our segmentation. 82 | 83 | # Labeling connected components 84 | # green_label = ndi.label(green_thresh)[0] 85 | # plt.imshow(green_label,interpolation='none') 86 | # plt.show() 87 | 88 | # However, to also partition the membranes to the cells, to generally improve 89 | # the segmentatation (e.g. split cells that end up connected here) and to 90 | # handle more complicated morphologies or to deal with lower quality data, 91 | # this approach is not sufficient. 92 | 93 | 94 | #------------------------------------------------------------------------------ 95 | 96 | # SEGMENTATION: SEEDING BY DISTANCE TRANSFORM 97 | # More advanced segmentation is usually a combination of seeding and expansion. 98 | # In seeding, we want to find a few pixels for each cell that we can assign to 99 | # said cell with great certainty. These 'seeds' are then expanded to partition 100 | # regions of the image where cell affiliation is less clear-cut. 101 | 102 | # Distance transform on thresholded membranes 103 | # Advantage of distance transform for seeding: It is quite robust to local 104 | # "holes" in the membranes. 105 | green_dt= ndi.distance_transform_edt(green_thresh) 106 | # plt.imshow(green_dt,interpolation='none') 107 | # plt.show() 108 | 109 | # Dilating (maximum filter) of distance transform improves results 110 | green_dt = ndi.filters.maximum_filter(green_dt,size=10) 111 | # plt.imshow(green_dt,interpolation='none') 112 | # plt.show() 113 | 114 | # Retrieve and label the local maxima 115 | from skimage.feature import peak_local_max 116 | green_max = peak_local_max(green_dt,indices=False,min_distance=10) # Local maximum detection 117 | green_max = ndi.label(green_max)[0] # Labeling 118 | 119 | # Show maxima as masked overlay 120 | # plt.imshow(green_smooth,cmap='gray',interpolation='none') 121 | # plt.imshow(np.ma.array(green_max,mask=green_max==0),interpolation='none') 122 | # plt.show() 123 | 124 | 125 | #------------------------------------------------------------------------------ 126 | 127 | # SEGMENTATION: EXPANSION BY WATERSHED 128 | # Watershedding is a relatively simple but powerful algorithm for expanding 129 | # seeds. The image intensity is considered as a topographical map (with high 130 | # intensities being "mountains" and low intensities "valleys") and water is 131 | # poured into the valleys from each of the seeds. The water first labels the 132 | # lowest intensity pixels around the seeds, then continues to fill up. The cell 133 | # boundaries are where the waterfronts between different seeds touch. 134 | 135 | # Get the watershed function and run it 136 | from skimage.morphology import watershed 137 | green_ws = watershed(green_smooth,green_max) 138 | 139 | # Show result as transparent overlay 140 | # Note: For a better visualization, see "FINDING CELL EDGES" below! 141 | # plt.imshow(green_smooth,cmap='gray',interpolation='none') 142 | # plt.imshow(green_ws,interpolation='none',alpha=0.7) 143 | # plt.show() 144 | 145 | # Notice that the previously connected cells are now mostly separated and the 146 | # membranes are partitioned to their respective cells. 147 | # ...however, we now see a few cases of oversegmentation! 148 | # This is a typical example of the trade-offs one has to face in any 149 | # computational classification task. 150 | 151 | 152 | #------------------------------------------------------------------------------ 153 | 154 | # POSTPROCESSING: REMOVING CELLS AT THE IMAGE BORDER 155 | # Since segmentation is never perfect, it often makes sense to remove artefacts 156 | # after the segmentation. For example, one could filter out cells that are too 157 | # big, have a strange shape, or strange intensity values. Similarly, supervised 158 | # machine learning can be used to identify cells of interest based on a 159 | # combination of various features. Another example of cells that should be 160 | # removed are those at the image boundary. 161 | 162 | # Create a mask for the image boundary pixels 163 | boundary_mask = np.ones_like(green_ws) # Initialize with all ones 164 | boundary_mask[1:-1,1:-1] = 0 # Set middle square to 0 165 | 166 | # Iterate over all cells in the segmentation 167 | current_label = 1 168 | for cell_id in np.unique(green_ws): 169 | 170 | # If the current cell touches the boundary, remove it 171 | if np.sum((green_ws==cell_id)*boundary_mask) != 0: 172 | green_ws[green_ws==cell_id] = 0 173 | 174 | # This is to keep the labeling continuous, which is cleaner 175 | else: 176 | green_ws[green_ws==cell_id] = current_label 177 | current_label += 1 178 | 179 | # Show result as transparent overlay 180 | # plt.imshow(green_smooth,cmap='gray',interpolation='none') 181 | # plt.imshow(np.ma.array(green_ws,mask=green_ws==0),interpolation='none',alpha=0.7) 182 | # plt.show() 183 | 184 | 185 | #------------------------------------------------------------------------------ 186 | 187 | # MEASUREMENTS: FINDING CELL EDGES 188 | # Finding cell edges is very useful for many purposes. In our example, edge 189 | # intensities are a measure of membrane intensities, which may be a desired 190 | # readout. The length of the edge (relative to cell size) is also a quite 191 | # informative feature about the cell shape. Finally, showing colored edges is 192 | # a nice way of visualizing segmentations. 193 | 194 | # How this works: The generic_filter function (see further below) iterates a 195 | # structure element (in this case a 3x3 square) over an image and passes all 196 | # the values within that element to some arbitrary function (in this case 197 | # edge_finder). The edge_finder function checks if all these pixels are the 198 | # same; if they are, the current pixel is not at an edge (return 0), otherwise 199 | # it is (return 1). generic_filter takes the returned values and organizes them 200 | # into an image again by setting the central pixel of each 3x3 square to the 201 | # respective return value from edge_finder. 202 | 203 | # Define the edge detection function 204 | def edge_finder(footprint_values): 205 | if (footprint_values == footprint_values[0]).all(): 206 | return 0 207 | else: 208 | return 1 209 | 210 | # Iterate the edge finder over the segmentation 211 | green_edges = ndi.filters.generic_filter(green_ws,edge_finder,size=3) 212 | 213 | # Label the detected edges based on the underlying cells 214 | # green_edges_labeled = green_edges * green_ws 215 | 216 | # Show them as masked overlay 217 | # plt.imshow(green_smooth,cmap='gray',interpolation='none') 218 | # plt.imshow(np.ma.array(green_edges_labeled,mask=green_edges_labeled==0),interpolation='none') 219 | # plt.show() 220 | 221 | 222 | #------------------------------------------------------------------------------ 223 | 224 | # MEASUREMENTS: SINGLE-CELL READOUTS 225 | # Now that the cells in the image are nicely segmented, we can quantify various 226 | # readouts for every cell individually. Readouts can be based on the intensity 227 | # in the original image, on intensities in other channels or on the size and 228 | # shape of the cells themselves. 229 | 230 | # Initialize a dict for results of choice 231 | results = {"cell_id":[], "green_mean":[], "red_mean":[],"green_mem_mean":[], 232 | "red_mem_mean":[],"cell_size":[],"cell_outline":[]} 233 | 234 | # Iterate over segmented cells 235 | for cell_id in np.unique(green_ws)[1:]: 236 | 237 | # Mask the pixels of the current cell 238 | cell_mask = green_ws==cell_id 239 | 240 | # Get the current cell's values 241 | # Note that the original raw data is used for quantification! 242 | results["cell_id"].append(cell_id) 243 | results["green_mean"].append(np.mean(img[0,:,:][cell_mask])) 244 | results["red_mean"].append(np.mean(img[1,:,:][cell_mask])) 245 | results["green_mem_mean"].append(np.mean(img[0,:,:][np.logical_and(cell_mask,green_edges)])) 246 | results["red_mem_mean"].append(np.mean(img[1,:,:][np.logical_and(cell_mask,green_edges)])) 247 | results["cell_size"].append(np.sum(cell_mask)) 248 | results["cell_outline"].append(np.sum(np.logical_and(cell_mask,green_edges))) 249 | 250 | 251 | #------------------------------------------------------------------------------ 252 | 253 | # Return result as tuple (important in multiprocessing) 254 | return (green_ws, results) 255 | 256 | 257 | #------------------------------------------------------------------------------ 258 | 259 | 260 | 261 | -------------------------------------------------------------------------------- /optional_advanced_content/Multiprocessing/example_cells_1.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/Multiprocessing/example_cells_1.tif -------------------------------------------------------------------------------- /optional_advanced_content/Multiprocessing/example_cells_2.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/Multiprocessing/example_cells_2.tif -------------------------------------------------------------------------------- /optional_advanced_content/Multiprocessing/example_multiprocessing.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Sat Mar 12 20:59:1 2016 4 | 5 | @author: Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | 7 | @descript: Multiprocessing is a simple way of increasing the speed of code if 8 | it is impossible, insufficient or otherwise undesirable to do so by 9 | vectorization. Essentially, multiprocessing simply means running 10 | different independent parts of a program (for example an function 11 | that is run again and again on different data) at the same time 12 | instead of sequentially in a loop. This is an example of using 13 | multiprocessing to run the batch pipeline from the main tutorial on 14 | multiple different images at the same time, instead of processing 15 | them one by one. 16 | 17 | In Python, multiprocessing is handled in the multiprocessing module. 18 | The easiest way of using it is to initialize a pool of "worker" 19 | processes, which are then available to run the functions passed to 20 | them (or "mapped onto them"). Although this is relatively easy to 21 | do, multiprocessing has some quirks that need to be payed attention: 22 | 23 | 1) Functions passed to worker processes can take at most one object 24 | as input and return at most one object as output. If multiple 25 | parameters need to be passed, they must be packaged into a single 26 | object first (and then unpacked at the beginning of the function). 27 | 28 | 2) If functions write to files, print output or display graphs, 29 | great care is advised during multiprocessing, as the different 30 | subprocesses may try to do these things at the same time, which 31 | may result in a garbled chaos or even a crash. 32 | 33 | 3) Every worker process will start out by automatically trying to 34 | set up the same "environment" as the main process. This 35 | effectively means that each subprocess tries to execute the main 36 | script again at the start, which could obviously have catastrophic 37 | consequences. To prevent this, the main script must be "protected". 38 | This is done through the built-in variable __name__, which has the 39 | value "__main__" if the script is called from the main process and 40 | a different value if it's called by a worker process. This can be 41 | exploited to make sure that the main script is not completely 42 | re-run by each subprocess (see the beginning of this script). 43 | 44 | The following describes an example of how to run the batch pipeline 45 | using N parallel processes. It requires batch_multiprocessing.py, 46 | which is a "cleaned" version of the batch pipeline that accommodates 47 | the three quirks mentioned above. Note that all code outside of the 48 | actual pipeline function has been deleted to avoid a similar problem 49 | to (3) during the import of the function (Python executes all 50 | non-protected code blocks in a module when that module is imported!). 51 | 52 | Execution of the following example for 4 copies of the same image 53 | takes ~73s on my machine (2 available cores). Running the 4 copies 54 | without multiprocessing would take ~144s. 55 | 56 | @requires: Python 2.7 57 | NumPy 1.9, SciPy 0.15, matplotlib 1.5.1, scikit-image 0.11.3 58 | 59 | """ 60 | 61 | # IMPORT BASIC MODULES 62 | 63 | from __future__ import division # Python 2.7 legacy 64 | import numpy as np # Array manipulation package 65 | import matplotlib.pyplot as plt # Plotting package 66 | import scipy.ndimage as ndi # Image processing package 67 | 68 | 69 | #------------------------------------------------------------------------------ 70 | 71 | # PROTECTION OF THIS SCRIPT FOR MULTIPROCESSING 72 | # When subprocesses are initialized, they will first try to run this main 73 | # script again (this is done to set up the environment/name space properly). 74 | # Since we do not want the following to be run again and again, we have to 75 | # protect it. 76 | # The built-in variable __name__ is automatically set to "__main__" in the main 77 | # process but has other values in the subprocesses, which means those processes 78 | # will ignore the code block within the following if-statement: 79 | 80 | if __name__ == '__main__': 81 | 82 | 83 | #-------------------------------------------------------------------------- 84 | 85 | # PREPARATION 86 | 87 | # Begin timing 88 | from time import time 89 | before = time() 90 | 91 | # Generate a list of image filenames (just as in main tutorial) 92 | from os import listdir, getcwd 93 | filelist = listdir(getcwd()) 94 | tiflist = [fname for fname in filelist if fname[-4:]=='.tif'] 95 | 96 | # Prepare for multiprocessing 97 | N = 4 # Maximum number of processes used 98 | import multiprocessing.pool # Import multiprocessing class 99 | currentPool = multiprocessing.Pool(processes=N) # Create a pool of worker processes 100 | from batch_multiprocessing import pipeline # Import cleaned pipeline function 101 | 102 | 103 | # EXECUTION 104 | 105 | # Here, the function pipeline is executed by the current pool of worker 106 | # processes for each parameter (filename) in the tiflist and the output is 107 | # written into the output_list. 108 | output_list = currentPool.map(pipeline,tiflist) 109 | 110 | # This is necessary clean-up to make sure that all worker subprocesses are 111 | # properly terminated. It's more of a "safety" thing, since things can 112 | # *really* go wrong in multiprocessing... 113 | currentPool.close() 114 | currentPool.join() 115 | 116 | # Reorganize the output into the same shape as in the batch tutorial 117 | all_results = [output[1] for output in output_list if output != "ERROR"] 118 | all_segmentations = [output[0] for output in output_list if output != "ERROR"] 119 | 120 | 121 | # DOWNSTREAM HANDLING 122 | 123 | # End Timing 124 | after = time() 125 | print after - before 126 | 127 | # See if it worked by printing the short summary 128 | print "\nSuccessfully analyzed", len(all_results), "of", len(tiflist), "images" 129 | print "Detected", sum([len(resultDict["cell_id"]) for resultDict in all_results]), "cells in total" 130 | 131 | # See if it worked by showing the scatterplot 132 | colors = plt.cm.jet(np.linspace(0,1,len(all_results))) # To give cells from different images different colors 133 | for image_id,resultDict in enumerate(all_results): 134 | plt.scatter(resultDict["cell_size"],resultDict["red_mem_mean"],color=colors[image_id]) 135 | plt.xlabel("cell size") 136 | plt.ylabel("red_mem_mean") 137 | plt.show() 138 | 139 | 140 | #-------------------------------------------------------------------------- 141 | 142 | 143 | 144 | -------------------------------------------------------------------------------- /optional_advanced_content/Vectorization/example_cells_1.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/Vectorization/example_cells_1.tif -------------------------------------------------------------------------------- /optional_advanced_content/Vectorization/example_cells_1_segmented.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/Vectorization/example_cells_1_segmented.npy -------------------------------------------------------------------------------- /optional_advanced_content/Vectorization/example_vectorization.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Sat Mar 12 16:28:01 2016 4 | 5 | @author: Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | 7 | @descript: An example of how vectorization can speed up data analysis code. 8 | Vectorization refers to the removal of iterations in favor of array 9 | operations. These generally compute much faster since they are 10 | implicitly parallel; if the same operation is done for all elements 11 | of an array independently - i.e. without a loop - multiple such 12 | operations can be done in parallel (or at least in quick succession 13 | without any overhead) on the CPU. 14 | 15 | This example is based on the cell segmentation generated in the 16 | main tutorial script, which is loaded from a npy file. The example 17 | shows a vectorized version of the edge pixel filter. 18 | 19 | @moreInfo: We previously used scipy.ndimage.generic_filter as a means of 20 | iterating over our segmentation and detecting the edges of each 21 | cell. Whilst generic_filter provides a means of fast array 22 | iteration, it would be much faster to use a vectorized approach, 23 | i.e. one that does not rely on iteration at all. 24 | 25 | One way to find cell border pixels without iterating is to generate 26 | all four possible "shifted-by-1" versions of the array. The edge 27 | pixels are those that do not have the same value in one of the 28 | shifted arrays as compared to the original. Since array comparison 29 | does not require iteration, this approach is bound to be much 30 | faster, especially for large arrays. 31 | 32 | However, there is a trade-off: generating 4 shifted copies of the 33 | image array requires a lot of memory, which can be a problem for 34 | big data. Such trade-offs between memory and speed are a common 35 | concern in code optimization. 36 | 37 | Note that vectorization is actually quite easy for a lot of common 38 | operations in programs; it just takes a bit of thinking and often 39 | some knowledge of linear algebra. However, there are also cases 40 | where the solution is not obvious or easily derived (this example 41 | is probably one such case - at least it was for me). In those 42 | cases, searching the internet for solutions to the problem (or a 43 | similar problem) is usually worth a try. 44 | 45 | @speed: This version takes ~0.016s to run on my machine, versus ~5.318s for 46 | the iteration-based implementation in the main tutorial. 47 | 48 | @requires: Python 2.7 49 | NumPy 1.9, scikit-image 0.11.3, matplotlib 1.5.1 50 | """ 51 | 52 | 53 | # PREPARATION 54 | 55 | # Module imports 56 | from __future__ import division # Python 2.7 legacy 57 | import numpy as np # Array manipulation package 58 | import matplotlib.pyplot as plt # Plotting package 59 | 60 | # Data import (segmentation from main tutorial) 61 | filename = 'example_cells_1' 62 | seg = np.load(filename+'_segmented.npy') 63 | 64 | # Begin timing 65 | from time import time 66 | before = time() 67 | 68 | 69 | ### EXECUTION 70 | 71 | # Padding adds values around the original array (here just 1 line of pixels) 72 | seg_pad = np.pad(seg,1,mode='reflect') 73 | 74 | # This generates a list of shifted-by-1 arrays by slicing sub-blocks out of the 75 | # padded original. 76 | seg_shifts = [seg_pad[:-2,:-2],seg_pad[:-2,2:],seg_pad[2:,:-2],seg_pad[2:,2:]] 77 | 78 | # Now it's just a matter of checking which pixels are different in a shifted 79 | # array compared to the original 80 | edges = np.zeros_like(seg) 81 | for shift in seg_shifts: 82 | edges[shift!=seg] = 1 83 | 84 | # Label the detected edges based on the underlying cells (as in the main tutorial) 85 | edges = edges * seg 86 | 87 | 88 | ### DOWNSTREAM HANDLING 89 | 90 | # End timing 91 | after = time() 92 | print after - before 93 | 94 | # Show result as masked overlay (as in the main tutorial) 95 | import skimage.io as io 96 | img = io.imread(filename+'.tif') 97 | plt.imshow(img[0,:,:],cmap='gray',interpolation='none') 98 | plt.imshow(np.ma.array(edges,mask=edges==0),interpolation='none') 99 | plt.show() 100 | 101 | 102 | 103 | -------------------------------------------------------------------------------- /optional_advanced_content/cluster_computation/README.md: -------------------------------------------------------------------------------- 1 | # Advanced Content 2 | 3 | ### Code Optimization 4 | 5 | Examples for how to speed up your code, relevant for anything that handles relatively large amounts of data (image analysis, data analysis, modeling, ...). There are scripts exemplifying three strategies: 6 | 7 | - **Vectorization** 8 | 9 | - Using the edge finder from the main script as an example, this demonstrates the drastic increase in speed that can often be achieved if operations are vectorized. 10 | 11 | - **Multiprocessing** 12 | 13 | - This shows how Python's multiprocessing module can be used to simultaneously run the batch pipeline from the main tutorial on several images. 14 | 15 | - **Cluster Processing** 16 | 17 | - An example of how to use a Python script to run another script multiple times with different input data. If run locally, this is very similar to multiprocessing, but with a bit of knowledge about high-performance cluster computing (see appropriate courses), this approach can be used to handle job submission and result collection on a computer cluster. 18 | 19 | It should be noted that one of the key aspects of code optimization is finding out *which part* of the code costs the most time and could be optimized for the greatest gain in speed. This is called *profiling* and there are a number of options for how to do it, both in the form of Python modules as well as built into IDEs like Spyder. Profiling is not discussed here, but as a very simple example the `time` module is used to test how long the different versions take to run. 20 | 21 | 22 | ### Advanced Data Analysis 23 | 24 | This tutorial illustrates how single-cell segmentation results can be piped into advanced data analysis. This is intended as a starting point for people to get into advanced data analysis with python. In particular, it shows off three important modules (scikit-learn, scipy.cluster and networkx) and illustrates a number of key concepts and methods (feature extraction, standardization/normalization, PCA and tSNE, clustering, graph representation). As a little bonus at the end, the xkcd-style plotting feature of matplotlib is shown. ;) 25 | 26 | *Important note: This tutorial is a **BETA** - it may contain bugs and other errors!* -------------------------------------------------------------------------------- /optional_advanced_content/cluster_computation/batch_cluster.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue Dec 22 00:12:38 2015 4 | 5 | @author: Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | 7 | @descript: A version of main tutorial segmentation pipeline optimized for 8 | cluster submission. The filename parameter for the pipeline 9 | function is retrieved from commandline input. See example_cluster.py 10 | for details. 11 | 12 | @requires: Python 2.7 13 | NumPy 1.9, SciPy 0.15, scikit-image 0.11.3 14 | """ 15 | 16 | 17 | # IMPORT STUFF 18 | from __future__ import division # Python 2.7 legacy 19 | import numpy as np # Array manipulation package 20 | import scipy.ndimage as ndi # Image processing package 21 | 22 | 23 | #------------------------------------------------------------------------------ 24 | 25 | # GET FILENAME FROM COMMANDLINE 26 | import sys 27 | filename = sys.argv[1] 28 | 29 | 30 | #------------------------------------------------------------------------------ 31 | 32 | # IMPORT AND SLICE DATA 33 | 34 | # Check if input filename exists, else create a file reporting the problem 35 | # and then terminate. 36 | from os.path import isfile 37 | if not isfile(filename): 38 | error = "ERROR: Could not find file " + str(filename) 39 | import json 40 | with open(filename[:-4]+'_out.json', 'w') as fp: 41 | json.dump(error, fp) 42 | raise NameError("Could not find file " + str(filename)) 43 | 44 | # Import tif files 45 | import skimage.io as io # Image file manipulation module 46 | img = io.imread(filename) # Importing multi-color tif file 47 | img = np.array(img) # Converting MultiImage object to numpy array 48 | 49 | # Slicing: We only work on one channel for segmentation 50 | green = img[0,:,:] 51 | 52 | 53 | #------------------------------------------------------------------------------ 54 | 55 | # PREPROCESSING: SMOOTHING AND ADAPTIVE THRESHOLDING 56 | # It's standard to smoothen images to reduce technical noise - this improves 57 | # all subsequent image processing steps. Adaptive thresholding allows the 58 | # masking of foreground objects even if the background intensity varies across 59 | # the image. 60 | 61 | # Gaussian smoothing 62 | sigma = 3 # Smoothing factor for Gaussian 63 | green_smooth = ndi.filters.gaussian_filter(green,sigma) # Perform smoothing 64 | 65 | # Create an adaptive background 66 | #struct = ndi.iterate_structure(ndi.generate_binary_structure(2,1),24) # Create a diamond-shaped structural element 67 | struct = ((np.mgrid[:31,:31][0] - 15)**2 + (np.mgrid[:31,:31][1] - 15)**2) <= 15**2 # Create a disk-shaped structural element 68 | bg = ndi.filters.generic_filter(green_smooth,np.mean,footprint=struct) # Run a mean filter over the image using the disc 69 | 70 | # Threshold using created background 71 | green_thresh = green_smooth >= bg 72 | 73 | # Clean by morphological hole filling 74 | green_thresh = ndi.binary_fill_holes(np.logical_not(green_thresh)) 75 | 76 | # Show the result 77 | # plt.imshow(green_thresh,interpolation='none',cmap='gray') 78 | # plt.show() 79 | 80 | 81 | #------------------------------------------------------------------------------ 82 | 83 | # (SIDE NOTE: WE COULD BE DONE NOW) 84 | # If the data is very clean and/or we just want a quick look, we could simply 85 | # label all connected pixels now and consider the result our segmentation. 86 | 87 | # Labeling connected components 88 | # green_label = ndi.label(green_thresh)[0] 89 | # plt.imshow(green_label,interpolation='none') 90 | # plt.show() 91 | 92 | # However, to also partition the membranes to the cells, to generally improve 93 | # the segmentatation (e.g. split cells that end up connected here) and to 94 | # handle more complicated morphologies or to deal with lower quality data, 95 | # this approach is not sufficient. 96 | 97 | 98 | #------------------------------------------------------------------------------ 99 | 100 | # SEGMENTATION: SEEDING BY DISTANCE TRANSFORM 101 | # More advanced segmentation is usually a combination of seeding and expansion. 102 | # In seeding, we want to find a few pixels for each cell that we can assign to 103 | # said cell with great certainty. These 'seeds' are then expanded to partition 104 | # regions of the image where cell affiliation is less clear-cut. 105 | 106 | # Distance transform on thresholded membranes 107 | # Advantage of distance transform for seeding: It is quite robust to local 108 | # "holes" in the membranes. 109 | green_dt= ndi.distance_transform_edt(green_thresh) 110 | # plt.imshow(green_dt,interpolation='none') 111 | # plt.show() 112 | 113 | # Dilating (maximum filter) of distance transform improves results 114 | green_dt = ndi.filters.maximum_filter(green_dt,size=10) 115 | # plt.imshow(green_dt,interpolation='none') 116 | # plt.show() 117 | 118 | # Retrieve and label the local maxima 119 | from skimage.feature import peak_local_max 120 | green_max = peak_local_max(green_dt,indices=False,min_distance=10) # Local maximum detection 121 | green_max = ndi.label(green_max)[0] # Labeling 122 | 123 | # Show maxima as masked overlay 124 | # plt.imshow(green_smooth,cmap='gray',interpolation='none') 125 | # plt.imshow(np.ma.array(green_max,mask=green_max==0),interpolation='none') 126 | # plt.show() 127 | 128 | 129 | #------------------------------------------------------------------------------ 130 | 131 | # SEGMENTATION: EXPANSION BY WATERSHED 132 | # Watershedding is a relatively simple but powerful algorithm for expanding 133 | # seeds. The image intensity is considered as a topographical map (with high 134 | # intensities being "mountains" and low intensities "valleys") and water is 135 | # poured into the valleys from each of the seeds. The water first labels the 136 | # lowest intensity pixels around the seeds, then continues to fill up. The cell 137 | # boundaries are where the waterfronts between different seeds touch. 138 | 139 | # Get the watershed function and run it 140 | from skimage.morphology import watershed 141 | green_ws = watershed(green_smooth,green_max) 142 | 143 | # Show result as transparent overlay 144 | # Note: For a better visualization, see "FINDING CELL EDGES" below! 145 | # plt.imshow(green_smooth,cmap='gray',interpolation='none') 146 | # plt.imshow(green_ws,interpolation='none',alpha=0.7) 147 | # plt.show() 148 | 149 | # Notice that the previously connected cells are now mostly separated and the 150 | # membranes are partitioned to their respective cells. 151 | # ...however, we now see a few cases of oversegmentation! 152 | # This is a typical example of the trade-offs one has to face in any 153 | # computational classification task. 154 | 155 | 156 | #------------------------------------------------------------------------------ 157 | 158 | # POSTPROCESSING: REMOVING CELLS AT THE IMAGE BORDER 159 | # Since segmentation is never perfect, it often makes sense to remove artefacts 160 | # after the segmentation. For example, one could filter out cells that are too 161 | # big, have a strange shape, or strange intensity values. Similarly, supervised 162 | # machine learning can be used to identify cells of interest based on a 163 | # combination of various features. Another example of cells that should be 164 | # removed are those at the image boundary. 165 | 166 | # Create a mask for the image boundary pixels 167 | boundary_mask = np.ones_like(green_ws) # Initialize with all ones 168 | boundary_mask[1:-1,1:-1] = 0 # Set middle square to 0 169 | 170 | # Iterate over all cells in the segmentation 171 | current_label = 1 172 | for cell_id in np.unique(green_ws): 173 | 174 | # If the current cell touches the boundary, remove it 175 | if np.sum((green_ws==cell_id)*boundary_mask) != 0: 176 | green_ws[green_ws==cell_id] = 0 177 | 178 | # This is to keep the labeling continuous, which is cleaner 179 | else: 180 | green_ws[green_ws==cell_id] = current_label 181 | current_label += 1 182 | 183 | # Show result as transparent overlay 184 | # plt.imshow(green_smooth,cmap='gray',interpolation='none') 185 | # plt.imshow(np.ma.array(green_ws,mask=green_ws==0),interpolation='none',alpha=0.7) 186 | # plt.show() 187 | 188 | 189 | #------------------------------------------------------------------------------ 190 | 191 | # MEASUREMENTS: FINDING CELL EDGES 192 | # Finding cell edges is very useful for many purposes. In our example, edge 193 | # intensities are a measure of membrane intensities, which may be a desired 194 | # readout. The length of the edge (relative to cell size) is also a quite 195 | # informative feature about the cell shape. Finally, showing colored edges is 196 | # a nice way of visualizing segmentations. 197 | 198 | # How this works: The generic_filter function (see further below) iterates a 199 | # structure element (in this case a 3x3 square) over an image and passes all 200 | # the values within that element to some arbitrary function (in this case 201 | # edge_finder). The edge_finder function checks if all these pixels are the 202 | # same; if they are, the current pixel is not at an edge (return 0), otherwise 203 | # it is (return 1). generic_filter takes the returned values and organizes them 204 | # into an image again by setting the central pixel of each 3x3 square to the 205 | # respective return value from edge_finder. 206 | 207 | # Define the edge detection function 208 | def edge_finder(footprint_values): 209 | if (footprint_values == footprint_values[0]).all(): 210 | return 0 211 | else: 212 | return 1 213 | 214 | # Iterate the edge finder over the segmentation 215 | green_edges = ndi.filters.generic_filter(green_ws,edge_finder,size=3) 216 | 217 | # Label the detected edges based on the underlying cells 218 | # green_edges_labeled = green_edges * green_ws 219 | 220 | # Show them as masked overlay 221 | # plt.imshow(green_smooth,cmap='gray',interpolation='none') 222 | # plt.imshow(np.ma.array(green_edges_labeled,mask=green_edges_labeled==0),interpolation='none') 223 | # plt.show() 224 | 225 | 226 | #------------------------------------------------------------------------------ 227 | 228 | # MEASUREMENTS: SINGLE-CELL READOUTS 229 | # Now that the cells in the image are nicely segmented, we can quantify various 230 | # readouts for every cell individually. Readouts can be based on the intensity 231 | # in the original image, on intensities in other channels or on the size and 232 | # shape of the cells themselves. 233 | 234 | # Initialize a dict for results of choice 235 | results = {"cell_id":[], "green_mean":[], "red_mean":[],"green_mem_mean":[], 236 | "red_mem_mean":[],"cell_size":[],"cell_outline":[]} 237 | 238 | # Iterate over segmented cells 239 | for cell_id in np.unique(green_ws)[1:]: 240 | 241 | # Mask the pixels of the current cell 242 | cell_mask = green_ws==cell_id 243 | 244 | # Get the current cell's values 245 | # Note that the original raw data is used for quantification! 246 | results["cell_id"].append(cell_id) 247 | results["green_mean"].append(np.mean(img[0,:,:][cell_mask])) 248 | results["red_mean"].append(np.mean(img[1,:,:][cell_mask])) 249 | results["green_mem_mean"].append(np.mean(img[0,:,:][np.logical_and(cell_mask,green_edges)])) 250 | results["red_mem_mean"].append(np.mean(img[1,:,:][np.logical_and(cell_mask,green_edges)])) 251 | results["cell_size"].append(np.sum(cell_mask)) 252 | results["cell_outline"].append(np.sum(np.logical_and(cell_mask,green_edges))) 253 | 254 | 255 | #------------------------------------------------------------------------------ 256 | 257 | # SAVE RESULTS 258 | 259 | import json 260 | with open(filename[:-4]+'_out.json', 'w') as fp: 261 | json.dump(results, fp) 262 | 263 | 264 | #------------------------------------------------------------------------------ 265 | 266 | 267 | 268 | -------------------------------------------------------------------------------- /optional_advanced_content/cluster_computation/example_cells_1.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/cluster_computation/example_cells_1.tif -------------------------------------------------------------------------------- /optional_advanced_content/cluster_computation/example_cells_2.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/cluster_computation/example_cells_2.tif -------------------------------------------------------------------------------- /optional_advanced_content/cluster_computation/example_cluster.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue Dec 22 00:12:38 2015 4 | 5 | @author: Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | 7 | @descript: Sometimes a single computer just doesn't cut it; a computer cluster 8 | is required. At EMBL, we have access to a high-performance 9 | computation (HPC) cluster with over 4000 CPUs, see: 10 | 11 | https://intranet.embl.de/it_services/services/computing/hpc_cluster/index.html 12 | 13 | The HPC cluster is used by submitting jobs from a (linux-based) 14 | server, using a queuing system called "LSF". IT offers courses on 15 | how this is done, and since many people are using the cluster, it 16 | is good to know what you are doing before trying it yourself, to 17 | avoid causing problems for others. 18 | 19 | For those who already know about LSF (or plan to learn about it), 20 | this is an example of how cluster computation could be handled with 21 | Python, using the batch processing pipeline established in the main 22 | tutorial. However, instead of submitting to the cluster, this 23 | script creates Python processes on the local machine, making it 24 | more or less equivalent to multi-processing. 25 | 26 | In principle, cluster handling requires two things: job submission 27 | and result collection. Here, the analysis pipeline is submitted 28 | with each image as a job and the resulting segmentations are 29 | collected when those jobs finish. Doing this on the clsuter would 30 | be slightly more complicated than doing it locally, but those who 31 | know about HPC/LSF should be able to figure it out. 32 | 33 | NOTE 1: This uses a "cleaned" version of the batch pipeline, which 34 | takes the input filename from a commandline argument and saves its 35 | output into a file. 36 | 37 | NOTE 2: This code is every so slightly dependent on the operating 38 | system used and on the paths of Python and the input files on the 39 | system. The current version is written for Windows and lines tagged 40 | as #OS! have to be adjusted for linux (and in some cases also for 41 | Windows machines if the paths are different). 42 | 43 | @requires: Python 2.7 44 | NumPy 1.9, SciPy 0.15, scikit-image 0.11.3, matpliotlib 1.5.1 45 | """ 46 | 47 | # IMPORT BASIC MODULES 48 | 49 | from __future__ import division # Python 2.7 legacy 50 | import numpy as np # Array manipulation package 51 | import matplotlib.pyplot as plt # Plotting package 52 | import json # Writing and reading python objects 53 | 54 | 55 | # PREPARATION 56 | 57 | # Generate a list of image filenames (just as in the batch tutorial) 58 | from os import listdir, getcwd 59 | filelist = listdir(getcwd()) 60 | tiflist = [fname for fname in filelist if fname[-4:]=='.tif'] 61 | 62 | 63 | # SUBMISSION 64 | 65 | # For each filename, use the commandline to execute the batch pipeline script 66 | # with that filename as an input. 67 | from os import system # Function to run commandline commands 68 | print "Submitting jobs..." 69 | for fname in tiflist: 70 | 71 | system('python batch_cluster.py "'+fname+'"') #OS! 72 | 73 | # For cluster submission, it would look something like this. However, note 74 | # that this is pseudo-code and would have to be adjusted at least slightly! 75 | #system("bsub -o out.txt -e error.txt 'python batch_cluster.py "+fname+"'") 76 | 77 | 78 | # RESULT COLLECTION 79 | 80 | all_results = [] # Initialize result list 81 | all_done = [] # This is used to check which images have already been processed 82 | errors = 0 # This is used to count errors 83 | 84 | # A while-loop to keep looking until all the output files have been retrieved. 85 | # Note that unexpected errors within the pipeline may cause this to become an 86 | # infinite loop; it would be better to implement this in a clean fashion that 87 | # handles exceptions in the pipelines properly (or at least stops automatically 88 | # after a certain amount of waiting time). 89 | while len(all_done) != len(tiflist): 90 | 91 | # Wait for 30 sec 92 | print "Waiting for results..." 93 | from time import sleep 94 | sleep(30) 95 | 96 | # Check for output files 97 | filelist = listdir(getcwd()) 98 | outlist = [fname for fname in filelist if '_out.json' in fname] 99 | all_done = all_done + outlist 100 | 101 | # For each available output file... 102 | for fname in outlist: 103 | 104 | # Load the output data 105 | with open(fname, 'r') as fp: 106 | results = json.load(fp) 107 | 108 | # Make sure errors are caught... 109 | if type(results) == str: 110 | errors += 1 111 | 112 | # If all is well, add the result to all other results 113 | else: 114 | all_results.append(results) 115 | 116 | # Then just delete the file; we don't need it anymore 117 | system('del "'+fname+'"') #OS! 118 | 119 | # Report on progress, then wait for a minute, then try again 120 | print "Retrieved", len(all_done), "result files of", len(tiflist), "with a total of", errors, "errors!" 121 | 122 | 123 | # DOWNSTREAM PROCESSING 124 | 125 | # See if it worked by printing the short summary 126 | print "\nSuccessfully analyzed", len(all_results), "of", len(tiflist), "images" 127 | print "Detected", sum([len(resultDict["cell_id"]) for resultDict in all_results]), "cells in total" 128 | 129 | # See if it worked by showing the scatterplot 130 | colors = plt.cm.jet(np.linspace(0,1,len(all_results))) # To give cells from different images different colors 131 | for image_id,resultDict in enumerate(all_results): 132 | plt.scatter(resultDict["cell_size"],resultDict["red_mem_mean"],color=colors[image_id]) 133 | plt.xlabel("cell size") 134 | plt.ylabel("red_mem_mean") 135 | plt.show() 136 | 137 | -------------------------------------------------------------------------------- /optional_advanced_content/data_analysis/example_cells_1.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/data_analysis/example_cells_1.tif -------------------------------------------------------------------------------- /optional_advanced_content/data_analysis/example_cells_1_green.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/data_analysis/example_cells_1_green.npy -------------------------------------------------------------------------------- /optional_advanced_content/data_analysis/example_cells_1_segmented.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/optional_advanced_content/data_analysis/example_cells_1_segmented.npy -------------------------------------------------------------------------------- /optional_advanced_content/data_analysis/example_data_analysis.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Sat Mar 12 16:28:01 2016 4 | 5 | @author: Jonas Hartmann @ Gilmour Group @ EMBL Heidelberg 6 | 7 | @descript: A crude introduction on how to pipeline single-cell segmentation 8 | data into downstream analyses such as clustering. 9 | 10 | For people with limited experience in data analysis, this script is 11 | intended as an inspiration and incentive to think about possible 12 | advanced analyses downstream of segmentation. Solving the exercises 13 | without help may be difficult, so it may be a good idea to have a 14 | look at the solutions to get some idea of how the problems should 15 | be approached. However, once the principles are understood, it is 16 | an important part of the learning experience to build one's own 17 | implementation. 18 | 19 | More experienced people can use this script as a starting point to 20 | exploring the data analysis packages provided for Python. It also 21 | illustrates that Python readily allows the construction of complete 22 | and consistent analysis pipelines, from image preprocessing to 23 | feature extraction to clustering (and back). The exercises will be 24 | doable, and are intended as an incentive to think about the concepts, 25 | also in regards to ones own data. 26 | 27 | There are a number of machine learning, clustering and other data 28 | analysis packages for Python. As a starting point, I recommend you 29 | look into the following: 30 | 31 | - scikit-learn (scikit-learn.org/stable/) 32 | - scipy.cluster (docs.scipy.org/doc/scipy/reference/cluster.html) 33 | - networkx (networkx.github.io/) 34 | 35 | For people interested in Bayesian methods (not covered here), I 36 | recommend the PyMC package (pymc-devs.github.io/pymc/). 37 | 38 | @WARNING: This exercise and the associated solutions are a BETA! They have 39 | been implemented in a limited amount of time and have not been 40 | tested extensively. Furthermore, the example data used is rather 41 | uniform in regards to many conventional features and thus is not 42 | ideal to illustrate clustering. Nevertheless, the principles and 43 | packages introduced here should serve as a good inspiration or 44 | starting point for further study. 45 | 46 | @requires: Python 2.7 47 | NumPy 1.9, scikit-image 0.11.3, matplotlib 1.5.1 48 | SciPy 0.16.0, scikit-learn 0.15.2, networkx 1.9.1 49 | """ 50 | 51 | 52 | #------------------------------------------------------------------------------ 53 | 54 | ### PREPARATION 55 | 56 | ### Module imports 57 | from __future__ import division # Python 2.7 legacy 58 | import numpy as np # Array manipulation package 59 | import matplotlib.pyplot as plt # Plotting package 60 | import scipy.ndimage as ndi # Multidimensional image operations 61 | 62 | 63 | ### Importing image and segmentation data from main tutorial 64 | 65 | # Note: Loading from .npy is faster, so do the following once: 66 | #filename = 'example_cells_1' 67 | #import skimage.io as IO 68 | #img = IO.imread(filename+'.tif')[0,:,:] 69 | #img = ndi.filters.gaussian_filter(img,3) # Smoothing 70 | #np.save(filename+'_green',img) 71 | 72 | # Loading from npy 73 | filename = 'example_cells_1' 74 | img = np.load(filename+'_green.npy') 75 | seg = np.load(filename+'_segmented.npy') 76 | 77 | # Some frequently used variables 78 | labels = np.unique(seg)[1:] # Labels of cells in segmentation 79 | N = len(labels) # Number of cells in segmentation 80 | 81 | 82 | #------------------------------------------------------------------------------ 83 | 84 | ### FEATURE EXTRACTION 85 | # As discussed in the main tutorial, we can measure various quantities for each 86 | # cell once the cells have been segmented. Any such quantity can be used as a 87 | # feature to classify or cluster cells. Besides explicitly measured quantities, 88 | # there are algorithms/packages that measure a whole bunch of features at once. 89 | 90 | # All the extracted features together are called the 'feature space'. Each 91 | # sample can be considered a point in a space that has as many dimensions as 92 | # there are features. The feature space should be arranged as an array that 93 | # has the shape (n_samples,n_features). 94 | 95 | # EXERCISE 1: 96 | # Come up with at least 4 different features and measure them for each cell in 97 | # the segmentation of the main tutorial. 98 | 99 | # Hint: For many measures of shape and spatial distribution, it is useful to 100 | # first calculate the centroid of the segmented object and then think of 101 | # features relative to it. 102 | 103 | # Hint: It can be advantageous to use measures that are largely independent of 104 | # or normalized for cell size, so this factor does not end up dominating 105 | # the features. Cells size itself can be a useful feature, though. 106 | 107 | # Hint: Don't forget that we detected the membranes of each cell in the main 108 | # script. Importing this data may be useful for the calculation of 109 | # various features. 110 | 111 | # Hint: Make sure you visualize your data! 112 | # It can be very useful to have a look at what a feature looks like when 113 | # mapped to the actual image. This may already show interesting patterns, 114 | # or should at least confirm that the extracted value is consistent with 115 | # its rationale. For example, one could show a feature as a color-coded 116 | # semi-transparent overlay over the actual image. 117 | # Furthermore, box and scatter plots are great options for investigating 118 | # how the values of a feature are distributed and how features relate to 119 | # each other.. 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | # EXERCISE 2: 128 | # Find and use a feature extraction algorithm that returns a large feature set 129 | # for each cell. The features could for example be related to shape or texture. 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | # Note: You can save and later reload the features spaces you've generated with 139 | # np.save and np.load, so you don't need to run the feature extraction each 140 | # time you run the script. 141 | 142 | 143 | 144 | #------------------------------------------------------------------------------ 145 | 146 | ### NORMALIZATION AND STANDARDIZATION 147 | # Many classification and clustering algorithms need features to be normalized 148 | # and/or standardized, otherwise the absolute size of the feature could affect 149 | # the result (for example, you could get a different result if you use cell 150 | # size in um or in pixels, because the absolute numbers are different). 151 | 152 | # Normalization in this context generally means scaling your features to each 153 | # range from 0 to 1. Standardization means centering the features around zero 154 | # and scaling them to "unit variance" by dividing by their standard deviation. 155 | # A more elaborate version of this which often provides a good starting point 156 | # is a "whitening transform", which is implemented in the scipy.cluster module. 157 | 158 | # It's worthwhile to read up on normalization/standardization so you avoid 159 | # introducing errors/biases. For example, normalization of data with outliers 160 | # will compress the 'real' data into a very small range. Thus, outliers should 161 | # be removed before normalization/standardization. 162 | 163 | # EXERCISE 3: 164 | # Find a way to remove outliers from your feature space. 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | # EXERCISE 4: 175 | # Standardize, normalize and/or whiten your feature space as you deem fit, 176 | # either by transforming the data yourself or using a module function. 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | # Note: Don't forget to visualize your data again and compare to the raw data! 186 | 187 | 188 | 189 | #------------------------------------------------------------------------------ 190 | 191 | ### PRINCIPAL COMPONENT ANALYSIS (PCA) 192 | # The principal components of a feature space are the axes of greatest variance 193 | # of the data. By transforming our data to this "relevance-corrected" coordinate 194 | # system, we can achieve two things: 195 | # 1) Usually, most of the variance in a dataset falls onto just a few principal 196 | # components, so we can ignore the other ones as irrelevant, thus reducing 197 | # the number of features whilst maintaining all information. This is very 198 | # useful to facilitate subsequent analyses. 199 | # 2) Just PCA on its own can yield nice results. For example, different cell 200 | # populations that are not clearly separated by any single feature may 201 | # appear separated along a principal component. Furthermore, principal 202 | # components may correlate with other features of the data, which can be an 203 | # interesting result on its own. 204 | 205 | # EXERCISE 5: 206 | # Perform a PCA on your feature space and investigate the results. 207 | 208 | # Hint: You may want to use the PCA implementation of scikit-learn 209 | # Algorithms in sklearn are provided as "estimator objects". The general 210 | # workflow for using them is to first instantiate the estimator object, 211 | # passing general parameters, then to fit the estimator to your data and 212 | # finally to extract various results from the fitted estimator. 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | #------------------------------------------------------------------------------ 225 | 226 | ### K-MEANS CLUSTERING 227 | # If you expect that you can split your population into distinct groups, an 228 | # easy way of doing so in an unsupervised fashion is k-means clustering. 229 | # K-means partitions samples into clusters based on their proximity to the 230 | # cluster's mean. 231 | 232 | # EXERCISE 6: 233 | # Perform k-means clustering on your data. To do so, you have to assume the 234 | # number of clusters. To begin with, just try it with 5 clusters. Try doing the 235 | # PCA for raw, normalized and PCA-transformed data to see the difference. Don't 236 | # forget to visualize your result. 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | # ADDITIONAL EXERCISE: 249 | # Can you think of and implement a simple way of objectively choosing the 250 | # number of clusters for k-means? 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | #------------------------------------------------------------------------------ 263 | 264 | ### tSNE ANALYSIS 265 | # Although PCA is great to reduce and visualize high-dimensional data, it only 266 | # works well on linear relationships and global trends. Therefore, alternative 267 | # algorithms optimized for non-linear, local relationships have also been 268 | # created. 269 | 270 | # These algorithms tend to be quite complicated and going into them is beyond 271 | # the scope of this course. This example is intended as a taste of what is out 272 | # there and to show people who already know about these methods that they are 273 | # implemented in Python. Note that it can be risky to use these algorithms if 274 | # you do not know what you are doing, so it may make sense to read up and/or to 275 | # consult with an expert before you do this kind of analysis. 276 | 277 | # This is not an exercise, just an example for you to study. 278 | # You can find the code in the solutions file. 279 | 280 | 281 | 282 | #------------------------------------------------------------------------------ 283 | 284 | ### GRAPH-BASED ANALYSIS 285 | # Graphs are a universal way of mathematically describing relationships, be 286 | # they based on similarity, interaction, or virtually anything else. Despite 287 | # their power, graph-based analyses have so far not been used extensively on 288 | # biological imaging data, but as microscopes and analysis algorithms improve, 289 | # they become increasingly feasible and will likely become very important in 290 | # the future. 291 | 292 | # The networkx module provides various functions for importing and generating 293 | # graphs, for operating and analyzing graphs and for exporting and visualizing 294 | # graphs. The following example shows how a simple graph based on our feature 295 | # space could be built and visualized. In doing so, it introduces the networkx 296 | # Graph object, which is the core of the networkx module. 297 | 298 | # This is not an exercise, just an example for you to study. 299 | # You can find the code in the solutions file. 300 | 301 | 302 | 303 | #------------------------------------------------------------------------------ 304 | 305 | ### BONUS: XCKD-STYLE PLOTS 306 | 307 | # CONGRATULATIONS! You made it to the very end of this debaucherous tutorial, 308 | # so you now get to see what is probably the most fantastic functionality in 309 | # matplotlib: plotting in the style of the xkcd webcomic! 310 | 311 | # This is not an exercise, just an example for you to study. 312 | # You can find the code in the solutions file. 313 | 314 | 315 | -------------------------------------------------------------------------------- /pre_tutorial/.ipynb_checkpoints/Short tutorial on functions-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# This is a short tutorial on defining and using functions/modules\n", 8 | "\n", 9 | "References:\n", 10 | "\n", 11 | "http://www.tutorialspoint.com/python/ \n", 12 | "\n", 13 | "https://github.com/tobyhodges/ITPP\n", 14 | "\n", 15 | "https://github.com/cmci/HTManalysisCourse/blob/master/CentreCourseProtocol.md#workflow-python-primer\n", 16 | "\n", 17 | "http://cmci.embl.de/documents/ijcourses" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "## Defining a Function\n", 25 | "\n", 26 | "You can define functions to provide the required functionality. Here are simple rules to define a function in Python.\n", 27 | "\n", 28 | "Function blocks begin with the keyword def followed by the function name and parentheses ( ( ) ).\n", 29 | "\n", 30 | "Any input parameters or arguments should be placed within these parentheses. You can also define parameters inside these parentheses.\n", 31 | "\n", 32 | "The first statement of a function can be an optional statement - the documentation string of the function or docstring.\n", 33 | "\n", 34 | "The code block within every function starts with a colon (:) and is indented.\n", 35 | "\n", 36 | "The statement return [expression] exits a function, optionally passing back an expression to the caller. A return statement with no arguments is the same as return None." 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Syntax\n", 44 | "\n", 45 | "```def functionname( parameters ):\n", 46 | " \n", 47 | " \"function_docstring\"\n", 48 | " function_suite\n", 49 | " return [expression]```\n", 50 | " \n", 51 | "By default, parameters have a positional behavior and you need to inform them in the same order that they were defined." 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "### Example\n", 59 | "\n", 60 | "The following function takes a string as input parameter and prints it on standard screen." 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 2, 66 | "metadata": { 67 | "collapsed": true 68 | }, 69 | "outputs": [], 70 | "source": [ 71 | "def printme( str ):\n", 72 | " \"This prints a passed string into this function\"\n", 73 | " print str\n", 74 | " return" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "### Example\n", 82 | "\n", 83 | "The following function takes two numbers and returns and prints their sum." 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 3, 89 | "metadata": { 90 | "collapsed": true 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "def addme( a, b ):\n", 95 | " \"This adds passed arguments.\"\n", 96 | " print a+b\n", 97 | " return a+b" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "## Calling a Function\n", 105 | "\n", 106 | "Defining a function only gives it a name, specifies the parameters that are to be included in the function and structures the blocks of code.\n", 107 | "\n", 108 | "Once the basic structure of a function is finalized, you can execute it by calling it from another function or directly from the Python prompt. Following is the example to call printme() function − " 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": 7, 114 | "metadata": { 115 | "collapsed": false 116 | }, 117 | "outputs": [ 118 | { 119 | "name": "stdout", 120 | "output_type": "stream", 121 | "text": [ 122 | "I'm first call to user defined function!\n", 123 | "Again second call to the same function\n", 124 | "3\n", 125 | "3\n" 126 | ] 127 | } 128 | ], 129 | "source": [ 130 | "printme(\"I'm first call to user defined function!\")\n", 131 | "printme(\"Again second call to the same function\") \n", 132 | "print addme(1,2)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "## Function Arguments\n", 140 | "\n", 141 | "You can call a function by using the following types of formal arguments:\n", 142 | "\n", 143 | "### Required arguments\n", 144 | "\n", 145 | "Required arguments are the arguments passed to a function in correct positional order. Here, the number of arguments in the function call should match exactly with the function definition.\n", 146 | "\n", 147 | "To call the function printme(), you definitely need to pass one argument, otherwise it gives a syntax error.\n", 148 | "\n", 149 | "```Traceback (most recent call last):\n", 150 | "File \"test.py\", line 11, in \n", 151 | " printme();\n", 152 | "TypeError: printme() takes exactly 1 argument (0 given)```\n", 153 | "\n", 154 | "Similarly, for addme() you need to pass two arguments." 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "### Keyword arguments\n", 162 | "\n", 163 | "Keyword arguments are related to the function calls. When you use keyword arguments in a function call, the caller identifies the arguments by the parameter name.\n", 164 | "\n", 165 | "This allows you to skip arguments or place them out of order because the Python interpreter is able to use the keywords provided to match the values with parameters. \n", 166 | "\n", 167 | "For example, the following code:" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 8, 173 | "metadata": { 174 | "collapsed": false 175 | }, 176 | "outputs": [ 177 | { 178 | "name": "stdout", 179 | "output_type": "stream", 180 | "text": [ 181 | "My string\n" 182 | ] 183 | } 184 | ], 185 | "source": [ 186 | "printme( str = \"My string\")" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "The following example gives more clear picture. Note that the order of parameters does not matter." 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 9, 199 | "metadata": { 200 | "collapsed": false 201 | }, 202 | "outputs": [ 203 | { 204 | "name": "stdout", 205 | "output_type": "stream", 206 | "text": [ 207 | "Name: miki\n", 208 | "Age: 50\n" 209 | ] 210 | } 211 | ], 212 | "source": [ 213 | "# Function definition is here\n", 214 | "def printinfo( name, age ):\n", 215 | " \"This prints a passed info into this function\"\n", 216 | " print \"Name: \", name\n", 217 | " print \"Age: \", age\n", 218 | " return\n", 219 | "\n", 220 | "# Now you can call printinfo function\n", 221 | "printinfo( age=50, name=\"miki\" )" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "### Default arguments\n", 229 | "\n", 230 | "A default argument is an argument that assumes a default value if a value is not provided in the function call for that argument. The following example gives an idea on default arguments, it prints default age if it is not passed −" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 10, 236 | "metadata": { 237 | "collapsed": false 238 | }, 239 | "outputs": [ 240 | { 241 | "name": "stdout", 242 | "output_type": "stream", 243 | "text": [ 244 | "Name: miki\n", 245 | "Age 50\n", 246 | "Name: miki\n", 247 | "Age 35\n" 248 | ] 249 | } 250 | ], 251 | "source": [ 252 | "# Function definition is here\n", 253 | "def printinfo( name, age = 35 ):\n", 254 | " \"This prints a passed info into this function\"\n", 255 | " print \"Name: \", name\n", 256 | " print \"Age \", age\n", 257 | " return;\n", 258 | "\n", 259 | "# Now you can call printinfo function\n", 260 | "printinfo( age=50, name=\"miki\" )\n", 261 | "printinfo( name=\"miki\" )" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "### Variable-length arguments\n", 269 | "\n", 270 | "You may need to process a function for more arguments than you specified while defining the function. These arguments are called variable-length arguments and are not named in the function definition, unlike required and default arguments.\n", 271 | "\n", 272 | "Syntax for a function with non-keyword variable arguments is this −\n", 273 | "\n", 274 | "```def functionname([formal_args,] *var_args_tuple ):\n", 275 | " \"function_docstring\"\n", 276 | " function_suite\n", 277 | " return [expression]```\n", 278 | "\n", 279 | "An asterisk (*) is placed before the variable name that holds the values of all nonkeyword variable arguments. This tuple remains empty if no additional arguments are specified during the function call. Following is a simple example −" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 11, 285 | "metadata": { 286 | "collapsed": false 287 | }, 288 | "outputs": [ 289 | { 290 | "name": "stdout", 291 | "output_type": "stream", 292 | "text": [ 293 | "Output is: \n", 294 | "10\n", 295 | "Output is: \n", 296 | "70\n", 297 | "60\n", 298 | "50\n" 299 | ] 300 | } 301 | ], 302 | "source": [ 303 | "# Function definition is here\n", 304 | "def printinfo( arg1, *vartuple ):\n", 305 | " \"This prints a variable passed arguments\"\n", 306 | " print \"Output is: \"\n", 307 | " print arg1\n", 308 | " for var in vartuple:\n", 309 | " print var\n", 310 | " return;\n", 311 | "\n", 312 | "# Now you can call printinfo function\n", 313 | "printinfo( 10 )\n", 314 | "printinfo( 70, 60, 50 )" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "### The Anonymous Functions\n", 322 | "\n", 323 | "These functions are called anonymous because they are not declared in the standard manner by using the def keyword. You can use the lambda keyword to create small anonymous functions.\n", 324 | "\n", 325 | "Lambda forms can take any number of arguments but return just one value in the form of an expression. They cannot contain commands or multiple expressions.\n", 326 | "\n", 327 | "An anonymous function cannot be a direct call to print because lambda requires an expression\n", 328 | "\n", 329 | "Lambda functions have their own local namespace and cannot access variables other than those in their parameter list and those in the global namespace.\n", 330 | "\n", 331 | "Although it appears that lambda's are a one-line version of a function, they are not equivalent to inline statements in C or C++, whose purpose is by passing function stack allocation during invocation for performance reasons." 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": {}, 337 | "source": [ 338 | "#### Syntax\n", 339 | "\n", 340 | "The syntax of lambda functions contains only a single statement, which is as follows:\n", 341 | "\n", 342 | "lambda [arg1 [,arg2,.....argn]]:expression\n", 343 | "\n", 344 | "Following is the example to show how lambda form of function works:" 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": 12, 350 | "metadata": { 351 | "collapsed": false 352 | }, 353 | "outputs": [ 354 | { 355 | "name": "stdout", 356 | "output_type": "stream", 357 | "text": [ 358 | "Value of total : 30\n", 359 | "Value of total : 40\n" 360 | ] 361 | } 362 | ], 363 | "source": [ 364 | "# Function definition is here\n", 365 | "sum = lambda arg1, arg2: arg1 + arg2;\n", 366 | "\n", 367 | "# Now you can call sum as a function\n", 368 | "print \"Value of total : \", sum( 10, 20 )\n", 369 | "print \"Value of total : \", sum( 20, 20 )" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "### The return Statement\n", 377 | "\n", 378 | "The statement return [expression] exits a function, optionally passing back an expression to the caller. A return statement with no arguments is the same as return None.\n", 379 | "\n", 380 | "All the above examples are not returning any value. You can return a value from a function as follows:" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 13, 386 | "metadata": { 387 | "collapsed": false 388 | }, 389 | "outputs": [ 390 | { 391 | "name": "stdout", 392 | "output_type": "stream", 393 | "text": [ 394 | "-10\n" 395 | ] 396 | } 397 | ], 398 | "source": [ 399 | "# Function definition is here\n", 400 | "def substractme( arg1, arg2 ):\n", 401 | " # Substracts the second parameter from the first and returns the result.\"\n", 402 | " total = arg1 - arg2\n", 403 | " return total;\n", 404 | "\n", 405 | "# Now you can call sum function\n", 406 | "total = substractme( 10, 20 );\n", 407 | "print total " 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 14, 413 | "metadata": { 414 | "collapsed": false 415 | }, 416 | "outputs": [ 417 | { 418 | "name": "stdout", 419 | "output_type": "stream", 420 | "text": [ 421 | "3\n", 422 | "-1\n", 423 | "2\n", 424 | "3\n", 425 | "2\n", 426 | "-1\n" 427 | ] 428 | } 429 | ], 430 | "source": [ 431 | "# The retuned arguments are also order-specific. So if you have a function:\n", 432 | "def arithmetic( a, b ):\n", 433 | " sumab = a+b\n", 434 | " substractab = a-b\n", 435 | " multiplyab = a*b\n", 436 | " return sumab, substractab, multiplyab\n", 437 | " \n", 438 | "# This\n", 439 | "c, d, e = arithmetic(1,2)\n", 440 | "print c\n", 441 | "print d\n", 442 | "print e\n", 443 | "\n", 444 | "# does not alocate the same values to the varialbes c, d, e as this\n", 445 | "c, e, d = arithmetic(1,2)\n", 446 | "print c\n", 447 | "print d\n", 448 | "print e" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "### Scope of Variables\n", 456 | "\n", 457 | "All variables in a program may not be accessible at all locations in that program. This depends on where you have declared a variable.\n", 458 | "\n", 459 | "The scope of a variable determines the portion of the program where you can access a particular identifier. There are two basic scopes of variables in Python:\n", 460 | "\n", 461 | "Global variables\n", 462 | "\n", 463 | "Local variables\n", 464 | "\n", 465 | "#### Global vs. Local variables\n", 466 | "\n", 467 | "Variables that are defined inside a function body have a local scope, and those defined outside have a global scope.\n", 468 | "\n", 469 | "This means that local variables can be accessed only inside the function in which they are declared, whereas global variables can be accessed throughout the program body by all functions. When you call a function, the variables declared inside it are brought into scope. Following is a simple example −" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 15, 475 | "metadata": { 476 | "collapsed": true 477 | }, 478 | "outputs": [], 479 | "source": [ 480 | "total = 0; # This is global variable." 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": 17, 486 | "metadata": { 487 | "collapsed": false 488 | }, 489 | "outputs": [ 490 | { 491 | "name": "stdout", 492 | "output_type": "stream", 493 | "text": [ 494 | "Inside the function (local) total: -10\n", 495 | "Outside the function (global) total : 0\n" 496 | ] 497 | } 498 | ], 499 | "source": [ 500 | "# Function definition is here\n", 501 | "def substractme( arg1, arg2 ):\n", 502 | " # Substratcs the second parameter from the first and return them.\"\n", 503 | " total = arg1 - arg2; # Here total is a local variable.\n", 504 | " print \"Inside the function (local) total: \", total \n", 505 | " return total;\n", 506 | "\n", 507 | "# Now you can call sum function\n", 508 | "substractme( 10, 20 );\n", 509 | "print \"Outside the function (global) total : \", total " 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "### Python Modules\n", 517 | "\n", 518 | "A module allows you to logically organise your Python code. Grouping related code into a module makes the code easier to understand and use. A module is a Python object with arbitrarily named attributes that you can bind and reference.\n", 519 | "\n", 520 | "Simply, a module is a file consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.\n", 521 | "\n", 522 | "For example, in a new file, copy and past the following, making sure you undertand the code. Name the file module_example.py:\n", 523 | "\n", 524 | "```def print_func( par ):\n", 525 | " print \"Hello : \", par\n", 526 | " return```\n", 527 | "\n", 528 | "In this example, the module we are intersted in ix modelu_example.py." 529 | ] 530 | }, 531 | { 532 | "cell_type": "markdown", 533 | "metadata": {}, 534 | "source": [ 535 | "### The import Statement\n", 536 | "\n", 537 | "You can use any Python source file as a module by executing an import statement in some other Python source file. The import has the following syntax:\n", 538 | "\n", 539 | "```import module1[, module2[,... moduleN]```\n", 540 | "\n", 541 | "When the interpreter encounters an import statement, it imports the module if the module is present in the search path. A search path is a list of directories that the interpreter searches before importing a module. For example, to import the module module_example.py, you need to put the following command at the top of the script −" 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": 18, 547 | "metadata": { 548 | "collapsed": false 549 | }, 550 | "outputs": [ 551 | { 552 | "name": "stdout", 553 | "output_type": "stream", 554 | "text": [ 555 | "Hello : Zara\n" 556 | ] 557 | } 558 | ], 559 | "source": [ 560 | "# Import module support\n", 561 | "import module_example as modex\n", 562 | "\n", 563 | "# Now you can call defined functions of the module, as follows\n", 564 | "modex.print_func(\"Zara\")" 565 | ] 566 | }, 567 | { 568 | "cell_type": "markdown", 569 | "metadata": {}, 570 | "source": [ 571 | "A module is loaded only once, regardless of the number of times it is imported. This prevents the module execution from happening over and over again if multiple imports occur." 572 | ] 573 | }, 574 | { 575 | "cell_type": "markdown", 576 | "metadata": {}, 577 | "source": [ 578 | "### The from...import Statement and The from...import * Statement:\n", 579 | "\n", 580 | "Python's from statement lets you import specific attributes from a module into the current namespace. The from...import has the following syntax −\n", 581 | "\n", 582 | "#from modname import name1[, name2[, ... nameN]]\n", 583 | "\n", 584 | "For example, to import the function fibonacci from the module fib, use the following statement:" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": 20, 590 | "metadata": { 591 | "collapsed": true 592 | }, 593 | "outputs": [], 594 | "source": [ 595 | "from scipy import sum as s" 596 | ] 597 | }, 598 | { 599 | "cell_type": "markdown", 600 | "metadata": {}, 601 | "source": [ 602 | "This statement does not import the entire module/package scipy into the current namespace; it just introduces the item sum from the module scipy into the global symbol table of the importing module, and renames it to s.\n", 603 | "\n", 604 | "It is also possible to import all names from a module into the current namespace by using the following import statement:" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": 21, 610 | "metadata": { 611 | "collapsed": false, 612 | "scrolled": true 613 | }, 614 | "outputs": [ 615 | { 616 | "ename": "ImportError", 617 | "evalue": "No module named modname", 618 | "output_type": "error", 619 | "traceback": [ 620 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 621 | "\u001b[0;31mImportError\u001b[0m Traceback (most recent call last)", 622 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mmodname\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 623 | "\u001b[0;31mImportError\u001b[0m: No module named modname" 624 | ] 625 | } 626 | ], 627 | "source": [ 628 | "from modname import *" 629 | ] 630 | }, 631 | { 632 | "cell_type": "code", 633 | "execution_count": null, 634 | "metadata": { 635 | "collapsed": true 636 | }, 637 | "outputs": [], 638 | "source": [] 639 | } 640 | ], 641 | "metadata": { 642 | "kernelspec": { 643 | "display_name": "Python 2", 644 | "language": "python", 645 | "name": "python2" 646 | }, 647 | "language_info": { 648 | "codemirror_mode": { 649 | "name": "ipython", 650 | "version": 2 651 | }, 652 | "file_extension": ".py", 653 | "mimetype": "text/x-python", 654 | "name": "python", 655 | "nbconvert_exporter": "python", 656 | "pygments_lexer": "ipython2", 657 | "version": "2.7.11" 658 | } 659 | }, 660 | "nbformat": 4, 661 | "nbformat_minor": 0 662 | } 663 | -------------------------------------------------------------------------------- /pre_tutorial/.ipynb_checkpoints/pre_tutorial-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Reminder of importing modules" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Modules in Python are simply Python files with the .py extension, which implement a set of functions. The purpose of writting modules is to group related code together, to make it easier to understand and use, sort of as a 'black box'.\n", 15 | "\n", 16 | "Packages are name spaces that contain multiple packages and modules themselves. You can think of them as 'directories'.\n", 17 | "\n", 18 | "To be able to use the functions in a particular package or module, you need to 'import' that module or package. To do so, you need to use the import command. For example, to import the package numpy, which enables the manipulation of arrays, you would do:" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": { 25 | "collapsed": true 26 | }, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np # Array manipulation package\n", 30 | "import matplotlib.pyplot as plt # Plotting package\n", 31 | "import skimage.io as io # Image file manipulation module" 32 | ] 33 | } 34 | ], 35 | "metadata": { 36 | "kernelspec": { 37 | "display_name": "Python 2", 38 | "language": "python", 39 | "name": "python2" 40 | }, 41 | "language_info": { 42 | "codemirror_mode": { 43 | "name": "ipython", 44 | "version": 2 45 | }, 46 | "file_extension": ".py", 47 | "mimetype": "text/x-python", 48 | "name": "python", 49 | "nbconvert_exporter": "python", 50 | "pygments_lexer": "ipython2", 51 | "version": "2.7.11" 52 | } 53 | }, 54 | "nbformat": 4, 55 | "nbformat_minor": 0 56 | } 57 | -------------------------------------------------------------------------------- /pre_tutorial/.ipynb_checkpoints/tutorial-on-functions-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python functions and modules" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 1. Defining a Function\n", 15 | "\n", 16 | "You can define functions to provide the required functionality. Here are simple rules to define a function in Python and the syntax:\n", 17 | "\n", 18 | "- Function blocks begin with the keyword `def` followed by the function name, parentheses ( ( ) ) and a colon (:).\n", 19 | "- Any input parameters or arguments should be placed within the parentheses. You can also define parameters inside these parentheses.\n", 20 | "- Next is the first statement of the function, which can be an optional statement - the documentation string of the function or docstring.\n", 21 | "- The code block is next. It needs to be indented.\n", 22 | "- The statement `return [expression]` exits a function and, optionally, passes back an expression to the caller. A return statement with no arguments is the same as `return None`.\n", 23 | "\n", 24 | "```\n", 25 | "def functionname( parameters ):\n", 26 | " \"function_docstring\"\n", 27 | " function_suite\n", 28 | " return [expression]\n", 29 | "```" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "#### Example\n", 37 | "\n", 38 | "The following function takes a string as input parameter and prints it on standard screen." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 10, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "def printme( str ):\n", 50 | " \"This prints a passed string into this function\"\n", 51 | " print str\n", 52 | " return" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "#### Example \n", 60 | "\n", 61 | "The following function takes two numbers and returns and prints their sum." 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 11, 67 | "metadata": { 68 | "collapsed": true 69 | }, 70 | "outputs": [], 71 | "source": [ 72 | "def addme( a, b ):\n", 73 | " \"This adds passed arguments.\"\n", 74 | " return a+b" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "## 2. Calling a Function\n", 82 | "\n", 83 | "Defining a function only gives it a name, specifies the parameters that are to be included in the function and structures the blocks of code.\n", 84 | "\n", 85 | "Once the basic structure of a function is finalized, you can execute it by calling it directly from the Python prompt. Note that by default, parameters have a positional behavior and, if there is more than one, you need to input them in the same order that they were defined. " 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "#### Example" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 12, 98 | "metadata": { 99 | "collapsed": false 100 | }, 101 | "outputs": [ 102 | { 103 | "name": "stdout", 104 | "output_type": "stream", 105 | "text": [ 106 | "I'm first call to user defined function!\n", 107 | "Again second call to the same function\n", 108 | "1 plus 2 is 3\n" 109 | ] 110 | } 111 | ], 112 | "source": [ 113 | "printme(\"I'm first call to user defined function!\")\n", 114 | "printme(\"Again second call to the same function\") \n", 115 | "print '1 plus 2 is', addme(1,2)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "## 3. Function Arguments\n", 123 | "\n", 124 | "Functions have different types of arguments:\n", 125 | "\n", 126 | "#### Required arguments\n", 127 | "\n", 128 | "Required arguments are the arguments passed to a function in correct positional order. Here, the number of arguments in the function call should match exactly with the function definition.\n", 129 | "\n", 130 | "To call the function `printme()`, you definitely need to pass one argument, otherwise it gives a syntax error.\n", 131 | "\n", 132 | "Similarly, for `addme()` you need to pass two arguments." 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "#### Keyword arguments\n", 140 | "\n", 141 | "Keyword arguments are related to the function calls. When you use keyword arguments in a function call, the caller identifies the arguments by the parameter name.\n", 142 | "\n", 143 | "This allows you to skip arguments or place them out of order because the Python interpreter is able to use the keywords provided to match the values with parameters. \n", 144 | "\n", 145 | "For example, the following code:" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 13, 151 | "metadata": { 152 | "collapsed": false 153 | }, 154 | "outputs": [ 155 | { 156 | "name": "stdout", 157 | "output_type": "stream", 158 | "text": [ 159 | "My string\n" 160 | ] 161 | } 162 | ], 163 | "source": [ 164 | "printme( str = \"My string\")" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "The following example gives a more clear picture. Note that when keyword arguments are used, the order of parameters does not matter." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 14, 177 | "metadata": { 178 | "collapsed": false 179 | }, 180 | "outputs": [ 181 | { 182 | "name": "stdout", 183 | "output_type": "stream", 184 | "text": [ 185 | "Name: miki\n", 186 | "Age: 50\n" 187 | ] 188 | } 189 | ], 190 | "source": [ 191 | "# Function definition is here\n", 192 | "def printinfo( name, age ):\n", 193 | " \"This prints a passed info into this function\"\n", 194 | " print \"Name: \", name\n", 195 | " print \"Age: \", age\n", 196 | " return\n", 197 | "\n", 198 | "# Now you can call printinfo function\n", 199 | "printinfo( age=50, name=\"miki\" )" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "#### Default arguments\n", 207 | "\n", 208 | "A default argument is an argument that assumes a default value if a value is not provided in the function call for that argument. The following example gives an idea on default arguments, it prints default age if it is not passed −" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 15, 214 | "metadata": { 215 | "collapsed": false 216 | }, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "Name: miki\n", 223 | "Age 50\n", 224 | "Name: miki\n", 225 | "Age 35\n" 226 | ] 227 | } 228 | ], 229 | "source": [ 230 | "# Function definition is here\n", 231 | "def printinfo( name, age = 35 ):\n", 232 | " \"This prints a passed info into this function\"\n", 233 | " print \"Name: \", name\n", 234 | " print \"Age \", age\n", 235 | " return;\n", 236 | "\n", 237 | "# Now you can call printinfo function\n", 238 | "printinfo( age=50, name=\"miki\" )\n", 239 | "printinfo( name=\"miki\" )" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "#### Variable-length arguments\n", 247 | "\n", 248 | "You may need to process a function for more arguments than you specified while defining the function. These arguments are called variable-length arguments and are not named in the function definition, unlike required and default arguments.\n", 249 | "\n", 250 | "Syntax for a function with non-keyword variable arguments is this −\n", 251 | "\n", 252 | "```\n", 253 | "def functionname([formal_args,] *var_args_tuple ):\n", 254 | " \"function_docstring\"\n", 255 | " function_suite\n", 256 | " return [expression]\n", 257 | "```\n", 258 | "\n", 259 | "An asterisk (*) is placed before the variable name that holds the values of all nonkeyword variable arguments. This tuple remains empty if no additional arguments are specified during the function call. " 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 16, 265 | "metadata": { 266 | "collapsed": false 267 | }, 268 | "outputs": [ 269 | { 270 | "name": "stdout", 271 | "output_type": "stream", 272 | "text": [ 273 | "Output is: \n", 274 | "10\n", 275 | "Output is: \n", 276 | "70\n", 277 | "60\n", 278 | "50\n" 279 | ] 280 | } 281 | ], 282 | "source": [ 283 | "# Function definition is here\n", 284 | "def printinfo( arg1, *vartuple ):\n", 285 | " \"This prints a variable passed arguments\"\n", 286 | " print \"Output is: \"\n", 287 | " print arg1\n", 288 | " for var in vartuple:\n", 289 | " print var\n", 290 | " return;\n", 291 | "\n", 292 | "# Now you can call printinfo function\n", 293 | "printinfo( 10 )\n", 294 | "printinfo( 70, 60, 50 )" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "## 4. Anonymous Functions\n", 302 | "\n", 303 | "These functions are called anonymous because they are not declared in the standard manner by using the `def` keyword. You can use the `lambda` keyword to create small anonymous functions.\n", 304 | "\n", 305 | "Lambda forms can take any number of arguments but return just one value in the form of an expression. They cannot contain commands or multiple expressions." 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "#### Syntax\n", 313 | "\n", 314 | "The syntax of `lambda` functions contains only a single statement, which is as follows:\n", 315 | "\n", 316 | "`lambda [arg1 [,arg2,.....argn]]:expression`" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 12, 322 | "metadata": { 323 | "collapsed": false 324 | }, 325 | "outputs": [ 326 | { 327 | "name": "stdout", 328 | "output_type": "stream", 329 | "text": [ 330 | "Value of total : 30\n", 331 | "Value of total : 40\n" 332 | ] 333 | } 334 | ], 335 | "source": [ 336 | "# Function definition is here - this function has two arguments and it adds them up\n", 337 | "sum = lambda arg1, arg2: arg1 + arg2;\n", 338 | "\n", 339 | "# Now you can call sum as a function\n", 340 | "print \"Value of total : \", sum( 10, 20 )\n", 341 | "print \"Value of total : \", sum( 20, 20 )" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "## 5. The return Statement\n", 349 | "\n", 350 | "We briefly used the return statement `return [expression]` in the above functions, but let's try to explain it more explicitly: It exits a function, optionally, passing back an expression to the caller. A return statement with no arguments is the same as `return None`." 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 17, 356 | "metadata": { 357 | "collapsed": false 358 | }, 359 | "outputs": [ 360 | { 361 | "name": "stdout", 362 | "output_type": "stream", 363 | "text": [ 364 | "-10\n" 365 | ] 366 | } 367 | ], 368 | "source": [ 369 | "# This function returns an expression\n", 370 | "def substractme( arg1, arg2 ):\n", 371 | " # Substracts the second parameter from the first and returns the result.\"\n", 372 | " total = arg1 - arg2\n", 373 | " return total;\n", 374 | "\n", 375 | "# Now you can call sum function\n", 376 | "total = substractme( 10, 20 );\n", 377 | "print total " 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "Warning: The retuned arguments are also order-specific! See below:" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 18, 390 | "metadata": { 391 | "collapsed": false 392 | }, 393 | "outputs": [ 394 | { 395 | "name": "stdout", 396 | "output_type": "stream", 397 | "text": [ 398 | "3\n", 399 | "-1\n", 400 | "2\n", 401 | "\n", 402 | "\n", 403 | "3\n", 404 | "2\n", 405 | "-1\n" 406 | ] 407 | } 408 | ], 409 | "source": [ 410 | "# if you have a function:\n", 411 | "def arithmetic( a, b ):\n", 412 | " sumab = a+b\n", 413 | " substractab = a-b\n", 414 | " multiplyab = a*b\n", 415 | " return sumab, substractab, multiplyab\n", 416 | " \n", 417 | "# This\n", 418 | "c, d, e = arithmetic(1,2)\n", 419 | "print c\n", 420 | "print d\n", 421 | "print e\n", 422 | "print '\\n'\n", 423 | "\n", 424 | "# does not alocate the same values to the varialbes c, d, e as this\n", 425 | "c, e, d = arithmetic(1,2)\n", 426 | "print c\n", 427 | "print d\n", 428 | "print e" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "## 5. Scope of Variables\n", 436 | "\n", 437 | "All variables in a program may not be accessible at all locations in that program. This depends on where you have declared a variable. The scope of a variable determines the portion of the program where you can access a particular identifier.\n", 438 | "\n", 439 | "There are two basic scopes of variables in Python:\n", 440 | "\n", 441 | "Global variables\n", 442 | "\n", 443 | "Local variables\n", 444 | "\n", 445 | "\n", 446 | "#### Global vs. Local variables\n", 447 | "\n", 448 | "Variables that are defined inside a function body have a local scope, and those defined outside have a global scope.\n", 449 | "\n", 450 | "This means that local variables can be accessed only inside the function in which they are declared, whereas global variables can be accessed throughout the program body by all functions. When you call a function, the variables declared inside it are brought into scope. Following is a simple example −" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 15, 456 | "metadata": { 457 | "collapsed": true 458 | }, 459 | "outputs": [], 460 | "source": [ 461 | "total = 0; # This is global variable." 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": 17, 467 | "metadata": { 468 | "collapsed": false 469 | }, 470 | "outputs": [ 471 | { 472 | "name": "stdout", 473 | "output_type": "stream", 474 | "text": [ 475 | "Inside the function (local) total: -10\n", 476 | "Outside the function (global) total : 0\n" 477 | ] 478 | } 479 | ], 480 | "source": [ 481 | "# Function definition is here\n", 482 | "def substractme( arg1, arg2 ):\n", 483 | " # Substracts the second parameter from the first and return them.\"\n", 484 | " total = arg1 - arg2; # Here total is a local variable.\n", 485 | " print \"Inside the function (local) total: \", total \n", 486 | " return total;\n", 487 | "\n", 488 | "# Now you can call sum function\n", 489 | "substractme( 10, 20 );\n", 490 | "print \"Outside the function (global) total : \", total " 491 | ] 492 | }, 493 | { 494 | "cell_type": "markdown", 495 | "metadata": {}, 496 | "source": [ 497 | "## 6. Python Modules\n", 498 | "\n", 499 | "Simply, a module is a file consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.\n", 500 | "\n", 501 | "A module allows you to logically organise your Python code. Grouping related code into a module makes the code easier to understand and use. \n", 502 | "\n", 503 | "For example, in a new file, copy and paste the following, making sure you undertand the code. Name the file module_example.py:\n", 504 | "\n", 505 | "```\n", 506 | "def print_func( par ):\n", 507 | " print \"Hello : \", par\n", 508 | " return\n", 509 | "```" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "#### The import Statement\n", 517 | "\n", 518 | "You can use any Python source file as a module by executing an import statement in some other Python source file. The import has the following syntax:\n", 519 | "\n", 520 | "`import module1[, module2[,... moduleN]`\n", 521 | "\n", 522 | "When the interpreter encounters an import statement, it imports the module if the module is present in the search path or the current directory.\n", 523 | "\n", 524 | "For example, to import the module module_example.py, you need to use the following command:" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 18, 530 | "metadata": { 531 | "collapsed": false 532 | }, 533 | "outputs": [ 534 | { 535 | "name": "stdout", 536 | "output_type": "stream", 537 | "text": [ 538 | "Hello : Zara\n" 539 | ] 540 | } 541 | ], 542 | "source": [ 543 | "# Import module support\n", 544 | "import module_example as modex\n", 545 | "\n", 546 | "# Now you can call defined functions of the module, as follows\n", 547 | "modex.print_func(\"Zara\")" 548 | ] 549 | }, 550 | { 551 | "cell_type": "markdown", 552 | "metadata": {}, 553 | "source": [ 554 | "A module is loaded only once, regardless of the number of times it is imported. This prevents the module execution from happening over and over again if multiple imports occur." 555 | ] 556 | }, 557 | { 558 | "cell_type": "markdown", 559 | "metadata": {}, 560 | "source": [ 561 | "#### The `from...import` statement and the `from...import *` statement:\n", 562 | "\n", 563 | "Python's `from import` statement lets you import specific attributes from a module into the current namespace. It has the following syntaxL\n", 564 | "\n", 565 | "`from modname import name1[, name2[, ... nameN]]`\n", 566 | "\n", 567 | "For example, to import the function `sum` from the module `scipy`, use the following statement:" 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": 19, 573 | "metadata": { 574 | "collapsed": true 575 | }, 576 | "outputs": [], 577 | "source": [ 578 | "from scipy import sum as s" 579 | ] 580 | }, 581 | { 582 | "cell_type": "markdown", 583 | "metadata": {}, 584 | "source": [ 585 | "This statement does not import the entire module/package scipy into the current namespace; it just introduces the item `sum` from the module `scipy`. Note that in this example it is renamed to `s`.\n", 586 | "\n", 587 | "It is also possible to import all names from a module into the current namespace by using the following import statement:" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 21, 593 | "metadata": { 594 | "collapsed": false, 595 | "scrolled": true 596 | }, 597 | "outputs": [], 598 | "source": [ 599 | "from scipy import *" 600 | ] 601 | }, 602 | { 603 | "cell_type": "markdown", 604 | "metadata": {}, 605 | "source": [ 606 | "#### References:\n", 607 | "\n", 608 | "http://www.tutorialspoint.com/python/ , \n", 609 | "https://github.com/tobyhodges/ITPP , \n", 610 | "https://github.com/cmci/HTManalysisCourse/blob/master/CentreCourseProtocol.md#workflow-python-primer , \n", 611 | "http://cmci.embl.de/documents/ijcourses" 612 | ] 613 | }, 614 | { 615 | "cell_type": "code", 616 | "execution_count": null, 617 | "metadata": { 618 | "collapsed": true 619 | }, 620 | "outputs": [], 621 | "source": [] 622 | } 623 | ], 624 | "metadata": { 625 | "kernelspec": { 626 | "display_name": "Python 2", 627 | "language": "python", 628 | "name": "python2" 629 | }, 630 | "language_info": { 631 | "codemirror_mode": { 632 | "name": "ipython", 633 | "version": 2 634 | }, 635 | "file_extension": ".py", 636 | "mimetype": "text/x-python", 637 | "name": "python", 638 | "nbconvert_exporter": "python", 639 | "pygments_lexer": "ipython2", 640 | "version": "2.7.11" 641 | } 642 | }, 643 | "nbformat": 4, 644 | "nbformat_minor": 0 645 | } 646 | -------------------------------------------------------------------------------- /pre_tutorial/README.md: -------------------------------------------------------------------------------- 1 | ## README Pre-Tutorial on Image Processing and Analysis with Python 2 | 3 | 4 | ### DESCRIPTION 5 | Here you can find a tutorial that introduces basic, but necessary, concepts of digital images and their manipulation with Python. It also has a short tutorial on the Python library NumPy and another on Python functions. 6 | 7 | This course assumes a basic knowledge of the Python Programming Language. 8 | 9 | 10 | ### REQUIREMENTS 11 | - Python 2.7 (we recommend the Anaconda distribution, which includes most of the required modules) 12 | - Modules: NumPy, SciPy, scikit-image, tifffile 13 | - A text/code editor 14 | 15 | 16 | ### PROGRAMMING CONCEPTS AND CONTENT DISCUSSED IN THIS TUTORIAL 17 | - The Python NumPy library to manipulate images 18 | - Making functions in Python 19 | - Manipulating images with Python 20 | 21 | 22 | ### IMAGE PROCESSING CONCEPTS AND CONTENT DISCUSSED IN THIS TUTORIAL 23 | - The numerical and array nature of digital images 24 | - Bit-depth, variable types and data types of images in Python 25 | - Grayness resolution, RGB format and look up tables 26 | - Image arithmetic and unexpected errors due to data type 27 | 28 | 29 | 30 | ### HOW TO FOLLOW THIS TUTORIAL 31 | - Files you should have: ipython notebooks titled `pre-tutorial.ipynb`, `arrays-and-numpy.ipynb` and `tutorial-on-functions.ipyn`. 32 | - To start the ipython notebooks, start a new terminal window and type “jupyter notebook”, then press enter. When a new browser window opens, navigate to the folder where you have saved the tutorials. Click on the tutorial that you want to follow. 33 | - Using Jupyter notebook: When you want to type code into a cell, simply click on it to activate it. When you want to run the code, press shift+enter. You can learn how to use Jupyter notebooks from, e.g., https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/ 34 | - If you are following this tutorial in class, if you have any questions, raise your hand and someone will come to help you. Otherwise, feel free to send your query to one of these two email addresses: 35 | jonas.hartmann@embl.de 36 | karin.sasaki@embl.de 37 | 38 | -------------------------------------------------------------------------------- /pre_tutorial/ext_nuc_AP2_beta_subunit.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/pre_tutorial/ext_nuc_AP2_beta_subunit.tif -------------------------------------------------------------------------------- /pre_tutorial/figA.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/pre_tutorial/figA.jpeg -------------------------------------------------------------------------------- /pre_tutorial/figC.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/pre_tutorial/figC.png -------------------------------------------------------------------------------- /pre_tutorial/figD.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/pre_tutorial/figD.png -------------------------------------------------------------------------------- /pre_tutorial/figE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/pre_tutorial/figE.png -------------------------------------------------------------------------------- /pre_tutorial/figF.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/pre_tutorial/figF.png -------------------------------------------------------------------------------- /pre_tutorial/module_example.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | def print_func( par ): 4 | print "Hello : ", par 5 | return -------------------------------------------------------------------------------- /pre_tutorial/nuclei.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karinsasaki/python-workshop-image-processing/a6de0424bb184e55a769801bbedec1c8ce02dd5a/pre_tutorial/nuclei.png -------------------------------------------------------------------------------- /pre_tutorial/randimg.txt: -------------------------------------------------------------------------------- 1 | 8.900000000000000000e+01 9.700000000000000000e+01 6.600000000000000000e+01 1.420000000000000000e+02 2.100000000000000000e+02 4.300000000000000000e+01 2.270000000000000000e+02 2.340000000000000000e+02 9.100000000000000000e+01 1.470000000000000000e+02 2 | 1.390000000000000000e+02 2.320000000000000000e+02 2.240000000000000000e+02 9.100000000000000000e+01 1.250000000000000000e+02 2.480000000000000000e+02 1.940000000000000000e+02 1.910000000000000000e+02 1.660000000000000000e+02 1.000000000000000000e+00 3 | 0.000000000000000000e+00 1.290000000000000000e+02 1.130000000000000000e+02 2.510000000000000000e+02 2.500000000000000000e+02 1.000000000000000000e+02 2.310000000000000000e+02 1.320000000000000000e+02 1.990000000000000000e+02 8.000000000000000000e+00 4 | 2.200000000000000000e+01 8.700000000000000000e+01 1.750000000000000000e+02 7.000000000000000000e+01 2.450000000000000000e+02 1.520000000000000000e+02 3.200000000000000000e+01 1.780000000000000000e+02 9.600000000000000000e+01 2.700000000000000000e+01 5 | 7.000000000000000000e+00 7.300000000000000000e+01 5.800000000000000000e+01 9.600000000000000000e+01 2.460000000000000000e+02 1.210000000000000000e+02 2.020000000000000000e+02 2.130000000000000000e+02 2.250000000000000000e+02 1.120000000000000000e+02 6 | 2.470000000000000000e+02 2.320000000000000000e+02 2.070000000000000000e+02 2.300000000000000000e+01 1.450000000000000000e+02 1.140000000000000000e+02 1.260000000000000000e+02 1.210000000000000000e+02 2.440000000000000000e+02 4.900000000000000000e+01 7 | 2.010000000000000000e+02 2.600000000000000000e+01 1.930000000000000000e+02 1.850000000000000000e+02 2.030000000000000000e+02 1.130000000000000000e+02 1.830000000000000000e+02 1.020000000000000000e+02 7.400000000000000000e+01 1.130000000000000000e+02 8 | 2.020000000000000000e+02 3.600000000000000000e+01 1.180000000000000000e+02 2.520000000000000000e+02 1.840000000000000000e+02 6.800000000000000000e+01 2.410000000000000000e+02 1.000000000000000000e+02 2.700000000000000000e+01 1.680000000000000000e+02 9 | 1.960000000000000000e+02 4.800000000000000000e+01 1.920000000000000000e+02 2.100000000000000000e+01 6.200000000000000000e+01 1.680000000000000000e+02 1.050000000000000000e+02 2.300000000000000000e+01 1.570000000000000000e+02 2.150000000000000000e+02 10 | 8.600000000000000000e+01 7.900000000000000000e+01 1.320000000000000000e+02 2.210000000000000000e+02 1.100000000000000000e+01 1.800000000000000000e+01 1.610000000000000000e+02 2.100000000000000000e+01 1.750000000000000000e+02 8.600000000000000000e+01 11 | -------------------------------------------------------------------------------- /pre_tutorial/results.txt: -------------------------------------------------------------------------------- 1 | {"protein": ["AP2", "p150glued"], "intensity": [8.3106078793474918, 9.8909087411511241], "number": [154370, 140631, 95877]} -------------------------------------------------------------------------------- /pre_tutorial/tutorial-on-functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python functions and modules" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## 1. Defining a Function\n", 15 | "\n", 16 | "You can define functions to provide the required functionality. Here are simple rules to define a function in Python and the syntax:\n", 17 | "\n", 18 | "- Function blocks begin with the keyword `def` followed by the function name, parentheses ( ( ) ) and a colon (:).\n", 19 | "- Any input parameters or arguments should be placed within the parentheses. You can also define parameters inside these parentheses.\n", 20 | "- Next is the first statement of the function, which can be an optional statement - the documentation string of the function or docstring.\n", 21 | "- The code block is next. It needs to be indented.\n", 22 | "- The statement `return [expression]` exits a function and, optionally, passes back an expression to the caller. A return statement with no arguments is the same as `return None`.\n", 23 | "\n", 24 | "```\n", 25 | "def functionname( parameters ):\n", 26 | " \"function_docstring\"\n", 27 | " function_suite\n", 28 | " return [expression]\n", 29 | "```" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "#### Example\n", 37 | "\n", 38 | "The following function takes a string as input parameter and prints it on standard screen." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 10, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "def printme( str ):\n", 50 | " \"This prints a passed string into this function\"\n", 51 | " print str\n", 52 | " return" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "#### Example \n", 60 | "\n", 61 | "The following function takes two numbers and returns and prints their sum." 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 11, 67 | "metadata": { 68 | "collapsed": true 69 | }, 70 | "outputs": [], 71 | "source": [ 72 | "def addme( a, b ):\n", 73 | " \"This adds passed arguments.\"\n", 74 | " return a+b" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "## 2. Calling a Function\n", 82 | "\n", 83 | "Defining a function only gives it a name, specifies the parameters that are to be included in the function and structures the blocks of code.\n", 84 | "\n", 85 | "Once the basic structure of a function is finalized, you can execute it by calling it directly from the Python prompt. Note that by default, parameters have a positional behavior and, if there is more than one, you need to input them in the same order that they were defined. " 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "#### Example" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 12, 98 | "metadata": { 99 | "collapsed": false 100 | }, 101 | "outputs": [ 102 | { 103 | "name": "stdout", 104 | "output_type": "stream", 105 | "text": [ 106 | "I'm first call to user defined function!\n", 107 | "Again second call to the same function\n", 108 | "1 plus 2 is 3\n" 109 | ] 110 | } 111 | ], 112 | "source": [ 113 | "printme(\"I'm first call to user defined function!\")\n", 114 | "printme(\"Again second call to the same function\") \n", 115 | "print '1 plus 2 is', addme(1,2)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "## 3. Function Arguments\n", 123 | "\n", 124 | "Functions have different types of arguments:\n", 125 | "\n", 126 | "#### Required arguments\n", 127 | "\n", 128 | "Required arguments are the arguments passed to a function in correct positional order. Here, the number of arguments in the function call should match exactly with the function definition.\n", 129 | "\n", 130 | "To call the function `printme()`, you definitely need to pass one argument, otherwise it gives a syntax error.\n", 131 | "\n", 132 | "Similarly, for `addme()` you need to pass two arguments." 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "#### Keyword arguments\n", 140 | "\n", 141 | "Keyword arguments are related to the function calls. When you use keyword arguments in a function call, the caller identifies the arguments by the parameter name.\n", 142 | "\n", 143 | "This allows you to skip arguments or place them out of order because the Python interpreter is able to use the keywords provided to match the values with parameters. \n", 144 | "\n", 145 | "For example, the following code:" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 13, 151 | "metadata": { 152 | "collapsed": false 153 | }, 154 | "outputs": [ 155 | { 156 | "name": "stdout", 157 | "output_type": "stream", 158 | "text": [ 159 | "My string\n" 160 | ] 161 | } 162 | ], 163 | "source": [ 164 | "printme( str = \"My string\")" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "The following example gives a more clear picture. Note that when keyword arguments are used, the order of parameters does not matter." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 14, 177 | "metadata": { 178 | "collapsed": false 179 | }, 180 | "outputs": [ 181 | { 182 | "name": "stdout", 183 | "output_type": "stream", 184 | "text": [ 185 | "Name: miki\n", 186 | "Age: 50\n" 187 | ] 188 | } 189 | ], 190 | "source": [ 191 | "# Function definition is here\n", 192 | "def printinfo( name, age ):\n", 193 | " \"This prints a passed info into this function\"\n", 194 | " print \"Name: \", name\n", 195 | " print \"Age: \", age\n", 196 | " return\n", 197 | "\n", 198 | "# Now you can call printinfo function\n", 199 | "printinfo( age=50, name=\"miki\" )" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "#### Default arguments\n", 207 | "\n", 208 | "A default argument is an argument that assumes a default value if a value is not provided in the function call for that argument. The following example gives an idea on default arguments, it prints default age if it is not passed −" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 15, 214 | "metadata": { 215 | "collapsed": false 216 | }, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "Name: miki\n", 223 | "Age 50\n", 224 | "Name: miki\n", 225 | "Age 35\n" 226 | ] 227 | } 228 | ], 229 | "source": [ 230 | "# Function definition is here\n", 231 | "def printinfo( name, age = 35 ):\n", 232 | " \"This prints a passed info into this function\"\n", 233 | " print \"Name: \", name\n", 234 | " print \"Age \", age\n", 235 | " return;\n", 236 | "\n", 237 | "# Now you can call printinfo function\n", 238 | "printinfo( age=50, name=\"miki\" )\n", 239 | "printinfo( name=\"miki\" )" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "#### Variable-length arguments\n", 247 | "\n", 248 | "You may need to process a function for more arguments than you specified while defining the function. These arguments are called variable-length arguments and are not named in the function definition, unlike required and default arguments.\n", 249 | "\n", 250 | "Syntax for a function with non-keyword variable arguments is this −\n", 251 | "\n", 252 | "```\n", 253 | "def functionname([formal_args,] *var_args_tuple ):\n", 254 | " \"function_docstring\"\n", 255 | " function_suite\n", 256 | " return [expression]\n", 257 | "```\n", 258 | "\n", 259 | "An asterisk (*) is placed before the variable name that holds the values of all nonkeyword variable arguments. This tuple remains empty if no additional arguments are specified during the function call. " 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 16, 265 | "metadata": { 266 | "collapsed": false 267 | }, 268 | "outputs": [ 269 | { 270 | "name": "stdout", 271 | "output_type": "stream", 272 | "text": [ 273 | "Output is: \n", 274 | "10\n", 275 | "Output is: \n", 276 | "70\n", 277 | "60\n", 278 | "50\n" 279 | ] 280 | } 281 | ], 282 | "source": [ 283 | "# Function definition is here\n", 284 | "def printinfo( arg1, *vartuple ):\n", 285 | " \"This prints a variable passed arguments\"\n", 286 | " print \"Output is: \"\n", 287 | " print arg1\n", 288 | " for var in vartuple:\n", 289 | " print var\n", 290 | " return;\n", 291 | "\n", 292 | "# Now you can call printinfo function\n", 293 | "printinfo( 10 )\n", 294 | "printinfo( 70, 60, 50 )" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "## 4. Anonymous Functions\n", 302 | "\n", 303 | "These functions are called anonymous because they are not declared in the standard manner by using the `def` keyword. You can use the `lambda` keyword to create small anonymous functions.\n", 304 | "\n", 305 | "Lambda forms can take any number of arguments but return just one value in the form of an expression. They cannot contain commands or multiple expressions." 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "#### Syntax\n", 313 | "\n", 314 | "The syntax of `lambda` functions contains only a single statement, which is as follows:\n", 315 | "\n", 316 | "`lambda [arg1 [,arg2,.....argn]]:expression`" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 12, 322 | "metadata": { 323 | "collapsed": false 324 | }, 325 | "outputs": [ 326 | { 327 | "name": "stdout", 328 | "output_type": "stream", 329 | "text": [ 330 | "Value of total : 30\n", 331 | "Value of total : 40\n" 332 | ] 333 | } 334 | ], 335 | "source": [ 336 | "# Function definition is here - this function has two arguments and it adds them up\n", 337 | "sum = lambda arg1, arg2: arg1 + arg2;\n", 338 | "\n", 339 | "# Now you can call sum as a function\n", 340 | "print \"Value of total : \", sum( 10, 20 )\n", 341 | "print \"Value of total : \", sum( 20, 20 )" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "## 5. The return Statement\n", 349 | "\n", 350 | "We briefly used the return statement `return [expression]` in the above functions, but let's try to explain it more explicitly: It exits a function, optionally, passing back an expression to the caller. A return statement with no arguments is the same as `return None`." 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 17, 356 | "metadata": { 357 | "collapsed": false 358 | }, 359 | "outputs": [ 360 | { 361 | "name": "stdout", 362 | "output_type": "stream", 363 | "text": [ 364 | "-10\n" 365 | ] 366 | } 367 | ], 368 | "source": [ 369 | "# This function returns an expression\n", 370 | "def substractme( arg1, arg2 ):\n", 371 | " # Substracts the second parameter from the first and returns the result.\"\n", 372 | " total = arg1 - arg2\n", 373 | " return total;\n", 374 | "\n", 375 | "# Now you can call sum function\n", 376 | "total = substractme( 10, 20 );\n", 377 | "print total " 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "Warning: The retuned arguments are also order-specific! See below:" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 18, 390 | "metadata": { 391 | "collapsed": false 392 | }, 393 | "outputs": [ 394 | { 395 | "name": "stdout", 396 | "output_type": "stream", 397 | "text": [ 398 | "3\n", 399 | "-1\n", 400 | "2\n", 401 | "\n", 402 | "\n", 403 | "3\n", 404 | "2\n", 405 | "-1\n" 406 | ] 407 | } 408 | ], 409 | "source": [ 410 | "# if you have a function:\n", 411 | "def arithmetic( a, b ):\n", 412 | " sumab = a+b\n", 413 | " substractab = a-b\n", 414 | " multiplyab = a*b\n", 415 | " return sumab, substractab, multiplyab\n", 416 | " \n", 417 | "# This\n", 418 | "c, d, e = arithmetic(1,2)\n", 419 | "print c\n", 420 | "print d\n", 421 | "print e\n", 422 | "print '\\n'\n", 423 | "\n", 424 | "# does not alocate the same values to the varialbes c, d, e as this\n", 425 | "c, e, d = arithmetic(1,2)\n", 426 | "print c\n", 427 | "print d\n", 428 | "print e" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "## 5. Scope of Variables\n", 436 | "\n", 437 | "All variables in a program may not be accessible at all locations in that program. This depends on where you have declared a variable. The scope of a variable determines the portion of the program where you can access a particular identifier.\n", 438 | "\n", 439 | "There are two basic scopes of variables in Python:\n", 440 | "\n", 441 | "Global variables\n", 442 | "\n", 443 | "Local variables\n", 444 | "\n", 445 | "\n", 446 | "#### Global vs. Local variables\n", 447 | "\n", 448 | "Variables that are defined inside a function body have a local scope, and those defined outside have a global scope.\n", 449 | "\n", 450 | "This means that local variables can be accessed only inside the function in which they are declared, whereas global variables can be accessed throughout the program body by all functions. When you call a function, the variables declared inside it are brought into scope. Following is a simple example −" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 15, 456 | "metadata": { 457 | "collapsed": true 458 | }, 459 | "outputs": [], 460 | "source": [ 461 | "total = 0; # This is global variable." 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": 17, 467 | "metadata": { 468 | "collapsed": false 469 | }, 470 | "outputs": [ 471 | { 472 | "name": "stdout", 473 | "output_type": "stream", 474 | "text": [ 475 | "Inside the function (local) total: -10\n", 476 | "Outside the function (global) total : 0\n" 477 | ] 478 | } 479 | ], 480 | "source": [ 481 | "# Function definition is here\n", 482 | "def substractme( arg1, arg2 ):\n", 483 | " # Substracts the second parameter from the first and return them.\"\n", 484 | " total = arg1 - arg2; # Here total is a local variable.\n", 485 | " print \"Inside the function (local) total: \", total \n", 486 | " return total;\n", 487 | "\n", 488 | "# Now you can call sum function\n", 489 | "substractme( 10, 20 );\n", 490 | "print \"Outside the function (global) total : \", total " 491 | ] 492 | }, 493 | { 494 | "cell_type": "markdown", 495 | "metadata": {}, 496 | "source": [ 497 | "## 6. Python Modules\n", 498 | "\n", 499 | "Simply, a module is a file consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.\n", 500 | "\n", 501 | "A module allows you to logically organise your Python code. Grouping related code into a module makes the code easier to understand and use. \n", 502 | "\n", 503 | "For example, in a new file, copy and paste the following, making sure you undertand the code. Name the file module_example.py:\n", 504 | "\n", 505 | "```\n", 506 | "def print_func( par ):\n", 507 | " print \"Hello : \", par\n", 508 | " return\n", 509 | "```" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "#### The import Statement\n", 517 | "\n", 518 | "You can use any Python source file as a module by executing an import statement in some other Python source file. The import has the following syntax:\n", 519 | "\n", 520 | "`import module1[, module2[,... moduleN]`\n", 521 | "\n", 522 | "When the interpreter encounters an import statement, it imports the module if the module is present in the search path or the current directory.\n", 523 | "\n", 524 | "For example, to import the module module_example.py, you need to use the following command:" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 18, 530 | "metadata": { 531 | "collapsed": false 532 | }, 533 | "outputs": [ 534 | { 535 | "name": "stdout", 536 | "output_type": "stream", 537 | "text": [ 538 | "Hello : Zara\n" 539 | ] 540 | } 541 | ], 542 | "source": [ 543 | "# Import module support\n", 544 | "import module_example as modex\n", 545 | "\n", 546 | "# Now you can call defined functions of the module, as follows\n", 547 | "modex.print_func(\"Zara\")" 548 | ] 549 | }, 550 | { 551 | "cell_type": "markdown", 552 | "metadata": {}, 553 | "source": [ 554 | "A module is loaded only once, regardless of the number of times it is imported. This prevents the module execution from happening over and over again if multiple imports occur." 555 | ] 556 | }, 557 | { 558 | "cell_type": "markdown", 559 | "metadata": {}, 560 | "source": [ 561 | "#### The `from...import` statement and the `from...import *` statement:\n", 562 | "\n", 563 | "Python's `from import` statement lets you import specific attributes from a module into the current namespace. It has the following syntaxL\n", 564 | "\n", 565 | "`from modname import name1[, name2[, ... nameN]]`\n", 566 | "\n", 567 | "For example, to import the function `sum` from the module `scipy`, use the following statement:" 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": 19, 573 | "metadata": { 574 | "collapsed": true 575 | }, 576 | "outputs": [], 577 | "source": [ 578 | "from scipy import sum as s" 579 | ] 580 | }, 581 | { 582 | "cell_type": "markdown", 583 | "metadata": {}, 584 | "source": [ 585 | "This statement does not import the entire module/package scipy into the current namespace; it just introduces the item `sum` from the module `scipy`. Note that in this example it is renamed to `s`.\n", 586 | "\n", 587 | "It is also possible to import all names from a module into the current namespace by using the following import statement:" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 21, 593 | "metadata": { 594 | "collapsed": false, 595 | "scrolled": true 596 | }, 597 | "outputs": [], 598 | "source": [ 599 | "from scipy import *" 600 | ] 601 | }, 602 | { 603 | "cell_type": "markdown", 604 | "metadata": {}, 605 | "source": [ 606 | "#### References:\n", 607 | "\n", 608 | "http://www.tutorialspoint.com/python/ , \n", 609 | "https://github.com/tobyhodges/ITPP , \n", 610 | "https://github.com/cmci/HTManalysisCourse/blob/master/CentreCourseProtocol.md#workflow-python-primer , \n", 611 | "http://cmci.embl.de/documents/ijcourses" 612 | ] 613 | }, 614 | { 615 | "cell_type": "code", 616 | "execution_count": null, 617 | "metadata": { 618 | "collapsed": true 619 | }, 620 | "outputs": [], 621 | "source": [] 622 | } 623 | ], 624 | "metadata": { 625 | "kernelspec": { 626 | "display_name": "Python 2", 627 | "language": "python", 628 | "name": "python2" 629 | }, 630 | "language_info": { 631 | "codemirror_mode": { 632 | "name": "ipython", 633 | "version": 2 634 | }, 635 | "file_extension": ".py", 636 | "mimetype": "text/x-python", 637 | "name": "python", 638 | "nbconvert_exporter": "python", 639 | "pygments_lexer": "ipython2", 640 | "version": "2.7.11" 641 | } 642 | }, 643 | "nbformat": 4, 644 | "nbformat_minor": 0 645 | } 646 | --------------------------------------------------------------------------------