├── 1. Basics.ipynb
├── 2. OpenCV - Basics.ipynb
├── 3. Operations on Images .ipynb
├── 4. Image Processing.ipynb
├── 5. Feature Detection.ipynb
├── 6. Video Analysis.ipynb
├── 7. Camera Calibration and 3D Reconstruction.ipynb
├── 8. Image Denoising and Inpainting.ipynb
├── Barcode Detector.ipynb
├── Digit Recognition.ipynb
├── Document Scanner.ipynb
├── Face Detection using Haar Cascades .ipynb
├── FloodFill.ipynb
├── Human Detection.ipynb
├── Optical Character Recognition using K Nearest Neighbours.ipynb
├── Optical Character Recognition using Support Vector Machines.ipynb
├── Path Planning and Obstacle Detection.ipynb
├── README.md
├── Shape Detection.ipynb
├── Texture Flow.ipynb
├── Zooming.ipynb
├── captures
├── 1.png
├── ataritm.png
├── barcodedetection.png
├── batman1.png
├── br.png
├── br2.png
├── bthresh.png
├── circles.png
├── ck.png
├── closing.png
├── corners2.png
├── cs.png
├── denoise1.png
├── digitrecognizer.png
├── facedetection.png
├── featuresmatched.png
├── featuresmatched2.png
├── floodfill.png
├── foreground.png
├── gradient.png
├── hc.png
├── hista.png
├── histb.png
├── histc.png
├── histd.png
├── histe.png
├── humandetection1.png
├── humandetection2.png
├── imagepyramid.png
├── joker2.png
├── mask.png
├── mpl.png
├── opticalflow.png
├── original.png
├── res.png
├── resizedip.png
├── sat.png
├── shapes.png
├── shapesdetected.png
├── shapesntext.jpg
├── shapesthresh.png
├── sparrow.png
├── sparrows.png
├── step1ds.png
├── step2ds.png
├── step3ds.png
├── thres1.png
├── trackbar.png
└── zoom.png
├── datasets
├── digits.png
└── letter-recognition.data
├── images
├── 9ball.jpg
├── K.JPG
├── S.JPG
├── atari.jpg
├── atarit.png
├── batman.png
├── boundingrect.png
├── building.jpg
├── calib_pattern.jpg
├── calib_radial.jpg
├── calib_result.jpg
├── camshift_result.jpg
├── chess.png
├── circumcircle.png
├── coins.png
├── contour.jpg
├── contourapprox.jpg
├── contours.png
├── cube.png
├── cubeedge.jpg
├── cubeedge.png
├── denoise.png
├── denoisedimage.jpg
├── diamond.png
├── fast_kp.jpg
├── filter.jpg
├── fitellipse.png
├── fitline.jpg
├── floodfillshapes.png
├── flower - Copy.jpg
├── flower.jpg
├── google.jpg
├── googlelogo.jpg
├── grabcut_output1.jpg
├── grad.png
├── grad2.png
├── gray.jpg
├── grayscale.jpg
├── hd.png
├── hdigits.jpg
├── high_contrast.jpg
├── hist.png
├── ij1.jpg
├── im2.jpg
├── im3.jpg
├── im4.jpg
├── image1.jpg
├── image2.jpg
├── inpaint_result.jpg
├── invertedstar.png
├── joker (1).png
├── joker.png
├── left08.jpg
├── letters.JPG
├── m1left.jpg
├── m2right.jpg
├── messi.png
├── minion1.jpg
├── minionleft.jpg
├── minionright.jpg
├── minions.jpg
├── noise.jpg
├── noise.png
├── noise1.jpg
├── noiseimage.jpg
├── noisyim.jpg
├── opencv_logo.jpg
├── photo_1.jpg
├── photo_2.jpg
├── pokemon_games.png
├── pose_1.jpg
├── pose_2.jpg
├── rect.png
├── sat_noisy.jpg
├── shapes.png
├── shitomasi_block1.jpg
├── sift_keypoints.jpg
├── skew.png
├── star.png
├── star2.png
├── starry_night.jpg
├── surf_kp1.jpg
├── surf_kp2.jpg
├── template.jpg
├── th.png
├── th2.png
├── th3.png
├── th4.png
├── triangle.png
├── water_dt.jpg
├── water_fgbg.jpg
├── water_marker.jpg
├── water_result.jpg
└── water_thresh.jpg
├── results
├── Directblending.jpg
├── Pyramidblending.jpg
├── building1.png
├── chessboard1.png
└── textureflow.jpg
└── videos
├── mean_shift.webm
├── meanshiftoutput.mp4
├── people-walking.mp4
├── slow_traffic.mp4
└── sparrow.mp4
/1. Basics.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Programming Computer Vision : Basics\n",
8 | "___"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "## Reading Images\n",
16 | "Images can be read using Image class of Python library **PIL** [(Python Imaging Library)](http://www.pythonware.com/products/pil/)."
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": null,
22 | "metadata": {
23 | "collapsed": true
24 | },
25 | "outputs": [],
26 | "source": [
27 | "from PIL import Image\n",
28 | "im = Image.open('images/flower.jpg')"
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {},
34 | "source": [
35 | "The return value 'im' is a PIL image object. Thus the following image would be read.\n",
36 | "
"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "___\n",
44 | "## Color conversions\n",
45 | "We can use the `convert()` method for Color conversions. An image can be converted to grayscale using the .convert('L') function where 'L' simply is a mode that defines images as 8-bit pixels of black & white. To learn about other modes, you can visit http://pillow.readthedocs.org/en/3.1.x/handbook/concepts.html.\n",
46 | "The library supports transformations between each supported mode and the 'L' and 'RGB' modes. To convert between other modes, you may have to use an intermediate image (typically an “RGB” image)."
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": null,
52 | "metadata": {
53 | "collapsed": true
54 | },
55 | "outputs": [],
56 | "source": [
57 | "gray = im.convert('L')\n",
58 | "gray.show()"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "
"
66 | ]
67 | },
68 | {
69 | "cell_type": "markdown",
70 | "metadata": {},
71 | "source": [
72 | "___\n",
73 | "## Enhancement\n",
74 | "The ImageEnhance module can be used for image enhancement. Once created from an image, an enhancement object can be used to quickly try out different settings.\n",
75 | "You can adjust contrast, brightness, color balance and sharpness in this way."
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": null,
81 | "metadata": {
82 | "collapsed": true
83 | },
84 | "outputs": [],
85 | "source": [
86 | "from PIL import ImageEnhance\n",
87 | "\n",
88 | "enh = ImageEnhance.Contrast(im)\n",
89 | "enh.enhance(1.4).show(\"30% more contrast\")\n"
90 | ]
91 | },
92 | {
93 | "cell_type": "markdown",
94 | "metadata": {},
95 | "source": [
96 | "
"
97 | ]
98 | },
99 | {
100 | "cell_type": "markdown",
101 | "metadata": {},
102 | "source": [
103 | "___\n",
104 | "\n",
105 | "## Converting into other file format"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": null,
111 | "metadata": {
112 | "collapsed": true
113 | },
114 | "outputs": [],
115 | "source": [
116 | "from __future__ import print_function\n",
117 | "import os, sys\n",
118 | "from PIL import Image\n",
119 | "\n",
120 | "def convertToJPEG():\n",
121 | " for infile in sys.argv[1:]:\n",
122 | " f, e = os.path.splitext(infile)\n",
123 | " outfile = f + \".jpg\"\n",
124 | " if infile != outfile:\n",
125 | " try:\n",
126 | " Image.open(infile).save(outfile)\n",
127 | " except IOError:\n",
128 | " print(\"cannot convert\", infile)"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "This is a function that converts the images in our specified file format. The PIL function open() creates a PIL image object and the save() method saves the image to a file with the given filename.\n",
136 | "___\n",
137 | "## Creating Thumbnails"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": null,
143 | "metadata": {
144 | "collapsed": true
145 | },
146 | "outputs": [],
147 | "source": [
148 | "from __future__ import print_function\n",
149 | "import os, sys\n",
150 | "from PIL import Image\n",
151 | "\n",
152 | "size = (128, 128)\n",
153 | "\n",
154 | "def createThumbnails():\n",
155 | " for infile in sys.argv[1:]:\n",
156 | " outfile = os.path.splitext(infile)[0] + \".thumbnail\"\n",
157 | " if infile != outfile:\n",
158 | " try:\n",
159 | " im = Image.open(infile)\n",
160 | " im.thumbnail(size)\n",
161 | " im.save(outfile, \"JPEG\")\n",
162 | " except IOError:\n",
163 | " print(\"cannot create thumbnail for\", infile)\n"
164 | ]
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "metadata": {},
169 | "source": [
170 | "The thumbnail() method takes a tuple specifying the new size and converts the image to a thumbnail image with size that fits within the tuple.\n",
171 | "\n",
172 | "___\n",
173 | "## Copy and paste regions\n",
174 | "Cropping a region from an image is done using the crop() method."
175 | ]
176 | },
177 | {
178 | "cell_type": "code",
179 | "execution_count": null,
180 | "metadata": {
181 | "collapsed": true
182 | },
183 | "outputs": [],
184 | "source": [
185 | "box = (100,100,400,400)\n",
186 | "region = im.crop(box)"
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {},
192 | "source": [
193 | "\n",
194 | "The region is defined by a 4-tuple, where coordinates are (left, upper, right, lower). PIL uses a coordinate system with (0, 0) in the upper left corner. The extracted region can for example be rotated and then put back using the paste() method like this:"
195 | ]
196 | },
197 | {
198 | "cell_type": "code",
199 | "execution_count": null,
200 | "metadata": {
201 | "collapsed": true
202 | },
203 | "outputs": [],
204 | "source": [
205 | "region = region.transpose(Image.ROTATE_180)\n",
206 | "im.paste(region,box)\n"
207 | ]
208 | },
209 | {
210 | "cell_type": "markdown",
211 | "metadata": {},
212 | "source": [
213 | "____\n",
214 | "## Resize and rotate\n",
215 | "To resize an image, call resize() with a tuple giving the new size."
216 | ]
217 | },
218 | {
219 | "cell_type": "code",
220 | "execution_count": null,
221 | "metadata": {
222 | "collapsed": true
223 | },
224 | "outputs": [],
225 | "source": [
226 | "out = im.resize((128,128))"
227 | ]
228 | },
229 | {
230 | "cell_type": "markdown",
231 | "metadata": {},
232 | "source": [
233 | "To rotate an image, use counter clockwise angles and rotate() like this:"
234 | ]
235 | },
236 | {
237 | "cell_type": "code",
238 | "execution_count": null,
239 | "metadata": {
240 | "collapsed": true
241 | },
242 | "outputs": [],
243 | "source": [
244 | "out = im.rotate(45)"
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "metadata": {},
250 | "source": [
251 | "___\n",
252 | "\n",
253 | "## Using Matplotlib to plot images, points and lines:"
254 | ]
255 | },
256 | {
257 | "cell_type": "code",
258 | "execution_count": 4,
259 | "metadata": {
260 | "collapsed": true
261 | },
262 | "outputs": [],
263 | "source": [
264 | "from PIL import Image\n",
265 | "from pylab import *\n",
266 | "# read image to array\n",
267 | "im = array(Image.open('images/flower.jpg'))\n",
268 | "# plot the image\n",
269 | "imshow(im)\n",
270 | "# some points\n",
271 | "x = [100,100,400,400]\n",
272 | "y = [200,500,200,500]\n",
273 | "# plot the points with red star-markers\n",
274 | "plot(x,y,'r*')\n",
275 | "# line plot connecting the first two points\n",
276 | "plot(x[:2],y[:2])\n",
277 | "# add title and show the plot\n",
278 | "title('Plotting: \"flower.jpg')\n",
279 | "show()"
280 | ]
281 | },
282 | {
283 | "cell_type": "markdown",
284 | "metadata": {},
285 | "source": [
286 | "
"
287 | ]
288 | },
289 | {
290 | "cell_type": "markdown",
291 | "metadata": {},
292 | "source": [
293 | "### Image Contours"
294 | ]
295 | },
296 | {
297 | "cell_type": "code",
298 | "execution_count": 7,
299 | "metadata": {
300 | "collapsed": false
301 | },
302 | "outputs": [
303 | {
304 | "data": {
305 | "text/plain": [
306 | "(0.0, 400.0, 0.0, 300.0)"
307 | ]
308 | },
309 | "execution_count": 7,
310 | "metadata": {},
311 | "output_type": "execute_result"
312 | }
313 | ],
314 | "source": [
315 | "from PIL import Image\n",
316 | "from pylab import *\n",
317 | "# read image to array\n",
318 | "im = array(Image.open('images/flower.jpg').convert('L'))\n",
319 | "# create a new figure\n",
320 | "figure()\n",
321 | "# don’t use colors\n",
322 | "gray()\n",
323 | "# show contours with origin upper left corner\n",
324 | "contour(im, origin='image')\n",
325 | "axis('equal')\n",
326 | "axis('off')\n"
327 | ]
328 | },
329 | {
330 | "cell_type": "markdown",
331 | "metadata": {},
332 | "source": [
333 | "
\n",
334 | "### Histograms:"
335 | ]
336 | },
337 | {
338 | "cell_type": "code",
339 | "execution_count": 8,
340 | "metadata": {
341 | "collapsed": true
342 | },
343 | "outputs": [],
344 | "source": [
345 | "figure()\n",
346 | "hist(im.flatten(),128)\n",
347 | "show()"
348 | ]
349 | },
350 | {
351 | "cell_type": "markdown",
352 | "metadata": {},
353 | "source": [
354 | "
\n",
355 | "This shows the distribution of pixel values. A number of bins is specified for the span of values and each bin gets a count of how many pixels have values in the bin’s range. The visualization of the (graylevel) image histogram is done using the hist() function.\n",
356 | "The second argument specifies the number of bins to use. Note that the image needs to be flattened first, because hist() takes a one-dimensional array as input. The method flatten() converts any array to a one-dimensional array with values taken row-wise.\n",
357 | "___\n",
358 | "\n",
359 | "## Graylevel transforms using NumPy"
360 | ]
361 | },
362 | {
363 | "cell_type": "code",
364 | "execution_count": 9,
365 | "metadata": {
366 | "collapsed": true
367 | },
368 | "outputs": [],
369 | "source": [
370 | "from PIL import Image\n",
371 | "from numpy import *\n",
372 | "im = array(Image.open('images/flower.jpg').convert('L'))\n",
373 | "im2 = 255 - im #invert image\n",
374 | "im3 = (100.0/255) * im + 100 #clamp to interval 100...200\n",
375 | "im4 = 255.0 * (im/255.0)**2 #squared"
376 | ]
377 | },
378 | {
379 | "cell_type": "markdown",
380 | "metadata": {},
381 | "source": [
382 | "Converting these numpy arrays back into our grayscale images:"
383 | ]
384 | },
385 | {
386 | "cell_type": "code",
387 | "execution_count": null,
388 | "metadata": {
389 | "collapsed": true
390 | },
391 | "outputs": [],
392 | "source": [
393 | "npim2 = Image.fromarray(uint8(im2))\n",
394 | "npim2.show()\n",
395 | "npim3 = Image.fromarray(uint8(im3))\n",
396 | "npim3.show()\n",
397 | "npim4 = Image.fromarray(uint8(im4))\n",
398 | "npim4.show()"
399 | ]
400 | },
401 | {
402 | "cell_type": "markdown",
403 | "metadata": {},
404 | "source": [
405 | "Thus the three transformed grayscale images can be compared as follows:\n",
406 | "
"
407 | ]
408 | },
409 | {
410 | "cell_type": "markdown",
411 | "metadata": {},
412 | "source": [
413 | "___\n",
414 | "## Image De-noising\n",
415 | "Image de-noising is the process of removing image noise while at the same time trying to preserve details and structures"
416 | ]
417 | },
418 | {
419 | "cell_type": "code",
420 | "execution_count": null,
421 | "metadata": {
422 | "collapsed": true
423 | },
424 | "outputs": [],
425 | "source": [
426 | "from numpy import *\n",
427 | "\n",
428 | "def denoise(im, U_init, tolerance=0.1, tau=0.125, tv_weight=100):\n",
429 | " \"\"\" An implementation of the Rudin-Osher-Fatemi (ROF) denoising model\n",
430 | " using the numerical procedure presented in Eq. (11) of A. Chambolle\n",
431 | " (2005). Implemented using periodic boundary conditions \n",
432 | " (essentially turning the rectangular image domain into a torus!).\n",
433 | " \n",
434 | " Input:\n",
435 | " im - noisy input image (grayscale)\n",
436 | " U_init - initial guess for U\n",
437 | " tv_weight - weight of the TV-regularizing term\n",
438 | " tau - steplength in the Chambolle algorithm\n",
439 | " tolerance - tolerance for determining the stop criterion\n",
440 | " \n",
441 | " Output:\n",
442 | " U - denoised and detextured image (also the primal variable)\n",
443 | " T - texture residual\"\"\"\n",
444 | " \n",
445 | " #---Initialization\n",
446 | " m,n = im.shape #size of noisy image\n",
447 | "\n",
448 | " U = U_init\n",
449 | " Px = im #x-component to the dual field\n",
450 | " Py = im #y-component of the dual field\n",
451 | " error = 1 \n",
452 | " iteration = 0\n",
453 | "\n",
454 | " #---Main iteration\n",
455 | " while (error > tolerance):\n",
456 | " Uold = U\n",
457 | "\n",
458 | " #Gradient of primal variable\n",
459 | " LyU = vstack((U[1:,:],U[0,:])) #Left translation w.r.t. the y-direction\n",
460 | " LxU = hstack((U[:,1:],U.take([0],axis=1))) #Left translation w.r.t. the x-direction\n",
461 | "\n",
462 | " GradUx = LxU-U #x-component of U's gradient\n",
463 | " GradUy = LyU-U #y-component of U's gradient\n",
464 | "\n",
465 | " #First we update the dual varible\n",
466 | " PxNew = Px + (tau/tv_weight)*GradUx #Non-normalized update of x-component (dual)\n",
467 | " PyNew = Py + (tau/tv_weight)*GradUy #Non-normalized update of y-component (dual)\n",
468 | " NormNew = maximum(1,sqrt(PxNew**2+PyNew**2))\n",
469 | "\n",
470 | " Px = PxNew/NormNew #Update of x-component (dual)\n",
471 | " Py = PyNew/NormNew #Update of y-component (dual)\n",
472 | "\n",
473 | " #Then we update the primal variable\n",
474 | " RxPx =hstack((Px.take([-1],axis=1),Px[:,0:-1])) #Right x-translation of x-component\n",
475 | " RyPy = vstack((Py[-1,:],Py[0:-1,:])) #Right y-translation of y-component\n",
476 | " DivP = (Px-RxPx)+(Py-RyPy) #Divergence of the dual field.\n",
477 | " U = im + tv_weight*DivP #Update of the primal variable\n",
478 | "\n",
479 | " #Update of error-measure\n",
480 | " error = linalg.norm(U-Uold)/sqrt(n*m);\n",
481 | " iteration += 1;\n",
482 | "\n",
483 | " print iteration, error\n",
484 | "\n",
485 | " #The texture residual\n",
486 | " T = im - U\n",
487 | " print 'Number of ROF iterations: ', iteration\n",
488 | " \n",
489 | " return U,T"
490 | ]
491 | },
492 | {
493 | "cell_type": "markdown",
494 | "metadata": {},
495 | "source": [
496 | "In this example, we used the function roll(), which as the name suggests, \"rolls\" the values of an array cyclically around an axis. This is very convenient for computing neighbor differences, in this case for derivatives. We also used linalg.norm() which measures the difference between two arrays (in this case the image matrices U and Uold)\n",
497 | "\n",
498 | "We can now use the denoise function to remove noise from a real image This is the image to be tested:\n",
499 | "
"
500 | ]
501 | },
502 | {
503 | "cell_type": "code",
504 | "execution_count": null,
505 | "metadata": {
506 | "collapsed": true
507 | },
508 | "outputs": [],
509 | "source": [
510 | "from PIL import Image\n",
511 | "import pylab\n",
512 | "\n",
513 | "im = array(Image.open('images/noiseimage.jpg').convert('L'))\n",
514 | "U,T = denoise(im,im)\n",
515 | "\n",
516 | "pylab.figure()\n",
517 | "pylab.gray()\n",
518 | "pylab.imshow(U)\n",
519 | "pylab.axis('equal')\n",
520 | "pylab.axis('off')\n",
521 | "pylab.show()"
522 | ]
523 | },
524 | {
525 | "cell_type": "markdown",
526 | "metadata": {},
527 | "source": [
528 | "The resulting de-noised image is:\n",
529 | "
\n",
530 | " ___"
531 | ]
532 | },
533 | {
534 | "cell_type": "markdown",
535 | "metadata": {},
536 | "source": [
537 | "Thus we are done with the basics of Computer Vision. Next we would level up a bit by exploring the OpenCV library.\n",
538 | "
"
539 | ]
540 | }
541 | ],
542 | "metadata": {
543 | "kernelspec": {
544 | "display_name": "Python 3",
545 | "language": "python",
546 | "name": "python3"
547 | },
548 | "language_info": {
549 | "codemirror_mode": {
550 | "name": "ipython",
551 | "version": 3
552 | },
553 | "file_extension": ".py",
554 | "mimetype": "text/x-python",
555 | "name": "python",
556 | "nbconvert_exporter": "python",
557 | "pygments_lexer": "ipython3",
558 | "version": "3.5.2"
559 | }
560 | },
561 | "nbformat": 4,
562 | "nbformat_minor": 0
563 | }
564 |
--------------------------------------------------------------------------------
/2. OpenCV - Basics.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# OpenCV : Basics\n",
8 | "\n",
9 | "___"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "OpenCV (Open Source Computer Vision) is an image and video processing library of programming functions mainly aimed at real-time computer vision. OpenCV has bindings in C++, C, Python, Java and MATLAB/OCTAVE.\n",
17 | "\n",
18 | "**Applications** of OpenCV include variety of image and video analysis techniques like :"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "metadata": {},
24 | "source": [
25 | "\n",
26 | "- Egomotion
\n",
27 | "- Facial recognition system
\n",
28 | "- Gesture recognition
\n",
29 | "- Human–computer interaction (HCI)
\n",
30 | "- Mobile robotics
\n",
31 | "- Object Recognition
\n",
32 | "- Image Segmentation and recognition
\n",
33 | "- Stereopsis stereo vision: depth perception from 2 cameras
\n",
34 | "- Structure from motion (SFM)
\n",
35 | "- Motion tracking
\n",
36 | "- Augmented reality
\n",
37 | "- Optical Character Recognition
\n",
38 | "
\n"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "and lots of others.\n",
46 | "\n",
47 | "___\n",
48 | "\n",
49 | "In the next set of examples, we will primarily be working on Python. Installing OpenCV for python requires two main libraries, with an optional third. Below Python packages are to be downloaded and installed to their default locations.\n",
50 | "\n",
51 | "1. Python-2.7.x.\n",
52 | "2. Numpy.\n",
53 | "3. Matplotlib (Matplotlib is optional, but recommended since we use it a lot in our tutorials).\n",
54 | "\n",
55 | "## Windows Users:\n",
56 | "Download the appropriate wheel (.whl) file of opencv for your corresponding operating system from https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv\n",
57 | "\n",
58 | "Then open Command Prompt and direct to the Scripts folder and install the modules using pip:\n",
59 | ">> `C:/Python34/Scripts`\n",
60 | "\n",
61 | ">> `pip install _youropencvwhlfile_.whl`\n",
62 | "\n",
63 | ">> `pip install numpy`\n",
64 | "\n",
65 | ">> `pip install matplotlib`\n",
66 | "\n",
67 | "If this method doesn't work, here's an alternative : \n",
68 | "\n",
69 | "* Download latest OpenCV release from [here](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/2.4.6/OpenCV-2.4.6.0.exe/download) and double-click to extract it.\n",
70 | "* Goto opencv/build/python/2.7 folder.\n",
71 | "* Copy cv2.pyd to C:/Python27/lib/site-packages.\n",
72 | "* Open Python IDLE and type following codes in Python terminal."
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": null,
78 | "metadata": {
79 | "collapsed": false
80 | },
81 | "outputs": [],
82 | "source": [
83 | "import cv2\n",
84 | "print cv2.__version__"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "If the results are printed out without any errors then you have successfully installed OpenCV-Python.\n",
92 | "\n",
93 | "## Linux / Mac Users:\n",
94 | ">> `pip3 install numpy` or `apt-get install python3-numpy`.\n",
95 | "\n",
96 | "You may need to apt-get install python3-pip.\n",
97 | "\n",
98 | ">> `pip3 install matplotlib` or `apt-get install python3-matplotlib`.\n",
99 | "\n",
100 | ">> `apt-get install python-OpenCV`.\n",
101 | "\n",
102 | "Matplotlib is an optional choice for visualizing video or image frames . Numpy will be primarily used for its array functionality. Finally, we will be using the python-specific bindings for OpenCV called python-OpenCV.\n",
103 | "\n",
104 | "**[Here](http://www.pyimagesearch.com/2016/10/24/ubuntu-16-04-how-to-install-opencv/)'s an alternative solution to build and install OpenCV in Ubuntu.**\n",
105 | "\n",
106 | "\n",
107 | "Once installed, Run the following python module imports:"
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": 3,
113 | "metadata": {
114 | "collapsed": true
115 | },
116 | "outputs": [],
117 | "source": [
118 | "import cv2\n",
119 | "import matplotlib\n",
120 | "import numpy"
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "If there are no errors then we are good to go!\n",
128 | "___"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "## Getting started with images"
136 | ]
137 | },
138 | {
139 | "cell_type": "markdown",
140 | "metadata": {},
141 | "source": [
142 | "### Reading an image:\n",
143 | "\n",
144 | "Use the function _cv2.imread()_ to read an image. The image should be in the working directory or a full path of image should be given. I highly encourage you to use your own images as examples to increase fun as well as the learning curve.\n",
145 | "\n",
146 | "Second argument is a flag which specifies the way image should be read.\n",
147 | "\n",
148 | "cv2.IMREAD_COLOR : Loads a color image. Any transparency of image will be neglected. It is the default flag.\n",
149 | "cv2.IMREAD_GRAYSCALE : Loads image in grayscale mode\n",
150 | "cv2.IMREAD_UNCHANGED : Loads image as such including alpha channel"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": 8,
156 | "metadata": {
157 | "collapsed": true
158 | },
159 | "outputs": [],
160 | "source": [
161 | "import numpy as np\n",
162 | "import cv2\n",
163 | "# Load an color image in grayscale\n",
164 | "img = cv2.imread('images/flower.jpg',0)\n",
165 | "# Warning: Even if the image path is wrong, it won’t throw any error, but print img will give you None\n"
166 | ]
167 | },
168 | {
169 | "cell_type": "markdown",
170 | "metadata": {},
171 | "source": [
172 | "### Displaying an image:\n",
173 | "Use the function _cv2.imshow()_ to display an image in a window. The window automatically fits to the image size.\n",
174 | "First argument is a window name which is a string. second argument is our image. "
175 | ]
176 | },
177 | {
178 | "cell_type": "code",
179 | "execution_count": 9,
180 | "metadata": {
181 | "collapsed": true
182 | },
183 | "outputs": [],
184 | "source": [
185 | "cv2.imshow('image',img)\n",
186 | "cv2.waitKey(0)\n",
187 | "cv2.destroyAllWindows()"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "metadata": {},
193 | "source": [
194 | "A GUI will open as a result and would look like:\n",
195 | "
"
196 | ]
197 | },
198 | {
199 | "cell_type": "markdown",
200 | "metadata": {},
201 | "source": [
202 | "_cv2.waitKey()_ is a keyboard binding function. Its argument is the time in milliseconds. The function waits for\n",
203 | "specified milliseconds for any keyboard event. If you press any key in that time, the program continues. If 0 is passed,\n",
204 | "it waits indefinitely for a key stroke. It can also be set to detect specific key strokes like, if key a is pressed etc which\n",
205 | "we will discuss below.\n",
206 | "\n",
207 | "_cv2.destroyAllWindows()_ simply destroys all the windows we created. If you want to destroy any specific window,\n",
208 | "use the function cv2.destroyWindow() where you pass the exact window name as the argument.\n",
209 | "\n",
210 | "Note: There is a special case where you can already create a window and load image to it later. In that case, you can\n",
211 | "specify whether window is resizable or not. It is done with the function cv2.namedWindow(). By default, the flag is\n",
212 | "cv2.WINDOW_AUTOSIZE. But if you specify flag to be cv2.WINDOW_NORMAL, you can resize window. It will be\n",
213 | "helpful when image is too large in dimension and adding track bar to windows.\n",
214 | "\n",
215 | "This can be done using:"
216 | ]
217 | },
218 | {
219 | "cell_type": "code",
220 | "execution_count": 11,
221 | "metadata": {
222 | "collapsed": false
223 | },
224 | "outputs": [],
225 | "source": [
226 | "cv2.namedWindow('image', cv2.WINDOW_NORMAL)\n",
227 | "cv2.imshow('image',img)\n",
228 | "cv2.waitKey(0)\n",
229 | "cv2.destroyAllWindows()"
230 | ]
231 | },
232 | {
233 | "cell_type": "markdown",
234 | "metadata": {},
235 | "source": [
236 | "### Write an image\n",
237 | "Use the function _cv2.imwrite()_ to save an image.\n",
238 | "First argument is the file name, second argument is the image you want to save."
239 | ]
240 | },
241 | {
242 | "cell_type": "code",
243 | "execution_count": 12,
244 | "metadata": {
245 | "collapsed": false
246 | },
247 | "outputs": [
248 | {
249 | "data": {
250 | "text/plain": [
251 | "True"
252 | ]
253 | },
254 | "execution_count": 12,
255 | "metadata": {},
256 | "output_type": "execute_result"
257 | }
258 | ],
259 | "source": [
260 | "cv2.imwrite('flowergray.png',img)\n",
261 | "# This will save the image in PNG format in the working directory."
262 | ]
263 | },
264 | {
265 | "cell_type": "markdown",
266 | "metadata": {},
267 | "source": [
268 | "___\n",
269 | "## Getting started with Videos\n"
270 | ]
271 | },
272 | {
273 | "cell_type": "markdown",
274 | "metadata": {},
275 | "source": [
276 | "OpenCV provides a very simple interface to capture live stream with our own cameras. \n",
277 | "\n",
278 | "To capture a video, you need to create a VideoCapture object. Its argument can be either the device index or the name\n",
279 | "of a video file. Device index is just the number to specify which camera. \n",
280 | "If there are multiple cameras connected to your computer passing index as 0 or -1 would start the first camera; passing 1 as index would start the second camera and so on.\n",
281 | "\n",
282 | "After starting the respective camera, you can capture frame-by-frame. And at the end of capturing, we release the capture."
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": null,
288 | "metadata": {
289 | "collapsed": true
290 | },
291 | "outputs": [],
292 | "source": [
293 | "import numpy as np\n",
294 | "import cv2\n",
295 | "cap = cv2.VideoCapture(0)\n",
296 | "while(True):\n",
297 | " # Capture frame-by-frame\n",
298 | " ret, frame = cap.read()\n",
299 | " # Our operations on the frame come here\n",
300 | " gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)\n",
301 | " # Display the resulting frame\n",
302 | " cv2.imshow('frame',gray)\n",
303 | " if cv2.waitKey(1) & 0xFF == ord('q'):\n",
304 | " break\n",
305 | "# When everything done, release the capture\n",
306 | "cap.release()\n",
307 | "cv2.destroyAllWindows()"
308 | ]
309 | },
310 | {
311 | "cell_type": "markdown",
312 | "metadata": {},
313 | "source": [
314 | "This will capture a\n",
315 | "video from the camera (in this case the in-built webcam of my laptop), convert it into grayscale video and display it.\n",
316 | "\n",
317 | "cap.read() returns a bool (True/False). If frame is read correctly, it will be True. So you can check end of the\n",
318 | "video by checking this return value.\n",
319 | "Sometimes, cap may not have initialized the capture. In that case, this code shows error. You can check whether it is\n",
320 | "initialized or not by the method cap.isOpened(). If it is True, OK. Otherwise open it using cap.open()."
321 | ]
322 | },
323 | {
324 | "cell_type": "markdown",
325 | "metadata": {},
326 | "source": [
327 | "### Playing Video from file\n",
328 | "It is same as capturing from Camera, just change camera index with video file name. Also while displaying the frame,\n",
329 | "use appropriate time for _cv2.waitKey()_. If it is too less, video will be very fast and if it is too high, video will be\n",
330 | "slow (Well, that is how you can display videos in slow motion). 25 milliseconds will be OK in normal cases"
331 | ]
332 | },
333 | {
334 | "cell_type": "code",
335 | "execution_count": null,
336 | "metadata": {
337 | "collapsed": true
338 | },
339 | "outputs": [],
340 | "source": [
341 | "import numpy as np\n",
342 | "import cv2\n",
343 | "cap = cv2.VideoCapture('videos/people-walking.mp4')\n",
344 | "while(cap.isOpened()):\n",
345 | " ret, frame = cap.read()\n",
346 | " gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)\n",
347 | " cv2.imshow('frame',gray)\n",
348 | " if cv2.waitKey(1) & 0xFF == ord('q'):\n",
349 | " break\n",
350 | "cap.release()\n",
351 | "cv2.destroyAllWindows()"
352 | ]
353 | },
354 | {
355 | "cell_type": "markdown",
356 | "metadata": {},
357 | "source": [
358 | "### Saving a Video\n",
359 | "What if we want to save the video after we capture it and process it frame-by-frame? For images, it is very simple, just\n",
360 | "use cv2.imwrite().\n",
361 | "\n",
362 | "This time we create a VideoWriter object. We should specify the output file name (eg: output.avi). Then we should\n",
363 | "specify the FourCC code . Then number of frames per second (fps) and frame size should\n",
364 | "be passed. And last one is isColor flag. If it is True, encoder expect color frame, otherwise it works with grayscale\n",
365 | "frame.\n",
366 | "\n",
367 | "FourCC is a 4-byte code used to specify the video codec. The list of available codes can be found in fourcc.org. It is\n",
368 | "platform dependent.\n",
369 | "\n",
370 | "FourCC code is passed as cv2.VideoWriter_fourcc(’M’,’J’,’P’,’G’) or\n",
371 | "cv2.VideoWriter_fourcc(*’MJPG) for MJPG.\n",
372 | "\n",
373 | "Below code captures from a Camera, flip every frame in vertical direction and saves it."
374 | ]
375 | },
376 | {
377 | "cell_type": "code",
378 | "execution_count": null,
379 | "metadata": {
380 | "collapsed": true
381 | },
382 | "outputs": [],
383 | "source": [
384 | "import numpy as np\n",
385 | "import cv2\n",
386 | "cap = cv2.VideoCapture(0)\n",
387 | "# Define the codec and create VideoWriter object\n",
388 | "fourcc = cv2.VideoWriter_fourcc(*'XVID')\n",
389 | "out = cv2.VideoWriter('output.avi',fourcc, 20.0, (640,480))\n",
390 | "while(cap.isOpened()):\n",
391 | " ret, frame = cap.read()\n",
392 | " if ret==True:\n",
393 | " frame = cv2.flip(frame,0)\n",
394 | " # write the flipped frame\n",
395 | " out.write(frame)\n",
396 | " cv2.imshow('frame',frame)\n",
397 | " if cv2.waitKey(1) & 0xFF == ord('q'):\n",
398 | " break\n",
399 | " else:\n",
400 | " break\n",
401 | "# Release everything if job is finished\n",
402 | "cap.release()\n",
403 | "out.release()\n",
404 | "cv2.destroyAllWindows()"
405 | ]
406 | },
407 | {
408 | "cell_type": "markdown",
409 | "metadata": {},
410 | "source": [
411 | "___"
412 | ]
413 | },
414 | {
415 | "cell_type": "markdown",
416 | "metadata": {},
417 | "source": [
418 | "## Drawing and Writing Text on Images\n",
419 | "To draw different shapes using OpenCV we would be using functions like:\n",
420 | "_cv2.line(), cv2.circle() , cv2.rectangle(), cv2.ellipse() etc\n",
421 | "\n",
422 | "In all the above functions, you will see some common arguments as given below:\n",
423 | "* img : The image where you want to draw the shapes\n",
424 | "* color : Color of the shape. for BGR, pass it as a tuple, eg: (255,0,0) for blue. For grayscale, just pass the scalar value.\n",
425 | "* thickness : Thickness of the line or circle etc. If -1 is passed for closed figures like circles, it will fill the shape. default thickness = 1\n",
426 | "* lineType : Type of line, whether 8-connected, anti-aliased line etc. By default, it is 8-connected. cv2.LINE_AA gives anti-aliased line which looks great for curves.\n",
427 | "\n",
428 | "To add text to images you need to specify following things:\n",
429 | "* Text data that you want to write \n",
430 | "* Position coordinates of where you want put it (i.e. bottom-left corner where data starts). \n",
431 | "* Font type (Check cv2.putText() docs for supported fonts)\n",
432 | "* Font Scale (specifies the size of font)\n",
433 | "* regular things like color, thickness, lineType etc. For better look, lineType = cv2.LINE_AA is recommended.\n"
434 | ]
435 | },
436 | {
437 | "cell_type": "code",
438 | "execution_count": 5,
439 | "metadata": {
440 | "collapsed": true
441 | },
442 | "outputs": [],
443 | "source": [
444 | "import numpy as np\n",
445 | "import cv2\n",
446 | "\n",
447 | "img = cv2.imread('images/flower.jpg',cv2.IMREAD_COLOR)\n",
448 | "\n",
449 | "cv2.line(img,(0,0),(150,150),(255,255,255),15) # line\n",
450 | "# To draw a line, you need to pass starting and ending coordinates of line.\n",
451 | "\n",
452 | "cv2.rectangle(img,(15,25),(200,150),(0,0,255),15) # red rect \n",
453 | "# To draw a rectangle, you need top-left corner and bottom-right corner of rectangle.\n",
454 | "\n",
455 | "cv2.circle(img,(100,63), 55, (0,255,0), -1) #circle\n",
456 | "# To draw a circle, you need its center coordinates and radius.\n",
457 | "\n",
458 | "cv2.ellipse(img,(256,256),(100,50),0,0,180,255,-1) #elipse\n",
459 | "# To draw the ellipse, we need to pass follwing arguments : 1.center location (x,y); 2.axes lengths (major axis length, minor axis length).\n",
460 | "# then the angle of rotation of ellipse in anti-clockwise direction.\n",
461 | "# startAngle and endAngle denotes the starting and ending of ellipse arc measured in clockwise direction from major axis.\n",
462 | "# i.e. giving values 0 and 360 gives the full ellipse\n",
463 | "\n",
464 | "pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32) # polygon\n",
465 | "# To draw a polygon, first you need coordinates of vertices. Make those points into an array of shape ROWSx1x2 where\n",
466 | "# ROWS are number of vertices and it should be of type int32.\n",
467 | "pts = pts.reshape((-1,1,2))\n",
468 | "cv2.polylines(img, [pts], True, (0,255,255), 3)\n",
469 | "\n",
470 | "# writing\n",
471 | "font = cv2.FONT_HERSHEY_SIMPLEX\n",
472 | "cv2.putText(img,'Text!',(0,130), font, 1, (200,255,155), 2, cv2.LINE_AA)\n",
473 | "\n",
474 | "cv2.imshow('image',img)\n",
475 | "cv2.waitKey(0)\n",
476 | "cv2.destroyAllWindows()\n",
477 | "\n"
478 | ]
479 | },
480 | {
481 | "cell_type": "markdown",
482 | "metadata": {},
483 | "source": [
484 | "The output would be : \n",
485 | "
"
486 | ]
487 | },
488 | {
489 | "cell_type": "markdown",
490 | "metadata": {},
491 | "source": [
492 | "## Using Mouse as a Paint Brush:"
493 | ]
494 | },
495 | {
496 | "cell_type": "markdown",
497 | "metadata": {},
498 | "source": [
499 | "Here, we create a simple application which draws a circle on an image wherever we double-click on it.\n",
500 | "\n",
501 | "First we create a mouse callback function which is executed when a mouse event take place. Mouse event can be\n",
502 | "anything related to mouse like left-button down, left-button up, left-button double-click etc. It gives us the coordinates\n",
503 | "(x,y) for every mouse event. With this event and location, we can do whatever we like. To list all available events\n",
504 | "available, run the following code in Python terminal:"
505 | ]
506 | },
507 | {
508 | "cell_type": "code",
509 | "execution_count": 6,
510 | "metadata": {
511 | "collapsed": false
512 | },
513 | "outputs": [
514 | {
515 | "name": "stdout",
516 | "output_type": "stream",
517 | "text": [
518 | "['EVENT_FLAG_ALTKEY', 'EVENT_FLAG_CTRLKEY', 'EVENT_FLAG_LBUTTON', 'EVENT_FLAG_MBUTTON', 'EVENT_FLAG_RBUTTON', 'EVENT_FLAG_SHIFTKEY', 'EVENT_LBUTTONDBLCLK', 'EVENT_LBUTTONDOWN', 'EVENT_LBUTTONUP', 'EVENT_MBUTTONDBLCLK', 'EVENT_MBUTTONDOWN', 'EVENT_MBUTTONUP', 'EVENT_MOUSEHWHEEL', 'EVENT_MOUSEMOVE', 'EVENT_MOUSEWHEEL', 'EVENT_RBUTTONDBLCLK', 'EVENT_RBUTTONDOWN', 'EVENT_RBUTTONUP']\n"
519 | ]
520 | }
521 | ],
522 | "source": [
523 | "import cv2\n",
524 | "events = [i for i in dir(cv2) if 'EVENT' in i]\n",
525 | "print events"
526 | ]
527 | },
528 | {
529 | "cell_type": "markdown",
530 | "metadata": {},
531 | "source": [
532 | "Creating mouse callback function has a specific format which is same everywhere. It differs only in what the function\n",
533 | "does. So our mouse callback function does one thing, it draws a circle where we double-click."
534 | ]
535 | },
536 | {
537 | "cell_type": "code",
538 | "execution_count": null,
539 | "metadata": {
540 | "collapsed": true
541 | },
542 | "outputs": [],
543 | "source": [
544 | "import cv2\n",
545 | "import numpy as np\n",
546 | "# mouse callback function\n",
547 | "def draw_circle(event,x,y,flags,param):\n",
548 | " if event == cv2.EVENT_LBUTTONDBLCLK:\n",
549 | " cv2.circle(img,(x,y),100,(255,0,0),-1)\n",
550 | " \n",
551 | "# Create a black image, a window and bind the function to window\n",
552 | "img = np.zeros((512,512,3), np.uint8)\n",
553 | "cv2.namedWindow('image')\n",
554 | "cv2.setMouseCallback('image',draw_circle)\n",
555 | "\n",
556 | "while(1):\n",
557 | " cv2.imshow('image',img)\n",
558 | " if cv2.waitKey(20) & 0xFF == 27:\n",
559 | " break\n",
560 | "cv2.destroyAllWindows()"
561 | ]
562 | },
563 | {
564 | "cell_type": "markdown",
565 | "metadata": {},
566 | "source": [
567 | "Now we go for much more better application. In this, we draw either rectangles or circles (depending on the mode we\n",
568 | "select) by dragging the mouse like we do in Paint application. So our mouse callback function has two parts, one to\n",
569 | "draw rectangle and other to draw the circles. This specific example will be really helpful in creating and understanding\n",
570 | "some interactive applications like object tracking, image segmentation etc."
571 | ]
572 | },
573 | {
574 | "cell_type": "code",
575 | "execution_count": null,
576 | "metadata": {
577 | "collapsed": true
578 | },
579 | "outputs": [],
580 | "source": [
581 | "import cv2\n",
582 | "import numpy as np\n",
583 | "drawing = False # true if mouse is pressed\n",
584 | "mode = True # if True, draw rectangle. Press 'm' to toggle to curve\n",
585 | "ix,iy = -1,-1\n",
586 | "# mouse callback function\n",
587 | "def draw_circle(event,x,y,flags,param):\n",
588 | " global ix,iy,drawing,mode\n",
589 | " if event == cv2.EVENT_LBUTTONDOWN:\n",
590 | " drawing = True\n",
591 | " ix,iy = x,y\n",
592 | " elif event == cv2.EVENT_MOUSEMOVE:\n",
593 | " if drawing == True:\n",
594 | " if mode == True:\n",
595 | " cv2.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)\n",
596 | " else:\n",
597 | " cv2.circle(img,(x,y),5,(0,0,255),-1)\n",
598 | " elif event == cv2.EVENT_LBUTTONUP:\n",
599 | " drawing = False\n",
600 | " if mode == True:\n",
601 | " cv2.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)\n",
602 | " else:\n",
603 | " cv2.circle(img,(x,y),5,(0,0,255),-1)\n",
604 | "# Next we have to bind this mouse callback function to OpenCV window. In the main loop, we should set a keyboard binding for key ‘m’ to toggle between rectangle and circle. \n",
605 | "\n",
606 | "img = np.zeros((512,512,3), np.uint8)\n",
607 | "cv2.namedWindow('image')\n",
608 | "cv2.setMouseCallback('image',draw_circle)\n",
609 | "while(1):\n",
610 | " cv2.imshow('image',img)\n",
611 | " k = cv2.waitKey(1) & 0xFF\n",
612 | " if k == ord('m'):\n",
613 | " mode = not mode\n",
614 | " elif k == 27:\n",
615 | " break\n",
616 | "cv2.destroyAllWindows()"
617 | ]
618 | },
619 | {
620 | "cell_type": "markdown",
621 | "metadata": {},
622 | "source": [
623 | "## Trackbar as the Color Palette"
624 | ]
625 | },
626 | {
627 | "cell_type": "markdown",
628 | "metadata": {},
629 | "source": [
630 | "Here we will create a simple application which shows the color you specify. You have a window which shows the\n",
631 | "color and three trackbars to specify each of B,G,R colors. You slide the trackbar and correspondingly window color\n",
632 | "changes. By default, initial color will be set to Black.\n",
633 | "\n",
634 | "For _cv2.getTrackbarPos()_ function, first argument is the trackbar name, second one is the window name to which it is\n",
635 | "attached, third argument is the default value, fourth one is the maximum value and fifth one is the callback function which is executed everytime trackbar value changes. The callback function always has a default argument which is\n",
636 | "the trackbar position. In our case, function does nothing, so we simply pass.\n",
637 | "\n",
638 | "Another important application of trackbar is to use it as a button or switch. OpenCV, by default, doesn’t have button\n",
639 | "functionality. So you can use trackbar to get such functionality. In our application, we have created one switch in\n",
640 | "which application works only if switch is ON, otherwise screen is always black."
641 | ]
642 | },
643 | {
644 | "cell_type": "code",
645 | "execution_count": null,
646 | "metadata": {
647 | "collapsed": true
648 | },
649 | "outputs": [],
650 | "source": [
651 | "import cv2\n",
652 | "import numpy as np\n",
653 | "\n",
654 | "def nothing(x):\n",
655 | " pass\n",
656 | "\n",
657 | "# Create a black image, a window\n",
658 | "img = np.zeros((300,512,3), np.uint8)\n",
659 | "cv2.namedWindow('image')\n",
660 | "\n",
661 | "# create trackbars for color change\n",
662 | "cv2.createTrackbar('R','image',0,255,nothing)\n",
663 | "cv2.createTrackbar('G','image',0,255,nothing)\n",
664 | "cv2.createTrackbar('B','image',0,255,nothing)\n",
665 | "\n",
666 | "# create switch for ON/OFF functionality\n",
667 | "switch = '0 : OFF \\n1 : ON'\n",
668 | "cv2.createTrackbar(switch, 'image',0,1,nothing)\n",
669 | "\n",
670 | "while(1):\n",
671 | " cv2.imshow('image',img)\n",
672 | " k = cv2.waitKey(1) & 0xFF\n",
673 | " if k == 27:\n",
674 | " break\n",
675 | " # get current positions of four trackbars\n",
676 | " r = cv2.getTrackbarPos('R','image')\n",
677 | " g = cv2.getTrackbarPos('G','image')\n",
678 | " b = cv2.getTrackbarPos('B','image')\n",
679 | " s = cv2.getTrackbarPos(switch,'image')\n",
680 | " \n",
681 | " if s == 0:\n",
682 | " img[:] = 0\n",
683 | " else:\n",
684 | " img[:] = [b,g,r]\n",
685 | " \n",
686 | "cv2.destroyAllWindows()\n"
687 | ]
688 | },
689 | {
690 | "cell_type": "markdown",
691 | "metadata": {},
692 | "source": [
693 | "Our application would look something like this:\n",
694 | "\n",
695 | "
"
696 | ]
697 | },
698 | {
699 | "cell_type": "markdown",
700 | "metadata": {
701 | "collapsed": true
702 | },
703 | "source": [
704 | "___"
705 | ]
706 | }
707 | ],
708 | "metadata": {
709 | "kernelspec": {
710 | "display_name": "Python 3",
711 | "language": "python",
712 | "name": "python3"
713 | },
714 | "language_info": {
715 | "codemirror_mode": {
716 | "name": "ipython",
717 | "version": 3
718 | },
719 | "file_extension": ".py",
720 | "mimetype": "text/x-python",
721 | "name": "python",
722 | "nbconvert_exporter": "python",
723 | "pygments_lexer": "ipython3",
724 | "version": "3.5.2"
725 | }
726 | },
727 | "nbformat": 4,
728 | "nbformat_minor": 0
729 | }
730 |
--------------------------------------------------------------------------------
/7. Camera Calibration and 3D Reconstruction.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "deletable": true,
7 | "editable": true
8 | },
9 | "source": [
10 | " Camera Calibration and 3D Reconstruction
\n",
11 | "___"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {
17 | "deletable": true,
18 | "editable": true
19 | },
20 | "source": [
21 | "\n",
22 | "# Camera Calibration\n",
23 | "The everyday used pinhole cameras introduce a lot of distortion to images. Two major distortions are radial distortion and tangential distortion.\n",
24 | "\n",
25 | "Due to radial distortion, straight lines will appear curved. Its effect is more as we move away from the center of image. For example, one image is shown below, where two edges of a chess board are marked with red lines. But you can see that border is not a straight line and doesn't match with the red line. All the expected straight lines are bulged out.\n",
26 | "
"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {
32 | "deletable": true,
33 | "editable": true
34 | },
35 | "source": [
36 | "To understand these distortions in depth and get a matematical understanding, you can visit: Distortion(optics)\n",
37 | "\n",
38 | "For the stereo applications, these distortions need to be corrected first. To find the intrinsic and extrinsic parameters of camera, we have to provide some sample images of a well defined pattern like a chess board. We find some specific points in it ( square corners in chess board). We know its coordinates in real world space and we know its coordinates in image. With these data, some mathematical problem is solved in background to get the distortion coefficients. For better results, we need atleast 10 test patterns.\n",
39 | "\n",
40 | "We will use the image of chess board (see samples/cpp/left01.jpg – left14.jpg) that come with OpenCV istelf.\n",
41 | "\n",
42 | "Example image is shown below:\n"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {
48 | "collapsed": true,
49 | "deletable": true,
50 | "editable": true
51 | },
52 | "source": [
53 | "
"
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {
59 | "deletable": true,
60 | "editable": true
61 | },
62 | "source": [
63 | "Important input datas needed for camera calibration is a set of 3D real world points and its corresponding 2D image points. 2D image points are OK which we can easily find from the image. (These image points are locations where two black squares touch each other in chess boards)\n",
64 | "\n",
65 | "What about the 3D points from real world space? Those images are taken from a static camera and chess boards are placed at different locations and orientations. So we need to know (X,Y,Z) values. But for simplicity, we can say chess board was kept stationary at XY plane, (so Z=0 always) and camera was moved accordingly. This consideration helps us to find only X,Y values. Now for X,Y values, we can simply pass the points as (0,0), (1,0), (2,0), ... which denotes the location of points. In this case, the results we get will be in the scale of size of chess board square. But if we know the square size, (say 30 mm), and we can pass the values as (0,0),(30,0),(60,0),..., we get the results in mm.\n",
66 | "\n",
67 | "3D points are called object points and 2D image points are called image points.\n",
68 | "\n",
69 | "So to find pattern in chess board, we use the function, cv2.findChessboardCorners(). We also need to pass what kind of pattern we are looking, like 8x8 grid, 5x5 grid etc. In this example, we use 7x6 grid. (Normally a chess board has 8x8 squares and 7x7 internal corners). It returns the corner points and retval which will be True if pattern is obtained. These corners will be placed in an order (from left-to-right, top-to-bottom)\n",
70 | "\n",
71 | "\n",
72 | "This function may not be able to find the required pattern in all the images. So one good option is to write the code such that, it starts the camera and check each frame for required pattern. Once pattern is obtained, find the corners and store it in a list. Also provides some interval before reading next frame so that we can adjust our chess board in different direction. Continue this process until required number of good patterns are obtained. Even in the example provided here, we are not sure out of 14 images given, how many are good. So we read all the images and take the good ones.\n",
73 | "Instead of chess board, we can use some circular grid, but then use the function _cv2.findCirclesGrid()_ to find the pattern. It is said that less number of images are enough when using circular grid.\n",
74 | "Once we find the corners, we can increase their accuracy using cv2.cornerSubPix(). We can also draw the pattern using cv2.drawChessboardCorners(). All these steps are included in below code:"
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": null,
80 | "metadata": {
81 | "collapsed": true,
82 | "deletable": true,
83 | "editable": true,
84 | "scrolled": true
85 | },
86 | "outputs": [],
87 | "source": [
88 | "import numpy as np\n",
89 | "import cv2\n",
90 | "import glob\n",
91 | "\n",
92 | "# termination criteria\n",
93 | "criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)\n",
94 | "\n",
95 | "# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)\n",
96 | "objp = np.zeros((6*7,3), np.float32)\n",
97 | "objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)\n",
98 | "\n",
99 | "# Arrays to store object points and image points from all the images.\n",
100 | "objpoints = [] # 3d point in real world space\n",
101 | "imgpoints = [] # 2d points in image plane.\n",
102 | "\n",
103 | "# You'll have to store the chessboard images in the directory of this script\n",
104 | "images = glob.glob('*.jpg')\n",
105 | "\n",
106 | "for fname in images:\n",
107 | " img = cv2.imread(fname)\n",
108 | " gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)\n",
109 | " \n",
110 | " # Find the chess board corners\n",
111 | " ret, corners = cv2.findChessboardCorners(gray, (7,6),None)\n",
112 | " \n",
113 | " # If found, add object points, image points (after refining them)\n",
114 | " if ret == True:\n",
115 | " objpoints.append(objp)\n",
116 | " corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)\n",
117 | " imgpoints.append(corners2)\n",
118 | " \n",
119 | " # Draw and display the corners\n",
120 | " img = cv2.drawChessboardCorners(img, (7,6), corners2,ret)\n",
121 | " cv2.imshow('img',img)\n",
122 | " cv2.waitKey(500)\n",
123 | " \n",
124 | "cv2.destroyAllWindows()"
125 | ]
126 | },
127 | {
128 | "cell_type": "markdown",
129 | "metadata": {
130 | "deletable": true,
131 | "editable": true
132 | },
133 | "source": [
134 | "One image with pattern drawn on it is shown below:\n",
135 | "
"
136 | ]
137 | },
138 | {
139 | "cell_type": "markdown",
140 | "metadata": {
141 | "deletable": true,
142 | "editable": true
143 | },
144 | "source": [
145 | "## Calibration:\n",
146 | "\n",
147 | "So now we have our object points and image points we are ready to go for calibration. For that we use the function, cv2.calibrateCamera(). It returns the camera matrix, distortion coefficients, rotation and translation vectors etc.\n"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": null,
153 | "metadata": {
154 | "collapsed": false,
155 | "deletable": true,
156 | "editable": true
157 | },
158 | "outputs": [],
159 | "source": [
160 | "ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)"
161 | ]
162 | },
163 | {
164 | "cell_type": "markdown",
165 | "metadata": {
166 | "deletable": true,
167 | "editable": true
168 | },
169 | "source": [
170 | "## Undistortion:\n",
171 | "\n",
172 | "We have got what we were trying. Now we can take an image and undistort it. OpenCV comes with two methods, we will see both. But before that, we can refine the camera matrix based on a free scaling parameter using cv2.getOptimalNewCameraMatrix(). If the scaling parameter alpha=0, it returns undistorted image with minimum unwanted pixels. So it may even remove some pixels at image corners. If alpha=1, all pixels are retained with some extra black images. It also returns an image ROI which can be used to crop the result.\n",
173 | "\n",
174 | "So we take a new image (left12.jpg in this case. That is the first image in this chapter)"
175 | ]
176 | },
177 | {
178 | "cell_type": "code",
179 | "execution_count": null,
180 | "metadata": {
181 | "collapsed": false,
182 | "deletable": true,
183 | "editable": true,
184 | "scrolled": true
185 | },
186 | "outputs": [],
187 | "source": [
188 | "img = cv2.imread('left12.jpg')\n",
189 | "h, w = img.shape[:2]\n",
190 | "newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))"
191 | ]
192 | },
193 | {
194 | "cell_type": "markdown",
195 | "metadata": {
196 | "deletable": true,
197 | "editable": true
198 | },
199 | "source": [
200 | "### 1. Using cv2.undistort()\n",
201 | "\n",
202 | "This is the shortest path. Just call the function and use ROI obtained above to crop the result."
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": null,
208 | "metadata": {
209 | "collapsed": true,
210 | "deletable": true,
211 | "editable": true
212 | },
213 | "outputs": [],
214 | "source": [
215 | "# undistort\n",
216 | "dst = cv2.undistort(img, mtx, dist, None, newcameramtx)\n",
217 | "# crop the image\n",
218 | "x,y,w,h = roi\n",
219 | "dst = dst[y:y+h, x:x+w]\n",
220 | "cv2.imwrite('calibresult.png',dst)"
221 | ]
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {
226 | "deletable": true,
227 | "editable": true
228 | },
229 | "source": [
230 | "### 2. Using remapping\n",
231 | "\n",
232 | "This is curved path. First find a mapping function from distorted image to undistorted image. Then use the remap function."
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": null,
238 | "metadata": {
239 | "collapsed": true,
240 | "deletable": true,
241 | "editable": true
242 | },
243 | "outputs": [],
244 | "source": [
245 | "# undistort\n",
246 | "mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)\n",
247 | "dst = cv2.remap(img,mapx,mapy,cv2.INTER_LINEAR)\n",
248 | "# crop the image\n",
249 | "x,y,w,h = roi\n",
250 | "dst = dst[y:y+h, x:x+w]\n",
251 | "cv2.imwrite('calibresult.png',dst)"
252 | ]
253 | },
254 | {
255 | "cell_type": "markdown",
256 | "metadata": {
257 | "deletable": true,
258 | "editable": true
259 | },
260 | "source": [
261 | "Both the methods give the same result. See the result below:\n",
262 | "
\n",
263 | "\n",
264 | "You can see in the result that all the edges are straight.\n",
265 | "\n",
266 | "Now you can store the camera matrix and distortion coefficients using write functions in Numpy (np.savez, np.savetxt etc) for future uses."
267 | ]
268 | },
269 | {
270 | "cell_type": "markdown",
271 | "metadata": {},
272 | "source": [
273 | "
\n",
274 | "\n",
275 | "Following is the complete script that implements camera calibration for distorted images with chess board samples.\n",
276 | "It reads distorted images, calculates the calibration and write undistorted images to a folder nameed 'output'.\n",
277 | "\n",
278 | "#### usage:\n",
279 | " ```calibrate.py [--debug