├── .gitattributes
├── .gitignore
├── IntroParallelProgramming.sln
├── Lesson1-CubeNumbers
    ├── IntroParallelProgramming.vcxproj
    └── main.cu
├── Lesson4-Reduction
    └── Lesson4-Reduction.vcxproj
├── ProblemSet1-RGB2Gray
    ├── CMakeLists.txt
    ├── HW1.cpp
    ├── HW1_differenceImage.png
    ├── HW1_reference.png
    ├── Makefile
    ├── RGB2Gray.vcxproj
    ├── cinque_terre.gold
    ├── cinque_terre_gray.jpg
    ├── cinque_terre_small.jpg
    ├── compare.cpp
    ├── compare.h
    ├── main.cpp
    ├── reference_calc.cpp
    ├── reference_calc.h
    ├── student_func.cu
    ├── timer.h
    └── utils.h
├── ProblemSet2-Blur
    ├── CMakeLists.txt
    ├── HW2.cpp
    ├── HW2_differenceImage.png
    ├── HW2_reference.png
    ├── Makefile
    ├── ProblemSet2-Blur.vcxproj
    ├── cinque_terre.gold
    ├── cinque_terre_blur.jpg
    ├── cinque_terre_small.jpg
    ├── compare.cpp
    ├── compare.h
    ├── main.cpp
    ├── reference_calc.cpp
    ├── reference_calc.h
    ├── student_func.cu
    ├── timer.h
    └── utils.h
├── ProblemSet3-ToneMapping
    ├── CMakeLists.txt
    ├── HDR-image.jpg
    ├── HDR-image_mapped.png
    ├── HW3.cu
    ├── HW3_differenceImage.png
    ├── HW3_reference.png
    ├── HW3_reference_old.png
    ├── Makefile
    ├── ProblemSet3-ToneMapping.vcxproj
    ├── compare.cpp
    ├── compare.h
    ├── input.png
    ├── loadSaveImage.cpp
    ├── loadSaveImage.h
    ├── main.cpp
    ├── memorial.exr
    ├── memorial_large.exr
    ├── memorial_png.gold
    ├── memorial_png_large.gold
    ├── memorial_raw.png
    ├── memorial_raw_large.png
    ├── memorial_raw_large_mapped.png
    ├── memorial_raw_mapped.png
    ├── my_output.png
    ├── reference_calc.cpp
    ├── reference_calc.h
    ├── student_func.cu
    ├── timer.h
    └── utils.h
├── ProblemSet4-RedEyeRemoval
    ├── CMakeLists.txt
    ├── HW4.cu
    ├── HW4_output.png
    ├── Makefile
    ├── ProblemSet4-RedEyeRemoval.vcxproj
    ├── compare.cpp
    ├── compare.h
    ├── loadSaveImage.cpp
    ├── loadSaveImage.h
    ├── main.cpp
    ├── red_eye_effect.gold
    ├── red_eye_effect_5.jpg
    ├── red_eye_effect_5_out.jpg
    ├── red_eye_effect_template_5.jpg
    ├── reference_calc.cpp
    ├── reference_calc.h
    ├── student_func.cu
    ├── timer.h
    └── utils.h
├── ProblemSet5-OptimizedHistogram
    ├── ProblemSet5-OptimizedHistogram.vcxproj
    ├── main.cu
    ├── reference_calc.cpp
    ├── reference_calc.h
    ├── student.cu
    ├── timer.h
    └── utils.h
├── ProblemSet6-SeamlessImageCloning
    ├── HW6.cu
    ├── HW6_differenceImage.png
    ├── HW6_output.png
    ├── HW6_reference.png
    ├── ProblemSet6-SeamlessImageCloning.vcxproj
    ├── blended.gold
    ├── compare.cpp
    ├── compare.h
    ├── destination.png
    ├── loadSaveImage.cpp
    ├── loadSaveImage.h
    ├── main.cpp
    ├── reference_calc.cpp
    ├── reference_calc.h
    ├── source.png
    ├── student_func.cu
    ├── timer.h
    └── utils.h
└── README.md


/.gitattributes:
--------------------------------------------------------------------------------
 1 | ###############################################################################
 2 | # Set default behavior to automatically normalize line endings.
 3 | ###############################################################################
 4 | * text=auto
 5 | 
 6 | ###############################################################################
 7 | # Set default behavior for command prompt diff.
 8 | #
 9 | # This is need for earlier builds of msysgit that does not have it on by
10 | # default for csharp files.
11 | # Note: This is only used by command line
12 | ###############################################################################
13 | #*.cs     diff=csharp
14 | 
15 | ###############################################################################
16 | # Set the merge driver for project and solution files
17 | #
18 | # Merging from the command prompt will add diff markers to the files if there
19 | # are conflicts (Merging from VS is not affected by the settings below, in VS
20 | # the diff markers are never inserted). Diff markers may cause the following 
21 | # file extensions to fail to load in VS. An alternative would be to treat
22 | # these files as binary and thus will always conflict and require user
23 | # intervention with every merge. To do so, just uncomment the entries below
24 | ###############################################################################
25 | #*.sln       merge=binary
26 | #*.csproj    merge=binary
27 | #*.vbproj    merge=binary
28 | #*.vcxproj   merge=binary
29 | #*.vcproj    merge=binary
30 | #*.dbproj    merge=binary
31 | #*.fsproj    merge=binary
32 | #*.lsproj    merge=binary
33 | #*.wixproj   merge=binary
34 | #*.modelproj merge=binary
35 | #*.sqlproj   merge=binary
36 | #*.wwaproj   merge=binary
37 | 
38 | ###############################################################################
39 | # behavior for image files
40 | #
41 | # image files are treated as binary by default.
42 | ###############################################################################
43 | #*.jpg   binary
44 | #*.png   binary
45 | #*.gif   binary
46 | 
47 | ###############################################################################
48 | # diff behavior for common document formats
49 | # 
50 | # Convert binary document formats to text before diffing them. This feature
51 | # is only available from the command line. Turn it on by uncommenting the 
52 | # entries below.
53 | ###############################################################################
54 | #*.doc   diff=astextplain
55 | #*.DOC   diff=astextplain
56 | #*.docx  diff=astextplain
57 | #*.DOCX  diff=astextplain
58 | #*.dot   diff=astextplain
59 | #*.DOT   diff=astextplain
60 | #*.pdf   diff=astextplain
61 | #*.PDF   diff=astextplain
62 | #*.rtf   diff=astextplain
63 | #*.RTF   diff=astextplain
64 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | ## Ignore Visual Studio temporary files, build results, and
  2 | ## files generated by popular Visual Studio add-ons.
  3 | 
  4 | # User-specific files
  5 | *.suo
  6 | *.user
  7 | *.userosscache
  8 | *.sln.docstates
  9 | 
 10 | # User-specific files (MonoDevelop/Xamarin Studio)
 11 | *.userprefs
 12 | 
 13 | # Build results
 14 | [Dd]ebug/
 15 | [Dd]ebugPublic/
 16 | [Rr]elease/
 17 | [Rr]eleases/
 18 | [Xx]64/
 19 | [Xx]86/
 20 | [Bb]uild/
 21 | bld/
 22 | [Bb]in/
 23 | [Oo]bj/
 24 | 
 25 | # Visual Studio 2015 cache/options directory
 26 | .vs/
 27 | # Uncomment if you have tasks that create the project's static files in wwwroot
 28 | #wwwroot/
 29 | 
 30 | # MSTest test Results
 31 | [Tt]est[Rr]esult*/
 32 | [Bb]uild[Ll]og.*
 33 | 
 34 | # NUNIT
 35 | *.VisualState.xml
 36 | TestResult.xml
 37 | 
 38 | # Build Results of an ATL Project
 39 | [Dd]ebugPS/
 40 | [Rr]eleasePS/
 41 | dlldata.c
 42 | 
 43 | # DNX
 44 | project.lock.json
 45 | artifacts/
 46 | 
 47 | *_i.c
 48 | *_p.c
 49 | *_i.h
 50 | *.ilk
 51 | *.meta
 52 | *.obj
 53 | *.pch
 54 | *.pdb
 55 | *.pgc
 56 | *.pgd
 57 | *.rsp
 58 | *.sbr
 59 | *.tlb
 60 | *.tli
 61 | *.tlh
 62 | *.tmp
 63 | *.tmp_proj
 64 | *.log
 65 | *.vspscc
 66 | *.vssscc
 67 | .builds
 68 | *.pidb
 69 | *.svclog
 70 | *.scc
 71 | 
 72 | # Chutzpah Test files
 73 | _Chutzpah*
 74 | 
 75 | # Visual C++ cache files
 76 | ipch/
 77 | *.aps
 78 | *.ncb
 79 | *.opendb
 80 | *.opensdf
 81 | *.sdf
 82 | *.cachefile
 83 | *.VC.db
 84 | 
 85 | # Visual Studio profiler
 86 | *.psess
 87 | *.vsp
 88 | *.vspx
 89 | *.sap
 90 | 
 91 | # TFS 2012 Local Workspace
 92 | $tf/
 93 | 
 94 | # Guidance Automation Toolkit
 95 | *.gpState
 96 | 
 97 | # ReSharper is a .NET coding add-in
 98 | _ReSharper*/
 99 | *.[Rr]e[Ss]harper
100 | *.DotSettings.user
101 | 
102 | # JustCode is a .NET coding add-in
103 | .JustCode
104 | 
105 | # TeamCity is a build add-in
106 | _TeamCity*
107 | 
108 | # DotCover is a Code Coverage Tool
109 | *.dotCover
110 | 
111 | # NCrunch
112 | _NCrunch_*
113 | .*crunch*.local.xml
114 | nCrunchTemp_*
115 | 
116 | # MightyMoose
117 | *.mm.*
118 | AutoTest.Net/
119 | 
120 | # Web workbench (sass)
121 | .sass-cache/
122 | 
123 | # Installshield output folder
124 | [Ee]xpress/
125 | 
126 | # DocProject is a documentation generator add-in
127 | DocProject/buildhelp/
128 | DocProject/Help/*.HxT
129 | DocProject/Help/*.HxC
130 | DocProject/Help/*.hhc
131 | DocProject/Help/*.hhk
132 | DocProject/Help/*.hhp
133 | DocProject/Help/Html2
134 | DocProject/Help/html
135 | 
136 | # Click-Once directory
137 | publish/
138 | 
139 | # Publish Web Output
140 | *.[Pp]ublish.xml
141 | *.azurePubxml
142 | 
143 | # TODO: Un-comment the next line if you do not want to checkin
144 | # your web deploy settings because they may include unencrypted
145 | # passwords
146 | #*.pubxml
147 | *.publishproj
148 | 
149 | # NuGet Packages
150 | *.nupkg
151 | # The packages folder can be ignored because of Package Restore
152 | **/packages/*
153 | # except build/, which is used as an MSBuild target.
154 | !**/packages/build/
155 | # Uncomment if necessary however generally it will be regenerated when needed
156 | #!**/packages/repositories.config
157 | # NuGet v3's project.json files produces more ignoreable files
158 | *.nuget.props
159 | *.nuget.targets
160 | 
161 | # Microsoft Azure Build Output
162 | csx/
163 | *.build.csdef
164 | 
165 | # Microsoft Azure Emulator
166 | ecf/
167 | rcf/
168 | 
169 | # Windows Store app package directory
170 | AppPackages/
171 | BundleArtifacts/
172 | 
173 | # Visual Studio cache files
174 | # files ending in .cache can be ignored
175 | *.[Cc]ache
176 | # but keep track of directories ending in .cache
177 | !*.[Cc]ache/
178 | 
179 | # Others
180 | ClientBin/
181 | [Ss]tyle[Cc]op.*
182 | ~$*
183 | *~
184 | *.dbmdl
185 | *.dbproj.schemaview
186 | *.pfx
187 | *.publishsettings
188 | node_modules/
189 | orleans.codegen.cs
190 | 
191 | # RIA/Silverlight projects
192 | Generated_Code/
193 | 
194 | # Backup & report files from converting an old project file
195 | # to a newer Visual Studio version. Backup files are not needed,
196 | # because we have git ;-)
197 | _UpgradeReport_Files/
198 | Backup*/
199 | UpgradeLog*.XML
200 | UpgradeLog*.htm
201 | 
202 | # SQL Server files
203 | *.mdf
204 | *.ldf
205 | 
206 | # Business Intelligence projects
207 | *.rdl.data
208 | *.bim.layout
209 | *.bim_*.settings
210 | 
211 | # Microsoft Fakes
212 | FakesAssemblies/
213 | 
214 | # GhostDoc plugin setting file
215 | *.GhostDoc.xml
216 | 
217 | # Node.js Tools for Visual Studio
218 | .ntvs_analysis.dat
219 | 
220 | # Visual Studio 6 build log
221 | *.plg
222 | 
223 | # Visual Studio 6 workspace options file
224 | *.opt
225 | 
226 | # Visual Studio LightSwitch build output
227 | **/*.HTMLClient/GeneratedArtifacts
228 | **/*.DesktopClient/GeneratedArtifacts
229 | **/*.DesktopClient/ModelManifest.xml
230 | **/*.Server/GeneratedArtifacts
231 | **/*.Server/ModelManifest.xml
232 | _Pvt_Extensions
233 | 
234 | # LightSwitch generated files
235 | GeneratedArtifacts/
236 | ModelManifest.xml
237 | 
238 | # Paket dependency manager
239 | .paket/paket.exe
240 | 
241 | # FAKE - F# Make
242 | .fake/
243 | 


--------------------------------------------------------------------------------
/IntroParallelProgramming.sln:
--------------------------------------------------------------------------------
 1 | ﻿
 2 | Microsoft Visual Studio Solution File, Format Version 12.00
 3 | # Visual Studio 14
 4 | VisualStudioVersion = 14.0.25420.1
 5 | MinimumVisualStudioVersion = 10.0.40219.1
 6 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "Lesson1-CubeNumbers", "Lesson1-CubeNumbers\IntroParallelProgramming.vcxproj", "{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}"
 7 | EndProject
 8 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet1-RGB2Gray", "ProblemSet1-RGB2Gray\RGB2Gray.vcxproj", "{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}"
 9 | EndProject
10 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet2-Blur", "ProblemSet2-Blur\ProblemSet2-Blur.vcxproj", "{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}"
11 | EndProject
12 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet3-ToneMapping", "ProblemSet3-ToneMapping\ProblemSet3-ToneMapping.vcxproj", "{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}"
13 | EndProject
14 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet4-RedEyeRemoval", "ProblemSet4-RedEyeRemoval\ProblemSet4-RedEyeRemoval.vcxproj", "{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}"
15 | EndProject
16 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet5-OptimizedHistogram", "ProblemSet5-OptimizedHistogram\ProblemSet5-OptimizedHistogram.vcxproj", "{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}"
17 | EndProject
18 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet6-SeamlessImageCloning", "ProblemSet6-SeamlessImageCloning\ProblemSet6-SeamlessImageCloning.vcxproj", "{5781233B-6022-4F34-B559-1473B9674B39}"
19 | EndProject
20 | Global
21 | 	GlobalSection(SolutionConfigurationPlatforms) = preSolution
22 | 		Debug|x64 = Debug|x64
23 | 		Debug|x86 = Debug|x86
24 | 		Release|x64 = Release|x64
25 | 		Release|x86 = Release|x86
26 | 	EndGlobalSection
27 | 	GlobalSection(ProjectConfigurationPlatforms) = postSolution
28 | 		{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x64.ActiveCfg = Debug|x64
29 | 		{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x64.Build.0 = Debug|x64
30 | 		{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x86.ActiveCfg = Debug|Win32
31 | 		{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x86.Build.0 = Debug|Win32
32 | 		{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x64.ActiveCfg = Release|x64
33 | 		{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x64.Build.0 = Release|x64
34 | 		{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x86.ActiveCfg = Release|Win32
35 | 		{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x86.Build.0 = Release|Win32
36 | 		{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x64.ActiveCfg = Debug|x64
37 | 		{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x64.Build.0 = Debug|x64
38 | 		{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x86.ActiveCfg = Debug|Win32
39 | 		{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x86.Build.0 = Debug|Win32
40 | 		{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x64.ActiveCfg = Release|x64
41 | 		{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x64.Build.0 = Release|x64
42 | 		{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x86.ActiveCfg = Release|Win32
43 | 		{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x86.Build.0 = Release|Win32
44 | 		{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x64.ActiveCfg = Debug|x64
45 | 		{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x64.Build.0 = Debug|x64
46 | 		{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x86.ActiveCfg = Debug|Win32
47 | 		{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x86.Build.0 = Debug|Win32
48 | 		{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x64.ActiveCfg = Release|x64
49 | 		{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x64.Build.0 = Release|x64
50 | 		{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x86.ActiveCfg = Release|Win32
51 | 		{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x86.Build.0 = Release|Win32
52 | 		{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x64.ActiveCfg = Debug|x64
53 | 		{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x64.Build.0 = Debug|x64
54 | 		{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x86.ActiveCfg = Debug|Win32
55 | 		{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x86.Build.0 = Debug|Win32
56 | 		{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x64.ActiveCfg = Release|x64
57 | 		{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x64.Build.0 = Release|x64
58 | 		{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x86.ActiveCfg = Release|Win32
59 | 		{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x86.Build.0 = Release|Win32
60 | 		{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x64.ActiveCfg = Debug|x64
61 | 		{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x64.Build.0 = Debug|x64
62 | 		{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x86.ActiveCfg = Debug|Win32
63 | 		{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x86.Build.0 = Debug|Win32
64 | 		{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x64.ActiveCfg = Release|x64
65 | 		{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x64.Build.0 = Release|x64
66 | 		{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x86.ActiveCfg = Release|Win32
67 | 		{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x86.Build.0 = Release|Win32
68 | 		{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x64.ActiveCfg = Debug|x64
69 | 		{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x64.Build.0 = Debug|x64
70 | 		{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x86.ActiveCfg = Debug|Win32
71 | 		{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x86.Build.0 = Debug|Win32
72 | 		{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x64.ActiveCfg = Release|x64
73 | 		{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x64.Build.0 = Release|x64
74 | 		{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x86.ActiveCfg = Release|Win32
75 | 		{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x86.Build.0 = Release|Win32
76 | 		{5781233B-6022-4F34-B559-1473B9674B39}.Debug|x64.ActiveCfg = Debug|x64
77 | 		{5781233B-6022-4F34-B559-1473B9674B39}.Debug|x64.Build.0 = Debug|x64
78 | 		{5781233B-6022-4F34-B559-1473B9674B39}.Debug|x86.ActiveCfg = Debug|Win32
79 | 		{5781233B-6022-4F34-B559-1473B9674B39}.Debug|x86.Build.0 = Debug|Win32
80 | 		{5781233B-6022-4F34-B559-1473B9674B39}.Release|x64.ActiveCfg = Release|x64
81 | 		{5781233B-6022-4F34-B559-1473B9674B39}.Release|x64.Build.0 = Release|x64
82 | 		{5781233B-6022-4F34-B559-1473B9674B39}.Release|x86.ActiveCfg = Release|Win32
83 | 		{5781233B-6022-4F34-B559-1473B9674B39}.Release|x86.Build.0 = Release|Win32
84 | 	EndGlobalSection
85 | 	GlobalSection(SolutionProperties) = preSolution
86 | 		HideSolutionNode = FALSE
87 | 	EndGlobalSection
88 | EndGlobal
89 | 


--------------------------------------------------------------------------------
/Lesson1-CubeNumbers/IntroParallelProgramming.vcxproj:
--------------------------------------------------------------------------------
  1 | ﻿<?xml version="1.0" encoding="utf-8"?>
  2 | <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  3 |   <ItemGroup Label="ProjectConfigurations">
  4 |     <ProjectConfiguration Include="Debug|Win32">
  5 |       <Configuration>Debug</Configuration>
  6 |       <Platform>Win32</Platform>
  7 |     </ProjectConfiguration>
  8 |     <ProjectConfiguration Include="Debug|x64">
  9 |       <Configuration>Debug</Configuration>
 10 |       <Platform>x64</Platform>
 11 |     </ProjectConfiguration>
 12 |     <ProjectConfiguration Include="Release|Win32">
 13 |       <Configuration>Release</Configuration>
 14 |       <Platform>Win32</Platform>
 15 |     </ProjectConfiguration>
 16 |     <ProjectConfiguration Include="Release|x64">
 17 |       <Configuration>Release</Configuration>
 18 |       <Platform>x64</Platform>
 19 |     </ProjectConfiguration>
 20 |   </ItemGroup>
 21 |   <ItemGroup>
 22 |     <CudaCompile Include="main.cu" />
 23 |   </ItemGroup>
 24 |   <PropertyGroup Label="Globals">
 25 |     <ProjectGuid>{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}</ProjectGuid>
 26 |     <RootNamespace>IntroParallelProgramming</RootNamespace>
 27 |     <ProjectName>Lesson1-CubeNumbers</ProjectName>
 28 |   </PropertyGroup>
 29 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 30 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
 31 |     <ConfigurationType>Application</ConfigurationType>
 32 |     <UseDebugLibraries>true</UseDebugLibraries>
 33 |     <CharacterSet>MultiByte</CharacterSet>
 34 |     <PlatformToolset>v140</PlatformToolset>
 35 |   </PropertyGroup>
 36 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
 37 |     <ConfigurationType>Application</ConfigurationType>
 38 |     <UseDebugLibraries>true</UseDebugLibraries>
 39 |     <CharacterSet>MultiByte</CharacterSet>
 40 |     <PlatformToolset>v140</PlatformToolset>
 41 |   </PropertyGroup>
 42 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
 43 |     <ConfigurationType>Application</ConfigurationType>
 44 |     <UseDebugLibraries>false</UseDebugLibraries>
 45 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 46 |     <CharacterSet>MultiByte</CharacterSet>
 47 |     <PlatformToolset>v140</PlatformToolset>
 48 |   </PropertyGroup>
 49 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
 50 |     <ConfigurationType>Application</ConfigurationType>
 51 |     <UseDebugLibraries>false</UseDebugLibraries>
 52 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 53 |     <CharacterSet>MultiByte</CharacterSet>
 54 |     <PlatformToolset>v140</PlatformToolset>
 55 |   </PropertyGroup>
 56 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 57 |   <ImportGroup Label="ExtensionSettings">
 58 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.props" />
 59 |   </ImportGroup>
 60 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 61 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 62 |   </ImportGroup>
 63 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 64 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 65 |   </ImportGroup>
 66 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 67 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 68 |   </ImportGroup>
 69 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 70 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 71 |   </ImportGroup>
 72 |   <PropertyGroup Label="UserMacros" />
 73 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 74 |     <LinkIncremental>true</LinkIncremental>
 75 |   </PropertyGroup>
 76 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 77 |     <LinkIncremental>true</LinkIncremental>
 78 |   </PropertyGroup>
 79 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 80 |     <ClCompile>
 81 |       <WarningLevel>Level3</WarningLevel>
 82 |       <Optimization>Disabled</Optimization>
 83 |       <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 84 |     </ClCompile>
 85 |     <Link>
 86 |       <GenerateDebugInformation>true</GenerateDebugInformation>
 87 |       <SubSystem>Console</SubSystem>
 88 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
 89 |     </Link>
 90 |     <PostBuildEvent>
 91 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
 92 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
 93 |     </PostBuildEvent>
 94 |   </ItemDefinitionGroup>
 95 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 96 |     <ClCompile>
 97 |       <WarningLevel>Level3</WarningLevel>
 98 |       <Optimization>Disabled</Optimization>
 99 |       <PreprocessorDefinitions>WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
100 |     </ClCompile>
101 |     <Link>
102 |       <GenerateDebugInformation>true</GenerateDebugInformation>
103 |       <SubSystem>Console</SubSystem>
104 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
105 |     </Link>
106 |     <PostBuildEvent>
107 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
108 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
109 |     </PostBuildEvent>
110 |     <CudaCompile>
111 |       <TargetMachinePlatform>64</TargetMachinePlatform>
112 |     </CudaCompile>
113 |   </ItemDefinitionGroup>
114 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
115 |     <ClCompile>
116 |       <WarningLevel>Level3</WarningLevel>
117 |       <Optimization>MaxSpeed</Optimization>
118 |       <FunctionLevelLinking>true</FunctionLevelLinking>
119 |       <IntrinsicFunctions>true</IntrinsicFunctions>
120 |       <PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
121 |     </ClCompile>
122 |     <Link>
123 |       <GenerateDebugInformation>true</GenerateDebugInformation>
124 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
125 |       <OptimizeReferences>true</OptimizeReferences>
126 |       <SubSystem>Console</SubSystem>
127 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
128 |     </Link>
129 |     <PostBuildEvent>
130 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
131 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
132 |     </PostBuildEvent>
133 |   </ItemDefinitionGroup>
134 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
135 |     <ClCompile>
136 |       <WarningLevel>Level3</WarningLevel>
137 |       <Optimization>MaxSpeed</Optimization>
138 |       <FunctionLevelLinking>true</FunctionLevelLinking>
139 |       <IntrinsicFunctions>true</IntrinsicFunctions>
140 |       <PreprocessorDefinitions>WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
141 |     </ClCompile>
142 |     <Link>
143 |       <GenerateDebugInformation>true</GenerateDebugInformation>
144 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
145 |       <OptimizeReferences>true</OptimizeReferences>
146 |       <SubSystem>Console</SubSystem>
147 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
148 |     </Link>
149 |     <PostBuildEvent>
150 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
151 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
152 |     </PostBuildEvent>
153 |     <CudaCompile>
154 |       <TargetMachinePlatform>64</TargetMachinePlatform>
155 |     </CudaCompile>
156 |   </ItemDefinitionGroup>
157 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
158 |   <ImportGroup Label="ExtensionTargets">
159 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.targets" />
160 |   </ImportGroup>
161 | </Project>


--------------------------------------------------------------------------------
/Lesson1-CubeNumbers/main.cu:
--------------------------------------------------------------------------------
 1 | #include "cuda_runtime.h"
 2 | #include "device_launch_parameters.h"
 3 | 
 4 | #include <stdio.h>
 5 | 
 6 | __global__ void cube(float * d_out, float * d_in) {
 7 | 	
 8 | 	int idx = threadIdx.x;
 9 | 	float f = d_in[idx];
10 | 	d_out[idx] = f*f*f;
11 | }
12 | 
13 | int main(int argc, char ** argv) {
14 | 	const int ARRAY_SIZE = 96;
15 | 	const int ARRAY_BYTES = ARRAY_SIZE * sizeof(float);
16 | 
17 | 	// generate the input array on the host
18 | 	float h_in[ARRAY_SIZE];
19 | 	for (int i = 0; i < ARRAY_SIZE; i++) {
20 | 		h_in[i] = float(i);
21 | 	}
22 | 	float h_out[ARRAY_SIZE];
23 | 
24 | 	// declare GPU memory pointers
25 | 	float * d_in;
26 | 	float * d_out;
27 | 
28 | 	// allocate GPU memory
29 | 	cudaMalloc((void**)&d_in, ARRAY_BYTES);
30 | 	cudaMalloc((void**)&d_out, ARRAY_BYTES);
31 | 
32 | 	// transfer the array to the GPU
33 | 	cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice);
34 | 
35 | 	// launch the kernel
36 | 	cube << <1, ARRAY_SIZE >> >(d_out, d_in);
37 | 
38 | 	// copy back the result array to the CPU
39 | 	cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);
40 | 
41 | 	// print out the resulting array
42 | 	for (int i = 0; i < ARRAY_SIZE; i++) {
43 | 		printf("%f", h_out[i]);
44 | 		printf(((i % 4) != 3) ? "\t" : "\n");
45 | 	}
46 | 
47 | 	cudaFree(d_in);
48 | 	cudaFree(d_out);
49 | 
50 | 	return 0;
51 | }


--------------------------------------------------------------------------------
/Lesson4-Reduction/Lesson4-Reduction.vcxproj:
--------------------------------------------------------------------------------
  1 | ﻿<?xml version="1.0" encoding="utf-8"?>
  2 | <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  3 |   <ItemGroup Label="ProjectConfigurations">
  4 |     <ProjectConfiguration Include="Debug|Win32">
  5 |       <Configuration>Debug</Configuration>
  6 |       <Platform>Win32</Platform>
  7 |     </ProjectConfiguration>
  8 |     <ProjectConfiguration Include="Debug|x64">
  9 |       <Configuration>Debug</Configuration>
 10 |       <Platform>x64</Platform>
 11 |     </ProjectConfiguration>
 12 |     <ProjectConfiguration Include="Release|Win32">
 13 |       <Configuration>Release</Configuration>
 14 |       <Platform>Win32</Platform>
 15 |     </ProjectConfiguration>
 16 |     <ProjectConfiguration Include="Release|x64">
 17 |       <Configuration>Release</Configuration>
 18 |       <Platform>x64</Platform>
 19 |     </ProjectConfiguration>
 20 |   </ItemGroup>
 21 |   <PropertyGroup Label="Globals">
 22 |     <ProjectGuid>{0741C52D-C5E1-4C2F-A8E9-67C29CBF5B97}</ProjectGuid>
 23 |     <RootNamespace>Lesson4_Reduction</RootNamespace>
 24 |     <ProjectName>Lesson3-Reduction</ProjectName>
 25 |   </PropertyGroup>
 26 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 27 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
 28 |     <ConfigurationType>Application</ConfigurationType>
 29 |     <UseDebugLibraries>true</UseDebugLibraries>
 30 |     <CharacterSet>MultiByte</CharacterSet>
 31 |     <PlatformToolset>v140</PlatformToolset>
 32 |   </PropertyGroup>
 33 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
 34 |     <ConfigurationType>Application</ConfigurationType>
 35 |     <UseDebugLibraries>true</UseDebugLibraries>
 36 |     <CharacterSet>MultiByte</CharacterSet>
 37 |     <PlatformToolset>v140</PlatformToolset>
 38 |   </PropertyGroup>
 39 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
 40 |     <ConfigurationType>Application</ConfigurationType>
 41 |     <UseDebugLibraries>false</UseDebugLibraries>
 42 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 43 |     <CharacterSet>MultiByte</CharacterSet>
 44 |     <PlatformToolset>v140</PlatformToolset>
 45 |   </PropertyGroup>
 46 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
 47 |     <ConfigurationType>Application</ConfigurationType>
 48 |     <UseDebugLibraries>false</UseDebugLibraries>
 49 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 50 |     <CharacterSet>MultiByte</CharacterSet>
 51 |     <PlatformToolset>v140</PlatformToolset>
 52 |   </PropertyGroup>
 53 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 54 |   <ImportGroup Label="ExtensionSettings">
 55 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.props" />
 56 |   </ImportGroup>
 57 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 58 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 59 |   </ImportGroup>
 60 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 61 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 62 |   </ImportGroup>
 63 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 64 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 65 |   </ImportGroup>
 66 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 67 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 68 |   </ImportGroup>
 69 |   <PropertyGroup Label="UserMacros" />
 70 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 71 |     <LinkIncremental>true</LinkIncremental>
 72 |   </PropertyGroup>
 73 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 74 |     <LinkIncremental>true</LinkIncremental>
 75 |   </PropertyGroup>
 76 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 77 |     <ClCompile>
 78 |       <WarningLevel>Level3</WarningLevel>
 79 |       <Optimization>Disabled</Optimization>
 80 |       <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 81 |     </ClCompile>
 82 |     <Link>
 83 |       <GenerateDebugInformation>true</GenerateDebugInformation>
 84 |       <SubSystem>Console</SubSystem>
 85 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
 86 |     </Link>
 87 |     <PostBuildEvent>
 88 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
 89 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
 90 |     </PostBuildEvent>
 91 |   </ItemDefinitionGroup>
 92 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 93 |     <ClCompile>
 94 |       <WarningLevel>Level3</WarningLevel>
 95 |       <Optimization>Disabled</Optimization>
 96 |       <PreprocessorDefinitions>WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 97 |     </ClCompile>
 98 |     <Link>
 99 |       <GenerateDebugInformation>true</GenerateDebugInformation>
100 |       <SubSystem>Console</SubSystem>
101 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
102 |     </Link>
103 |     <PostBuildEvent>
104 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
105 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
106 |     </PostBuildEvent>
107 |     <CudaCompile>
108 |       <TargetMachinePlatform>64</TargetMachinePlatform>
109 |     </CudaCompile>
110 |   </ItemDefinitionGroup>
111 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
112 |     <ClCompile>
113 |       <WarningLevel>Level3</WarningLevel>
114 |       <Optimization>MaxSpeed</Optimization>
115 |       <FunctionLevelLinking>true</FunctionLevelLinking>
116 |       <IntrinsicFunctions>true</IntrinsicFunctions>
117 |       <PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
118 |     </ClCompile>
119 |     <Link>
120 |       <GenerateDebugInformation>true</GenerateDebugInformation>
121 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
122 |       <OptimizeReferences>true</OptimizeReferences>
123 |       <SubSystem>Console</SubSystem>
124 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
125 |     </Link>
126 |     <PostBuildEvent>
127 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
128 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
129 |     </PostBuildEvent>
130 |   </ItemDefinitionGroup>
131 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
132 |     <ClCompile>
133 |       <WarningLevel>Level3</WarningLevel>
134 |       <Optimization>MaxSpeed</Optimization>
135 |       <FunctionLevelLinking>true</FunctionLevelLinking>
136 |       <IntrinsicFunctions>true</IntrinsicFunctions>
137 |       <PreprocessorDefinitions>WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
138 |     </ClCompile>
139 |     <Link>
140 |       <GenerateDebugInformation>true</GenerateDebugInformation>
141 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
142 |       <OptimizeReferences>true</OptimizeReferences>
143 |       <SubSystem>Console</SubSystem>
144 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
145 |     </Link>
146 |     <PostBuildEvent>
147 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
148 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
149 |     </PostBuildEvent>
150 |     <CudaCompile>
151 |       <TargetMachinePlatform>64</TargetMachinePlatform>
152 |     </CudaCompile>
153 |   </ItemDefinitionGroup>
154 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
155 |   <ImportGroup Label="ExtensionTargets">
156 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.targets" />
157 |   </ImportGroup>
158 | </Project>


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | ############################################################################
 2 | # <summary> CMakeLists.txt for OpenCV and CUDA. </summary>
 3 | # <date>    2012-02-07          </date>
 4 | # <author>  Quan Tran Minh. edit by Johannes Kast, Michael Sarahan </author>
 5 | # <email>   quantm@unist.ac.kr  kast.jo@googlemail.com msarahan@gmail.com</email>
 6 | ############################################################################
 7 | 
 8 | # collect source files
 9 | 
10 | file( GLOB  hdr *.hpp *.h )
11 | file( GLOB  cu  *.cu)
12 | SET (HW1_files main.cpp reference_calc.cpp compare.cpp)
13 | 
14 | CUDA_ADD_EXECUTABLE(HW1 ${HW1_files} ${hdr} ${cu})


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/HW1.cpp:
--------------------------------------------------------------------------------
 1 | #include <opencv2/core/core.hpp>
 2 | #include <opencv2/highgui/highgui.hpp>
 3 | #include <opencv2/opencv.hpp>
 4 | #include "utils.h"
 5 | #include <cuda.h>
 6 | #include <cuda_runtime.h>
 7 | #include <string>
 8 | 
 9 | static cv::Mat imageRGBA;
10 | static cv::Mat imageGrey;
11 | 
12 | static uchar4        *d_rgbaImage__;
13 | static unsigned char *d_greyImage__;
14 | 
15 | static size_t numRows() { return imageRGBA.rows; }
16 | static size_t numCols() { return imageRGBA.cols; }
17 | 
18 | //return types are void since any internal error will be handled by quitting
19 | //no point in returning error codes...
20 | //returns a pointer to an RGBA version of the input image
21 | //and a pointer to the single channel grey-scale output
22 | //on both the host and device
23 | static void preProcess(uchar4 **inputImage, unsigned char **greyImage,
24 |                 uchar4 **d_rgbaImage, unsigned char **d_greyImage,
25 |                 const std::string &filename) {
26 |   //make sure the context initializes ok
27 |   checkCudaErrors(cudaFree(0));
28 | 
29 |   cv::Mat image;
30 |   image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
31 |   if (image.empty()) {
32 |     std::cerr << "Couldn't open file: " << filename << std::endl;
33 |     exit(1);
34 |   }
35 | 
36 |   cv::cvtColor(image, imageRGBA, CV_BGR2RGBA);
37 | 
38 |   //allocate memory for the output
39 |   imageGrey.create(image.rows, image.cols, CV_8UC1);
40 | 
41 |   //This shouldn't ever happen given the way the images are created
42 |   //at least based upon my limited understanding of OpenCV, but better to check
43 |   if (!imageRGBA.isContinuous() || !imageGrey.isContinuous()) {
44 |     std::cerr << "Images aren't continuous!! Exiting." << std::endl;
45 |     exit(1);
46 |   }
47 | 
48 |   *inputImage = (uchar4 *)imageRGBA.ptr<unsigned char>(0);
49 |   *greyImage  = imageGrey.ptr<unsigned char>(0);
50 | 
51 |   const size_t numPixels = numRows() * numCols();
52 |   //allocate memory on the device for both input and output
53 |   checkCudaErrors(cudaMalloc(d_rgbaImage, sizeof(uchar4) * numPixels));
54 |   checkCudaErrors(cudaMalloc(d_greyImage, sizeof(unsigned char) * numPixels));
55 |   checkCudaErrors(cudaMemset(*d_greyImage, 0, numPixels * sizeof(unsigned char))); //make sure no memory is left laying around
56 | 
57 |   //copy input array to the GPU
58 |   checkCudaErrors(cudaMemcpy(*d_rgbaImage, *inputImage, sizeof(uchar4) * numPixels, cudaMemcpyHostToDevice));
59 | 
60 |   d_rgbaImage__ = *d_rgbaImage;
61 |   d_greyImage__ = *d_greyImage;
62 | }
63 | 
64 | static void postProcess(const std::string& output_file, unsigned char* data_ptr) {
65 |   cv::Mat output(numRows(), numCols(), CV_8UC1, (void*)data_ptr);
66 | 
67 |   //output the image
68 |   cv::imwrite(output_file.c_str(), output);
69 | }
70 | 
71 | static void cleanup()
72 | {
73 |   //cleanup
74 |   cudaFree(d_rgbaImage__);
75 |   cudaFree(d_greyImage__);
76 | }
77 | 
78 | static void generateReferenceImage(std::string input_filename, std::string output_filename)
79 | {
80 |   cv::Mat reference = cv::imread(input_filename, CV_LOAD_IMAGE_GRAYSCALE);
81 | 
82 |   cv::imwrite(output_filename, reference);
83 | 
84 | }
85 | 


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/HW1_differenceImage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/HW1_differenceImage.png


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/HW1_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/HW1_reference.png


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/Makefile:
--------------------------------------------------------------------------------
 1 | NVCC=nvcc
 2 | 
 3 | ###################################
 4 | # These are the default install   #
 5 | # locations on most linux distros #
 6 | ###################################
 7 | 
 8 | OPENCV_LIBPATH=/usr/lib
 9 | OPENCV_INCLUDEPATH=/usr/include
10 | 
11 | ###################################################
12 | # On Macs the default install locations are below #
13 | ###################################################
14 | 
15 | #OPENCV_LIBPATH=/usr/local/lib
16 | #OPENCV_INCLUDEPATH=/usr/local/include
17 | 
18 | # or if using MacPorts
19 | 
20 | #OPENCV_LIBPATH=/opt/local/lib
21 | #OPENCV_INCLUDEPATH=/opt/local/include
22 | 
23 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui
24 | 
25 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
26 | 
27 | ######################################################
28 | # On Macs the default install locations are below    #
29 | # ####################################################
30 | 
31 | #CUDA_INCLUDEPATH=/usr/local/cuda/include
32 | #CUDA_LIBPATH=/usr/local/cuda/lib
33 | 
34 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
35 | 
36 | GCC_OPTS=-O3 -Wall -Wextra -m64
37 | 
38 | student: main.o student_func.o compare.o reference_calc.o Makefile
39 | 	$(NVCC) -o HW1 main.o student_func.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS)
40 | 
41 | main.o: main.cpp timer.h utils.h reference_calc.cpp compare.cpp HW1.cpp
42 | 	g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) -I $(OPENCV_INCLUDEPATH)
43 | 
44 | student_func.o: student_func.cu utils.h
45 | 	nvcc -c student_func.cu $(NVCC_OPTS)
46 | 
47 | compare.o: compare.cpp compare.h
48 | 	g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
49 | 
50 | reference_calc.o: reference_calc.cpp reference_calc.h
51 | 	g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
52 | 
53 | clean:
54 | 	rm -f *.o *.png hw
55 | 


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/RGB2Gray.vcxproj:
--------------------------------------------------------------------------------
  1 | ﻿<?xml version="1.0" encoding="utf-8"?>
  2 | <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  3 |   <ItemGroup Label="ProjectConfigurations">
  4 |     <ProjectConfiguration Include="Debug|Win32">
  5 |       <Configuration>Debug</Configuration>
  6 |       <Platform>Win32</Platform>
  7 |     </ProjectConfiguration>
  8 |     <ProjectConfiguration Include="Debug|x64">
  9 |       <Configuration>Debug</Configuration>
 10 |       <Platform>x64</Platform>
 11 |     </ProjectConfiguration>
 12 |     <ProjectConfiguration Include="Release|Win32">
 13 |       <Configuration>Release</Configuration>
 14 |       <Platform>Win32</Platform>
 15 |     </ProjectConfiguration>
 16 |     <ProjectConfiguration Include="Release|x64">
 17 |       <Configuration>Release</Configuration>
 18 |       <Platform>x64</Platform>
 19 |     </ProjectConfiguration>
 20 |   </ItemGroup>
 21 |   <ItemGroup>
 22 |     <ClCompile Include="compare.cpp" />
 23 |     <ClCompile Include="HW1.cpp" />
 24 |     <ClCompile Include="main.cpp" />
 25 |     <ClCompile Include="reference_calc.cpp" />
 26 |   </ItemGroup>
 27 |   <ItemGroup>
 28 |     <ClInclude Include="compare.h" />
 29 |     <ClInclude Include="reference_calc.h" />
 30 |     <ClInclude Include="timer.h" />
 31 |     <ClInclude Include="utils.h" />
 32 |   </ItemGroup>
 33 |   <ItemGroup>
 34 |     <CudaCompile Include="student_func.cu" />
 35 |   </ItemGroup>
 36 |   <PropertyGroup Label="Globals">
 37 |     <ProjectGuid>{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}</ProjectGuid>
 38 |     <RootNamespace>RGB2Gray</RootNamespace>
 39 |     <ProjectName>ProblemSet1-RGB2Gray</ProjectName>
 40 |   </PropertyGroup>
 41 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 42 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
 43 |     <ConfigurationType>Application</ConfigurationType>
 44 |     <UseDebugLibraries>true</UseDebugLibraries>
 45 |     <CharacterSet>MultiByte</CharacterSet>
 46 |     <PlatformToolset>v140</PlatformToolset>
 47 |   </PropertyGroup>
 48 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
 49 |     <ConfigurationType>Application</ConfigurationType>
 50 |     <UseDebugLibraries>true</UseDebugLibraries>
 51 |     <CharacterSet>MultiByte</CharacterSet>
 52 |     <PlatformToolset>v140</PlatformToolset>
 53 |   </PropertyGroup>
 54 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
 55 |     <ConfigurationType>Application</ConfigurationType>
 56 |     <UseDebugLibraries>false</UseDebugLibraries>
 57 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 58 |     <CharacterSet>MultiByte</CharacterSet>
 59 |     <PlatformToolset>v140</PlatformToolset>
 60 |   </PropertyGroup>
 61 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
 62 |     <ConfigurationType>Application</ConfigurationType>
 63 |     <UseDebugLibraries>false</UseDebugLibraries>
 64 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 65 |     <CharacterSet>MultiByte</CharacterSet>
 66 |     <PlatformToolset>v140</PlatformToolset>
 67 |   </PropertyGroup>
 68 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 69 |   <ImportGroup Label="ExtensionSettings">
 70 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.props" />
 71 |   </ImportGroup>
 72 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 73 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 74 |   </ImportGroup>
 75 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 76 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 77 |   </ImportGroup>
 78 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 79 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 80 |   </ImportGroup>
 81 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 82 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 83 |   </ImportGroup>
 84 |   <PropertyGroup Label="UserMacros" />
 85 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 86 |     <LinkIncremental>true</LinkIncremental>
 87 |   </PropertyGroup>
 88 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 89 |     <LinkIncremental>true</LinkIncremental>
 90 |   </PropertyGroup>
 91 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 92 |     <ClCompile>
 93 |       <WarningLevel>Level3</WarningLevel>
 94 |       <Optimization>Disabled</Optimization>
 95 |       <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 96 |     </ClCompile>
 97 |     <Link>
 98 |       <GenerateDebugInformation>true</GenerateDebugInformation>
 99 |       <SubSystem>Console</SubSystem>
100 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
101 |     </Link>
102 |     <PostBuildEvent>
103 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
104 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
105 |     </PostBuildEvent>
106 |   </ItemDefinitionGroup>
107 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
108 |     <ClCompile>
109 |       <WarningLevel>Level3</WarningLevel>
110 |       <Optimization>Disabled</Optimization>
111 |       <PreprocessorDefinitions>WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
112 |     </ClCompile>
113 |     <Link>
114 |       <GenerateDebugInformation>true</GenerateDebugInformation>
115 |       <SubSystem>Console</SubSystem>
116 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
117 |     </Link>
118 |     <PostBuildEvent>
119 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
120 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
121 |     </PostBuildEvent>
122 |     <CudaCompile>
123 |       <TargetMachinePlatform>64</TargetMachinePlatform>
124 |     </CudaCompile>
125 |   </ItemDefinitionGroup>
126 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
127 |     <ClCompile>
128 |       <WarningLevel>Level3</WarningLevel>
129 |       <Optimization>MaxSpeed</Optimization>
130 |       <FunctionLevelLinking>true</FunctionLevelLinking>
131 |       <IntrinsicFunctions>true</IntrinsicFunctions>
132 |       <PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
133 |     </ClCompile>
134 |     <Link>
135 |       <GenerateDebugInformation>true</GenerateDebugInformation>
136 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
137 |       <OptimizeReferences>true</OptimizeReferences>
138 |       <SubSystem>Console</SubSystem>
139 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
140 |     </Link>
141 |     <PostBuildEvent>
142 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
143 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
144 |     </PostBuildEvent>
145 |   </ItemDefinitionGroup>
146 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
147 |     <ClCompile>
148 |       <WarningLevel>Level3</WarningLevel>
149 |       <Optimization>MaxSpeed</Optimization>
150 |       <FunctionLevelLinking>true</FunctionLevelLinking>
151 |       <IntrinsicFunctions>true</IntrinsicFunctions>
152 |       <PreprocessorDefinitions>WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
153 |       <AdditionalIncludeDirectories>%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);C:\opencv\build\include;C:\opencv\build\include\opencv2</AdditionalIncludeDirectories>
154 |     </ClCompile>
155 |     <Link>
156 |       <GenerateDebugInformation>true</GenerateDebugInformation>
157 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
158 |       <OptimizeReferences>true</OptimizeReferences>
159 |       <SubSystem>Console</SubSystem>
160 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;opencv_world320d.lib;%(AdditionalDependencies)</AdditionalDependencies>
161 |       <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories);$(CudaToolkitLibDir);C:\opencv\build\x64\vc14\lib</AdditionalLibraryDirectories>
162 |     </Link>
163 |     <PostBuildEvent>
164 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
165 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
166 |     </PostBuildEvent>
167 |     <CudaCompile>
168 |       <TargetMachinePlatform>64</TargetMachinePlatform>
169 |     </CudaCompile>
170 |   </ItemDefinitionGroup>
171 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
172 |   <ImportGroup Label="ExtensionTargets">
173 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.targets" />
174 |   </ImportGroup>
175 | </Project>


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/cinque_terre.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre.gold


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/cinque_terre_gray.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre_gray.jpg


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/cinque_terre_small.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre_small.jpg


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/compare.cpp:
--------------------------------------------------------------------------------
 1 | #include <opencv2/core/core.hpp>
 2 | #include <opencv2/highgui/highgui.hpp>
 3 | #include <opencv2/opencv.hpp>
 4 | 
 5 | #include "utils.h"
 6 | 
 7 | void compareImages(std::string reference_filename, std::string test_filename, 
 8 |                    bool useEpsCheck, double perPixelError, double globalError)
 9 | {
10 |   cv::Mat reference = cv::imread(reference_filename, -1);
11 |   cv::Mat test = cv::imread(test_filename, -1);
12 | 
13 |   cv::Mat diff = abs(reference - test);
14 | 
15 |   cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
16 | 
17 |   double minVal, maxVal;
18 | 
19 |   cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
20 | 
21 |   //now perform transform so that we bump values to the full range
22 | 
23 |   diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
24 | 
25 |   diff = diffSingleChannel.reshape(reference.channels(), 0);
26 | 
27 |   cv::imwrite("HW1_differenceImage.png", diff);
28 |   //OK, now we can start comparing values...
29 |   unsigned char *referencePtr = reference.ptr<unsigned char>(0);
30 |   unsigned char *testPtr = test.ptr<unsigned char>(0);
31 | 
32 |   if (useEpsCheck) {
33 |     checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
34 |   }
35 |   else
36 |   {
37 |     checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
38 |   }
39 | 
40 |   std::cout << "PASS" << std::endl;
41 |   return;
42 | }
43 | 


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef COMPARE_H__
2 | #define COMPARE_H__
3 | 
4 | void compareImages(std::string reference_filename, std::string test_filename, 
5 |                    bool useEpsCheck, double perPixelError, double globalError);
6 | 
7 | #endif
8 | 


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/main.cpp:
--------------------------------------------------------------------------------
 1 | //Udacity HW1 Solution
 2 | 
 3 | #include <iostream>
 4 | #include "timer.h"
 5 | #include "utils.h"
 6 | #include <string>
 7 | #include <stdio.h>
 8 | #include "reference_calc.h"
 9 | #include "compare.h"
10 | 
11 | void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, 
12 |                             uchar4 * const d_rgbaImage,
13 |                             unsigned char* const d_greyImage, 
14 |                             size_t numRows, size_t numCols);
15 | 
16 | //include the definitions of the above functions for this homework
17 | #include "HW1.cpp"
18 | 
19 | int main(int argc, char **argv) {
20 |   uchar4        *h_rgbaImage, *d_rgbaImage;
21 |   unsigned char *h_greyImage, *d_greyImage;
22 | 
23 |   std::string input_file;
24 |   std::string output_file;
25 |   std::string reference_file;
26 |   double perPixelError = 0.0;
27 |   double globalError   = 0.0;
28 |   bool useEpsCheck = false;
29 |   switch (argc)
30 |   {
31 | 	case 2:
32 | 	  input_file = std::string(argv[1]);
33 | 	  output_file = "HW1_output.png";
34 | 	  reference_file = "HW1_reference.png";
35 | 	  break;
36 | 	case 3:
37 | 	  input_file  = std::string(argv[1]);
38 |       output_file = std::string(argv[2]);
39 | 	  reference_file = "HW1_reference.png";
40 | 	  break;
41 | 	case 4:
42 | 	  input_file  = std::string(argv[1]);
43 |       output_file = std::string(argv[2]);
44 | 	  reference_file = std::string(argv[3]);
45 | 	  break;
46 | 	case 6:
47 | 	  useEpsCheck=true;
48 | 	  input_file  = std::string(argv[1]);
49 | 	  output_file = std::string(argv[2]);
50 | 	  reference_file = std::string(argv[3]);
51 | 	  perPixelError = atof(argv[4]);
52 |       globalError   = atof(argv[5]);
53 | 	  break;
54 | 	default:
55 |       std::cerr << "Usage: ./HW1 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl;
56 |       exit(1);
57 |   }
58 |   //load the image and give us our input and output pointers
59 |   preProcess(&h_rgbaImage, &h_greyImage, &d_rgbaImage, &d_greyImage, input_file);
60 | 
61 |   GpuTimer timer;
62 |   timer.Start();
63 |   //call the students' code
64 |   your_rgba_to_greyscale(h_rgbaImage, d_rgbaImage, d_greyImage, numRows(), numCols());
65 |   timer.Stop();
66 |   cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
67 | 
68 |   int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
69 | 
70 |   if (err < 0) {
71 |     //Couldn't print! Probably the student closed stdout - bad news
72 |     std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
73 |     exit(1);
74 |   }
75 | 
76 |   size_t numPixels = numRows()*numCols();
77 |   checkCudaErrors(cudaMemcpy(h_greyImage, d_greyImage, sizeof(unsigned char) * numPixels, cudaMemcpyDeviceToHost));
78 | 
79 |   //check results and output the grey image
80 |   postProcess(output_file, h_greyImage);
81 | 
82 |   referenceCalculation(h_rgbaImage, h_greyImage, numRows(), numCols());
83 | 
84 |   postProcess(reference_file, h_greyImage);
85 | 
86 |   //generateReferenceImage(input_file, reference_file);
87 |   compareImages(reference_file, output_file, useEpsCheck, perPixelError, 
88 |                 globalError);
89 | 
90 |   cleanup();
91 | 
92 |   return 0;
93 | }
94 | 


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/reference_calc.cpp:
--------------------------------------------------------------------------------
 1 | // for uchar4 struct
 2 | #include <cuda_runtime.h>
 3 | 
 4 | void referenceCalculation(const uchar4* const rgbaImage,
 5 |                           unsigned char *const greyImage,
 6 |                           size_t numRows,
 7 |                           size_t numCols)
 8 | {
 9 |   for (size_t r = 0; r < numRows; ++r) {
10 |     for (size_t c = 0; c < numCols; ++c) {
11 |       uchar4 rgba = rgbaImage[r * numCols + c];
12 |       float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
13 |       greyImage[r * numCols + c] = channelSum;
14 |     }
15 |   }
16 | }
17 | 
18 | 


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 | 
4 | void referenceCalculation(const uchar4* const rgbaImage,
5 |                           unsigned char *const greyImage,
6 |                           size_t numRows,
7 |                           size_t numCols);
8 | 
9 | #endif


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/student_func.cu:
--------------------------------------------------------------------------------
 1 | // Homework 1
 2 | // Color to Greyscale Conversion
 3 | 
 4 | //A common way to represent color images is known as RGBA - the color
 5 | //is specified by how much Red, Grean and Blue is in it.
 6 | //The 'A' stands for Alpha and is used for transparency, it will be
 7 | //ignored in this homework.
 8 | 
 9 | //Each channel Red, Blue, Green and Alpha is represented by one byte.
10 | //Since we are using one byte for each color there are 256 different
11 | //possible values for each color.  This means we use 4 bytes per pixel.
12 | 
13 | //Greyscale images are represented by a single intensity value per pixel
14 | //which is one byte in size.
15 | 
16 | //To convert an image from color to grayscale one simple method is to
17 | //set the intensity to the average of the RGB channels.  But we will
18 | //use a more sophisticated method that takes into account how the eye 
19 | //perceives color and weights the channels unequally.
20 | 
21 | //The eye responds most strongly to green followed by red and then blue.
22 | //The NTSC (National Television System Committee) recommends the following
23 | //formula for color to greyscale conversion:
24 | 
25 | //I = .299f * R + .587f * G + .114f * B
26 | 
27 | //Notice the trailing f's on the numbers which indicate that they are 
28 | //single precision floating point constants and not double precision
29 | //constants.
30 | 
31 | //You should fill in the kernel as well as set the block and grid sizes
32 | //so that the entire image is processed.
33 | 
34 | #include "utils.h"
35 | #include "device_launch_parameters.h"
36 | 
37 | const size_t blockWidth = 32; //threads per block on one dimension (32*32 total)
38 | 
39 | __global__
40 | void rgba_to_greyscale(const uchar4* const rgbaImage,
41 |                        unsigned char* const greyImage,
42 |                        size_t numRows, size_t numCols)
43 | {
44 |   //Fill in the kernel to convert from color to greyscale
45 |   //the mapping from components of a uchar4 to RGBA is:
46 |   // .x -> R ; .y -> G ; .z -> B ; .w -> A
47 |   //
48 |   //The output (greyImage) at each pixel should be the result of
49 |   //applying the formula: output = .299f * R + .587f * G + .114f * B;
50 |   //Note: We will be ignoring the alpha channel for this conversion
51 | 
52 |   //First create a mapping from the 2D block and grid locations
53 |   //to an absolute 2D location in the image, then use that to
54 |   //calculate a 1D offset
55 | 	size_t idx_x = threadIdx.x + blockIdx.x*blockDim.x;
56 | 	size_t idx_y = threadIdx.y + blockIdx.y*blockDim.y;
57 | 
58 | 	if (idx_x >= numRows || idx_y >= numCols) return; //it can happen on the "remainder" block
59 | 	
60 | 	size_t idxvec = idx_x*numCols + idx_y;
61 | 	uchar4 rgb_value = rgbaImage[idxvec];
62 | 	greyImage[idxvec] = (unsigned char)(.299f*rgb_value.x + .587f*rgb_value.y + .114f*rgb_value.z);
63 | }
64 | 
65 | void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
66 |                             unsigned char* const d_greyImage, size_t numRows, size_t numCols)
67 | {
68 |   //You must fill in the correct sizes for the blockSize and gridSize
69 |   //currently only one block with one thread is being launched
70 |   
71 |   const dim3 blockSize(blockWidth,blockWidth, 1); 
72 |   unsigned int numBlocksX = (unsigned int)(numRows / blockWidth + 1);
73 |   unsigned int numBlocksY = (unsigned int)(numCols / blockWidth + 1);
74 |   const dim3 gridSize(numBlocksX,numBlocksY, 1);  
75 |   rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);
76 |   
77 |   cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
78 | 
79 | }
80 | 


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/timer.h:
--------------------------------------------------------------------------------
 1 | #ifndef GPU_TIMER_H__
 2 | #define GPU_TIMER_H__
 3 | 
 4 | #include <cuda_runtime.h>
 5 | 
 6 | struct GpuTimer
 7 | {
 8 |   cudaEvent_t start;
 9 |   cudaEvent_t stop;
10 | 
11 |   GpuTimer()
12 |   {
13 |     cudaEventCreate(&start);
14 |     cudaEventCreate(&stop);
15 |   }
16 | 
17 |   ~GpuTimer()
18 |   {
19 |     cudaEventDestroy(start);
20 |     cudaEventDestroy(stop);
21 |   }
22 | 
23 |   void Start()
24 |   {
25 |     cudaEventRecord(start, 0);
26 |   }
27 | 
28 |   void Stop()
29 |   {
30 |     cudaEventRecord(stop, 0);
31 |   }
32 | 
33 |   float Elapsed()
34 |   {
35 |     float elapsed;
36 |     cudaEventSynchronize(stop);
37 |     cudaEventElapsedTime(&elapsed, start, stop);
38 |     return elapsed;
39 |   }
40 | };
41 | 
42 | #endif  /* GPU_TIMER_H__ */
43 | 


--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/utils.h:
--------------------------------------------------------------------------------
 1 | #ifndef UTILS_H__
 2 | #define UTILS_H__
 3 | 
 4 | #include <iostream>
 5 | #include <iomanip>
 6 | #include <cuda.h>
 7 | #include <cuda_runtime.h>
 8 | #include <cuda_runtime_api.h>
 9 | #include <cassert>
10 | #include <cmath>
11 | #include <algorithm>
12 | 
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 | 
15 | template<typename T>
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 |   if (err != cudaSuccess) {
18 |     std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 |     std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 |     exit(1);
21 |   }
22 | }
23 | 
24 | template<typename T>
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 |   //check that the GPU result matches the CPU result
27 |   for (size_t i = 0; i < numElem; ++i) {
28 |     if (ref[i] != gpu[i]) {
29 |       std::cerr << "Difference at pos " << i << std::endl;
30 |       //the + is magic to convert char to int without messing
31 |       //with other types
32 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 |                  "\nGPU      : " << +gpu[i] << std::endl;
34 |       exit(1);
35 |     }
36 |   }
37 | }
38 | 
39 | template<typename T>
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 |   assert(eps1 >= 0 && eps2 >= 0);
42 |   unsigned long long totalDiff = 0;
43 |   unsigned numSmallDifferences = 0;
44 |   for (size_t i = 0; i < numElem; ++i) {
45 |     //subtract smaller from larger in case of unsigned types
46 |     T smaller = std::min(ref[i], gpu[i]);
47 |     T larger = std::max(ref[i], gpu[i]);
48 |     T diff = larger - smaller;
49 |     if (diff > 0 && diff <= eps1) {
50 |       numSmallDifferences++;
51 |     }
52 |     else if (diff > eps1) {
53 |       std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 |         "\nGPU      : " << +gpu[i] << std::endl;
56 |       exit(1);
57 |     }
58 |     totalDiff += diff * diff;
59 |   }
60 |   double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 |   if (percentSmallDifferences > eps2) {
62 |     std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 |     std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 |     exit(1);
65 |   }
66 | }
67 | 
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template<typename T>
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 | 
74 |   size_t numBadPixels = 0;
75 |   for (size_t i = 0; i < numElem; ++i) {
76 |     T smaller = std::min(ref[i], gpu[i]);
77 |     T larger = std::max(ref[i], gpu[i]);
78 |     T diff = larger - smaller;
79 |     if (diff > variance)
80 |       ++numBadPixels;
81 |   }
82 | 
83 |   if (numBadPixels > tolerance) {
84 |     std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 |     exit(1);
86 |   }
87 | }
88 | 
89 | #endif
90 | 


--------------------------------------------------------------------------------
/ProblemSet2-Blur/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | ############################################################################
 2 | # <summary> CMakeLists.txt for OpenCV and CUDA. </summary>
 3 | # <date>    2012-02-07          </date>
 4 | # <author>  Quan Tran Minh. edit by Johannes Kast, Michael Sarahan </author>
 5 | # <email>   quantm@unist.ac.kr  kast.jo@googlemail.com msarahan@gmail.com</email>
 6 | ############################################################################
 7 | 
 8 | # collect source files
 9 | 
10 | file( GLOB  hdr *.hpp *.h )
11 | file( GLOB  cu  *.cu)
12 | SET (HW2_files main.cpp reference_calc.cpp compare.cpp)
13 |     
14 | CUDA_ADD_EXECUTABLE(HW2 ${HW2_files} ${hdr} ${cu})
15 | 


--------------------------------------------------------------------------------
/ProblemSet2-Blur/HW2.cpp:
--------------------------------------------------------------------------------
  1 | #include <opencv2/core/core.hpp>
  2 | #include <opencv2/highgui/highgui.hpp>
  3 | #include <opencv2/opencv.hpp>
  4 | #include "utils.h"
  5 | #include <cuda.h>
  6 | #include <cuda_runtime.h>
  7 | #include <string>
  8 | 
  9 | static cv::Mat imageInputRGBA;
 10 | static cv::Mat imageOutputRGBA;
 11 | 
 12 | static uchar4 *d_inputImageRGBA__;
 13 | static uchar4 *d_outputImageRGBA__;
 14 | 
 15 | static float *h_filter__;
 16 | 
 17 | static size_t numRows() { return imageInputRGBA.rows; }
 18 | static size_t numCols() { return imageInputRGBA.cols; }
 19 | 
 20 | //return types are void since any internal error will be handled by quitting
 21 | //no point in returning error codes...
 22 | //returns a pointer to an RGBA version of the input image
 23 | //and a pointer to the single channel grey-scale output
 24 | //on both the host and device
 25 | static void preProcess(uchar4 **h_inputImageRGBA, uchar4 **h_outputImageRGBA,
 26 |                 uchar4 **d_inputImageRGBA, uchar4 **d_outputImageRGBA,
 27 |                 unsigned char **d_redBlurred,
 28 |                 unsigned char **d_greenBlurred,
 29 |                 unsigned char **d_blueBlurred,
 30 |                 float **h_filter, int *filterWidth,
 31 |                 const std::string &filename) {
 32 | 
 33 |   //make sure the context initializes ok
 34 |   checkCudaErrors(cudaFree(0));
 35 | 
 36 |   cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
 37 |   if (image.empty()) {
 38 |     std::cerr << "Couldn't open file: " << filename << std::endl;
 39 |     exit(1);
 40 |   }
 41 | 
 42 |   cv::cvtColor(image, imageInputRGBA, CV_BGR2RGBA);
 43 | 
 44 |   //allocate memory for the output
 45 |   imageOutputRGBA.create(image.rows, image.cols, CV_8UC4);
 46 | 
 47 |   //This shouldn't ever happen given the way the images are created
 48 |   //at least based upon my limited understanding of OpenCV, but better to check
 49 |   if (!imageInputRGBA.isContinuous() || !imageOutputRGBA.isContinuous()) {
 50 |     std::cerr << "Images aren't continuous!! Exiting." << std::endl;
 51 |     exit(1);
 52 |   }
 53 | 
 54 |   *h_inputImageRGBA  = (uchar4 *)imageInputRGBA.ptr<unsigned char>(0);
 55 |   *h_outputImageRGBA = (uchar4 *)imageOutputRGBA.ptr<unsigned char>(0);
 56 | 
 57 |   const size_t numPixels = numRows() * numCols();
 58 |   //allocate memory on the device for both input and output
 59 |   checkCudaErrors(cudaMalloc(d_inputImageRGBA, sizeof(uchar4) * numPixels));
 60 |   checkCudaErrors(cudaMalloc(d_outputImageRGBA, sizeof(uchar4) * numPixels));
 61 |   checkCudaErrors(cudaMemset(*d_outputImageRGBA, 0, numPixels * sizeof(uchar4))); //make sure no memory is left laying around
 62 | 
 63 |   //copy input array to the GPU
 64 |   checkCudaErrors(cudaMemcpy(*d_inputImageRGBA, *h_inputImageRGBA, sizeof(uchar4) * numPixels, cudaMemcpyHostToDevice));
 65 | 
 66 |   d_inputImageRGBA__  = *d_inputImageRGBA;
 67 |   d_outputImageRGBA__ = *d_outputImageRGBA;
 68 | 
 69 |   //now create the filter that they will use
 70 |   const int blurKernelWidth = 9;
 71 |   const float blurKernelSigma = 2.;
 72 | 
 73 |   *filterWidth = blurKernelWidth;
 74 | 
 75 |   //create and fill the filter we will convolve with
 76 |   *h_filter = new float[blurKernelWidth * blurKernelWidth];
 77 |   h_filter__ = *h_filter;
 78 | 
 79 |   float filterSum = 0.f; //for normalization
 80 | 
 81 |   for (int r = -blurKernelWidth/2; r <= blurKernelWidth/2; ++r) {
 82 |     for (int c = -blurKernelWidth/2; c <= blurKernelWidth/2; ++c) {
 83 |       float filterValue = expf( -(float)(c * c + r * r) / (2.f * blurKernelSigma * blurKernelSigma));
 84 |       (*h_filter)[(r + blurKernelWidth/2) * blurKernelWidth + c + blurKernelWidth/2] = filterValue;
 85 |       filterSum += filterValue;
 86 |     }
 87 |   }
 88 | 
 89 |   float normalizationFactor = 1.f / filterSum;
 90 | 
 91 |   for (int r = -blurKernelWidth/2; r <= blurKernelWidth/2; ++r) {
 92 |     for (int c = -blurKernelWidth/2; c <= blurKernelWidth/2; ++c) {
 93 |       (*h_filter)[(r + blurKernelWidth/2) * blurKernelWidth + c + blurKernelWidth/2] *= normalizationFactor;
 94 |     }
 95 |   }
 96 | 
 97 |   //blurred
 98 |   checkCudaErrors(cudaMalloc(d_redBlurred,    sizeof(unsigned char) * numPixels));
 99 |   checkCudaErrors(cudaMalloc(d_greenBlurred,  sizeof(unsigned char) * numPixels));
100 |   checkCudaErrors(cudaMalloc(d_blueBlurred,   sizeof(unsigned char) * numPixels));
101 |   checkCudaErrors(cudaMemset(*d_redBlurred,   0, sizeof(unsigned char) * numPixels));
102 |   checkCudaErrors(cudaMemset(*d_greenBlurred, 0, sizeof(unsigned char) * numPixels));
103 |   checkCudaErrors(cudaMemset(*d_blueBlurred,  0, sizeof(unsigned char) * numPixels));
104 | }
105 | 
106 | static void postProcess(const std::string& output_file, uchar4* data_ptr) {
107 |   cv::Mat output(numRows(), numCols(), CV_8UC4, (void*)data_ptr);
108 | 
109 |   cv::Mat imageOutputBGR;
110 |   cv::cvtColor(output, imageOutputBGR, CV_RGBA2BGR);
111 |   //output the image
112 |   cv::imwrite(output_file.c_str(), imageOutputBGR);
113 | }
114 | 
115 | static void cleanUp(void)
116 | {
117 |   cudaFree(d_inputImageRGBA__);
118 |   cudaFree(d_outputImageRGBA__);
119 |   delete[] h_filter__;
120 | }
121 | 
122 | 
123 | // An unused bit of code showing how to accomplish this assignment using OpenCV.  It is much faster 
124 | //    than the naive implementation in reference_calc.cpp.
125 | static void generateReferenceImage(std::string input_file, std::string reference_file, int kernel_size)
126 | {
127 | 	cv::Mat input = cv::imread(input_file);
128 | 	// Create an identical image for the output as a placeholder
129 | 	cv::Mat reference = cv::imread(input_file);
130 | 	cv::GaussianBlur(input, reference, cv::Size2i(kernel_size, kernel_size),0);
131 | 	cv::imwrite(reference_file, reference);
132 | }
133 | 


--------------------------------------------------------------------------------
/ProblemSet2-Blur/HW2_differenceImage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/HW2_differenceImage.png


--------------------------------------------------------------------------------
/ProblemSet2-Blur/HW2_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/HW2_reference.png


--------------------------------------------------------------------------------
/ProblemSet2-Blur/Makefile:
--------------------------------------------------------------------------------
 1 | NVCC=nvcc
 2 | 
 3 | ###################################
 4 | # These are the default install   #
 5 | # locations on most linux distros #
 6 | ###################################
 7 | 
 8 | OPENCV_LIBPATH=/usr/lib
 9 | OPENCV_INCLUDEPATH=/usr/include
10 | 
11 | ###################################################
12 | # On Macs the default install locations are below #
13 | ###################################################
14 | 
15 | #OPENCV_LIBPATH=/usr/local/lib
16 | #OPENCV_INCLUDEPATH=/usr/local/include
17 | 
18 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui
19 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
20 | 
21 | ######################################################
22 | # On Macs the default install locations are below    #
23 | # ####################################################
24 | 
25 | #CUDA_INCLUDEPATH=/usr/local/cuda/include
26 | #CUDA_LIBPATH=/usr/local/cuda/lib
27 | 
28 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
29 | 
30 | GCC_OPTS=-O3 -Wall -Wextra -m64
31 | 
32 | student: main.o student_func.o compare.o reference_calc.o Makefile
33 | 	$(NVCC) -o HW2 main.o student_func.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS)
34 | 
35 | main.o: main.cpp timer.h utils.h HW2.cpp
36 | 	g++ -c main.cpp $(GCC_OPTS) -I $(OPENCV_INCLUDEPATH) -I $(CUDA_INCLUDEPATH)
37 | 
38 | student_func.o: student_func.cu reference_calc.cpp utils.h
39 | 	nvcc -c student_func.cu $(NVCC_OPTS)
40 | 
41 | compare.o: compare.cpp compare.h
42 | 	g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
43 | 
44 | reference_calc.o: reference_calc.cpp reference_calc.h
45 | 	g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
46 | 
47 | clean:
48 | 	rm -f *.o *.png hw
49 | 


--------------------------------------------------------------------------------
/ProblemSet2-Blur/cinque_terre.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre.gold


--------------------------------------------------------------------------------
/ProblemSet2-Blur/cinque_terre_blur.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre_blur.jpg


--------------------------------------------------------------------------------
/ProblemSet2-Blur/cinque_terre_small.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre_small.jpg


--------------------------------------------------------------------------------
/ProblemSet2-Blur/compare.cpp:
--------------------------------------------------------------------------------
 1 | #include <opencv2/core/core.hpp>
 2 | #include <opencv2/highgui/highgui.hpp>
 3 | #include <opencv2/opencv.hpp>
 4 | 
 5 | #include "utils.h"
 6 | 
 7 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
 8 | 				   double perPixelError, double globalError)
 9 | {
10 |   cv::Mat reference = cv::imread(reference_filename, -1);
11 |   cv::Mat test = cv::imread(test_filename, -1);
12 | 
13 |   cv::Mat diff = abs(reference - test);
14 | 
15 |   cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
16 | 
17 |   double minVal, maxVal;
18 | 
19 |   cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
20 | 
21 |   //now perform transform so that we bump values to the full range
22 | 
23 |   diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
24 | 
25 |   diff = diffSingleChannel.reshape(reference.channels(), 0);
26 | 
27 |   cv::imwrite("HW2_differenceImage.png", diff);
28 |   //OK, now we can start comparing values...
29 |   unsigned char *referencePtr = reference.ptr<unsigned char>(0);
30 |   unsigned char *testPtr = test.ptr<unsigned char>(0);
31 | 
32 |   if (useEpsCheck) {
33 |     checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
34 |   }
35 |   else
36 |   {
37 |     checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
38 |   }
39 | 
40 |   std::cout << "PASS" << std::endl;
41 |   return;
42 | }


--------------------------------------------------------------------------------
/ProblemSet2-Blur/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef COMPARE_H__
2 | #define COMPARE_H__
3 | 
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | 				   double perPixelError, double globalError);
6 | 
7 | #endif


--------------------------------------------------------------------------------
/ProblemSet2-Blur/main.cpp:
--------------------------------------------------------------------------------
  1 | //Udacity HW2 Driver
  2 | 
  3 | #include <iostream>
  4 | #include "timer.h"
  5 | #include "utils.h"
  6 | #include <string>
  7 | #include <stdio.h>
  8 | 
  9 | #include "reference_calc.h"
 10 | #include "compare.h"
 11 | 
 12 | //include the definitions of the above functions for this homework
 13 | #include "HW2.cpp"
 14 | 
 15 | 
 16 | /*******  DEFINED IN student_func.cu *********/
 17 | 
 18 | void your_gaussian_blur(const uchar4 * const h_inputImageRGBA, uchar4 * const d_inputImageRGBA,
 19 |                         uchar4* const d_outputImageRGBA,
 20 |                         const size_t numRows, const size_t numCols,
 21 |                         unsigned char *d_redBlurred,
 22 |                         unsigned char *d_greenBlurred,
 23 |                         unsigned char *d_blueBlurred,
 24 |                         const int filterWidth);
 25 | 
 26 | void allocateMemoryAndCopyToGPU(const size_t numRowsImage, const size_t numColsImage,
 27 |                                 const float* const h_filter, const size_t filterWidth);
 28 | 
 29 | 
 30 | /*******  Begin main *********/
 31 | 
 32 | int main(int argc, char **argv) {
 33 |   uchar4 *h_inputImageRGBA,  *d_inputImageRGBA;
 34 |   uchar4 *h_outputImageRGBA, *d_outputImageRGBA;
 35 |   unsigned char *d_redBlurred, *d_greenBlurred, *d_blueBlurred;
 36 | 
 37 |   float *h_filter;
 38 |   int    filterWidth;
 39 | 
 40 |   std::string input_file;
 41 |   std::string output_file;
 42 |   std::string reference_file;
 43 |   double perPixelError = 0.0;
 44 |   double globalError   = 0.0;
 45 |   bool useEpsCheck = false;
 46 |   switch (argc)
 47 |   {
 48 | 	case 2:
 49 | 	  input_file = std::string(argv[1]);
 50 | 	  output_file = "HW2_output.png";
 51 | 	  reference_file = "HW2_reference.png";
 52 | 	  break;
 53 | 	case 3:
 54 | 	  input_file  = std::string(argv[1]);
 55 |       output_file = std::string(argv[2]);
 56 | 	  reference_file = "HW2_reference.png";
 57 | 	  break;
 58 | 	case 4:
 59 | 	  input_file  = std::string(argv[1]);
 60 |       output_file = std::string(argv[2]);
 61 | 	  reference_file = std::string(argv[3]);
 62 | 	  break;
 63 | 	case 6:
 64 | 	  useEpsCheck=true;
 65 | 	  input_file  = std::string(argv[1]);
 66 | 	  output_file = std::string(argv[2]);
 67 | 	  reference_file = std::string(argv[3]);
 68 | 	  perPixelError = atof(argv[4]);
 69 |       globalError   = atof(argv[5]);
 70 | 	  break;
 71 | 	default:
 72 |       std::cerr << "Usage: ./HW2 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl;
 73 |       exit(1);
 74 |   }
 75 |   //load the image and give us our input and output pointers
 76 |   preProcess(&h_inputImageRGBA, &h_outputImageRGBA, &d_inputImageRGBA, &d_outputImageRGBA,
 77 |              &d_redBlurred, &d_greenBlurred, &d_blueBlurred,
 78 |              &h_filter, &filterWidth, input_file);
 79 | 
 80 |   allocateMemoryAndCopyToGPU(numRows(), numCols(), h_filter, filterWidth);
 81 |   GpuTimer timer;
 82 |   timer.Start();
 83 |   //call the students' code
 84 |   your_gaussian_blur(h_inputImageRGBA, d_inputImageRGBA, d_outputImageRGBA, numRows(), numCols(),
 85 |                      d_redBlurred, d_greenBlurred, d_blueBlurred, filterWidth);
 86 |   timer.Stop();
 87 |   cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
 88 |   int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
 89 | 
 90 |   if (err < 0) {
 91 |     //Couldn't print! Probably the student closed stdout - bad news
 92 |     std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
 93 |     exit(1);
 94 |   }
 95 | 
 96 |   //check results and output the blurred image
 97 | 
 98 |   size_t numPixels = numRows()*numCols();
 99 |   //copy the output back to the host
100 |   checkCudaErrors(cudaMemcpy(h_outputImageRGBA, d_outputImageRGBA__, sizeof(uchar4) * numPixels, cudaMemcpyDeviceToHost));
101 | 
102 |   postProcess(output_file, h_outputImageRGBA);
103 | 
104 |   referenceCalculation(h_inputImageRGBA, h_outputImageRGBA,
105 |                        numRows(), numCols(),
106 |                        h_filter, filterWidth);
107 | 
108 |   postProcess(reference_file, h_outputImageRGBA);
109 | 
110 |     //  Cheater easy way with OpenCV
111 |     //generateReferenceImage(input_file, reference_file, filterWidth);
112 | 
113 |   compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError);
114 | 
115 |   checkCudaErrors(cudaFree(d_redBlurred));
116 |   checkCudaErrors(cudaFree(d_greenBlurred));
117 |   checkCudaErrors(cudaFree(d_blueBlurred));
118 | 
119 |   cleanUp();
120 | 
121 |   return 0;
122 | }
123 | 


--------------------------------------------------------------------------------
/ProblemSet2-Blur/reference_calc.cpp:
--------------------------------------------------------------------------------
 1 | #include <algorithm>
 2 | #include <cassert>
 3 | // for uchar4 struct
 4 | #include <cuda_runtime.h>
 5 | 
 6 | void channelConvolution(const unsigned char* const channel,
 7 |                         unsigned char* const channelBlurred,
 8 |                         const size_t numRows, const size_t numCols,
 9 |                         const float *filter, const int filterWidth)
10 | {
11 |   //Dealing with an even width filter is trickier
12 |   assert(filterWidth % 2 == 1);
13 | 
14 |   //For every pixel in the image
15 |   for (int r = 0; r < (int)numRows; ++r) {
16 |     for (int c = 0; c < (int)numCols; ++c) {
17 |       float result = 0.f;
18 |       //For every value in the filter around the pixel (c, r)
19 |       for (int filter_r = -filterWidth/2; filter_r <= filterWidth/2; ++filter_r) {
20 |         for (int filter_c = -filterWidth/2; filter_c <= filterWidth/2; ++filter_c) {
21 |           //Find the global image position for this filter position
22 |           //clamp to boundary of the image
23 | 		  int image_r = std::min(std::max(r + filter_r, 0), static_cast<int>(numRows - 1));
24 |           int image_c = std::min(std::max(c + filter_c, 0), static_cast<int>(numCols - 1));
25 | 
26 |           float image_value = static_cast<float>(channel[image_r * numCols + image_c]);
27 |           float filter_value = filter[(filter_r + filterWidth/2) * filterWidth + filter_c + filterWidth/2];
28 | 
29 |           result += image_value * filter_value;
30 |         }
31 |       }
32 | 
33 |       channelBlurred[r * numCols + c] = result;
34 |     }
35 |   }
36 | }
37 | 
38 | void referenceCalculation(const uchar4* const rgbaImage, uchar4 *const outputImage,
39 |                           size_t numRows, size_t numCols,
40 |                           const float* const filter, const int filterWidth)
41 | {
42 |   unsigned char *red   = new unsigned char[numRows * numCols];
43 |   unsigned char *blue  = new unsigned char[numRows * numCols];
44 |   unsigned char *green = new unsigned char[numRows * numCols];
45 | 
46 |   unsigned char *redBlurred   = new unsigned char[numRows * numCols];
47 |   unsigned char *blueBlurred  = new unsigned char[numRows * numCols];
48 |   unsigned char *greenBlurred = new unsigned char[numRows * numCols];
49 | 
50 |   //First we separate the incoming RGBA image into three separate channels
51 |   //for Red, Green and Blue
52 |   for (size_t i = 0; i < numRows * numCols; ++i) {
53 |     uchar4 rgba = rgbaImage[i];
54 |     red[i]   = rgba.x;
55 |     green[i] = rgba.y;
56 |     blue[i]  = rgba.z;
57 |   }
58 | 
59 |   //Now we can do the convolution for each of the color channels
60 |   channelConvolution(red, redBlurred, numRows, numCols, filter, filterWidth);
61 |   channelConvolution(green, greenBlurred, numRows, numCols, filter, filterWidth);
62 |   channelConvolution(blue, blueBlurred, numRows, numCols, filter, filterWidth);
63 | 
64 |   //now recombine into the output image - Alpha is 255 for no transparency
65 |   for (size_t i = 0; i < numRows * numCols; ++i) {
66 |     uchar4 rgba = make_uchar4(redBlurred[i], greenBlurred[i], blueBlurred[i], 255);
67 |     outputImage[i] = rgba;
68 |   }
69 | 
70 |   delete[] red;
71 |   delete[] green;
72 |   delete[] blue;
73 | 
74 |   delete[] redBlurred;
75 |   delete[] greenBlurred;
76 |   delete[] blueBlurred;
77 | }
78 | 


--------------------------------------------------------------------------------
/ProblemSet2-Blur/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 | 
4 | void referenceCalculation(const uchar4* const rgbaImage, uchar4 *const outputImage,
5 |                           size_t numRows, size_t numCols,
6 |                           const float* const filter, const int filterWidth);
7 | 
8 | #endif


--------------------------------------------------------------------------------
/ProblemSet2-Blur/timer.h:
--------------------------------------------------------------------------------
 1 | #ifndef GPU_TIMER_H__
 2 | #define GPU_TIMER_H__
 3 | 
 4 | #include <cuda_runtime.h>
 5 | 
 6 | struct GpuTimer
 7 | {
 8 |   cudaEvent_t start;
 9 |   cudaEvent_t stop;
10 | 
11 |   GpuTimer()
12 |   {
13 |     cudaEventCreate(&start);
14 |     cudaEventCreate(&stop);
15 |   }
16 | 
17 |   ~GpuTimer()
18 |   {
19 |     cudaEventDestroy(start);
20 |     cudaEventDestroy(stop);
21 |   }
22 | 
23 |   void Start()
24 |   {
25 |     cudaEventRecord(start, 0);
26 |   }
27 | 
28 |   void Stop()
29 |   {
30 |     cudaEventRecord(stop, 0);
31 |   }
32 | 
33 |   float Elapsed()
34 |   {
35 |     float elapsed;
36 |     cudaEventSynchronize(stop);
37 |     cudaEventElapsedTime(&elapsed, start, stop);
38 |     return elapsed;
39 |   }
40 | };
41 | 
42 | #endif  /* GPU_TIMER_H__ */
43 | 


--------------------------------------------------------------------------------
/ProblemSet2-Blur/utils.h:
--------------------------------------------------------------------------------
 1 | #ifndef UTILS_H__
 2 | #define UTILS_H__
 3 | 
 4 | #include <iostream>
 5 | #include <iomanip>
 6 | #include <cuda.h>
 7 | #include <cuda_runtime.h>
 8 | #include <cuda_runtime_api.h>
 9 | #include <cassert>
10 | #include <algorithm>
11 | 
12 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
13 | 
14 | template<typename T>
15 | void check(T err, const char* const func, const char* const file, const int line) {
16 |   if (err != cudaSuccess) {
17 |     std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
18 |     std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
19 |     exit(1);
20 |   }
21 | }
22 | 
23 | template<typename T>
24 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
25 |   //check that the GPU result matches the CPU result
26 |   for (size_t i = 0; i < numElem; ++i) {
27 |     if (ref[i] != gpu[i]) {
28 |       std::cerr << "Difference at pos " << i << std::endl;
29 |       //the + is magic to convert char to int without messing
30 |       //with other types
31 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
32 |                  "\nGPU      : " << +gpu[i] << std::endl;
33 |       exit(1);
34 |     }
35 |   }
36 | }
37 | 
38 | template<typename T>
39 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
40 |   assert(eps1 >= 0 && eps2 >= 0);
41 |   unsigned long long totalDiff = 0;
42 |   unsigned numSmallDifferences = 0;
43 |   for (size_t i = 0; i < numElem; ++i) {
44 |     //subtract smaller from larger in case of unsigned types
45 |     T smaller = std::min(ref[i], gpu[i]);
46 |     T larger = std::max(ref[i], gpu[i]);
47 |     T diff = larger - smaller;
48 |     if (diff > 0 && diff <= eps1) {
49 |       numSmallDifferences++;
50 |     }
51 |     else if (diff > eps1) {
52 |       std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
53 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
54 |         "\nGPU      : " << +gpu[i] << std::endl;
55 |       exit(1);
56 |     }
57 |     totalDiff += diff * diff;
58 |   }
59 |   double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
60 |   if (percentSmallDifferences > eps2) {
61 |     std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
62 |     std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
63 |     exit(1);
64 |   }
65 | }
66 | 
67 | //Uses the autodesk method of image comparison
68 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
69 | template<typename T>
70 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
71 | {
72 | 
73 |   size_t numBadPixels = 0;
74 |   for (size_t i = 0; i < numElem; ++i) {
75 |     T smaller = std::min(ref[i], gpu[i]);
76 |     T larger = std::max(ref[i], gpu[i]);
77 |     T diff = larger - smaller;
78 |     if (diff > variance)
79 |       ++numBadPixels;
80 |   }
81 | 
82 |   if (numBadPixels > tolerance) {
83 |     std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
84 |     exit(1);
85 |   }
86 | }
87 | 
88 | #endif
89 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | ############################################################################
 2 | # <summary> CMakeLists.txt for OpenCV and CUDA. </summary>
 3 | # <date>    2012-02-07          </date>
 4 | # <author>  Quan Tran Minh. edit by Johannes Kast, Michael Sarahan </author>
 5 | # <email>   quantm@unist.ac.kr  kast.jo@googlemail.com msarahan@gmail.com</email>
 6 | ############################################################################
 7 | # minimum required cmake version
 8 | cmake_minimum_required(VERSION 2.8)
 9 | find_package(CUDA QUIET REQUIRED)
10 | 
11 | SET (compare_files compare.cpp)
12 | 
13 | file( GLOB  hdr *.hpp *.h )
14 | file( GLOB  cu  *.cu)
15 | SET (HW3_files main.cpp loadSaveImage.cpp reference_calc.cpp compare.cpp)
16 |     
17 | CUDA_ADD_EXECUTABLE(HW3 ${HW3_files} ${hdr} ${cu})
18 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HDR-image.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HDR-image.jpg


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HDR-image_mapped.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HDR-image_mapped.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HW3_differenceImage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_differenceImage.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HW3_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_reference.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HW3_reference_old.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_reference_old.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/Makefile:
--------------------------------------------------------------------------------
 1 | NVCC=nvcc
 2 | 
 3 | ###################################
 4 | # These are the default install   #
 5 | # locations on most linux distros #
 6 | ###################################
 7 | 
 8 | OPENCV_LIBPATH=/usr/lib
 9 | OPENCV_INCLUDEPATH=/usr/include
10 | 
11 | ###################################################
12 | # On Macs the default install locations are below #
13 | ###################################################
14 | 
15 | #OPENCV_LIBPATH=/usr/local/lib
16 | #OPENCV_INCLUDEPATH=/usr/local/include
17 | 
18 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui
19 | 
20 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
21 | 
22 | ######################################################
23 | # On Macs the default install locations are below    #
24 | # ####################################################
25 | 
26 | #CUDA_INCLUDEPATH=/usr/local/cuda/include
27 | #CUDA_LIBPATH=/usr/local/cuda/lib
28 | 
29 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
30 | 
31 | GCC_OPTS=-O3 -Wall -Wextra -m64
32 | 
33 | student: main.o student_func.o HW3.o loadSaveImage.o compare.o reference_calc.o Makefile
34 | 	$(NVCC) -o HW3 main.o student_func.o HW3.o loadSaveImage.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS)
35 | 
36 | main.o: main.cpp timer.h utils.h reference_calc.h compare.h
37 | 	g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
38 | 
39 | HW3.o: HW3.cu loadSaveImage.h utils.h
40 | 	$(NVCC) -c HW3.cu -I $(OPENCV_INCLUDEPATH) $(NVCC_OPTS)
41 | 
42 | loadSaveImage.o: loadSaveImage.cpp loadSaveImage.h
43 | 	g++ -c loadSaveImage.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
44 | 
45 | compare.o: compare.cpp compare.h
46 | 	g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
47 | 
48 | reference_calc.o: reference_calc.cpp reference_calc.h
49 | 	g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
50 | 
51 | student_func.o: student_func.cu utils.h
52 | 	$(NVCC) -c student_func.cu $(NVCC_OPTS)
53 | 
54 | clean:
55 | 	rm -f *.o hw
56 | 	find . -type f -name '*.exr' | grep -v memorial | xargs rm -f
57 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/ProblemSet3-ToneMapping.vcxproj:
--------------------------------------------------------------------------------
  1 | ﻿<?xml version="1.0" encoding="utf-8"?>
  2 | <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  3 |   <ItemGroup Label="ProjectConfigurations">
  4 |     <ProjectConfiguration Include="Debug|Win32">
  5 |       <Configuration>Debug</Configuration>
  6 |       <Platform>Win32</Platform>
  7 |     </ProjectConfiguration>
  8 |     <ProjectConfiguration Include="Debug|x64">
  9 |       <Configuration>Debug</Configuration>
 10 |       <Platform>x64</Platform>
 11 |     </ProjectConfiguration>
 12 |     <ProjectConfiguration Include="Release|Win32">
 13 |       <Configuration>Release</Configuration>
 14 |       <Platform>Win32</Platform>
 15 |     </ProjectConfiguration>
 16 |     <ProjectConfiguration Include="Release|x64">
 17 |       <Configuration>Release</Configuration>
 18 |       <Platform>x64</Platform>
 19 |     </ProjectConfiguration>
 20 |   </ItemGroup>
 21 |   <ItemGroup>
 22 |     <ClCompile Include="compare.cpp" />
 23 |     <ClCompile Include="loadSaveImage.cpp" />
 24 |     <ClCompile Include="main.cpp" />
 25 |     <ClCompile Include="reference_calc.cpp" />
 26 |   </ItemGroup>
 27 |   <ItemGroup>
 28 |     <ClInclude Include="compare.h" />
 29 |     <ClInclude Include="loadSaveImage.h" />
 30 |     <ClInclude Include="reference_calc.h" />
 31 |     <ClInclude Include="timer.h" />
 32 |     <ClInclude Include="utils.h" />
 33 |   </ItemGroup>
 34 |   <ItemGroup>
 35 |     <CudaCompile Include="HW3.cu" />
 36 |     <CudaCompile Include="student_func.cu" />
 37 |   </ItemGroup>
 38 |   <PropertyGroup Label="Globals">
 39 |     <ProjectGuid>{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}</ProjectGuid>
 40 |     <RootNamespace>ProblemSet3_ToneMapping</RootNamespace>
 41 |   </PropertyGroup>
 42 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 43 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
 44 |     <ConfigurationType>Application</ConfigurationType>
 45 |     <UseDebugLibraries>true</UseDebugLibraries>
 46 |     <CharacterSet>MultiByte</CharacterSet>
 47 |     <PlatformToolset>v140</PlatformToolset>
 48 |   </PropertyGroup>
 49 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
 50 |     <ConfigurationType>Application</ConfigurationType>
 51 |     <UseDebugLibraries>true</UseDebugLibraries>
 52 |     <CharacterSet>MultiByte</CharacterSet>
 53 |     <PlatformToolset>v140</PlatformToolset>
 54 |   </PropertyGroup>
 55 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
 56 |     <ConfigurationType>Application</ConfigurationType>
 57 |     <UseDebugLibraries>false</UseDebugLibraries>
 58 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 59 |     <CharacterSet>MultiByte</CharacterSet>
 60 |     <PlatformToolset>v140</PlatformToolset>
 61 |   </PropertyGroup>
 62 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
 63 |     <ConfigurationType>Application</ConfigurationType>
 64 |     <UseDebugLibraries>false</UseDebugLibraries>
 65 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 66 |     <CharacterSet>MultiByte</CharacterSet>
 67 |     <PlatformToolset>v140</PlatformToolset>
 68 |   </PropertyGroup>
 69 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 70 |   <ImportGroup Label="ExtensionSettings">
 71 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.props" />
 72 |   </ImportGroup>
 73 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 74 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 75 |   </ImportGroup>
 76 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 77 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 78 |   </ImportGroup>
 79 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 80 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 81 |   </ImportGroup>
 82 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 83 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 84 |   </ImportGroup>
 85 |   <PropertyGroup Label="UserMacros" />
 86 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 87 |     <LinkIncremental>true</LinkIncremental>
 88 |   </PropertyGroup>
 89 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 90 |     <LinkIncremental>true</LinkIncremental>
 91 |   </PropertyGroup>
 92 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 93 |     <ClCompile>
 94 |       <WarningLevel>Level3</WarningLevel>
 95 |       <Optimization>Disabled</Optimization>
 96 |       <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 97 |     </ClCompile>
 98 |     <Link>
 99 |       <GenerateDebugInformation>true</GenerateDebugInformation>
100 |       <SubSystem>Console</SubSystem>
101 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
102 |     </Link>
103 |     <PostBuildEvent>
104 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
105 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
106 |     </PostBuildEvent>
107 |   </ItemDefinitionGroup>
108 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
109 |     <ClCompile>
110 |       <WarningLevel>Level3</WarningLevel>
111 |       <Optimization>Disabled</Optimization>
112 |       <PreprocessorDefinitions>WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
113 |     </ClCompile>
114 |     <Link>
115 |       <GenerateDebugInformation>true</GenerateDebugInformation>
116 |       <SubSystem>Console</SubSystem>
117 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
118 |     </Link>
119 |     <PostBuildEvent>
120 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
121 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
122 |     </PostBuildEvent>
123 |     <CudaCompile>
124 |       <TargetMachinePlatform>64</TargetMachinePlatform>
125 |     </CudaCompile>
126 |   </ItemDefinitionGroup>
127 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
128 |     <ClCompile>
129 |       <WarningLevel>Level3</WarningLevel>
130 |       <Optimization>MaxSpeed</Optimization>
131 |       <FunctionLevelLinking>true</FunctionLevelLinking>
132 |       <IntrinsicFunctions>true</IntrinsicFunctions>
133 |       <PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
134 |     </ClCompile>
135 |     <Link>
136 |       <GenerateDebugInformation>true</GenerateDebugInformation>
137 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
138 |       <OptimizeReferences>true</OptimizeReferences>
139 |       <SubSystem>Console</SubSystem>
140 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
141 |     </Link>
142 |     <PostBuildEvent>
143 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
144 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
145 |     </PostBuildEvent>
146 |   </ItemDefinitionGroup>
147 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
148 |     <ClCompile>
149 |       <WarningLevel>Level3</WarningLevel>
150 |       <Optimization>MaxSpeed</Optimization>
151 |       <FunctionLevelLinking>true</FunctionLevelLinking>
152 |       <IntrinsicFunctions>true</IntrinsicFunctions>
153 |       <PreprocessorDefinitions>WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
154 |       <AdditionalIncludeDirectories>%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);C:\opencv\build\include;C:\opencv\build\include\opencv2</AdditionalIncludeDirectories>
155 |     </ClCompile>
156 |     <Link>
157 |       <GenerateDebugInformation>true</GenerateDebugInformation>
158 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
159 |       <OptimizeReferences>true</OptimizeReferences>
160 |       <SubSystem>Console</SubSystem>
161 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;opencv_world320d.lib;%(AdditionalDependencies)</AdditionalDependencies>
162 |       <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories);$(CudaToolkitLibDir);C:\opencv\build\x64\vc14\lib</AdditionalLibraryDirectories>
163 |     </Link>
164 |     <PostBuildEvent>
165 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
166 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
167 |     </PostBuildEvent>
168 |     <CudaCompile>
169 |       <TargetMachinePlatform>64</TargetMachinePlatform>
170 |       <CodeGeneration>compute_61,sm_61</CodeGeneration>
171 |     </CudaCompile>
172 |   </ItemDefinitionGroup>
173 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
174 |   <ImportGroup Label="ExtensionTargets">
175 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.targets" />
176 |   </ImportGroup>
177 | </Project>


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/compare.cpp:
--------------------------------------------------------------------------------
 1 | #include <opencv2/opencv.hpp>
 2 | #include "utils.h"
 3 | 
 4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
 5 | 				   double perPixelError, double globalError)
 6 | {
 7 |   cv::Mat reference = cv::imread(reference_filename, -1);
 8 |   cv::Mat test = cv::imread(test_filename, -1);
 9 | 
10 |   cv::Mat diff = abs(reference - test);
11 | 
12 |   cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
13 | 
14 |   double minVal, maxVal;
15 | 
16 |   cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
17 | 
18 |   //now perform transform so that we bump values to the full range
19 | 
20 |   diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
21 | 
22 |   diff = diffSingleChannel.reshape(reference.channels(), 0);
23 | 
24 |   cv::imwrite("HW3_differenceImage.png", diff);
25 |   //OK, now we can start comparing values...
26 |   unsigned char *referencePtr = reference.ptr<unsigned char>(0);
27 |   unsigned char *testPtr = test.ptr<unsigned char>(0);
28 | 
29 |   if (useEpsCheck) {
30 |     checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
31 |   }
32 |   else
33 |   {
34 |     checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
35 |   }
36 | 
37 |   std::cout << "PASS" << std::endl;
38 |   return;
39 | }
40 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef HW3_H__
2 | #define HW3_H__
3 | 
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | 				   double perPixelError, double globalError);
6 | 
7 | #endif
8 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/input.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/input.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/loadSaveImage.cpp:
--------------------------------------------------------------------------------
  1 | #include <opencv2/core/core.hpp>
  2 | #include <opencv2/highgui/highgui.hpp>
  3 | #include <opencv2/opencv.hpp>
  4 | #include <vector>
  5 | #include <stdio.h>
  6 | #include "cuda_runtime.h"
  7 | 
  8 | //The caller becomes responsible for the returned pointer. This
  9 | //is done in the interest of keeping this code as simple as possible.
 10 | //In production code this is a bad idea - we should use RAII
 11 | //to ensure the memory is freed.  DO NOT COPY THIS AND USE IN PRODUCTION
 12 | //CODE!!!
 13 | void loadImageHDR(const std::string &filename,
 14 |                   float **imagePtr,
 15 |                   size_t *numRows, size_t *numCols)
 16 | {
 17 |     cv::Mat originImg = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH);
 18 | 
 19 |     cv::Mat image;
 20 | 
 21 |     if(originImg.type() != CV_32FC3){
 22 |       originImg.convertTo(image,CV_32FC3);
 23 |     } else{
 24 |       image = originImg;
 25 |     }
 26 | 
 27 |   if (image.empty()) {
 28 |     std::cerr << "Couldn't open file: " << filename << std::endl;
 29 |     exit(1);
 30 |   }
 31 | 
 32 |   if (image.channels() != 3) {
 33 |     std::cerr << "Image must be color!" << std::endl;
 34 |     exit(1);
 35 |   }
 36 | 
 37 |   if (!image.isContinuous()) {
 38 |     std::cerr << "Image isn't continuous!" << std::endl;
 39 |     exit(1);
 40 |   }
 41 | 
 42 |   *imagePtr = new float[image.rows * image.cols * image.channels()];
 43 | 
 44 |   float *cvPtr = image.ptr<float>(0);
 45 |   for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i)
 46 |     (*imagePtr)[i] = cvPtr[i];
 47 | 
 48 |   *numRows = image.rows;
 49 |   *numCols = image.cols;
 50 | }
 51 | 
 52 | void loadImageRGBA(const std::string &filename,
 53 |                    uchar4 **imagePtr,
 54 |                    size_t *numRows, size_t *numCols)
 55 | {
 56 |   cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
 57 |   if (image.empty()) {
 58 |     std::cerr << "Couldn't open file: " << filename << std::endl;
 59 |     exit(1);
 60 |   }
 61 | 
 62 |   if (image.channels() != 3) {
 63 |     std::cerr << "Image must be color!" << std::endl;
 64 |     exit(1);
 65 |   }
 66 | 
 67 |   if (!image.isContinuous()) {
 68 |     std::cerr << "Image isn't continuous!" << std::endl;
 69 |     exit(1);
 70 |   }
 71 | 
 72 |   cv::Mat imageRGBA;
 73 |   cv::cvtColor(image, imageRGBA, CV_BGR2RGBA);
 74 | 
 75 |   *imagePtr = new uchar4[image.rows * image.cols];
 76 | 
 77 |   unsigned char *cvPtr = imageRGBA.ptr<unsigned char>(0);
 78 |   for (size_t i = 0; i < image.rows * image.cols; ++i) {
 79 |     (*imagePtr)[i].x = cvPtr[4 * i + 0];
 80 |     (*imagePtr)[i].y = cvPtr[4 * i + 1];
 81 |     (*imagePtr)[i].z = cvPtr[4 * i + 2];
 82 |     (*imagePtr)[i].w = cvPtr[4 * i + 3];
 83 |   }
 84 | 
 85 |   *numRows = image.rows;
 86 |   *numCols = image.cols;
 87 | }
 88 | 
 89 | void saveImageRGBA(const uchar4* const image,
 90 |                    const size_t numRows, const size_t numCols,
 91 |                    const std::string &output_file)
 92 | {
 93 |   int sizes[2];
 94 |   sizes[0] = numRows;
 95 |   sizes[1] = numCols;
 96 |   cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image);
 97 |   cv::Mat imageOutputBGR;
 98 |   cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR);
 99 |   //output the image
100 |   cv::imwrite(output_file.c_str(), imageOutputBGR);
101 | }
102 | 
103 | //output an exr file
104 | //assumed to already be BGR
105 | void saveImageHDR(const float* const image,
106 |                   const size_t numRows, const size_t numCols,
107 |                   const std::string &output_file)
108 | {
109 |   int sizes[2];
110 |   sizes[0] = numRows;
111 |   sizes[1] = numCols;
112 | 
113 |   cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image);
114 | 
115 |   imageHDR = imageHDR * 255;
116 | 
117 |   cv::imwrite(output_file.c_str(), imageHDR);
118 | }
119 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/loadSaveImage.h:
--------------------------------------------------------------------------------
 1 | #ifndef LOADSAVEIMAGE_H__
 2 | #define LOADSAVEIMAGE_H__
 3 | 
 4 | #include <string>
 5 | #include <cuda_runtime.h> //for uchar4
 6 | 
 7 | void loadImageHDR(const std::string &filename,
 8 |                   float **imagePtr,
 9 |                   size_t *numRows, size_t *numCols);
10 | 
11 | void loadImageRGBA(const std::string &filename,
12 |                    uchar4 **imagePtr,
13 |                    size_t *numRows, size_t *numCols);
14 | 
15 | void saveImageRGBA(const uchar4* const image,
16 |                    const size_t numRows, const size_t numCols,
17 |                    const std::string &output_file);
18 | 
19 | void saveImageHDR(const float* const image,
20 |                   const size_t numRows, const size_t numCols,
21 |                   const std::string &output_file);
22 | 
23 | #endif
24 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/main.cpp:
--------------------------------------------------------------------------------
  1 | //Udacity HW3 Driver
  2 | 
  3 | #include <iostream>
  4 | #include "timer.h"
  5 | #include "utils.h"
  6 | #include <string>
  7 | #include <stdio.h>
  8 | #include <algorithm>
  9 | 
 10 | #include "compare.h"
 11 | #include "reference_calc.h"
 12 | 
 13 | // Functions from HW3.cu
 14 | void preProcess(float **d_luminance, unsigned int **d_cdf,
 15 |                 size_t *numRows, size_t *numCols, unsigned int *numBins,
 16 |                 const std::string& filename);
 17 | 
 18 | void postProcess(const std::string& output_file, size_t numRows, size_t numCols,
 19 |                  float min_logLum, float max_logLum);
 20 | 
 21 | void cleanupGlobalMemory(void);
 22 | 
 23 | // Function from student_func.cu
 24 | void your_histogram_and_prefixsum(const float* const d_luminance,
 25 |                                   unsigned int* const d_cdf,
 26 |                                   float &min_logLum,
 27 |                                   float &max_logLum,
 28 |                                   const size_t numRows,
 29 |                                   const size_t numCols,
 30 |                                   const size_t numBins);
 31 | 
 32 | 
 33 | int main(int argc, char **argv) {
 34 |   float *d_luminance;
 35 |   unsigned int *d_cdf;
 36 | 
 37 |   size_t numRows, numCols;
 38 |   unsigned int numBins;
 39 | 
 40 |   std::string input_file;
 41 |   std::string output_file;
 42 |   std::string reference_file;
 43 |   double perPixelError = 0.0;
 44 |   double globalError   = 0.0;
 45 |   bool useEpsCheck = false;
 46 | 
 47 |   switch (argc)
 48 |   {
 49 | 	case 2:
 50 | 	  input_file = std::string(argv[1]);
 51 | 	  output_file = "HW3_output.png";
 52 | 	  reference_file = "HW3_reference.png";
 53 | 	  break;
 54 | 	case 3:
 55 | 	  input_file  = std::string(argv[1]);
 56 |       output_file = std::string(argv[2]);
 57 | 	  reference_file = "HW3_reference.png";
 58 | 	  break;
 59 | 	case 4:
 60 | 	  input_file  = std::string(argv[1]);
 61 |       output_file = std::string(argv[2]);
 62 | 	  reference_file = std::string(argv[3]);
 63 | 	  break;
 64 | 	case 6:
 65 | 	  useEpsCheck=true;
 66 | 	  input_file  = std::string(argv[1]);
 67 | 	  output_file = std::string(argv[2]);
 68 | 	  reference_file = std::string(argv[3]);
 69 | 	  perPixelError = atof(argv[4]);
 70 |       globalError   = atof(argv[5]);
 71 | 	  break;
 72 | 	default:
 73 |       std::cerr << "Usage: ./HW3 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl;
 74 |       exit(1);
 75 |   }
 76 |   //load the image and give us our input and output pointers
 77 |   preProcess(&d_luminance, &d_cdf,
 78 |              &numRows, &numCols, &numBins, input_file);
 79 | 
 80 |   GpuTimer timer;
 81 |   float min_logLum, max_logLum;
 82 |   min_logLum = 0.f;
 83 |   max_logLum = 1.f;
 84 |   timer.Start();
 85 |   //call the students' code
 86 |   your_histogram_and_prefixsum(d_luminance, d_cdf, min_logLum, max_logLum,
 87 |                                numRows, numCols, numBins);
 88 |   timer.Stop();
 89 |   cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
 90 |   int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
 91 | 
 92 |   if (err < 0) {
 93 |     //Couldn't print! Probably the student closed stdout - bad news
 94 |     std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
 95 |     exit(1);
 96 |   }
 97 | 
 98 |   float *h_luminance = (float *) malloc(sizeof(float)*numRows*numCols);
 99 |   unsigned int *h_cdf = (unsigned int *) malloc(sizeof(unsigned int)*numBins);
100 | 
101 |   checkCudaErrors(cudaMemcpy(h_luminance, d_luminance, numRows*numCols*sizeof(float), cudaMemcpyDeviceToHost));
102 | 
103 |   //check results and output the tone-mapped image
104 |   postProcess(output_file, numRows, numCols, min_logLum, max_logLum);
105 | 
106 |   for (size_t i = 1; i < numCols * numRows; ++i) {
107 | 	min_logLum = std::min(h_luminance[i], min_logLum);
108 |     max_logLum = std::max(h_luminance[i], max_logLum);
109 |   }
110 | 
111 |   referenceCalculation(h_luminance, h_cdf, numRows, numCols, numBins, min_logLum, max_logLum);
112 | 
113 |   checkCudaErrors(cudaMemcpy(d_cdf, h_cdf, sizeof(unsigned int) * numBins, cudaMemcpyHostToDevice));
114 | 
115 |   //check results and output the tone-mapped image
116 |   postProcess(reference_file, numRows, numCols, min_logLum, max_logLum);
117 | 
118 |   cleanupGlobalMemory();
119 | 
120 |   compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError);
121 | 
122 |   return 0;
123 | }
124 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial.exr


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_large.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_large.exr


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_png.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_png.gold


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_png_large.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_png_large.gold


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_raw.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_raw_large.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_large.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_raw_large_mapped.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_large_mapped.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_raw_mapped.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_mapped.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/my_output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/my_output.png


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/reference_calc.cpp:
--------------------------------------------------------------------------------
 1 | #include <algorithm>
 2 | #include <cassert>
 3 | void referenceCalculation(const float* const h_logLuminance, unsigned int* const h_cdf,
 4 |                           const size_t numRows, const size_t numCols, const size_t numBins, 
 5 | 						  float &logLumMin, float &logLumMax)
 6 | {
 7 |   logLumMin = h_logLuminance[0];
 8 |   logLumMax = h_logLuminance[0];
 9 | 
10 |   //Step 1
11 |   //first we find the minimum and maximum across the entire image
12 |   for (size_t i = 1; i < numCols * numRows; ++i) {
13 |     logLumMin = std::min(h_logLuminance[i], logLumMin);
14 |     logLumMax = std::max(h_logLuminance[i], logLumMax);
15 |   }
16 | 
17 |   //Step 2
18 |   float logLumRange = logLumMax - logLumMin;
19 | 
20 |   //Step 3
21 |   //next we use the now known range to compute
22 |   //a histogram of numBins bins
23 |   unsigned int *histo = new unsigned int[numBins];
24 | 
25 |   for (size_t i = 0; i < numBins; ++i) histo[i] = 0;
26 | 
27 |   for (size_t i = 0; i < numCols * numRows; ++i) {
28 |     unsigned int bin = std::min(static_cast<unsigned int>(numBins - 1),
29 |                            static_cast<unsigned int>((h_logLuminance[i] - logLumMin) / logLumRange * numBins));
30 |     histo[bin]++;
31 |   }
32 | 
33 |   //Step 4
34 |   //finally we perform and exclusive scan (prefix sum)
35 |   //on the histogram to get the cumulative distribution
36 |   h_cdf[0] = 0;
37 |   for (size_t i = 1; i < numBins; ++i) {
38 |     h_cdf[i] = h_cdf[i - 1] + histo[i - 1];
39 |   }
40 | 
41 |   delete[] histo;
42 | }


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 | 
4 |   void referenceCalculation(const float* const h_logLuminance, unsigned int* const h_cdf,
5 |                           const size_t numRows, const size_t numCols, const size_t numBins, 
6 | 						  float &logLumMin, float &logLumMax);
7 | 
8 | #endif
9 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/student_func.cu:
--------------------------------------------------------------------------------
  1 | /* Udacity Homework 3
  2 |    HDR Tone-mapping
  3 | 
  4 |   Background HDR
  5 |   ==============
  6 | 
  7 |   A High Dynamic Range (HDR) image contains a wider variation of intensity
  8 |   and color than is allowed by the RGB format with 1 byte per channel that we
  9 |   have used in the previous assignment.  
 10 | 
 11 |   To store this extra information we use single precision floating point for
 12 |   each channel.  This allows for an extremely wide range of intensity values.
 13 | 
 14 |   In the image for this assignment, the inside of church with light coming in
 15 |   through stained glass windows, the raw input floating point values for the
 16 |   channels range from 0 to 275.  But the mean is .41 and 98% of the values are
 17 |   less than 3!  This means that certain areas (the windows) are extremely bright
 18 |   compared to everywhere else.  If we linearly map this [0-275] range into the
 19 |   [0-255] range that we have been using then most values will be mapped to zero!
 20 |   The only thing we will be able to see are the very brightest areas - the
 21 |   windows - everything else will appear pitch black.
 22 | 
 23 |   The problem is that although we have cameras capable of recording the wide
 24 |   range of intensity that exists in the real world our monitors are not capable
 25 |   of displaying them.  Our eyes are also quite capable of observing a much wider
 26 |   range of intensities than our image formats / monitors are capable of
 27 |   displaying.
 28 | 
 29 |   Tone-mapping is a process that transforms the intensities in the image so that
 30 |   the brightest values aren't nearly so far away from the mean.  That way when
 31 |   we transform the values into [0-255] we can actually see the entire image.
 32 |   There are many ways to perform this process and it is as much an art as a
 33 |   science - there is no single "right" answer.  In this homework we will
 34 |   implement one possible technique.
 35 | 
 36 |   Background Chrominance-Luminance
 37 |   ================================
 38 | 
 39 |   The RGB space that we have been using to represent images can be thought of as
 40 |   one possible set of axes spanning a three dimensional space of color.  We
 41 |   sometimes choose other axes to represent this space because they make certain
 42 |   operations more convenient.
 43 | 
 44 |   Another possible way of representing a color image is to separate the color
 45 |   information (chromaticity) from the brightness information.  There are
 46 |   multiple different methods for doing this - a common one during the analog
 47 |   television days was known as Chrominance-Luminance or YUV.
 48 | 
 49 |   We choose to represent the image in this way so that we can remap only the
 50 |   intensity channel and then recombine the new intensity values with the color
 51 |   information to form the final image.
 52 | 
 53 |   Old TV signals used to be transmitted in this way so that black & white
 54 |   televisions could display the luminance channel while color televisions would
 55 |   display all three of the channels.
 56 |   
 57 | 
 58 |   Tone-mapping
 59 |   ============
 60 | 
 61 |   In this assignment we are going to transform the luminance channel (actually
 62 |   the log of the luminance, but this is unimportant for the parts of the
 63 |   algorithm that you will be implementing) by compressing its range to [0, 1].
 64 |   To do this we need the cumulative distribution of the luminance values.
 65 | 
 66 |   Example
 67 |   -------
 68 | 
 69 |   input : [2 4 3 3 1 7 4 5 7 0 9 4 3 2]
 70 |   min / max / range: 0 / 9 / 9
 71 | 
 72 |   histo with 3 bins: [4 7 3]
 73 | 
 74 |   cdf : [4 11 14]
 75 | 
 76 | 
 77 |   Your task is to calculate this cumulative distribution by following these
 78 |   steps.
 79 | 
 80 | */
 81 | 
 82 | #include "utils.h"
 83 | #include "device_launch_parameters.h"
 84 | //#include "reference_calc.cpp"
 85 | #include <stdio.h>
 86 | #include <float.h>
 87 | #include <limits.h>
 88 | 
 89 | const int BLOCK_SIZE = 1024;
 90 | 
 91 | __device__ float _min(float a, float b) {
 92 | 	return a < b ? a : b;
 93 | }
 94 | 
 95 | __device__ float _max(float a, float b) {
 96 | 	return a > b ? a : b;
 97 | }
 98 | 
 99 | __global__ void minmax_reduce(float* d_out, const float * d_in, int input_size,bool isMin) {
100 | 	
101 | 	extern __shared__ float sdata[];
102 | 	
103 | 	int tid = threadIdx.x;
104 | 	int global_id = tid + blockDim.x*blockIdx.x;
105 | 	
106 | 	if (global_id >= input_size) { sdata[tid] = d_in[0]; } //dummy init (does not modify the final result)
107 | 	else sdata[tid] = d_in[global_id];
108 | 	__syncthreads();
109 | 	for (int s = blockDim.x/2; s > 0; s>>=1){
110 | 		if (tid < s) sdata[tid] = isMin ? _min(sdata[tid], sdata[tid + s]) : _max(sdata[tid], sdata[tid + s]);
111 | 		__syncthreads();
112 | 	}
113 | 	if (tid == 0) {
114 | 		d_out[blockIdx.x] = sdata[0];
115 | 	}
116 | }
117 | 
118 | 
119 | 
120 | __global__ void histo_atomic(unsigned int* out_histo,  const float * d_in, int numBins, int input_size, float minVal, float rangeVals) {
121 | 	int tid = threadIdx.x;
122 | 	int global_id = tid + blockDim.x*blockIdx.x;
123 | 	if (global_id >= input_size) return;
124 | 	int bin  = ((d_in[global_id] - minVal)*numBins) / rangeVals;
125 | 	bin = bin == numBins ? numBins - 1 : bin; //max value bin is the last of the histo
126 | 	atomicAdd(&(out_histo[bin]), 1);
127 | }
128 | 
129 | 
130 | //--------HILLIS-STEELE SCAN----------
131 | //Optimal step efficiency (histogram is a relatively small vector)
132 | //Works on maximum 1024 (Pascal) elems vector.
133 | __global__ void scan_hillis_steele(unsigned int* d_out,const unsigned int* d_in, int size) {
134 | 	extern __shared__ unsigned int temp[];
135 | 	int tid = threadIdx.x;
136 | 	int pout = 0,pin=1;
137 | 	temp[tid] = tid>0? d_in[tid-1]:0; //exclusive scan
138 | 	__syncthreads();
139 | 
140 | 	//double buffered
141 | 	for (int off = 1; off < size; off <<= 1) {
142 | 		pout = 1 - pout;
143 | 		pin = 1 - pout;
144 | 		if (tid >= off) temp[size*pout + tid] = temp[size*pin + tid]+temp[size*pin + tid - off];
145 | 		else temp[size*pout + tid] = temp[size*pin + tid];
146 | 		__syncthreads();
147 | 	}
148 | 	d_out[tid] = temp[pout*size + tid];
149 | }
150 | 
151 | 
152 | float reduce(const float* const d_logLuminance, int input_size,bool isMin) {
153 | 	int threads = BLOCK_SIZE;
154 | 	float* d_current_in = NULL;
155 | 	int size = input_size;
156 | 	int blocks = ceil(1.0f*size / threads); 
157 | 	while (true) {
158 | 		//allocate memory for intermediate results
159 | 		//printf("Size %d blocks %d\n", size,blocks);
160 | 		float* d_out;
161 | 		checkCudaErrors(cudaMalloc(&d_out, blocks * sizeof(float)));
162 | 		//call reduce kernel: if first iteration use original vector, otherwise use the last intermediate result.
163 | 		if (d_current_in == NULL) minmax_reduce << <blocks, threads, threads * sizeof(float) >> > (d_out, d_logLuminance, size, isMin);
164 | 		else minmax_reduce << <blocks, threads, threads * sizeof(float) >> > (d_out, d_current_in, size, isMin);;
165 | 		cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
166 | 
167 | 		//free last intermediate result
168 | 		if (d_current_in != NULL) checkCudaErrors(cudaFree(d_current_in));
169 | 
170 | 		if (blocks == 1) {
171 | 			//end of reduction reached
172 | 			float h_out;
173 | 			checkCudaErrors(cudaMemcpy(&h_out, d_out, sizeof(float), cudaMemcpyDeviceToHost));
174 | 			return h_out;
175 | 		}
176 | 		size = blocks;
177 | 		blocks = ceil(1.0f*size / threads); 
178 | 		if (blocks == 0)blocks++;
179 | 		d_current_in = d_out;//point to new intermediate result
180 | 		
181 | 	}
182 | 	
183 | }
184 | 
185 | 
186 | unsigned int* compute_histogram(const float* const d_logLuminance, int numBins, int input_size, float minVal, float rangeVals) {
187 | 	unsigned int* d_histo;
188 | 	checkCudaErrors(cudaMalloc(&d_histo, numBins * sizeof(unsigned int)));
189 | 	checkCudaErrors(cudaMemset(d_histo, 0, numBins * sizeof(unsigned int)));
190 | 	int threads = BLOCK_SIZE;
191 | 	int blocks = ceil(1.0f*input_size / threads);
192 | 	histo_atomic << <blocks, threads >> >(d_histo, d_logLuminance, numBins, input_size, minVal, rangeVals);
193 | 	cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
194 | 	return d_histo;
195 | }
196 | 
197 | void your_histogram_and_prefixsum(const float* const d_logLuminance,
198 |                                   unsigned int* const d_cdf,
199 |                                   float &min_logLum,
200 |                                   float &max_logLum,
201 |                                   const size_t numRows,
202 |                                   const size_t numCols,
203 |                                   const size_t numBins)
204 | {
205 |   /*Here are the steps you need to implement
206 |     1) find the minimum and maximum value in the input logLuminance channel
207 |        store in min_logLum and max_logLum
208 |     2) subtract them to find the range
209 |     3) generate a histogram of all the values in the logLuminance channel using
210 |        the formula: bin = (lum[i] - lumMin) / lumRange * numBins
211 |     4) Perform an exclusive scan (prefix sum) on the histogram to get
212 |        the cumulative distribution of luminance values (this should go in the
213 |        incoming d_cdf pointer which already has been allocated for you)       */
214 | 
215 | 	//1. Reduce
216 | 	int input_size = numRows*numCols;
217 | 	min_logLum = reduce(d_logLuminance, input_size, true);
218 | 	max_logLum = reduce(d_logLuminance, input_size, false);
219 | 	//printf("%f %f\n", min_logLum, max_logLum);
220 | 
221 | 	//2. Range
222 | 	float range = max_logLum - min_logLum;
223 | 
224 | 	//3. Histogram
225 | 	unsigned int* d_histo=compute_histogram(d_logLuminance, numBins, input_size, min_logLum, range);
226 | 
227 | 	//4. CDF (scan)
228 | 	//Assumption: numBins<=1024
229 | 	scan_hillis_steele << <1, numBins, 2*numBins*sizeof(unsigned int) >> > (d_cdf,d_histo, numBins);
230 | 
231 | 	checkCudaErrors(cudaFree(d_histo));
232 | 
233 | }
234 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/timer.h:
--------------------------------------------------------------------------------
 1 | #ifndef GPU_TIMER_H__
 2 | #define GPU_TIMER_H__
 3 | 
 4 | #include <cuda_runtime.h>
 5 | 
 6 | struct GpuTimer
 7 | {
 8 |   cudaEvent_t start;
 9 |   cudaEvent_t stop;
10 | 
11 |   GpuTimer()
12 |   {
13 |     cudaEventCreate(&start);
14 |     cudaEventCreate(&stop);
15 |   }
16 | 
17 |   ~GpuTimer()
18 |   {
19 |     cudaEventDestroy(start);
20 |     cudaEventDestroy(stop);
21 |   }
22 | 
23 |   void Start()
24 |   {
25 |     cudaEventRecord(start, 0);
26 |   }
27 | 
28 |   void Stop()
29 |   {
30 |     cudaEventRecord(stop, 0);
31 |   }
32 | 
33 |   float Elapsed()
34 |   {
35 |     float elapsed;
36 |     cudaEventSynchronize(stop);
37 |     cudaEventElapsedTime(&elapsed, start, stop);
38 |     return elapsed;
39 |   }
40 | };
41 | 
42 | #endif  /* GPU_TIMER_H__ */
43 | 


--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/utils.h:
--------------------------------------------------------------------------------
 1 | #ifndef UTILS_H__
 2 | #define UTILS_H__
 3 | 
 4 | #include <iostream>
 5 | #include <iomanip>
 6 | #include <cuda.h>
 7 | #include <cuda_runtime.h>
 8 | #include <cuda_runtime_api.h>
 9 | #include <cassert>
10 | #include <cmath>
11 | #include <algorithm>
12 | 
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 | 
15 | template<typename T>
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 |   if (err != cudaSuccess) {
18 |     std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 |     std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 |     exit(1);
21 |   }
22 | }
23 | 
24 | template<typename T>
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 |   //check that the GPU result matches the CPU result
27 |   for (size_t i = 0; i < numElem; ++i) {
28 |     if (ref[i] != gpu[i]) {
29 |       std::cerr << "Difference at pos " << i << std::endl;
30 |       //the + is magic to convert char to int without messing
31 |       //with other types
32 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 |                  "\nGPU      : " << +gpu[i] << std::endl;
34 |       exit(1);
35 |     }
36 |   }
37 | }
38 | 
39 | template<typename T>
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 |   assert(eps1 >= 0 && eps2 >= 0);
42 |   unsigned long long totalDiff = 0;
43 |   unsigned numSmallDifferences = 0;
44 |   for (size_t i = 0; i < numElem; ++i) {
45 |     //subtract smaller from larger in case of unsigned types
46 |     T smaller = std::min(ref[i], gpu[i]);
47 |     T larger = std::max(ref[i], gpu[i]);
48 |     T diff = larger - smaller;
49 |     if (diff > 0 && diff <= eps1) {
50 |       numSmallDifferences++;
51 |     }
52 |     else if (diff > eps1) {
53 |       std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 |         "\nGPU      : " << +gpu[i] << std::endl;
56 |       exit(1);
57 |     }
58 |     totalDiff += diff * diff;
59 |   }
60 |   double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 |   if (percentSmallDifferences > eps2) {
62 |     std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 |     std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 |     exit(1);
65 |   }
66 | }
67 | 
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template<typename T>
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 | 
74 |   size_t numBadPixels = 0;
75 |   for (size_t i = 0; i < numElem; ++i) {
76 |     T smaller = std::min(ref[i], gpu[i]);
77 |     T larger = std::max(ref[i], gpu[i]);
78 |     T diff = larger - smaller;
79 |     if (diff > variance)
80 |       ++numBadPixels;
81 |   }
82 | 
83 |   if (numBadPixels > tolerance) {
84 |     std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 |     exit(1);
86 |   }
87 | }
88 | 
89 | #endif
90 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | ############################################################################
 2 | # <summary> CMakeLists.txt for OpenCV and CUDA. </summary>
 3 | # <date>    2012-02-07          </date>
 4 | # <author>  Quan Tran Minh. edit by Johannes Kast, Michael Sarahan </author>
 5 | # <email>   quantm@unist.ac.kr  kast.jo@googlemail.com msarahan@gmail.com</email>
 6 | ############################################################################
 7 | 
 8 | # collect source files
 9 | 
10 | file( GLOB  hdr *.hpp *.h )
11 | file( GLOB  cu  *.cu)
12 | SET (HW4_files main.cpp loadSaveImage.cpp reference_calc.cpp compare.cpp)
13 | 
14 | CUDA_ADD_EXECUTABLE(HW4 ${HW4_files} ${hdr} ${img} ${cu})
15 | 
16 | 
17 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/HW4_output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/HW4_output.png


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/Makefile:
--------------------------------------------------------------------------------
 1 | NVCC=/usr/local/cuda-5.0/bin/nvcc
 2 | #NVCC=nvcc
 3 | 
 4 | ###################################
 5 | # These are the default install   #
 6 | # locations on most linux distros #
 7 | ###################################
 8 | 
 9 | OPENCV_LIBPATH=/usr/lib
10 | OPENCV_INCLUDEPATH=/usr/include
11 | 
12 | ###################################################
13 | # On Macs the default install locations are below #
14 | ###################################################
15 | 
16 | #OPENCV_LIBPATH=/usr/local/lib
17 | #OPENCV_INCLUDEPATH=/usr/local/include
18 | 
19 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui
20 | 
21 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
22 | # CUDA_INCLUDEPATH=/usr/local/cuda/lib64/include
23 | # CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
24 | # CUDA_INCLUDEPATH=/Developer/NVIDIA/CUDA-5.0/include
25 | 
26 | ######################################################
27 | # On Macs the default install locations are below    #
28 | # ####################################################
29 | 
30 | #CUDA_INCLUDEPATH=/usr/local/cuda/include
31 | #CUDA_LIBPATH=/usr/local/cuda/lib
32 | CUDA_LIBPATH=/usr/local/cuda-5.0/lib64
33 | 
34 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
35 | 
36 | GCC_OPTS=-O3 -Wall -Wextra -m64
37 | 
38 | student: main.o student_func.o HW4.o loadSaveImage.o compare.o reference_calc.o Makefile
39 | 	$(NVCC) -o HW4 main.o student_func.o HW4.o loadSaveImage.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS)
40 | 
41 | main.o: main.cpp timer.h utils.h reference_calc.h
42 | 	g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
43 | 
44 | HW4.o: HW4.cu loadSaveImage.h utils.h
45 | 	$(NVCC) -c HW4.cu -I $(OPENCV_INCLUDEPATH) $(NVCC_OPTS)
46 | 
47 | loadSaveImage.o: loadSaveImage.cpp loadSaveImage.h
48 | 	g++ -c loadSaveImage.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
49 | 
50 | compare.o: compare.cpp compare.h
51 | 	g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
52 | 
53 | reference_calc.o: reference_calc.cpp reference_calc.h
54 | 	g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
55 | 
56 | student_func.o: student_func.cu reference_calc.cpp utils.h
57 | 	$(NVCC) -c student_func.cu $(NVCC_OPTS)
58 | 
59 | clean:
60 | 	rm -f *.o *.png hw
61 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/compare.cpp:
--------------------------------------------------------------------------------
 1 | #include <opencv2/opencv.hpp>
 2 | #include "utils.h"
 3 | 
 4 | 
 5 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
 6 | 				   double perPixelError, double globalError)
 7 | {
 8 |   cv::Mat reference = cv::imread(reference_filename, -1);
 9 |   cv::Mat test = cv::imread(test_filename, -1);
10 | 
11 |   cv::Mat diff = abs(reference - test);
12 | 
13 |   cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
14 | 
15 |   double minVal, maxVal;
16 | 
17 |   cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
18 | 
19 |   //now perform transform so that we bump values to the full range
20 | 
21 |   diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
22 | 
23 |   diff = diffSingleChannel.reshape(reference.channels(), 0);
24 | 
25 |   cv::imwrite("HW4_differenceImage.png", diff);
26 |   //OK, now we can start comparing values...
27 |   unsigned char *referencePtr = reference.ptr<unsigned char>(0);
28 |   unsigned char *testPtr = test.ptr<unsigned char>(0);
29 | 
30 |   if (useEpsCheck) {
31 |     checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
32 |   }
33 |   else
34 |   {
35 |     checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
36 |   }
37 | 
38 |   std::cout << "PASS" << std::endl;
39 |   return;
40 | }


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef HW4_H__
2 | #define HW4_H__
3 | 
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | 				   double perPixelError, double globalError);
6 | 
7 | #endif


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/loadSaveImage.cpp:
--------------------------------------------------------------------------------
  1 | #include <opencv2/core/core.hpp>
  2 | #include <opencv2/highgui/highgui.hpp>
  3 | #include <opencv2/opencv.hpp>
  4 | #include <vector>
  5 | #include "cuda_runtime.h"
  6 | 
  7 | //The caller becomes responsible for the returned pointer. This
  8 | //is done in the interest of keeping this code as simple as possible.
  9 | //In production code this is a bad idea - we should use RAII
 10 | //to ensure the memory is freed.  DO NOT COPY THIS AND USE IN PRODUCTION
 11 | //CODE!!!
 12 | void loadImageHDR(const std::string &filename,
 13 |                   float **imagePtr,
 14 |                   size_t *numRows, size_t *numCols)
 15 | {
 16 |   cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH);
 17 |   if (image.empty()) {
 18 |     std::cerr << "Couldn't open file: " << filename << std::endl;
 19 |     exit(1);
 20 |   }
 21 | 
 22 |   if (image.channels() != 3) {
 23 |     std::cerr << "Image must be color!" << std::endl;
 24 |     exit(1);
 25 |   }
 26 | 
 27 |   if (!image.isContinuous()) {
 28 |     std::cerr << "Image isn't continuous!" << std::endl;
 29 |     exit(1);
 30 |   }
 31 | 
 32 |   *imagePtr = new float[image.rows * image.cols * image.channels()];
 33 | 
 34 |   float *cvPtr = image.ptr<float>(0);
 35 |   for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i)
 36 |     (*imagePtr)[i] = cvPtr[i];
 37 | 
 38 |   *numRows = image.rows;
 39 |   *numCols = image.cols;
 40 | }
 41 | 
 42 | void loadImageRGBA(const std::string &filename,
 43 |                    uchar4 **imagePtr,
 44 |                    size_t *numRows, size_t *numCols)
 45 | {
 46 |   cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
 47 |   if (image.empty()) {
 48 |     std::cerr << "Couldn't open file: " << filename << std::endl;
 49 |     exit(1);
 50 |   }
 51 | 
 52 |   if (image.channels() != 3) {
 53 |     std::cerr << "Image must be color!" << std::endl;
 54 |     exit(1);
 55 |   }
 56 | 
 57 |   if (!image.isContinuous()) {
 58 |     std::cerr << "Image isn't continuous!" << std::endl;
 59 |     exit(1);
 60 |   }
 61 | 
 62 |   cv::Mat imageRGBA;
 63 |   cv::cvtColor(image, imageRGBA, CV_BGR2RGBA);
 64 | 
 65 |   *imagePtr = new uchar4[image.rows * image.cols];
 66 | 
 67 |   unsigned char *cvPtr = imageRGBA.ptr<unsigned char>(0);
 68 |   for (size_t i = 0; i < image.rows * image.cols; ++i) {
 69 |     (*imagePtr)[i].x = cvPtr[4 * i + 0];
 70 |     (*imagePtr)[i].y = cvPtr[4 * i + 1];
 71 |     (*imagePtr)[i].z = cvPtr[4 * i + 2];
 72 |     (*imagePtr)[i].w = cvPtr[4 * i + 3];
 73 |   }
 74 | 
 75 |   *numRows = image.rows;
 76 |   *numCols = image.cols;
 77 | }
 78 | 
 79 | void saveImageRGBA(const uchar4* const image,
 80 |                    const size_t numRows, const size_t numCols,
 81 |                    const std::string &output_file)
 82 | {
 83 |   int sizes[2];
 84 |   sizes[0] = numRows;
 85 |   sizes[1] = numCols;
 86 |   cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image);
 87 |   cv::Mat imageOutputBGR;
 88 |   cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR);
 89 |   //output the image
 90 |   cv::imwrite(output_file.c_str(), imageOutputBGR);
 91 | }
 92 | 
 93 | //output an exr file
 94 | //assumed to already be BGR
 95 | void saveImageHDR(const float* const image,
 96 |                   const size_t numRows, const size_t numCols,
 97 |                   const std::string &output_file)
 98 | {
 99 |   int sizes[2];
100 |   sizes[0] = numRows;
101 |   sizes[1] = numCols;
102 | 
103 |   cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image);
104 | 
105 |   imageHDR = imageHDR * 255;
106 | 
107 |   cv::imwrite(output_file.c_str(), imageHDR);
108 | }
109 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/loadSaveImage.h:
--------------------------------------------------------------------------------
 1 | #ifndef LOADSAVEIMAGE_H__
 2 | #define LOADSAVEIMAGE_H__
 3 | 
 4 | #include <string>
 5 | #include <cuda_runtime.h> //for uchar4
 6 | 
 7 | void loadImageHDR(const std::string &filename,
 8 |                   float **imagePtr,
 9 |                   size_t *numRows, size_t *numCols);
10 | 
11 | void loadImageRGBA(const std::string &filename,
12 |                    uchar4 **imagePtr,
13 |                    size_t *numRows, size_t *numCols);
14 | 
15 | void saveImageRGBA(const uchar4* const image,
16 |                    const size_t numRows, const size_t numCols,
17 |                    const std::string &output_file);
18 | 
19 | void saveImageHDR(const float* const image,
20 |                   const size_t numRows, const size_t numCols,
21 |                   const std::string &output_file);
22 | 
23 | #endif
24 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/main.cpp:
--------------------------------------------------------------------------------
  1 | //Udacity HW4 Driver
  2 | 
  3 | #include <iostream>
  4 | #include "timer.h"
  5 | #include "utils.h"
  6 | #include <string>
  7 | #include <stdio.h>
  8 | #include <thrust/host_vector.h>
  9 | #include <thrust/device_vector.h>
 10 | 
 11 | #include "compare.h"
 12 | #include "reference_calc.h"
 13 | 
 14 | void preProcess(unsigned int **inputVals,
 15 |                 unsigned int **inputPos,
 16 |                 unsigned int **outputVals,
 17 |                 unsigned int **outputPos,
 18 |                 size_t &numElems,
 19 |                 const std::string& filename,
 20 | 				const std::string& template_file);
 21 | 
 22 | void postProcess(const unsigned int* const outputVals,
 23 |                  const unsigned int* const outputPos,
 24 |                  const size_t numElems,
 25 |                  const std::string& output_file);
 26 | 
 27 | void your_sort(unsigned int* const inputVals,
 28 |                unsigned int* const inputPos,
 29 |                unsigned int* const outputVals,
 30 |                unsigned int* const outputPos,
 31 |                const size_t numElems);
 32 | 
 33 | int main(int argc, char **argv) {
 34 |   unsigned int *inputVals;
 35 |   unsigned int *inputPos;
 36 |   unsigned int *outputVals;
 37 |   unsigned int *outputPos;
 38 | 
 39 |   size_t numElems;
 40 | 
 41 |   std::string input_file;
 42 |   std::string template_file;
 43 |   std::string output_file;
 44 |   std::string reference_file;
 45 |   double perPixelError = 0.0;
 46 |   double globalError   = 0.0;
 47 |   bool useEpsCheck = false;
 48 | 
 49 |   switch (argc)
 50 |   {
 51 | 	case 3:
 52 | 	  input_file  = std::string(argv[1]);
 53 |       template_file = std::string(argv[2]);
 54 | 	  output_file = "HW4_output.png";
 55 | 	  break;
 56 | 	case 4:
 57 | 	  input_file  = std::string(argv[1]);
 58 |       template_file = std::string(argv[2]);
 59 | 	  output_file = std::string(argv[3]);
 60 | 	  break;
 61 | 	default:
 62 |           std::cerr << "Usage: ./HW4 input_file template_file [output_filename]" << std::endl;
 63 |           exit(1);
 64 |   }
 65 |   //load the image and give us our input and output pointers
 66 |   preProcess(&inputVals, &inputPos, &outputVals, &outputPos, numElems, input_file, template_file);
 67 | 
 68 |   GpuTimer timer;
 69 |   timer.Start();
 70 | 
 71 |   //call the students' code
 72 |   your_sort(inputVals, inputPos, outputVals, outputPos, numElems);
 73 | 
 74 |   timer.Stop();
 75 |   cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
 76 |   printf("\n");
 77 |   int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
 78 | 
 79 |   if (err < 0) {
 80 |     //Couldn't print! Probably the student closed stdout - bad news
 81 |     std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
 82 |     exit(1);
 83 |   }
 84 | 
 85 |   //check results and output the red-eye corrected image
 86 |   postProcess(outputVals, outputPos, numElems, output_file);
 87 | 
 88 |   // check code moved from HW4.cu
 89 |   /****************************************************************************
 90 |   * You can use the code below to help with debugging, but make sure to       *
 91 |   * comment it out again before submitting your assignment for grading,       *
 92 |   * otherwise this code will take too much time and make it seem like your    *
 93 |   * GPU implementation isn't fast enough.                                     *
 94 |   *                                                                           *
 95 |   * This code MUST RUN BEFORE YOUR CODE in case you accidentally change       *
 96 |   * the input values when implementing your radix sort.                       *
 97 |   *                                                                           *
 98 |   * This code performs the reference radix sort on the host and compares your *
 99 |   * sorted values to the reference.                                           *
100 |   *                                                                           *
101 |   * Thrust containers are used for copying memory from the GPU                *
102 |   * ************************************************************************* */
103 |   thrust::device_ptr<unsigned int> d_inputVals(inputVals);
104 |   thrust::device_ptr<unsigned int> d_inputPos(inputPos);
105 | 
106 |   thrust::host_vector<unsigned int> h_inputVals(d_inputVals,
107 |                                                 d_inputVals+numElems);
108 |   thrust::host_vector<unsigned int> h_inputPos(d_inputPos,
109 |                                                d_inputPos + numElems);
110 | 
111 |   thrust::host_vector<unsigned int> h_outputVals(numElems);
112 |   thrust::host_vector<unsigned int> h_outputPos(numElems);
113 | 
114 |   reference_calculation(&h_inputVals[0], &h_inputPos[0],
115 | 						&h_outputVals[0], &h_outputPos[0],
116 | 						numElems);
117 | 
118 |   //postProcess(valsPtr, posPtr, numElems, reference_file);
119 | 
120 |   //compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError);
121 | 
122 |   thrust::device_ptr<unsigned int> d_outputVals(outputVals);
123 |   thrust::device_ptr<unsigned int> d_outputPos(outputPos);
124 | 
125 |   thrust::host_vector<unsigned int> h_yourOutputVals(d_outputVals,
126 |                                                      d_outputVals + numElems);
127 |   thrust::host_vector<unsigned int> h_yourOutputPos(d_outputPos,
128 |                                                     d_outputPos + numElems);
129 | 
130 |   checkResultsExact(&h_outputVals[0], &h_yourOutputVals[0], numElems);
131 |   checkResultsExact(&h_outputPos[0], &h_yourOutputPos[0], numElems);
132 | 
133 |   checkCudaErrors(cudaFree(inputVals));
134 |   checkCudaErrors(cudaFree(inputPos));
135 |   checkCudaErrors(cudaFree(outputVals));
136 |   checkCudaErrors(cudaFree(outputPos));
137 | 
138 |   return 0;
139 | }
140 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/red_eye_effect.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect.gold


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/red_eye_effect_5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_5.jpg


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/red_eye_effect_5_out.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_5_out.jpg


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/red_eye_effect_template_5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_template_5.jpg


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/reference_calc.cpp:
--------------------------------------------------------------------------------
 1 | #include <algorithm>
 2 | // For memset
 3 | #include <cstring>
 4 | 
 5 | void reference_calculation(unsigned int* inputVals,
 6 |                            unsigned int* inputPos,
 7 |                            unsigned int* outputVals,
 8 |                            unsigned int* outputPos,
 9 |                            const size_t numElems)
10 | {
11 |   const int numBits = 1;
12 |   const int numBins = 1 << numBits;
13 | 
14 |   unsigned int *binHistogram = new unsigned int[numBins];
15 |   unsigned int *binScan      = new unsigned int[numBins];
16 | 
17 |   unsigned int *vals_src = inputVals;
18 |   unsigned int *pos_src  = inputPos;
19 | 
20 |   unsigned int *vals_dst = outputVals;
21 |   unsigned int *pos_dst  = outputPos;
22 | 
23 |   //a simple radix sort - only guaranteed to work for numBits that are multiples of 2
24 |   for (unsigned int i = 0; i < 8 * sizeof(unsigned int); i += numBits) {
25 |     unsigned int mask = (numBins - 1) << i;
26 | 
27 |     memset(binHistogram, 0, sizeof(unsigned int) * numBins); //zero out the bins
28 |     memset(binScan, 0, sizeof(unsigned int) * numBins); //zero out the bins
29 | 
30 |     //perform histogram of data & mask into bins
31 |     for (unsigned int j = 0; j < numElems; ++j) {
32 |       unsigned int bin = (vals_src[j] & mask) >> i;
33 |       binHistogram[bin]++;
34 |     }
35 | 
36 |     //perform exclusive prefix sum (scan) on binHistogram to get starting
37 |     //location for each bin
38 |     for (unsigned int j = 1; j < numBins; ++j) {
39 |       binScan[j] = binScan[j - 1] + binHistogram[j - 1];
40 |     }
41 | 
42 |     //Gather everything into the correct location
43 |     //need to move vals and positions
44 |     for (unsigned int j = 0; j < numElems; ++j) {
45 |       unsigned int bin = (vals_src[j] & mask) >> i;
46 |       vals_dst[binScan[bin]] = vals_src[j];
47 |       pos_dst[binScan[bin]]  = pos_src[j];
48 |       binScan[bin]++;
49 |     }
50 | 
51 |     //swap the buffers (pointers only)
52 |     std::swap(vals_dst, vals_src);
53 |     std::swap(pos_dst, pos_src);
54 |   }
55 | 
56 |   //we did an even number of iterations, need to copy from input buffer into output
57 |   std::copy(inputVals, inputVals + numElems, outputVals);
58 |   std::copy(inputPos, inputPos + numElems, outputPos);
59 | 
60 |   delete[] binHistogram;
61 |   delete[] binScan;
62 | }
63 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/reference_calc.h:
--------------------------------------------------------------------------------
 1 | #ifndef REFERENCE_H__
 2 | #define REFERENCE_H__
 3 | 
 4 | 
 5 | //A simple un-optimized reference radix sort calculation
 6 | //Only deals with power-of-2 radices
 7 | 
 8 | 
 9 | void reference_calculation(unsigned int* inputVals,
10 |                            unsigned int* inputPos,
11 |                            unsigned int* outputVals,
12 |                            unsigned int* outputPos,
13 |                            const size_t numElems);
14 | #endif


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/student_func.cu:
--------------------------------------------------------------------------------
  1 | //Udacity HW 4
  2 | //Radix Sorting
  3 | 
  4 | #include "utils.h"
  5 | #include "device_launch_parameters.h"
  6 | #include <thrust/host_vector.h>
  7 | 
  8 | const int BLOCK_SIZE = 1024;
  9 | 
 10 | /* Red Eye Removal
 11 |    ===============
 12 |    
 13 |    For this assignment we are implementing red eye removal.  This is
 14 |    accomplished by first creating a score for every pixel that tells us how
 15 |    likely it is to be a red eye pixel.  We have already done this for you - you
 16 |    are receiving the scores and need to sort them in ascending order so that we
 17 |    know which pixels to alter to remove the red eye.
 18 | 
 19 |    Note: ascending order == smallest to largest
 20 | 
 21 |    Each score is associated with a position, when you sort the scores, you must
 22 |    also move the positions accordingly.
 23 | 
 24 |    Implementing Parallel Radix Sort with CUDA
 25 |    ==========================================
 26 | 
 27 |    The basic idea is to construct a histogram on each pass of how many of each
 28 |    "digit" there are.   Then we scan this histogram so that we know where to put
 29 |    the output of each digit.  For example, the first 1 must come after all the
 30 |    0s so we have to know how many 0s there are to be able to start moving 1s
 31 |    into the correct position.
 32 | 
 33 |    1) Histogram of the number of occurrences of each digit
 34 |    2) Exclusive Prefix Sum of Histogram
 35 |    3) Determine relative offset of each digit
 36 |         For example [0 0 1 1 0 0 1]
 37 |                 ->  [0 1 0 1 2 3 2]
 38 |    4) Combine the results of steps 2 & 3 to determine the final
 39 |       output location for each element and move it there
 40 | 
 41 |    LSB Radix sort is an out-of-place sort and you will need to ping-pong values
 42 |    between the input and output buffers we have provided.  Make sure the final
 43 |    sorted results end up in the output buffer!  Hint: You may need to do a copy
 44 |    at the end.
 45 | 
 46 |  */
 47 | 
 48 | 
 49 | 
 50 | __global__ void predicate(unsigned int* predicate, const unsigned int* d_in, size_t numElems,int bit) {
 51 | 	int tid = threadIdx.x;
 52 | 	int global_id = tid + blockDim.x*blockIdx.x;
 53 | 	if (global_id >= numElems) return;
 54 | 	unsigned int bin = ((d_in[global_id] >> bit) & 1u);
 55 | 	predicate[global_id] =bin;
 56 | }
 57 | 
 58 | 
 59 | __global__ void bielloch_scan(unsigned int* d_out, const unsigned int* d_in, size_t input_size, unsigned int* blockSums) {
 60 | 	extern __shared__ unsigned int data[];
 61 | 	
 62 | 	int tid = threadIdx.x;
 63 | 	int offset = 1;
 64 | 	int abs_start = 2*blockDim.x*blockIdx.x;
 65 | 	
 66 | 	data[2 * tid] =(abs_start+2*tid)<input_size? d_in[abs_start+2 * tid]:0;
 67 | 	data[2 * tid+1] = (abs_start + 2 * tid+1)<input_size ? d_in[abs_start+2 * tid+1]:0;
 68 | 
 69 | 	for (int d = (2 * blockDim.x) >>1; d>0; d>>=1) {
 70 | 		__syncthreads();
 71 | 		
 72 | 		if (tid < d) {
 73 | 			int ai = offset*(2 * tid + 1) - 1;
 74 | 			int bi = offset*(2 * tid + 2) - 1;
 75 | 			
 76 | 			data[bi] += data[ai];
 77 | 		}
 78 | 		offset <<= 1;
 79 | 	}
 80 | 	if (tid == 0)data[2*blockDim.x - 1] = 0;
 81 | 
 82 | 	for (int d = 1; d < 2 * blockDim.x; d<<=1) {
 83 | 		offset >>= 1;
 84 | 		__syncthreads();
 85 | 		if (tid < d) {
 86 | 			int ai = offset*(2 * tid + 1) - 1;
 87 | 			int bi = offset*(2 * tid + 2) - 1;
 88 | 			unsigned int t = data[ai];
 89 | 			data[ai] = data[bi];
 90 | 			data[bi] += t;
 91 | 		}
 92 | 	}
 93 | 
 94 | 	__syncthreads();
 95 | 	
 96 | 	if (abs_start + 2 * tid < input_size) {
 97 | 		d_out[abs_start + 2 * tid] = data[2 * tid];
 98 | 	}
 99 | 	if (abs_start + 2 * tid+1 < input_size) {
100 | 		d_out[abs_start + 2 * tid+1] = data[2 * tid+1];
101 | 	}
102 | 
103 | 	if (tid == 0) {
104 | 		blockSums[blockIdx.x] = data[blockDim.x * 2 - 1];
105 | 		if(abs_start + blockDim.x * 2 - 1<input_size)blockSums[blockIdx.x]+=d_in[abs_start + blockDim.x * 2 - 1];
106 | 	}
107 | }
108 | 
109 | __global__ void adjustIncrement(unsigned int* d, unsigned int* incr, size_t input_size){
110 | 	int pos = blockIdx.x * blockDim.x*2 + threadIdx.x * 2 + 1;
111 | 	if (pos< input_size)
112 | 	{
113 | 		d[pos] += incr[blockIdx.x];
114 | 		d[pos-1] += incr[blockIdx.x];
115 | 	}
116 | 	else if (pos-1 < input_size)
117 | 	{
118 | 		d[pos-1] += incr[blockIdx.x];
119 | 	}
120 | }
121 | 
122 | __global__ void negatePredicate(unsigned int* predicate, size_t input_size) {
123 | 	int tid = threadIdx.x;
124 | 	int pos = blockDim.x*blockIdx.x + tid;
125 | 	if (pos >= input_size)return;
126 | 	predicate[pos] = predicate[pos] ? 0 : 1;
127 | }
128 | 
129 | __global__ void moveElements(unsigned int* d_out, const unsigned int* d_in, const unsigned int* d_histo, 
130 | 								const unsigned int* d_predicate,const unsigned int* d_scan_true, const unsigned int* d_scan_false, size_t input_size) {
131 | 	int tid = threadIdx.x;
132 | 	int pos = blockDim.x*blockIdx.x + tid;
133 | 	if (pos >= input_size)return;
134 | 	//calculate new index of element at position pos
135 | 	int newindex;	
136 | 	if (d_predicate[pos])newindex = d_histo[0] + d_scan_false[pos];
137 | 	else newindex = d_histo[1] + d_scan_true[pos];
138 | 	if (newindex >= input_size) return; //IMP
139 | 	d_out[newindex] = d_in[pos];
140 | }
141 | 
142 | 
143 | 
144 | unsigned int biellochScan(unsigned int* d_scan, unsigned int* d_pred, size_t numElems) {
145 | 	
146 | 	int num_double_blocks = ceil(1.0f*numElems / (2*BLOCK_SIZE));
147 | 	unsigned int* d_blocksums;
148 | 	checkCudaErrors(cudaMalloc(&d_blocksums, num_double_blocks * sizeof(unsigned int)));
149 | 	bielloch_scan << <num_double_blocks, BLOCK_SIZE, 2 * BLOCK_SIZE*sizeof(unsigned int) >> > (d_scan, d_pred, numElems, d_blocksums);
150 | 	cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
151 | 
152 | 	unsigned int finalSum;
153 | 	//Scan of the blocksums array
154 | 	if (num_double_blocks > 1) {
155 | 		unsigned int* d_scan_temp;
156 | 		checkCudaErrors(cudaMalloc(&d_scan_temp, num_double_blocks * sizeof(unsigned int)));
157 | 		finalSum=biellochScan(d_scan_temp, d_blocksums, num_double_blocks);
158 | 		adjustIncrement << <num_double_blocks, BLOCK_SIZE >> > (d_scan, d_scan_temp, numElems);
159 | 		cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
160 | 		checkCudaErrors(cudaFree(d_scan_temp));
161 | 	}
162 | 	else {
163 | 		
164 | 		checkCudaErrors(cudaMemcpy(&finalSum, d_blocksums, sizeof(unsigned int), cudaMemcpyDeviceToHost));
165 | 		checkCudaErrors(cudaFree(d_blocksums));
166 | 	}
167 | 	
168 | 	return finalSum;
169 | 
170 | }
171 | 
172 | void your_sort(unsigned int* const d_inputVals,
173 |                unsigned int* const d_inputPos,
174 |                unsigned int* const d_outputVals,
175 |                unsigned int* const d_outputPos,
176 |                size_t numElems)
177 | { 
178 |   //PUT YOUR SORT HERE
179 | 	int num_blocks = ceil(1.0f*numElems / BLOCK_SIZE);
180 | 	
181 | 	unsigned int h_histo[2];
182 | 	h_histo[0] = 0;
183 | 
184 | 	unsigned int* d_histo;
185 | 	unsigned int* d_pred;
186 | 	unsigned int* d_scan_true;
187 | 	unsigned int* d_scan_false;
188 | 	
189 | 	checkCudaErrors(cudaMalloc(&d_histo, 2 * sizeof(unsigned int)));
190 | 	checkCudaErrors(cudaMalloc(&d_pred, numElems*sizeof(unsigned int)));
191 | 	checkCudaErrors(cudaMalloc(&d_scan_true, numElems * sizeof(unsigned int)));
192 | 	checkCudaErrors(cudaMalloc(&d_scan_false, numElems * sizeof(unsigned int)));
193 | 	//for each of the 32 bits
194 | 	for (size_t i = 0; i < 32; i++) {
195 | 
196 | 		//compute predicate
197 | 		if (i % 2 == 0)predicate << <num_blocks, BLOCK_SIZE >> > (d_pred, d_inputVals, numElems, i);
198 | 		else predicate << <num_blocks, BLOCK_SIZE >> > (d_pred, d_outputVals, numElems, i);
199 | 		cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
200 | 
201 | 		
202 | 	
203 | 		//Exclusive Prefix Sum of 2-bins histogram is: [0 numFalse].
204 | 		//You can obtain it buy sum-reduce on predicate: equivalent to last sumBlock of BiellochScan
205 | 		
206 | 		//Compute offset of positives
207 | 		//Bielloch scan
208 | 		unsigned int number_trues=biellochScan(d_scan_true, d_pred, numElems);
209 | 
210 | 		//Flip bits
211 | 		negatePredicate << <num_blocks, BLOCK_SIZE >> > (d_pred, numElems);
212 | 		cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
213 | 
214 | 		//Compute offset of negatives
215 | 		unsigned int number_falses=biellochScan(d_scan_false, d_pred, numElems);
216 | 
217 | 		h_histo[1] = number_falses;
218 | 		checkCudaErrors(cudaMemcpy(d_histo, h_histo, 2 * sizeof(unsigned int), cudaMemcpyHostToDevice));
219 | 
220 | 		//Moving elements and indices
221 | 		if (i % 2 == 0) {
222 | 			moveElements << <num_blocks, BLOCK_SIZE >> > (d_outputVals, d_inputVals, d_histo, d_pred, d_scan_true, d_scan_false, numElems);
223 | 			cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
224 | 			moveElements << <num_blocks, BLOCK_SIZE >> > (d_outputPos, d_inputPos, d_histo, d_pred, d_scan_true, d_scan_false, numElems);
225 | 
226 | 		}
227 | 		else {
228 | 			moveElements << <num_blocks, BLOCK_SIZE >> > (d_inputVals, d_outputVals, d_histo, d_pred, d_scan_true, d_scan_false, numElems);
229 | 			cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
230 | 			moveElements << <num_blocks, BLOCK_SIZE >> > (d_inputPos, d_outputPos, d_histo, d_pred, d_scan_true, d_scan_false, numElems);
231 | 
232 | 		}
233 | 			cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
234 | 			
235 | 	}
236 | 
237 | 	//Copy result into d_outputVals
238 | 	checkCudaErrors(cudaMemcpy(d_outputVals, d_inputVals, numElems * sizeof(unsigned int), cudaMemcpyDeviceToDevice));
239 | 	checkCudaErrors(cudaMemcpy(d_outputPos, d_inputPos, numElems * sizeof(unsigned int), cudaMemcpyDeviceToDevice));
240 | 
241 | 	
242 | 	checkCudaErrors(cudaFree(d_histo));
243 | 	checkCudaErrors(cudaFree(d_pred));
244 | 	checkCudaErrors(cudaFree(d_scan_true));
245 | 	checkCudaErrors(cudaFree(d_scan_false));
246 | 
247 | }
248 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/timer.h:
--------------------------------------------------------------------------------
 1 | #ifndef GPU_TIMER_H__
 2 | #define GPU_TIMER_H__
 3 | 
 4 | #include <cuda_runtime.h>
 5 | 
 6 | struct GpuTimer
 7 | {
 8 |   cudaEvent_t start;
 9 |   cudaEvent_t stop;
10 | 
11 |   GpuTimer()
12 |   {
13 |     cudaEventCreate(&start);
14 |     cudaEventCreate(&stop);
15 |   }
16 | 
17 |   ~GpuTimer()
18 |   {
19 |     cudaEventDestroy(start);
20 |     cudaEventDestroy(stop);
21 |   }
22 | 
23 |   void Start()
24 |   {
25 |     cudaEventRecord(start, 0);
26 |   }
27 | 
28 |   void Stop()
29 |   {
30 |     cudaEventRecord(stop, 0);
31 |   }
32 | 
33 |   float Elapsed()
34 |   {
35 |     float elapsed;
36 |     cudaEventSynchronize(stop);
37 |     cudaEventElapsedTime(&elapsed, start, stop);
38 |     return elapsed;
39 |   }
40 | };
41 | 
42 | #endif  /* GPU_TIMER_H__ */
43 | 


--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/utils.h:
--------------------------------------------------------------------------------
 1 | #ifndef UTILS_H__
 2 | #define UTILS_H__
 3 | 
 4 | #include <iostream>
 5 | #include <iomanip>
 6 | #include <cuda.h>
 7 | #include <cuda_runtime.h>
 8 | #include <cuda_runtime_api.h>
 9 | #include <cassert>
10 | #include <cmath>
11 | #include <algorithm>
12 | 
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 | 
15 | template<typename T>
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 |   if (err != cudaSuccess) {
18 |     std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 |     std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 |     exit(1);
21 |   }
22 | }
23 | 
24 | template<typename T>
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 |   //check that the GPU result matches the CPU result
27 |   for (size_t i = 0; i < numElem; ++i) {
28 |     if (ref[i] != gpu[i]) {
29 |       std::cerr << "Difference at pos " << i << std::endl;
30 |       //the + is magic to convert char to int without messing
31 |       //with other types
32 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 |                  "\nGPU      : " << +gpu[i] << std::endl;
34 |       exit(1);
35 |     }
36 |   }
37 | }
38 | 
39 | template<typename T>
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 |   assert(eps1 >= 0 && eps2 >= 0);
42 |   unsigned long long totalDiff = 0;
43 |   unsigned numSmallDifferences = 0;
44 |   for (size_t i = 0; i < numElem; ++i) {
45 |     //subtract smaller from larger in case of unsigned types
46 |     T smaller = std::min(ref[i], gpu[i]);
47 |     T larger = std::max(ref[i], gpu[i]);
48 |     T diff = larger - smaller;
49 |     if (diff > 0 && diff <= eps1) {
50 |       numSmallDifferences++;
51 |     }
52 |     else if (diff > eps1) {
53 |       std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 |         "\nGPU      : " << +gpu[i] << std::endl;
56 |       exit(1);
57 |     }
58 |     totalDiff += diff * diff;
59 |   }
60 |   double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 |   if (percentSmallDifferences > eps2) {
62 |     std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 |     std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 |     exit(1);
65 |   }
66 | }
67 | 
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template<typename T>
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 | 
74 |   size_t numBadPixels = 0;
75 |   for (size_t i = 0; i < numElem; ++i) {
76 |     T smaller = std::min(ref[i], gpu[i]);
77 |     T larger = std::max(ref[i], gpu[i]);
78 |     T diff = larger - smaller;
79 |     if (diff > variance)
80 |       ++numBadPixels;
81 |   }
82 | 
83 |   if (numBadPixels > tolerance) {
84 |     std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 |     exit(1);
86 |   }
87 | }
88 | 
89 | #endif
90 | 


--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/ProblemSet5-OptimizedHistogram.vcxproj:
--------------------------------------------------------------------------------
  1 | ﻿<?xml version="1.0" encoding="utf-8"?>
  2 | <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  3 |   <ItemGroup Label="ProjectConfigurations">
  4 |     <ProjectConfiguration Include="Debug|Win32">
  5 |       <Configuration>Debug</Configuration>
  6 |       <Platform>Win32</Platform>
  7 |     </ProjectConfiguration>
  8 |     <ProjectConfiguration Include="Debug|x64">
  9 |       <Configuration>Debug</Configuration>
 10 |       <Platform>x64</Platform>
 11 |     </ProjectConfiguration>
 12 |     <ProjectConfiguration Include="Release|Win32">
 13 |       <Configuration>Release</Configuration>
 14 |       <Platform>Win32</Platform>
 15 |     </ProjectConfiguration>
 16 |     <ProjectConfiguration Include="Release|x64">
 17 |       <Configuration>Release</Configuration>
 18 |       <Platform>x64</Platform>
 19 |     </ProjectConfiguration>
 20 |   </ItemGroup>
 21 |   <PropertyGroup Label="Globals">
 22 |     <ProjectGuid>{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}</ProjectGuid>
 23 |     <RootNamespace>ProblemSet5_OptimizedHistogram</RootNamespace>
 24 |   </PropertyGroup>
 25 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 26 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
 27 |     <ConfigurationType>Application</ConfigurationType>
 28 |     <UseDebugLibraries>true</UseDebugLibraries>
 29 |     <CharacterSet>MultiByte</CharacterSet>
 30 |     <PlatformToolset>v140</PlatformToolset>
 31 |   </PropertyGroup>
 32 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
 33 |     <ConfigurationType>Application</ConfigurationType>
 34 |     <UseDebugLibraries>true</UseDebugLibraries>
 35 |     <CharacterSet>MultiByte</CharacterSet>
 36 |     <PlatformToolset>v140</PlatformToolset>
 37 |   </PropertyGroup>
 38 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
 39 |     <ConfigurationType>Application</ConfigurationType>
 40 |     <UseDebugLibraries>false</UseDebugLibraries>
 41 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 42 |     <CharacterSet>MultiByte</CharacterSet>
 43 |     <PlatformToolset>v140</PlatformToolset>
 44 |   </PropertyGroup>
 45 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
 46 |     <ConfigurationType>Application</ConfigurationType>
 47 |     <UseDebugLibraries>false</UseDebugLibraries>
 48 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 49 |     <CharacterSet>MultiByte</CharacterSet>
 50 |     <PlatformToolset>v140</PlatformToolset>
 51 |   </PropertyGroup>
 52 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 53 |   <ImportGroup Label="ExtensionSettings">
 54 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.props" />
 55 |   </ImportGroup>
 56 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 57 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 58 |   </ImportGroup>
 59 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 60 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 61 |   </ImportGroup>
 62 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 63 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 64 |   </ImportGroup>
 65 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 66 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 67 |   </ImportGroup>
 68 |   <PropertyGroup Label="UserMacros" />
 69 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 70 |     <LinkIncremental>true</LinkIncremental>
 71 |   </PropertyGroup>
 72 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 73 |     <LinkIncremental>true</LinkIncremental>
 74 |   </PropertyGroup>
 75 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 76 |     <ClCompile>
 77 |       <WarningLevel>Level3</WarningLevel>
 78 |       <Optimization>Disabled</Optimization>
 79 |       <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 80 |     </ClCompile>
 81 |     <Link>
 82 |       <GenerateDebugInformation>true</GenerateDebugInformation>
 83 |       <SubSystem>Console</SubSystem>
 84 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
 85 |     </Link>
 86 |     <PostBuildEvent>
 87 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
 88 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
 89 |     </PostBuildEvent>
 90 |   </ItemDefinitionGroup>
 91 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 92 |     <ClCompile>
 93 |       <WarningLevel>Level3</WarningLevel>
 94 |       <Optimization>Disabled</Optimization>
 95 |       <PreprocessorDefinitions>WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 96 |     </ClCompile>
 97 |     <Link>
 98 |       <GenerateDebugInformation>true</GenerateDebugInformation>
 99 |       <SubSystem>Console</SubSystem>
100 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
101 |     </Link>
102 |     <PostBuildEvent>
103 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
104 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
105 |     </PostBuildEvent>
106 |     <CudaCompile>
107 |       <TargetMachinePlatform>64</TargetMachinePlatform>
108 |     </CudaCompile>
109 |   </ItemDefinitionGroup>
110 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
111 |     <ClCompile>
112 |       <WarningLevel>Level3</WarningLevel>
113 |       <Optimization>MaxSpeed</Optimization>
114 |       <FunctionLevelLinking>true</FunctionLevelLinking>
115 |       <IntrinsicFunctions>true</IntrinsicFunctions>
116 |       <PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
117 |     </ClCompile>
118 |     <Link>
119 |       <GenerateDebugInformation>true</GenerateDebugInformation>
120 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
121 |       <OptimizeReferences>true</OptimizeReferences>
122 |       <SubSystem>Console</SubSystem>
123 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
124 |     </Link>
125 |     <PostBuildEvent>
126 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
127 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
128 |     </PostBuildEvent>
129 |   </ItemDefinitionGroup>
130 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
131 |     <ClCompile>
132 |       <WarningLevel>Level3</WarningLevel>
133 |       <Optimization>MaxSpeed</Optimization>
134 |       <FunctionLevelLinking>true</FunctionLevelLinking>
135 |       <IntrinsicFunctions>true</IntrinsicFunctions>
136 |       <PreprocessorDefinitions>WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
137 |     </ClCompile>
138 |     <Link>
139 |       <GenerateDebugInformation>true</GenerateDebugInformation>
140 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
141 |       <OptimizeReferences>true</OptimizeReferences>
142 |       <SubSystem>Console</SubSystem>
143 |       <AdditionalDependencies>cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)</AdditionalDependencies>
144 |     </Link>
145 |     <PostBuildEvent>
146 |       <Command>echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
147 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"</Command>
148 |     </PostBuildEvent>
149 |     <CudaCompile>
150 |       <TargetMachinePlatform>64</TargetMachinePlatform>
151 |     </CudaCompile>
152 |   </ItemDefinitionGroup>
153 |   <ItemGroup>
154 |     <CudaCompile Include="main.cu" />
155 |     <CudaCompile Include="student.cu" />
156 |   </ItemGroup>
157 |   <ItemGroup>
158 |     <ClCompile Include="reference_calc.cpp" />
159 |   </ItemGroup>
160 |   <ItemGroup>
161 |     <ClInclude Include="reference_calc.h" />
162 |     <ClInclude Include="timer.h" />
163 |     <ClInclude Include="utils.h" />
164 |   </ItemGroup>
165 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
166 |   <ImportGroup Label="ExtensionTargets">
167 |     <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 8.0.targets" />
168 |   </ImportGroup>
169 | </Project>


--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/main.cu:
--------------------------------------------------------------------------------
  1 | #include <cstdlib>
  2 | #include <iostream>
  3 | #include <cstdio>
  4 | #include <fstream>
  5 | #include "utils.h"
  6 | #include "timer.h"
  7 | #include <cstdio>
  8 | #if defined(_WIN16) || defined(_WIN32) || defined(_WIN64)
  9 | #include <Windows.h>
 10 | #else
 11 | #include <sys/time.h>
 12 | #endif
 13 | 
 14 | #include <thrust/random/linear_congruential_engine.h>
 15 | #include <thrust/random/normal_distribution.h>
 16 | #include <thrust/random/uniform_int_distribution.h>
 17 | 
 18 | #include "reference_calc.h"
 19 | 
 20 | void computeHistogram(const unsigned int *const d_vals,
 21 |                       unsigned int* const d_histo,
 22 |                       const unsigned int numBins,
 23 |                       const unsigned int numElems);
 24 | 
 25 | int main(void)
 26 | {
 27 |   const unsigned int numBins = 1024;
 28 |   const unsigned int numElems = 10000 * numBins;
 29 |   const float stddev = 100.f;
 30 | 
 31 |   unsigned int *vals = new unsigned int[numElems];
 32 |   unsigned int *h_vals = new unsigned int[numElems];
 33 |   unsigned int *h_studentHisto = new unsigned int[numBins];
 34 |   unsigned int *h_refHisto = new unsigned int[numBins];
 35 | 
 36 | #if defined(_WIN16) || defined(_WIN32) || defined(_WIN64)
 37 |   srand(GetTickCount());
 38 | #else
 39 |   timeval tv;
 40 |   gettimeofday(&tv, NULL);
 41 | 
 42 |   srand(tv.tv_usec);
 43 | #endif
 44 | 
 45 |   //make the mean unpredictable, but close enough to the middle
 46 |   //so that timings are unaffected
 47 |   unsigned int mean = rand() % 100 + 462;
 48 | 
 49 |   //Output mean so that grading can happen with the same inputs
 50 |   std::cout << mean << std::endl;
 51 | 
 52 |   thrust::minstd_rand rng;
 53 | 
 54 |   thrust::random::normal_distribution<float> normalDist((float)mean, stddev);
 55 | 
 56 | 
 57 | 
 58 |   // Generate the random values
 59 |   for (size_t i = 0; i < numElems; ++i) {
 60 |     vals[i] = std::min((unsigned int) std::max((int)normalDist(rng), 0), numBins - 1);
 61 |   }
 62 | 
 63 |   unsigned int *d_vals, *d_histo;
 64 | 
 65 |   GpuTimer timer; 
 66 | 
 67 |   checkCudaErrors(cudaMalloc(&d_vals,    sizeof(unsigned int) * numElems));
 68 |   checkCudaErrors(cudaMalloc(&d_histo,   sizeof(unsigned int) * numBins));
 69 |   checkCudaErrors(cudaMemset(d_histo, 0, sizeof(unsigned int) * numBins));
 70 | 
 71 |   checkCudaErrors(cudaMemcpy(d_vals, vals, sizeof(unsigned int) * numElems, cudaMemcpyHostToDevice));
 72 | 
 73 |   timer.Start();
 74 |   computeHistogram(d_vals, d_histo, numBins, numElems);
 75 |   timer.Stop();
 76 |   int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
 77 | 
 78 |   if (err < 0) {
 79 |     //Couldn't print! Probably the student closed stdout - bad news
 80 |     std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
 81 |     exit(1);
 82 |   }
 83 | 
 84 |   // copy the student-computed histogram back to the host
 85 |   checkCudaErrors(cudaMemcpy(h_studentHisto, d_histo, sizeof(unsigned int) * numBins, cudaMemcpyDeviceToHost));
 86 | 
 87 |   //generate reference for the given mean
 88 |   reference_calculation(vals, h_refHisto, numBins, numElems);
 89 | 
 90 |   //Now do the comparison
 91 |   checkResultsExact(h_refHisto, h_studentHisto, numBins);
 92 | 
 93 |   delete[] h_vals;
 94 |   delete[] h_refHisto;
 95 |   delete[] h_studentHisto;
 96 | 
 97 |   cudaFree(d_vals);
 98 |   cudaFree(d_histo);
 99 | 
100 |   return 0;
101 | }
102 | 


--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/reference_calc.cpp:
--------------------------------------------------------------------------------
 1 | #include <cstdlib>
 2 | //Reference Histogram calculation
 3 | 
 4 | void reference_calculation(const unsigned int* const vals,
 5 |                            unsigned int* const histo,
 6 |                            const size_t numBins,
 7 |                            const size_t numElems)
 8 | 
 9 | {
10 |   //zero out bins
11 |   for (size_t i = 0; i < numBins; ++i)
12 |     histo[i] = 0;
13 | 
14 |   //go through vals and increment appropriate bin
15 |   for (size_t i = 0; i < numElems; ++i)
16 |     histo[vals[i]]++;
17 | }
18 | 


--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/reference_calc.h:
--------------------------------------------------------------------------------
 1 | #ifndef REFERENCE_H__
 2 | #define REFERENCE_H__
 3 | 
 4 | //Reference Histogram calculation
 5 | 
 6 | void reference_calculation(const unsigned int* const vals,
 7 |                            unsigned int* const histo,
 8 |                            const size_t numBins,
 9 |                            const size_t numElems);
10 | 
11 | #endif


--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/student.cu:
--------------------------------------------------------------------------------
 1 | /* Udacity HW5
 2 |    Histogramming for Speed
 3 | 
 4 |    The goal of this assignment is compute a histogram
 5 |    as fast as possible.  We have simplified the problem as much as
 6 |    possible to allow you to focus solely on the histogramming algorithm.
 7 | 
 8 |    The input values that you need to histogram are already the exact
 9 |    bins that need to be updated.  This is unlike in HW3 where you needed
10 |    to compute the range of the data and then do:
11 |    bin = (val - valMin) / valRange to determine the bin.
12 | 
13 |    Here the bin is just:
14 |    bin = val
15 | 
16 |    so the serial histogram calculation looks like:
17 |    for (i = 0; i < numElems; ++i)
18 |      histo[val[i]]++;
19 | 
20 |    That's it!  Your job is to make it run as fast as possible!
21 | 
22 |    The values are normally distributed - you may take
23 |    advantage of this fact in your implementation.
24 | 
25 | */
26 | 
27 | 
28 | #include "utils.h"
29 | #include "device_launch_parameters.h"
30 | #include <thrust/host_vector.h>
31 | 
32 | const int N_THREADS =  1024;
33 | 
34 | 
35 | 
36 | __global__
37 | void naiveHisto(const unsigned int* const vals, //INPUT
38 | 	unsigned int* const histo,      //OUPUT
39 | 	int numVals)
40 | {
41 | 	int tid = threadIdx.x;
42 | 	int global_id = tid + blockDim.x*blockIdx.x;
43 | 	if (global_id >= numVals) return;
44 | 	atomicAdd(&(histo[vals[global_id]]), 1);
45 | }
46 | 
47 | __global__
48 | void perBlockHisto(const unsigned int* const vals, //INPUT
49 | 	unsigned int* const histo,      //OUPUT
50 | 	int numVals,int numBins) {
51 | 
52 | 	extern __shared__ unsigned int sharedHisto[]; //size as original histo
53 | 
54 | 	//coalesced initialization: multiple blocks could manage the same shared histo
55 | 	for (int i = threadIdx.x; i < numBins; i += blockDim.x) {
56 | 		sharedHisto[i] = 0;
57 | 	}
58 | 
59 | 	__syncthreads();
60 | 
61 | 	int globalid = threadIdx.x + blockIdx.x*blockDim.x;
62 | 	atomicAdd(&sharedHisto[vals[globalid]], 1);
63 | 	
64 | 	__syncthreads();
65 | 
66 | 	for (int i = threadIdx.x; i < numBins; i += blockDim.x) {
67 | 		atomicAdd(&histo[i], sharedHisto[i]);
68 | 	}
69 | 
70 | 
71 | }
72 | 
73 | 
74 | 
75 | void computeHistogram(const unsigned int* const d_vals, //INPUT
76 |                       unsigned int* const d_histo,      //OUTPUT
77 |                       const unsigned int numBins,
78 |                       const unsigned int numElems)
79 | {
80 |   //TODO Launch the yourHisto kernel
81 | 
82 | 	int blocks = ceil(numElems / N_THREADS);
83 | 
84 | 	//naiveHisto <<< blocks, N_THREADS >>> (d_vals, d_histo, numElems);
85 | 
86 | 
87 | 	//more than 7x speedup over naiveHisto
88 | 	perBlockHisto << <blocks, N_THREADS, sizeof(unsigned int)*numBins >> > (d_vals, d_histo, numElems, numBins);
89 | 
90 |   //if you want to use/launch more than one kernel,
91 |   //feel free
92 | 
93 |   cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
94 | }
95 | 


--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/timer.h:
--------------------------------------------------------------------------------
 1 | #ifndef GPU_TIMER_H__
 2 | #define GPU_TIMER_H__
 3 | 
 4 | #include <cuda_runtime.h>
 5 | 
 6 | struct GpuTimer
 7 | {
 8 |   cudaEvent_t start;
 9 |   cudaEvent_t stop;
10 | 
11 |   GpuTimer()
12 |   {
13 |     cudaEventCreate(&start);
14 |     cudaEventCreate(&stop);
15 |   }
16 | 
17 |   ~GpuTimer()
18 |   {
19 |     cudaEventDestroy(start);
20 |     cudaEventDestroy(stop);
21 |   }
22 | 
23 |   void Start()
24 |   {
25 |     cudaEventRecord(start, 0);
26 |   }
27 | 
28 |   void Stop()
29 |   {
30 |     cudaEventRecord(stop, 0);
31 |   }
32 | 
33 |   float Elapsed()
34 |   {
35 |     float elapsed;
36 |     cudaEventSynchronize(stop);
37 |     cudaEventElapsedTime(&elapsed, start, stop);
38 |     return elapsed;
39 |   }
40 | };
41 | 
42 | #endif  /* GPU_TIMER_H__ */
43 | 


--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/utils.h:
--------------------------------------------------------------------------------
 1 | #ifndef UTILS_H__
 2 | #define UTILS_H__
 3 | 
 4 | #include <iostream>
 5 | #include <iomanip>
 6 | #include <cuda.h>
 7 | #include <cuda_runtime.h>
 8 | #include <cuda_runtime_api.h>
 9 | #include <cassert>
10 | #include <cmath>
11 | #include <algorithm>
12 | 
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 | 
15 | template<typename T>
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 |   if (err != cudaSuccess) {
18 |     std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 |     std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 |     exit(1);
21 |   }
22 | }
23 | 
24 | template<typename T>
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 |   //check that the GPU result matches the CPU result
27 |   for (size_t i = 0; i < numElem; ++i) {
28 |     if (ref[i] != gpu[i]) {
29 |       std::cerr << "Difference at pos " << i << std::endl;
30 |       //the + is magic to convert char to int without messing
31 |       //with other types
32 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 |                  "\nGPU      : " << +gpu[i] << std::endl;
34 |       exit(1);
35 |     }
36 |   }
37 | }
38 | 
39 | template<typename T>
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 |   assert(eps1 >= 0 && eps2 >= 0);
42 |   unsigned long long totalDiff = 0;
43 |   unsigned numSmallDifferences = 0;
44 |   for (size_t i = 0; i < numElem; ++i) {
45 |     //subtract smaller from larger in case of unsigned types
46 |     T smaller = std::min(ref[i], gpu[i]);
47 |     T larger = std::max(ref[i], gpu[i]);
48 |     T diff = larger - smaller;
49 |     if (diff > 0 && diff <= eps1) {
50 |       numSmallDifferences++;
51 |     }
52 |     else if (diff > eps1) {
53 |       std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 |         "\nGPU      : " << +gpu[i] << std::endl;
56 |       exit(1);
57 |     }
58 |     totalDiff += diff * diff;
59 |   }
60 |   double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 |   if (percentSmallDifferences > eps2) {
62 |     std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 |     std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 |     exit(1);
65 |   }
66 | }
67 | 
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template<typename T>
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 | 
74 |   size_t numBadPixels = 0;
75 |   for (size_t i = 0; i < numElem; ++i) {
76 |     T smaller = std::min(ref[i], gpu[i]);
77 |     T larger = std::max(ref[i], gpu[i]);
78 |     T diff = larger - smaller;
79 |     if (diff > variance)
80 |       ++numBadPixels;
81 |   }
82 | 
83 |   if (numBadPixels > tolerance) {
84 |     std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 |     exit(1);
86 |   }
87 | }
88 | 
89 | #endif
90 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/HW6.cu:
--------------------------------------------------------------------------------
 1 | #include "utils.h"
 2 | #include <cuda.h>
 3 | #include <cuda_runtime.h>
 4 | #include <string>
 5 | #include <iostream>
 6 | 
 7 | #include "loadSaveImage.h"
 8 | #include <stdio.h>
 9 | 
10 | 
11 | //return types are void since any internal error will be handled by quitting
12 | //no point in returning error codes...
13 | void preProcess( uchar4 **sourceImg,
14 |                  size_t &numRows,  size_t &numCols,
15 |                  uchar4 **destImg, 
16 |                  uchar4 **blendedImg, const std::string& source_filename,
17 |                  const std::string& dest_filename){
18 | 
19 |   //make sure the context initializes ok
20 |   checkCudaErrors(cudaFree(0));
21 | 
22 |   size_t numRowsSource, numColsSource, numRowsDest, numColsDest;
23 | 
24 |   loadImageRGBA(source_filename, sourceImg, &numRowsSource, &numColsSource);
25 |   loadImageRGBA(dest_filename, destImg, &numRowsDest, &numColsDest);
26 | 
27 |   assert(numRowsSource == numRowsDest);
28 |   assert(numColsSource == numColsDest);
29 | 
30 |   numRows = numRowsSource;
31 |   numCols = numColsSource;
32 | 
33 |   *blendedImg = new uchar4[numRows * numCols];
34 | 
35 | }
36 | 
37 | void postProcess(const uchar4* const blendedImg,
38 |                  const size_t numRowsDest, const size_t numColsDest,
39 |                  const std::string& output_file)
40 | {
41 |   //just need to save the image...
42 |   saveImageRGBA(blendedImg, numRowsDest, numColsDest, output_file);
43 | }
44 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/HW6_differenceImage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_differenceImage.png


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/HW6_output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_output.png


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/HW6_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_reference.png


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/blended.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/blended.gold


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/compare.cpp:
--------------------------------------------------------------------------------
 1 | #include <opencv2/opencv.hpp>
 2 | #include "utils.h"
 3 | 
 4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
 5 | 				   double perPixelError, double globalError)
 6 | {
 7 |   cv::Mat reference = cv::imread(reference_filename, -1);
 8 |   cv::Mat test = cv::imread(test_filename, -1);
 9 | 
10 |   cv::Mat diff = abs(reference - test);
11 | 
12 |   cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
13 | 
14 |   double minVal, maxVal;
15 | 
16 |   cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
17 | 
18 |   //now perform transform so that we bump values to the full range
19 | 
20 |   diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
21 | 
22 |   diff = diffSingleChannel.reshape(reference.channels(), 0);
23 | 
24 |   cv::imwrite("HW6_differenceImage.png", diff);
25 |   //OK, now we can start comparing values...
26 |   unsigned char *referencePtr = reference.ptr<unsigned char>(0);
27 |   unsigned char *testPtr = test.ptr<unsigned char>(0);
28 | 
29 |   if (useEpsCheck) {
30 |     checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
31 |   }
32 |   else
33 |   {
34 |     checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
35 |   }
36 | 
37 |   std::cout << "PASS" << std::endl;
38 |   return;
39 | }
40 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef HW3_H__
2 | #define HW3_H__
3 | 
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | 				   double perPixelError, double globalError);
6 | 
7 | #endif
8 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/destination.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/destination.png


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/loadSaveImage.cpp:
--------------------------------------------------------------------------------
  1 | #include <opencv2/core/core.hpp>
  2 | #include <opencv2/highgui/highgui.hpp>
  3 | #include <opencv2/opencv.hpp>
  4 | #include <vector>
  5 | #include "cuda_runtime.h"
  6 | 
  7 | //The caller becomes responsible for the returned pointer. This
  8 | //is done in the interest of keeping this code as simple as possible.
  9 | //In production code this is a bad idea - we should use RAII
 10 | //to ensure the memory is freed.  DO NOT COPY THIS AND USE IN PRODUCTION
 11 | //CODE!!!
 12 | void loadImageHDR(const std::string &filename,
 13 |                   float **imagePtr,
 14 |                   size_t *numRows, size_t *numCols)
 15 | {
 16 |   cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH);
 17 |   if (image.empty()) {
 18 |     std::cerr << "Couldn't open file: " << filename << std::endl;
 19 |     exit(1);
 20 |   }
 21 | 
 22 |   if (image.channels() != 3) {
 23 |     std::cerr << "Image must be color!" << std::endl;
 24 |     exit(1);
 25 |   }
 26 | 
 27 |   if (!image.isContinuous()) {
 28 |     std::cerr << "Image isn't continuous!" << std::endl;
 29 |     exit(1);
 30 |   }
 31 | 
 32 |   *imagePtr = new float[image.rows * image.cols * image.channels()];
 33 | 
 34 |   float *cvPtr = image.ptr<float>(0);
 35 |   for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i)
 36 |     (*imagePtr)[i] = cvPtr[i];
 37 | 
 38 |   *numRows = image.rows;
 39 |   *numCols = image.cols;
 40 | }
 41 | 
 42 | void loadImageGrey(const std::string &filename,
 43 |                    unsigned char **imagePtr,
 44 |                    size_t *numRows, size_t *numCols)
 45 | {
 46 |   cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_GRAYSCALE);
 47 |   if (image.empty()) {
 48 |     std::cerr << "Couldn't open file: " << filename << std::endl;
 49 |     exit(1);
 50 |   }
 51 | 
 52 |   if (image.channels() != 1) {
 53 |     std::cerr << "Image must be greyscale!" << std::endl;
 54 |     exit(1);
 55 |   }
 56 | 
 57 |   if (!image.isContinuous()) {
 58 |     std::cerr << "Image isn't continuous!" << std::endl;
 59 |     exit(1);
 60 |   }
 61 | 
 62 |   *imagePtr = new unsigned char[image.rows * image.cols];
 63 | 
 64 |   unsigned char *cvPtr = image.ptr<unsigned char>(0);
 65 |   for (size_t i = 0; i < image.rows * image.cols; ++i) {
 66 |     (*imagePtr)[i] = cvPtr[i];
 67 |   }
 68 | 
 69 |   *numRows = image.rows;
 70 |   *numCols = image.cols;
 71 | }
 72 | void loadImageRGBA(const std::string &filename,
 73 |                    uchar4 **imagePtr,
 74 |                    size_t *numRows, size_t *numCols)
 75 | {
 76 |   cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
 77 |   if (image.empty()) {
 78 |     std::cerr << "Couldn't open file: " << filename << std::endl;
 79 |     exit(1);
 80 |   }
 81 | 
 82 |   if (image.channels() != 3) {
 83 |     std::cerr << "Image must be color!" << std::endl;
 84 |     exit(1);
 85 |   }
 86 | 
 87 |   if (!image.isContinuous()) {
 88 |     std::cerr << "Image isn't continuous!" << std::endl;
 89 |     exit(1);
 90 |   }
 91 | 
 92 |   cv::Mat imageRGBA;
 93 |   cv::cvtColor(image, imageRGBA, CV_BGR2RGBA);
 94 | 
 95 |   *imagePtr = new uchar4[image.rows * image.cols];
 96 | 
 97 |   unsigned char *cvPtr = imageRGBA.ptr<unsigned char>(0);
 98 |   for (size_t i = 0; i < image.rows * image.cols; ++i) {
 99 |     (*imagePtr)[i].x = cvPtr[4 * i + 0];
100 |     (*imagePtr)[i].y = cvPtr[4 * i + 1];
101 |     (*imagePtr)[i].z = cvPtr[4 * i + 2];
102 |     (*imagePtr)[i].w = cvPtr[4 * i + 3];
103 |   }
104 | 
105 |   *numRows = image.rows;
106 |   *numCols = image.cols;
107 | }
108 | 
109 | void saveImageRGBA(const uchar4* const image,
110 |                    const size_t numRows, const size_t numCols,
111 |                    const std::string &output_file)
112 | {
113 |   int sizes[2];
114 |   sizes[0] = numRows;
115 |   sizes[1] = numCols;
116 |   cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image);
117 |   cv::Mat imageOutputBGR;
118 |   cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR);
119 |   //output the image
120 |   cv::imwrite(output_file.c_str(), imageOutputBGR);
121 | }
122 | 
123 | //output an exr file
124 | //assumed to already be BGR
125 | void saveImageHDR(const float* const image,
126 |                   const size_t numRows, const size_t numCols,
127 |                   const std::string &output_file)
128 | {
129 |   int sizes[2];
130 |   sizes[0] = numRows;
131 |   sizes[1] = numCols;
132 | 
133 |   cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image);
134 | 
135 |   imageHDR = imageHDR * 255;
136 | 
137 |   cv::imwrite(output_file.c_str(), imageHDR);
138 | }
139 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/loadSaveImage.h:
--------------------------------------------------------------------------------
 1 | #ifndef LOADSAVEIMAGE_H__
 2 | #define LOADSAVEIMAGE_H__
 3 | 
 4 | #include <string>
 5 | #include <cuda_runtime.h> //for uchar4
 6 | 
 7 | void loadImageHDR(const std::string &filename,
 8 |                   float **imagePtr,
 9 |                   size_t *numRows, size_t *numCols);
10 | 
11 | void loadImageRGBA(const std::string &filename,
12 |                    uchar4 **imagePtr,
13 |                    size_t *numRows, size_t *numCols);
14 | 
15 | void loadImageGrey(const std::string &filename,
16 |                    unsigned char **imagePtr,
17 |                    size_t *numRows, size_t *numCols);
18 | 
19 | void saveImageRGBA(const uchar4* const image,
20 |                    const size_t numRows, const size_t numCols,
21 |                    const std::string &output_file);
22 | 
23 | void saveImageHDR(const float* const image,
24 |                   const size_t numRows, const size_t numCols,
25 |                   const std::string &output_file);
26 | 
27 | #endif
28 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/main.cpp:
--------------------------------------------------------------------------------
  1 | //Udacity HW6 Driver
  2 | 
  3 | #include <iostream>
  4 | #include "timer.h"
  5 | #include "utils.h"
  6 | #include <string>
  7 | #include <stdio.h>
  8 | 
  9 | #include <opencv2/core/core.hpp>
 10 | #include <opencv2/highgui/highgui.hpp>
 11 | #include <opencv2/opencv.hpp>
 12 | 
 13 | #include "reference_calc.h"
 14 | #include "compare.h"
 15 | 
 16 | void preProcess( uchar4 **sourceImg, size_t &numRowsSource,  size_t &numColsSource,
 17 |                  uchar4 **destImg,
 18 |                  uchar4 **blendedImg, const std::string& source_filename,
 19 |                  const std::string& dest_filename);
 20 | 
 21 | void postProcess(const uchar4* const blendedImg,
 22 |                  const size_t numRowsDest, const size_t numColsDest,
 23 |                  const std::string& output_file);
 24 | 
 25 | void your_blend(const uchar4* const sourceImg,
 26 |                 const size_t numRowsSource, const size_t numColsSource,
 27 |                 const uchar4* const destImg,
 28 |                 uchar4* const blendedImg);
 29 | 
 30 | int main(int argc, char **argv) {
 31 |   uchar4 *h_sourceImg, *h_destImg, *h_blendedImg;
 32 |   size_t numRowsSource, numColsSource;
 33 | 
 34 |   std::string input_source_file;
 35 |   std::string input_dest_file;
 36 |   std::string output_file;
 37 | 
 38 |   std::string reference_file;
 39 |   double perPixelError = 0.0;
 40 |   double globalError   = 0.0;
 41 |   bool useEpsCheck = false;
 42 | 
 43 |   switch (argc)
 44 |   {
 45 |   	case 3:
 46 |   	  input_source_file  = std::string(argv[1]);
 47 |   	  input_dest_file = std::string(argv[2]);
 48 |       output_file = "HW6_output.png";
 49 |   	  reference_file = "HW6_reference.png";
 50 |   	  break;
 51 |   	case 4:
 52 |   	  input_source_file  = std::string(argv[1]);
 53 |   	  input_dest_file = std::string(argv[2]);
 54 |       output_file = std::string(argv[3]);
 55 |   	  reference_file = "HW6_reference.png";
 56 |   	  break;
 57 |   	case 5:
 58 |   	  input_source_file  = std::string(argv[1]);
 59 |   	  input_dest_file = std::string(argv[2]);
 60 |   	  output_file = std::string(argv[3]);
 61 |   	  reference_file = std::string(argv[4]);
 62 |   	  break;
 63 |   	case 7:
 64 |   	  useEpsCheck=true;
 65 |   	  input_source_file  = std::string(argv[1]);
 66 |   	  input_dest_file = std::string(argv[2]);
 67 |   	  output_file = std::string(argv[3]);
 68 |   	  reference_file = std::string(argv[4]);
 69 |   	  perPixelError = atof(argv[5]);
 70 |       globalError   = atof(argv[6]);
 71 |   	  break;
 72 |   	default:
 73 |         std::cerr << "Usage: ./HW6 input_source_file input_dest_filename [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl;
 74 |         exit(1);
 75 |     }
 76 | 
 77 |   //load the image and give us our input and output pointers
 78 |   preProcess(&h_sourceImg, numRowsSource, numColsSource,
 79 |              &h_destImg,
 80 |              &h_blendedImg, input_source_file, input_dest_file);
 81 | 
 82 |   GpuTimer timer;
 83 |   timer.Start();
 84 | 
 85 |   //call the students' code
 86 |   your_blend(h_sourceImg, numRowsSource, numColsSource,
 87 |              h_destImg,
 88 |              h_blendedImg);
 89 | 
 90 |   timer.Stop();
 91 |   cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
 92 |   int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
 93 |   printf("\n");
 94 |   if (err < 0) {
 95 |     //Couldn't print! Probably the student closed stdout - bad news
 96 |     std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
 97 |     exit(1);
 98 |   }
 99 | 
100 |   //check results and output the tone-mapped image
101 |   postProcess(h_blendedImg, numRowsSource, numColsSource, output_file);
102 | 
103 |   // calculate the reference image
104 |   uchar4* h_reference = new uchar4[numRowsSource*numColsSource];
105 |   reference_calc(h_sourceImg, numRowsSource, numColsSource,
106 |                    h_destImg, h_reference);
107 | 
108 |   // save the reference image
109 |   postProcess(h_reference, numRowsSource, numColsSource, reference_file);
110 | 
111 |   compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError);
112 | 
113 |   delete[] h_reference;
114 |   delete[] h_destImg;
115 |   delete[] h_sourceImg;
116 |   delete[] h_blendedImg;
117 |   return 0;
118 | }
119 | 
120 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/reference_calc.cpp:
--------------------------------------------------------------------------------
  1 | //Udacity HW 6
  2 | //Poisson Blending Reference Calculation
  3 | 
  4 | #include "utils.h"
  5 | #include <thrust/host_vector.h>
  6 | 
  7 | //Performs one iteration of the solver
  8 | void computeIteration(const unsigned char* const dstImg,
  9 |                       const unsigned char* const strictInteriorPixels,
 10 |                       const unsigned char* const borderPixels,
 11 |                       const std::vector<uint2>& interiorPixelList,
 12 |                       const size_t numColsSource,
 13 |                       const float* const f,
 14 |                       const float* const g,
 15 |                       float* const f_next)
 16 | {
 17 |   unsigned int off = interiorPixelList[0].x * numColsSource + interiorPixelList[0].y;
 18 | 
 19 |   for (size_t i = 0; i < interiorPixelList.size(); ++i) {
 20 |     float blendedSum = 0.f;
 21 |     float borderSum  = 0.f;
 22 | 
 23 |     uint2 coord = interiorPixelList[i];
 24 | 
 25 |     unsigned int offset = coord.x * numColsSource + coord.y;
 26 | 
 27 |     //process all 4 neighbor pixels
 28 |     //for each pixel if it is an interior pixel
 29 |     //then we add the previous f, otherwise if it is a
 30 |     //border pixel then we add the value of the destination
 31 |     //image at the border.  These border values are our boundary
 32 |     //conditions.
 33 |     if (strictInteriorPixels[offset - 1]) {
 34 |       blendedSum += f[offset - 1];
 35 |     }
 36 |     else {
 37 |       borderSum += dstImg[offset - 1];
 38 |     }
 39 | 
 40 |     if (strictInteriorPixels[offset + 1]) {
 41 |       blendedSum += f[offset + 1];
 42 |     }
 43 |     else {
 44 |       borderSum += dstImg[offset + 1];
 45 |     }
 46 | 
 47 |     if (strictInteriorPixels[offset - numColsSource]) {
 48 |       blendedSum += f[offset - numColsSource];
 49 |     }
 50 |     else {
 51 |       borderSum += dstImg[offset - numColsSource];
 52 |     }
 53 | 
 54 |     if (strictInteriorPixels[offset + numColsSource]) {
 55 |       blendedSum += f[offset + numColsSource];
 56 |     }
 57 |     else {
 58 |       borderSum += dstImg[offset + numColsSource];
 59 |     }
 60 | 
 61 |     float f_next_val = (blendedSum + borderSum + g[offset]) / 4.f;
 62 | 
 63 |     f_next[offset] = std::min(255.f, std::max(0.f, f_next_val)); //clip to [0, 255]
 64 |   }
 65 | 
 66 | }
 67 | 
 68 | //pre-compute the values of g, which depend only the source image
 69 | //and aren't iteration dependent.
 70 | void computeG(const unsigned char* const channel,
 71 |               float* const g,
 72 |               const size_t numColsSource,
 73 |               const std::vector<uint2>& interiorPixelList)
 74 | {
 75 |   for (size_t i = 0; i < interiorPixelList.size(); ++i) {
 76 |     uint2 coord = interiorPixelList[i];
 77 |     unsigned int offset = coord.x * numColsSource + coord.y;
 78 | 
 79 |     float sum = 4.f * channel[offset];
 80 | 
 81 |     sum -= (float)channel[offset - 1] + (float)channel[offset + 1];
 82 |     sum -= (float)channel[offset + numColsSource] + (float)channel[offset - numColsSource];
 83 | 
 84 |     g[offset] = sum;
 85 |   }
 86 | }
 87 | 
 88 | void reference_calc(const uchar4* const h_sourceImg,
 89 |                     const size_t numRowsSource, const size_t numColsSource,
 90 |                     const uchar4* const h_destImg,
 91 |                     uchar4* const h_blendedImg){
 92 | 
 93 |   //we need to create a list of border pixels and interior pixels
 94 |   //this is a conceptually simple implementation, not a particularly efficient one...
 95 | 
 96 |   //first create mask
 97 |   size_t srcSize = numRowsSource * numColsSource;
 98 |   unsigned char* mask = new unsigned char[srcSize];
 99 | 
100 |   for (int i = 0; i < srcSize; ++i) {
101 |     mask[i] = (h_sourceImg[i].x + h_sourceImg[i].y + h_sourceImg[i].z < 3 * 255) ? 1 : 0;
102 |   }
103 | 
104 |   //next compute strictly interior pixels and border pixels
105 |   unsigned char *borderPixels = new unsigned char[srcSize];
106 |   unsigned char *strictInteriorPixels = new unsigned char[srcSize];
107 | 
108 |   std::vector<uint2> interiorPixelList;
109 | 
110 |   //the source region in the homework isn't near an image boundary, so we can
111 |   //simplify the conditionals a little...
112 |   for (size_t r = 1; r < numRowsSource - 1; ++r) {
113 |     for (size_t c = 1; c < numColsSource - 1; ++c) {
114 |       if (mask[r * numColsSource + c]) {
115 |         if (mask[(r -1) * numColsSource + c] && mask[(r + 1) * numColsSource + c] &&
116 |             mask[r * numColsSource + c - 1] && mask[r * numColsSource + c + 1]) {
117 |           strictInteriorPixels[r * numColsSource + c] = 1;
118 |           borderPixels[r * numColsSource + c] = 0;
119 |           interiorPixelList.push_back(make_uint2(r, c));
120 |         }
121 |         else {
122 |           strictInteriorPixels[r * numColsSource + c] = 0;
123 |           borderPixels[r * numColsSource + c] = 1;
124 |         }
125 |       }
126 |       else {
127 |           strictInteriorPixels[r * numColsSource + c] = 0;
128 |           borderPixels[r * numColsSource + c] = 0;
129 | 
130 |       }
131 |     }
132 |   }
133 | 
134 |   //split the source and destination images into their respective
135 |   //channels
136 |   unsigned char* red_src   = new unsigned char[srcSize];
137 |   unsigned char* blue_src  = new unsigned char[srcSize];
138 |   unsigned char* green_src = new unsigned char[srcSize];
139 | 
140 |   for (int i = 0; i < srcSize; ++i) {
141 |     red_src[i]   = h_sourceImg[i].x;
142 |     blue_src[i]  = h_sourceImg[i].y;
143 |     green_src[i] = h_sourceImg[i].z;
144 |   }
145 | 
146 |   unsigned char* red_dst   = new unsigned char[srcSize];
147 |   unsigned char* blue_dst  = new unsigned char[srcSize];
148 |   unsigned char* green_dst = new unsigned char[srcSize];
149 | 
150 |   for (int i = 0; i < srcSize; ++i) {
151 |     red_dst[i]   = h_destImg[i].x;
152 |     blue_dst[i]  = h_destImg[i].y;
153 |     green_dst[i] = h_destImg[i].z;
154 |   }
155 | 
156 |   //next we'll precompute the g term - it never changes, no need to recompute every iteration
157 |   float *g_red   = new float[srcSize];
158 |   float *g_blue  = new float[srcSize];
159 |   float *g_green = new float[srcSize];
160 | 
161 |   memset(g_red,   0, srcSize * sizeof(float));
162 |   memset(g_blue,  0, srcSize * sizeof(float));
163 |   memset(g_green, 0, srcSize * sizeof(float));
164 | 
165 |   computeG(red_src,   g_red,   numColsSource, interiorPixelList);
166 |   computeG(blue_src,  g_blue,  numColsSource, interiorPixelList);
167 |   computeG(green_src, g_green, numColsSource, interiorPixelList);
168 | 
169 |   //for each color channel we'll need two buffers and we'll ping-pong between them
170 |   float *blendedValsRed_1 = new float[srcSize];
171 |   float *blendedValsRed_2 = new float[srcSize];
172 | 
173 |   float *blendedValsBlue_1 = new float[srcSize];
174 |   float *blendedValsBlue_2 = new float[srcSize];
175 | 
176 |   float *blendedValsGreen_1 = new float[srcSize];
177 |   float *blendedValsGreen_2 = new float[srcSize];
178 | 
179 |   //IC is the source image, copy over
180 |   for (size_t i = 0; i < srcSize; ++i) {
181 |     blendedValsRed_1[i] = red_src[i];
182 |     blendedValsRed_2[i] = red_src[i];
183 |     blendedValsBlue_1[i] = blue_src[i];
184 |     blendedValsBlue_2[i] = blue_src[i];
185 |     blendedValsGreen_1[i] = green_src[i];
186 |     blendedValsGreen_2[i] = green_src[i];
187 |   }
188 | 
189 |   //Perform the solve on each color channel
190 |   const size_t numIterations = 800;
191 |   for (size_t i = 0; i < numIterations; ++i) {
192 |     computeIteration(red_dst, strictInteriorPixels, borderPixels,
193 |                      interiorPixelList, numColsSource, blendedValsRed_1, g_red,
194 |                      blendedValsRed_2);
195 | 
196 |     std::swap(blendedValsRed_1, blendedValsRed_2);
197 |   }
198 | 
199 |   for (size_t i = 0; i < numIterations; ++i) {
200 |     computeIteration(blue_dst, strictInteriorPixels, borderPixels,
201 |                      interiorPixelList, numColsSource, blendedValsBlue_1, g_blue,
202 |                      blendedValsBlue_2);
203 | 
204 |     std::swap(blendedValsBlue_1, blendedValsBlue_2);
205 |   }
206 | 
207 |   for (size_t i = 0; i < numIterations; ++i) {
208 |     computeIteration(green_dst, strictInteriorPixels, borderPixels,
209 |                      interiorPixelList, numColsSource, blendedValsGreen_1, g_green,
210 |                      blendedValsGreen_2);
211 | 
212 |     std::swap(blendedValsGreen_1, blendedValsGreen_2);
213 |   }
214 |   std::swap(blendedValsRed_1,   blendedValsRed_2);   //put output into _2
215 |   std::swap(blendedValsBlue_1,  blendedValsBlue_2);  //put output into _2
216 |   std::swap(blendedValsGreen_1, blendedValsGreen_2); //put output into _2
217 | 
218 |   //copy the destination image to the output
219 |   memcpy(h_blendedImg, h_destImg, sizeof(uchar4) * srcSize);
220 | 
221 |   //copy computed values for the interior into the output
222 |   for (size_t i = 0; i < interiorPixelList.size(); ++i) {
223 |     uint2 coord = interiorPixelList[i];
224 | 
225 |     unsigned int offset = coord.x * numColsSource + coord.y;
226 | 
227 |     h_blendedImg[offset].x = blendedValsRed_2[offset];
228 |     h_blendedImg[offset].y = blendedValsBlue_2[offset];
229 |     h_blendedImg[offset].z = blendedValsGreen_2[offset];
230 |   }
231 | 
232 |   //wow, we allocated a lot of memory!
233 |   delete[] mask;
234 |   delete[] blendedValsRed_1;
235 |   delete[] blendedValsRed_2;
236 |   delete[] blendedValsBlue_1;
237 |   delete[] blendedValsBlue_2;
238 |   delete[] blendedValsGreen_1;
239 |   delete[] blendedValsGreen_2;
240 |   delete[] g_red;
241 |   delete[] g_blue;
242 |   delete[] g_green;
243 |   delete[] red_src;
244 |   delete[] red_dst;
245 |   delete[] blue_src;
246 |   delete[] blue_dst;
247 |   delete[] green_src;
248 |   delete[] green_dst;
249 |   delete[] borderPixels;
250 |   delete[] strictInteriorPixels;
251 | }
252 | 
253 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/reference_calc.h:
--------------------------------------------------------------------------------
 1 | #ifndef REFERENCE_H__
 2 | #define REFERENCE_H__
 3 | 
 4 | void reference_calc(const uchar4* const h_sourceImg,
 5 |                     const size_t numRowsSource, const size_t numColsSource,
 6 |                     const uchar4* const h_destImg,
 7 |                       uchar4* const h_blendedImg);
 8 | 
 9 | #endif
10 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/source.png


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/timer.h:
--------------------------------------------------------------------------------
 1 | #ifndef GPU_TIMER_H__
 2 | #define GPU_TIMER_H__
 3 | 
 4 | #include <cuda_runtime.h>
 5 | 
 6 | struct GpuTimer
 7 | {
 8 |   cudaEvent_t start;
 9 |   cudaEvent_t stop;
10 | 
11 |   GpuTimer()
12 |   {
13 |     cudaEventCreate(&start);
14 |     cudaEventCreate(&stop);
15 |   }
16 | 
17 |   ~GpuTimer()
18 |   {
19 |     cudaEventDestroy(start);
20 |     cudaEventDestroy(stop);
21 |   }
22 | 
23 |   void Start()
24 |   {
25 |     cudaEventRecord(start, 0);
26 |   }
27 | 
28 |   void Stop()
29 |   {
30 |     cudaEventRecord(stop, 0);
31 |   }
32 | 
33 |   float Elapsed()
34 |   {
35 |     float elapsed;
36 |     cudaEventSynchronize(stop);
37 |     cudaEventElapsedTime(&elapsed, start, stop);
38 |     return elapsed;
39 |   }
40 | };
41 | 
42 | #endif  /* GPU_TIMER_H__ */
43 | 


--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/utils.h:
--------------------------------------------------------------------------------
 1 | #ifndef UTILS_H__
 2 | #define UTILS_H__
 3 | 
 4 | #include <iostream>
 5 | #include <iomanip>
 6 | #include <cuda.h>
 7 | #include <cuda_runtime.h>
 8 | #include <cuda_runtime_api.h>
 9 | #include <cassert>
10 | #include <cmath>
11 | #include <algorithm>
12 | 
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 | 
15 | template<typename T>
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 |   if (err != cudaSuccess) {
18 |     std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 |     std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 |     exit(1);
21 |   }
22 | }
23 | 
24 | template<typename T>
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 |   //check that the GPU result matches the CPU result
27 |   for (size_t i = 0; i < numElem; ++i) {
28 |     if (ref[i] != gpu[i]) {
29 |       std::cerr << "Difference at pos " << i << std::endl;
30 |       //the + is magic to convert char to int without messing
31 |       //with other types
32 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 |                  "\nGPU      : " << +gpu[i] << std::endl;
34 |       exit(1);
35 |     }
36 |   }
37 | }
38 | 
39 | template<typename T>
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 |   assert(eps1 >= 0 && eps2 >= 0);
42 |   unsigned long long totalDiff = 0;
43 |   unsigned numSmallDifferences = 0;
44 |   for (size_t i = 0; i < numElem; ++i) {
45 |     //subtract smaller from larger in case of unsigned types
46 |     T smaller = std::min(ref[i], gpu[i]);
47 |     T larger = std::max(ref[i], gpu[i]);
48 |     T diff = larger - smaller;
49 |     if (diff > 0 && diff <= eps1) {
50 |       numSmallDifferences++;
51 |     }
52 |     else if (diff > eps1) {
53 |       std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 |       std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 |         "\nGPU      : " << +gpu[i] << std::endl;
56 |       exit(1);
57 |     }
58 |     totalDiff += diff * diff;
59 |   }
60 |   double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 |   if (percentSmallDifferences > eps2) {
62 |     std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 |     std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 |     exit(1);
65 |   }
66 | }
67 | 
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template<typename T>
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 | 
74 |   size_t numBadPixels = 0;
75 |   for (size_t i = 0; i < numElem; ++i) {
76 |     T smaller = std::min(ref[i], gpu[i]);
77 |     T larger = std::max(ref[i], gpu[i]);
78 |     T diff = larger - smaller;
79 |     if (diff > variance)
80 |       ++numBadPixels;
81 |   }
82 | 
83 |   if (numBadPixels > tolerance) {
84 |     std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 |     exit(1);
86 |   }
87 | }
88 | 
89 | #endif
90 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # udacity-IntroToParallelProgramming
 2 | CS344 - Introduction To Parallel Programming course (Udacity) proposed solutions
 3 | 
 4 | Testing Environment: Visual Studio 2015 x64 + nVidia CUDA 8.0 + OpenCV 3.2.0
 5 | 
 6 | For each problem set, the core of the algorithm to be implemented is located in the _students_func.cu_ file.
 7 | 
 8 | ## Problem Set 1 - RGB2Gray: 
 9 | ### Objective
10 | Convert an input RGBA image into grayscale version (ignoring the A channel).
11 | ### Topics
12 | Example of a **map** primitive operation on a data structure.
13 | 
14 | ## Problem Set 2 - Blur
15 | ### Objective
16 | Apply a Gaussian blur convolution filter to an input RGBA image (blur each channel independently, ignoring the A channel).
17 | ### Topics
18 | Example of a **stencil** primitive operation on a 2D array. Use of the **shared memory** in order to speed-up the algorithm. Both global memory and shared memory based kernels are provided, the latter providing approx. 1.6 speedup over the first.
19 | 
20 | ## Problem Set 3 -Tone Mapping
21 | ### Objective
22 | Map a High Dynamic Range image into an image for a device supporting a smaller range of intensity values.
23 | ### Topics
24 | - Compute range of intensity values of the input image: min and max **reduce** implemented.
25 | - Compute **histogram** of intensity values (1024-values array)
26 | - Compute the cumulative ditribution function of the histogram: Hillis & Steele **scan** algorithm (step-efficient, well suited for small arrays like the histogram one). 
27 | 
28 | ## Problem Set 4 - Red eyes removal
29 | ### Objective
30 | Remove red eys effect from an inout RGBA image (it uses Normalized Cross Correlation against a training template).
31 | ### Topics
32 | Sorting algorithms with GPU: given an input array of NCC scores, sort it in ascending order: **radix sort**. For each bit:
33 | - Compute a predicate vector (0:false, 1:true)
34 | - Performs **Bielloch Scan** on the predicate vector (for both false and positive cases)
35 | - From Bielloch Scan extracts: an histogram of predicate values [0 numberOfFalses], an offset vector (the actual result of scan)
36 | - A move kernel computes the new index of each element (using the two structures above), and moves it.
37 | 
38 | ## Problem Set 5 - Optimized histogram computation
39 | ### Objective
40 | Improve the histogram computation performance on GPU over the simple global atomic solution.
41 | ### Topics
42 | **Per-block** histogram computation. Each block computes his own histogram in shared memory, and histograms are combined at the end in global memory (more than 7x speedup over global atomic implementation, while being relatively simple). 
43 | 
44 | ## Problem Set 6 - Seamless Image Cloning
45 | ### Objective
46 | Given a target image (e.g. a swimming pool), do a seamless attachment of a source image mask (e.g. an hyppo).
47 | ### Topics
48 | The algorithm consists into performing Jacobi iterations on the source and target image to blend one with the other.
49 | - Given the mask, detect the interior points and the boundary points
50 | - Since the algorithm has to be performed only on the interior points, compute the **bounding box** of the mask region to restrict the Jacobi iterations on a subimage.
51 | - Split the images in the R,G and B channels.
52 | - Run 800 Jacobi iterations on each channel. The code makes use of **CUDA Streams** to run concurrently the same kernel on the 3 different channels (speedup of 3x on my machine, of 1.5x on the Udacity machine). The Jacobi kernel makes extensive use of shared memory, so the number of threads per block has been reduced to maximize SM's occupancy.
53 | - Recombine the 3 channels to form the output image.
54 | 


--------------------------------------------------------------------------------