├── .gitattributes
├── .gitignore
├── IntroParallelProgramming.sln
├── Lesson1-CubeNumbers
├── IntroParallelProgramming.vcxproj
└── main.cu
├── Lesson4-Reduction
└── Lesson4-Reduction.vcxproj
├── ProblemSet1-RGB2Gray
├── CMakeLists.txt
├── HW1.cpp
├── HW1_differenceImage.png
├── HW1_reference.png
├── Makefile
├── RGB2Gray.vcxproj
├── cinque_terre.gold
├── cinque_terre_gray.jpg
├── cinque_terre_small.jpg
├── compare.cpp
├── compare.h
├── main.cpp
├── reference_calc.cpp
├── reference_calc.h
├── student_func.cu
├── timer.h
└── utils.h
├── ProblemSet2-Blur
├── CMakeLists.txt
├── HW2.cpp
├── HW2_differenceImage.png
├── HW2_reference.png
├── Makefile
├── ProblemSet2-Blur.vcxproj
├── cinque_terre.gold
├── cinque_terre_blur.jpg
├── cinque_terre_small.jpg
├── compare.cpp
├── compare.h
├── main.cpp
├── reference_calc.cpp
├── reference_calc.h
├── student_func.cu
├── timer.h
└── utils.h
├── ProblemSet3-ToneMapping
├── CMakeLists.txt
├── HDR-image.jpg
├── HDR-image_mapped.png
├── HW3.cu
├── HW3_differenceImage.png
├── HW3_reference.png
├── HW3_reference_old.png
├── Makefile
├── ProblemSet3-ToneMapping.vcxproj
├── compare.cpp
├── compare.h
├── input.png
├── loadSaveImage.cpp
├── loadSaveImage.h
├── main.cpp
├── memorial.exr
├── memorial_large.exr
├── memorial_png.gold
├── memorial_png_large.gold
├── memorial_raw.png
├── memorial_raw_large.png
├── memorial_raw_large_mapped.png
├── memorial_raw_mapped.png
├── my_output.png
├── reference_calc.cpp
├── reference_calc.h
├── student_func.cu
├── timer.h
└── utils.h
├── ProblemSet4-RedEyeRemoval
├── CMakeLists.txt
├── HW4.cu
├── HW4_output.png
├── Makefile
├── ProblemSet4-RedEyeRemoval.vcxproj
├── compare.cpp
├── compare.h
├── loadSaveImage.cpp
├── loadSaveImage.h
├── main.cpp
├── red_eye_effect.gold
├── red_eye_effect_5.jpg
├── red_eye_effect_5_out.jpg
├── red_eye_effect_template_5.jpg
├── reference_calc.cpp
├── reference_calc.h
├── student_func.cu
├── timer.h
└── utils.h
├── ProblemSet5-OptimizedHistogram
├── ProblemSet5-OptimizedHistogram.vcxproj
├── main.cu
├── reference_calc.cpp
├── reference_calc.h
├── student.cu
├── timer.h
└── utils.h
├── ProblemSet6-SeamlessImageCloning
├── HW6.cu
├── HW6_differenceImage.png
├── HW6_output.png
├── HW6_reference.png
├── ProblemSet6-SeamlessImageCloning.vcxproj
├── blended.gold
├── compare.cpp
├── compare.h
├── destination.png
├── loadSaveImage.cpp
├── loadSaveImage.h
├── main.cpp
├── reference_calc.cpp
├── reference_calc.h
├── source.png
├── student_func.cu
├── timer.h
└── utils.h
└── README.md
/.gitattributes:
--------------------------------------------------------------------------------
1 | ###############################################################################
2 | # Set default behavior to automatically normalize line endings.
3 | ###############################################################################
4 | * text=auto
5 |
6 | ###############################################################################
7 | # Set default behavior for command prompt diff.
8 | #
9 | # This is need for earlier builds of msysgit that does not have it on by
10 | # default for csharp files.
11 | # Note: This is only used by command line
12 | ###############################################################################
13 | #*.cs diff=csharp
14 |
15 | ###############################################################################
16 | # Set the merge driver for project and solution files
17 | #
18 | # Merging from the command prompt will add diff markers to the files if there
19 | # are conflicts (Merging from VS is not affected by the settings below, in VS
20 | # the diff markers are never inserted). Diff markers may cause the following
21 | # file extensions to fail to load in VS. An alternative would be to treat
22 | # these files as binary and thus will always conflict and require user
23 | # intervention with every merge. To do so, just uncomment the entries below
24 | ###############################################################################
25 | #*.sln merge=binary
26 | #*.csproj merge=binary
27 | #*.vbproj merge=binary
28 | #*.vcxproj merge=binary
29 | #*.vcproj merge=binary
30 | #*.dbproj merge=binary
31 | #*.fsproj merge=binary
32 | #*.lsproj merge=binary
33 | #*.wixproj merge=binary
34 | #*.modelproj merge=binary
35 | #*.sqlproj merge=binary
36 | #*.wwaproj merge=binary
37 |
38 | ###############################################################################
39 | # behavior for image files
40 | #
41 | # image files are treated as binary by default.
42 | ###############################################################################
43 | #*.jpg binary
44 | #*.png binary
45 | #*.gif binary
46 |
47 | ###############################################################################
48 | # diff behavior for common document formats
49 | #
50 | # Convert binary document formats to text before diffing them. This feature
51 | # is only available from the command line. Turn it on by uncommenting the
52 | # entries below.
53 | ###############################################################################
54 | #*.doc diff=astextplain
55 | #*.DOC diff=astextplain
56 | #*.docx diff=astextplain
57 | #*.DOCX diff=astextplain
58 | #*.dot diff=astextplain
59 | #*.DOT diff=astextplain
60 | #*.pdf diff=astextplain
61 | #*.PDF diff=astextplain
62 | #*.rtf diff=astextplain
63 | #*.RTF diff=astextplain
64 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | ## Ignore Visual Studio temporary files, build results, and
2 | ## files generated by popular Visual Studio add-ons.
3 |
4 | # User-specific files
5 | *.suo
6 | *.user
7 | *.userosscache
8 | *.sln.docstates
9 |
10 | # User-specific files (MonoDevelop/Xamarin Studio)
11 | *.userprefs
12 |
13 | # Build results
14 | [Dd]ebug/
15 | [Dd]ebugPublic/
16 | [Rr]elease/
17 | [Rr]eleases/
18 | [Xx]64/
19 | [Xx]86/
20 | [Bb]uild/
21 | bld/
22 | [Bb]in/
23 | [Oo]bj/
24 |
25 | # Visual Studio 2015 cache/options directory
26 | .vs/
27 | # Uncomment if you have tasks that create the project's static files in wwwroot
28 | #wwwroot/
29 |
30 | # MSTest test Results
31 | [Tt]est[Rr]esult*/
32 | [Bb]uild[Ll]og.*
33 |
34 | # NUNIT
35 | *.VisualState.xml
36 | TestResult.xml
37 |
38 | # Build Results of an ATL Project
39 | [Dd]ebugPS/
40 | [Rr]eleasePS/
41 | dlldata.c
42 |
43 | # DNX
44 | project.lock.json
45 | artifacts/
46 |
47 | *_i.c
48 | *_p.c
49 | *_i.h
50 | *.ilk
51 | *.meta
52 | *.obj
53 | *.pch
54 | *.pdb
55 | *.pgc
56 | *.pgd
57 | *.rsp
58 | *.sbr
59 | *.tlb
60 | *.tli
61 | *.tlh
62 | *.tmp
63 | *.tmp_proj
64 | *.log
65 | *.vspscc
66 | *.vssscc
67 | .builds
68 | *.pidb
69 | *.svclog
70 | *.scc
71 |
72 | # Chutzpah Test files
73 | _Chutzpah*
74 |
75 | # Visual C++ cache files
76 | ipch/
77 | *.aps
78 | *.ncb
79 | *.opendb
80 | *.opensdf
81 | *.sdf
82 | *.cachefile
83 | *.VC.db
84 |
85 | # Visual Studio profiler
86 | *.psess
87 | *.vsp
88 | *.vspx
89 | *.sap
90 |
91 | # TFS 2012 Local Workspace
92 | $tf/
93 |
94 | # Guidance Automation Toolkit
95 | *.gpState
96 |
97 | # ReSharper is a .NET coding add-in
98 | _ReSharper*/
99 | *.[Rr]e[Ss]harper
100 | *.DotSettings.user
101 |
102 | # JustCode is a .NET coding add-in
103 | .JustCode
104 |
105 | # TeamCity is a build add-in
106 | _TeamCity*
107 |
108 | # DotCover is a Code Coverage Tool
109 | *.dotCover
110 |
111 | # NCrunch
112 | _NCrunch_*
113 | .*crunch*.local.xml
114 | nCrunchTemp_*
115 |
116 | # MightyMoose
117 | *.mm.*
118 | AutoTest.Net/
119 |
120 | # Web workbench (sass)
121 | .sass-cache/
122 |
123 | # Installshield output folder
124 | [Ee]xpress/
125 |
126 | # DocProject is a documentation generator add-in
127 | DocProject/buildhelp/
128 | DocProject/Help/*.HxT
129 | DocProject/Help/*.HxC
130 | DocProject/Help/*.hhc
131 | DocProject/Help/*.hhk
132 | DocProject/Help/*.hhp
133 | DocProject/Help/Html2
134 | DocProject/Help/html
135 |
136 | # Click-Once directory
137 | publish/
138 |
139 | # Publish Web Output
140 | *.[Pp]ublish.xml
141 | *.azurePubxml
142 |
143 | # TODO: Un-comment the next line if you do not want to checkin
144 | # your web deploy settings because they may include unencrypted
145 | # passwords
146 | #*.pubxml
147 | *.publishproj
148 |
149 | # NuGet Packages
150 | *.nupkg
151 | # The packages folder can be ignored because of Package Restore
152 | **/packages/*
153 | # except build/, which is used as an MSBuild target.
154 | !**/packages/build/
155 | # Uncomment if necessary however generally it will be regenerated when needed
156 | #!**/packages/repositories.config
157 | # NuGet v3's project.json files produces more ignoreable files
158 | *.nuget.props
159 | *.nuget.targets
160 |
161 | # Microsoft Azure Build Output
162 | csx/
163 | *.build.csdef
164 |
165 | # Microsoft Azure Emulator
166 | ecf/
167 | rcf/
168 |
169 | # Windows Store app package directory
170 | AppPackages/
171 | BundleArtifacts/
172 |
173 | # Visual Studio cache files
174 | # files ending in .cache can be ignored
175 | *.[Cc]ache
176 | # but keep track of directories ending in .cache
177 | !*.[Cc]ache/
178 |
179 | # Others
180 | ClientBin/
181 | [Ss]tyle[Cc]op.*
182 | ~$*
183 | *~
184 | *.dbmdl
185 | *.dbproj.schemaview
186 | *.pfx
187 | *.publishsettings
188 | node_modules/
189 | orleans.codegen.cs
190 |
191 | # RIA/Silverlight projects
192 | Generated_Code/
193 |
194 | # Backup & report files from converting an old project file
195 | # to a newer Visual Studio version. Backup files are not needed,
196 | # because we have git ;-)
197 | _UpgradeReport_Files/
198 | Backup*/
199 | UpgradeLog*.XML
200 | UpgradeLog*.htm
201 |
202 | # SQL Server files
203 | *.mdf
204 | *.ldf
205 |
206 | # Business Intelligence projects
207 | *.rdl.data
208 | *.bim.layout
209 | *.bim_*.settings
210 |
211 | # Microsoft Fakes
212 | FakesAssemblies/
213 |
214 | # GhostDoc plugin setting file
215 | *.GhostDoc.xml
216 |
217 | # Node.js Tools for Visual Studio
218 | .ntvs_analysis.dat
219 |
220 | # Visual Studio 6 build log
221 | *.plg
222 |
223 | # Visual Studio 6 workspace options file
224 | *.opt
225 |
226 | # Visual Studio LightSwitch build output
227 | **/*.HTMLClient/GeneratedArtifacts
228 | **/*.DesktopClient/GeneratedArtifacts
229 | **/*.DesktopClient/ModelManifest.xml
230 | **/*.Server/GeneratedArtifacts
231 | **/*.Server/ModelManifest.xml
232 | _Pvt_Extensions
233 |
234 | # LightSwitch generated files
235 | GeneratedArtifacts/
236 | ModelManifest.xml
237 |
238 | # Paket dependency manager
239 | .paket/paket.exe
240 |
241 | # FAKE - F# Make
242 | .fake/
243 |
--------------------------------------------------------------------------------
/IntroParallelProgramming.sln:
--------------------------------------------------------------------------------
1 |
2 | Microsoft Visual Studio Solution File, Format Version 12.00
3 | # Visual Studio 14
4 | VisualStudioVersion = 14.0.25420.1
5 | MinimumVisualStudioVersion = 10.0.40219.1
6 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "Lesson1-CubeNumbers", "Lesson1-CubeNumbers\IntroParallelProgramming.vcxproj", "{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}"
7 | EndProject
8 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet1-RGB2Gray", "ProblemSet1-RGB2Gray\RGB2Gray.vcxproj", "{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}"
9 | EndProject
10 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet2-Blur", "ProblemSet2-Blur\ProblemSet2-Blur.vcxproj", "{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}"
11 | EndProject
12 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet3-ToneMapping", "ProblemSet3-ToneMapping\ProblemSet3-ToneMapping.vcxproj", "{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}"
13 | EndProject
14 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet4-RedEyeRemoval", "ProblemSet4-RedEyeRemoval\ProblemSet4-RedEyeRemoval.vcxproj", "{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}"
15 | EndProject
16 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet5-OptimizedHistogram", "ProblemSet5-OptimizedHistogram\ProblemSet5-OptimizedHistogram.vcxproj", "{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}"
17 | EndProject
18 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet6-SeamlessImageCloning", "ProblemSet6-SeamlessImageCloning\ProblemSet6-SeamlessImageCloning.vcxproj", "{5781233B-6022-4F34-B559-1473B9674B39}"
19 | EndProject
20 | Global
21 | GlobalSection(SolutionConfigurationPlatforms) = preSolution
22 | Debug|x64 = Debug|x64
23 | Debug|x86 = Debug|x86
24 | Release|x64 = Release|x64
25 | Release|x86 = Release|x86
26 | EndGlobalSection
27 | GlobalSection(ProjectConfigurationPlatforms) = postSolution
28 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x64.ActiveCfg = Debug|x64
29 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x64.Build.0 = Debug|x64
30 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x86.ActiveCfg = Debug|Win32
31 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x86.Build.0 = Debug|Win32
32 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x64.ActiveCfg = Release|x64
33 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x64.Build.0 = Release|x64
34 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x86.ActiveCfg = Release|Win32
35 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x86.Build.0 = Release|Win32
36 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x64.ActiveCfg = Debug|x64
37 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x64.Build.0 = Debug|x64
38 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x86.ActiveCfg = Debug|Win32
39 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x86.Build.0 = Debug|Win32
40 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x64.ActiveCfg = Release|x64
41 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x64.Build.0 = Release|x64
42 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x86.ActiveCfg = Release|Win32
43 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x86.Build.0 = Release|Win32
44 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x64.ActiveCfg = Debug|x64
45 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x64.Build.0 = Debug|x64
46 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x86.ActiveCfg = Debug|Win32
47 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x86.Build.0 = Debug|Win32
48 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x64.ActiveCfg = Release|x64
49 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x64.Build.0 = Release|x64
50 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x86.ActiveCfg = Release|Win32
51 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x86.Build.0 = Release|Win32
52 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x64.ActiveCfg = Debug|x64
53 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x64.Build.0 = Debug|x64
54 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x86.ActiveCfg = Debug|Win32
55 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x86.Build.0 = Debug|Win32
56 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x64.ActiveCfg = Release|x64
57 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x64.Build.0 = Release|x64
58 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x86.ActiveCfg = Release|Win32
59 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x86.Build.0 = Release|Win32
60 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x64.ActiveCfg = Debug|x64
61 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x64.Build.0 = Debug|x64
62 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x86.ActiveCfg = Debug|Win32
63 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x86.Build.0 = Debug|Win32
64 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x64.ActiveCfg = Release|x64
65 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x64.Build.0 = Release|x64
66 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x86.ActiveCfg = Release|Win32
67 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x86.Build.0 = Release|Win32
68 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x64.ActiveCfg = Debug|x64
69 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x64.Build.0 = Debug|x64
70 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x86.ActiveCfg = Debug|Win32
71 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x86.Build.0 = Debug|Win32
72 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x64.ActiveCfg = Release|x64
73 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x64.Build.0 = Release|x64
74 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x86.ActiveCfg = Release|Win32
75 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x86.Build.0 = Release|Win32
76 | {5781233B-6022-4F34-B559-1473B9674B39}.Debug|x64.ActiveCfg = Debug|x64
77 | {5781233B-6022-4F34-B559-1473B9674B39}.Debug|x64.Build.0 = Debug|x64
78 | {5781233B-6022-4F34-B559-1473B9674B39}.Debug|x86.ActiveCfg = Debug|Win32
79 | {5781233B-6022-4F34-B559-1473B9674B39}.Debug|x86.Build.0 = Debug|Win32
80 | {5781233B-6022-4F34-B559-1473B9674B39}.Release|x64.ActiveCfg = Release|x64
81 | {5781233B-6022-4F34-B559-1473B9674B39}.Release|x64.Build.0 = Release|x64
82 | {5781233B-6022-4F34-B559-1473B9674B39}.Release|x86.ActiveCfg = Release|Win32
83 | {5781233B-6022-4F34-B559-1473B9674B39}.Release|x86.Build.0 = Release|Win32
84 | EndGlobalSection
85 | GlobalSection(SolutionProperties) = preSolution
86 | HideSolutionNode = FALSE
87 | EndGlobalSection
88 | EndGlobal
89 |
--------------------------------------------------------------------------------
/Lesson1-CubeNumbers/IntroParallelProgramming.vcxproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Debug
6 | Win32
7 |
8 |
9 | Debug
10 | x64
11 |
12 |
13 | Release
14 | Win32
15 |
16 |
17 | Release
18 | x64
19 |
20 |
21 |
22 |
23 |
24 |
25 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}
26 | IntroParallelProgramming
27 | Lesson1-CubeNumbers
28 |
29 |
30 |
31 | Application
32 | true
33 | MultiByte
34 | v140
35 |
36 |
37 | Application
38 | true
39 | MultiByte
40 | v140
41 |
42 |
43 | Application
44 | false
45 | true
46 | MultiByte
47 | v140
48 |
49 |
50 | Application
51 | false
52 | true
53 | MultiByte
54 | v140
55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
70 |
71 |
72 |
73 |
74 | true
75 |
76 |
77 | true
78 |
79 |
80 |
81 | Level3
82 | Disabled
83 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
84 |
85 |
86 | true
87 | Console
88 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
89 |
90 |
91 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
92 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
93 |
94 |
95 |
96 |
97 | Level3
98 | Disabled
99 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
100 |
101 |
102 | true
103 | Console
104 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
105 |
106 |
107 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
108 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
109 |
110 |
111 | 64
112 |
113 |
114 |
115 |
116 | Level3
117 | MaxSpeed
118 | true
119 | true
120 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
121 |
122 |
123 | true
124 | true
125 | true
126 | Console
127 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
128 |
129 |
130 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
131 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
132 |
133 |
134 |
135 |
136 | Level3
137 | MaxSpeed
138 | true
139 | true
140 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
141 |
142 |
143 | true
144 | true
145 | true
146 | Console
147 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
148 |
149 |
150 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
151 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
152 |
153 |
154 | 64
155 |
156 |
157 |
158 |
159 |
160 |
161 |
--------------------------------------------------------------------------------
/Lesson1-CubeNumbers/main.cu:
--------------------------------------------------------------------------------
1 | #include "cuda_runtime.h"
2 | #include "device_launch_parameters.h"
3 |
4 | #include
5 |
6 | __global__ void cube(float * d_out, float * d_in) {
7 |
8 | int idx = threadIdx.x;
9 | float f = d_in[idx];
10 | d_out[idx] = f*f*f;
11 | }
12 |
13 | int main(int argc, char ** argv) {
14 | const int ARRAY_SIZE = 96;
15 | const int ARRAY_BYTES = ARRAY_SIZE * sizeof(float);
16 |
17 | // generate the input array on the host
18 | float h_in[ARRAY_SIZE];
19 | for (int i = 0; i < ARRAY_SIZE; i++) {
20 | h_in[i] = float(i);
21 | }
22 | float h_out[ARRAY_SIZE];
23 |
24 | // declare GPU memory pointers
25 | float * d_in;
26 | float * d_out;
27 |
28 | // allocate GPU memory
29 | cudaMalloc((void**)&d_in, ARRAY_BYTES);
30 | cudaMalloc((void**)&d_out, ARRAY_BYTES);
31 |
32 | // transfer the array to the GPU
33 | cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice);
34 |
35 | // launch the kernel
36 | cube << <1, ARRAY_SIZE >> >(d_out, d_in);
37 |
38 | // copy back the result array to the CPU
39 | cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);
40 |
41 | // print out the resulting array
42 | for (int i = 0; i < ARRAY_SIZE; i++) {
43 | printf("%f", h_out[i]);
44 | printf(((i % 4) != 3) ? "\t" : "\n");
45 | }
46 |
47 | cudaFree(d_in);
48 | cudaFree(d_out);
49 |
50 | return 0;
51 | }
--------------------------------------------------------------------------------
/Lesson4-Reduction/Lesson4-Reduction.vcxproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Debug
6 | Win32
7 |
8 |
9 | Debug
10 | x64
11 |
12 |
13 | Release
14 | Win32
15 |
16 |
17 | Release
18 | x64
19 |
20 |
21 |
22 | {0741C52D-C5E1-4C2F-A8E9-67C29CBF5B97}
23 | Lesson4_Reduction
24 | Lesson3-Reduction
25 |
26 |
27 |
28 | Application
29 | true
30 | MultiByte
31 | v140
32 |
33 |
34 | Application
35 | true
36 | MultiByte
37 | v140
38 |
39 |
40 | Application
41 | false
42 | true
43 | MultiByte
44 | v140
45 |
46 |
47 | Application
48 | false
49 | true
50 | MultiByte
51 | v140
52 |
53 |
54 |
55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
70 |
71 | true
72 |
73 |
74 | true
75 |
76 |
77 |
78 | Level3
79 | Disabled
80 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
81 |
82 |
83 | true
84 | Console
85 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
86 |
87 |
88 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
89 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
90 |
91 |
92 |
93 |
94 | Level3
95 | Disabled
96 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
97 |
98 |
99 | true
100 | Console
101 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
102 |
103 |
104 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
105 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
106 |
107 |
108 | 64
109 |
110 |
111 |
112 |
113 | Level3
114 | MaxSpeed
115 | true
116 | true
117 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
118 |
119 |
120 | true
121 | true
122 | true
123 | Console
124 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
125 |
126 |
127 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
128 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
129 |
130 |
131 |
132 |
133 | Level3
134 | MaxSpeed
135 | true
136 | true
137 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
138 |
139 |
140 | true
141 | true
142 | true
143 | Console
144 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
145 |
146 |
147 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
148 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
149 |
150 |
151 | 64
152 |
153 |
154 |
155 |
156 |
157 |
158 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/CMakeLists.txt:
--------------------------------------------------------------------------------
1 | ############################################################################
2 | # CMakeLists.txt for OpenCV and CUDA.
3 | # 2012-02-07
4 | # Quan Tran Minh. edit by Johannes Kast, Michael Sarahan
5 | # quantm@unist.ac.kr kast.jo@googlemail.com msarahan@gmail.com
6 | ############################################################################
7 |
8 | # collect source files
9 |
10 | file( GLOB hdr *.hpp *.h )
11 | file( GLOB cu *.cu)
12 | SET (HW1_files main.cpp reference_calc.cpp compare.cpp)
13 |
14 | CUDA_ADD_EXECUTABLE(HW1 ${HW1_files} ${hdr} ${cu})
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/HW1.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include "utils.h"
5 | #include
6 | #include
7 | #include
8 |
9 | static cv::Mat imageRGBA;
10 | static cv::Mat imageGrey;
11 |
12 | static uchar4 *d_rgbaImage__;
13 | static unsigned char *d_greyImage__;
14 |
15 | static size_t numRows() { return imageRGBA.rows; }
16 | static size_t numCols() { return imageRGBA.cols; }
17 |
18 | //return types are void since any internal error will be handled by quitting
19 | //no point in returning error codes...
20 | //returns a pointer to an RGBA version of the input image
21 | //and a pointer to the single channel grey-scale output
22 | //on both the host and device
23 | static void preProcess(uchar4 **inputImage, unsigned char **greyImage,
24 | uchar4 **d_rgbaImage, unsigned char **d_greyImage,
25 | const std::string &filename) {
26 | //make sure the context initializes ok
27 | checkCudaErrors(cudaFree(0));
28 |
29 | cv::Mat image;
30 | image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
31 | if (image.empty()) {
32 | std::cerr << "Couldn't open file: " << filename << std::endl;
33 | exit(1);
34 | }
35 |
36 | cv::cvtColor(image, imageRGBA, CV_BGR2RGBA);
37 |
38 | //allocate memory for the output
39 | imageGrey.create(image.rows, image.cols, CV_8UC1);
40 |
41 | //This shouldn't ever happen given the way the images are created
42 | //at least based upon my limited understanding of OpenCV, but better to check
43 | if (!imageRGBA.isContinuous() || !imageGrey.isContinuous()) {
44 | std::cerr << "Images aren't continuous!! Exiting." << std::endl;
45 | exit(1);
46 | }
47 |
48 | *inputImage = (uchar4 *)imageRGBA.ptr(0);
49 | *greyImage = imageGrey.ptr(0);
50 |
51 | const size_t numPixels = numRows() * numCols();
52 | //allocate memory on the device for both input and output
53 | checkCudaErrors(cudaMalloc(d_rgbaImage, sizeof(uchar4) * numPixels));
54 | checkCudaErrors(cudaMalloc(d_greyImage, sizeof(unsigned char) * numPixels));
55 | checkCudaErrors(cudaMemset(*d_greyImage, 0, numPixels * sizeof(unsigned char))); //make sure no memory is left laying around
56 |
57 | //copy input array to the GPU
58 | checkCudaErrors(cudaMemcpy(*d_rgbaImage, *inputImage, sizeof(uchar4) * numPixels, cudaMemcpyHostToDevice));
59 |
60 | d_rgbaImage__ = *d_rgbaImage;
61 | d_greyImage__ = *d_greyImage;
62 | }
63 |
64 | static void postProcess(const std::string& output_file, unsigned char* data_ptr) {
65 | cv::Mat output(numRows(), numCols(), CV_8UC1, (void*)data_ptr);
66 |
67 | //output the image
68 | cv::imwrite(output_file.c_str(), output);
69 | }
70 |
71 | static void cleanup()
72 | {
73 | //cleanup
74 | cudaFree(d_rgbaImage__);
75 | cudaFree(d_greyImage__);
76 | }
77 |
78 | static void generateReferenceImage(std::string input_filename, std::string output_filename)
79 | {
80 | cv::Mat reference = cv::imread(input_filename, CV_LOAD_IMAGE_GRAYSCALE);
81 |
82 | cv::imwrite(output_filename, reference);
83 |
84 | }
85 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/HW1_differenceImage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/HW1_differenceImage.png
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/HW1_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/HW1_reference.png
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/Makefile:
--------------------------------------------------------------------------------
1 | NVCC=nvcc
2 |
3 | ###################################
4 | # These are the default install #
5 | # locations on most linux distros #
6 | ###################################
7 |
8 | OPENCV_LIBPATH=/usr/lib
9 | OPENCV_INCLUDEPATH=/usr/include
10 |
11 | ###################################################
12 | # On Macs the default install locations are below #
13 | ###################################################
14 |
15 | #OPENCV_LIBPATH=/usr/local/lib
16 | #OPENCV_INCLUDEPATH=/usr/local/include
17 |
18 | # or if using MacPorts
19 |
20 | #OPENCV_LIBPATH=/opt/local/lib
21 | #OPENCV_INCLUDEPATH=/opt/local/include
22 |
23 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui
24 |
25 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
26 |
27 | ######################################################
28 | # On Macs the default install locations are below #
29 | # ####################################################
30 |
31 | #CUDA_INCLUDEPATH=/usr/local/cuda/include
32 | #CUDA_LIBPATH=/usr/local/cuda/lib
33 |
34 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
35 |
36 | GCC_OPTS=-O3 -Wall -Wextra -m64
37 |
38 | student: main.o student_func.o compare.o reference_calc.o Makefile
39 | $(NVCC) -o HW1 main.o student_func.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS)
40 |
41 | main.o: main.cpp timer.h utils.h reference_calc.cpp compare.cpp HW1.cpp
42 | g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) -I $(OPENCV_INCLUDEPATH)
43 |
44 | student_func.o: student_func.cu utils.h
45 | nvcc -c student_func.cu $(NVCC_OPTS)
46 |
47 | compare.o: compare.cpp compare.h
48 | g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
49 |
50 | reference_calc.o: reference_calc.cpp reference_calc.h
51 | g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
52 |
53 | clean:
54 | rm -f *.o *.png hw
55 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/RGB2Gray.vcxproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Debug
6 | Win32
7 |
8 |
9 | Debug
10 | x64
11 |
12 |
13 | Release
14 | Win32
15 |
16 |
17 | Release
18 | x64
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}
38 | RGB2Gray
39 | ProblemSet1-RGB2Gray
40 |
41 |
42 |
43 | Application
44 | true
45 | MultiByte
46 | v140
47 |
48 |
49 | Application
50 | true
51 | MultiByte
52 | v140
53 |
54 |
55 | Application
56 | false
57 | true
58 | MultiByte
59 | v140
60 |
61 |
62 | Application
63 | false
64 | true
65 | MultiByte
66 | v140
67 |
68 |
69 |
70 |
71 |
72 |
73 |
74 |
75 |
76 |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 |
85 |
86 | true
87 |
88 |
89 | true
90 |
91 |
92 |
93 | Level3
94 | Disabled
95 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
96 |
97 |
98 | true
99 | Console
100 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
101 |
102 |
103 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
104 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
105 |
106 |
107 |
108 |
109 | Level3
110 | Disabled
111 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
112 |
113 |
114 | true
115 | Console
116 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
117 |
118 |
119 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
120 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
121 |
122 |
123 | 64
124 |
125 |
126 |
127 |
128 | Level3
129 | MaxSpeed
130 | true
131 | true
132 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
133 |
134 |
135 | true
136 | true
137 | true
138 | Console
139 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
140 |
141 |
142 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
143 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
144 |
145 |
146 |
147 |
148 | Level3
149 | MaxSpeed
150 | true
151 | true
152 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
153 | %(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);C:\opencv\build\include;C:\opencv\build\include\opencv2
154 |
155 |
156 | true
157 | true
158 | true
159 | Console
160 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;opencv_world320d.lib;%(AdditionalDependencies)
161 | %(AdditionalLibraryDirectories);$(CudaToolkitLibDir);C:\opencv\build\x64\vc14\lib
162 |
163 |
164 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
165 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
166 |
167 |
168 | 64
169 |
170 |
171 |
172 |
173 |
174 |
175 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/cinque_terre.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre.gold
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/cinque_terre_gray.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre_gray.jpg
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/cinque_terre_small.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre_small.jpg
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/compare.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 |
5 | #include "utils.h"
6 |
7 | void compareImages(std::string reference_filename, std::string test_filename,
8 | bool useEpsCheck, double perPixelError, double globalError)
9 | {
10 | cv::Mat reference = cv::imread(reference_filename, -1);
11 | cv::Mat test = cv::imread(test_filename, -1);
12 |
13 | cv::Mat diff = abs(reference - test);
14 |
15 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
16 |
17 | double minVal, maxVal;
18 |
19 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
20 |
21 | //now perform transform so that we bump values to the full range
22 |
23 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
24 |
25 | diff = diffSingleChannel.reshape(reference.channels(), 0);
26 |
27 | cv::imwrite("HW1_differenceImage.png", diff);
28 | //OK, now we can start comparing values...
29 | unsigned char *referencePtr = reference.ptr(0);
30 | unsigned char *testPtr = test.ptr(0);
31 |
32 | if (useEpsCheck) {
33 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
34 | }
35 | else
36 | {
37 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
38 | }
39 |
40 | std::cout << "PASS" << std::endl;
41 | return;
42 | }
43 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef COMPARE_H__
2 | #define COMPARE_H__
3 |
4 | void compareImages(std::string reference_filename, std::string test_filename,
5 | bool useEpsCheck, double perPixelError, double globalError);
6 |
7 | #endif
8 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/main.cpp:
--------------------------------------------------------------------------------
1 | //Udacity HW1 Solution
2 |
3 | #include
4 | #include "timer.h"
5 | #include "utils.h"
6 | #include
7 | #include
8 | #include "reference_calc.h"
9 | #include "compare.h"
10 |
11 | void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage,
12 | uchar4 * const d_rgbaImage,
13 | unsigned char* const d_greyImage,
14 | size_t numRows, size_t numCols);
15 |
16 | //include the definitions of the above functions for this homework
17 | #include "HW1.cpp"
18 |
19 | int main(int argc, char **argv) {
20 | uchar4 *h_rgbaImage, *d_rgbaImage;
21 | unsigned char *h_greyImage, *d_greyImage;
22 |
23 | std::string input_file;
24 | std::string output_file;
25 | std::string reference_file;
26 | double perPixelError = 0.0;
27 | double globalError = 0.0;
28 | bool useEpsCheck = false;
29 | switch (argc)
30 | {
31 | case 2:
32 | input_file = std::string(argv[1]);
33 | output_file = "HW1_output.png";
34 | reference_file = "HW1_reference.png";
35 | break;
36 | case 3:
37 | input_file = std::string(argv[1]);
38 | output_file = std::string(argv[2]);
39 | reference_file = "HW1_reference.png";
40 | break;
41 | case 4:
42 | input_file = std::string(argv[1]);
43 | output_file = std::string(argv[2]);
44 | reference_file = std::string(argv[3]);
45 | break;
46 | case 6:
47 | useEpsCheck=true;
48 | input_file = std::string(argv[1]);
49 | output_file = std::string(argv[2]);
50 | reference_file = std::string(argv[3]);
51 | perPixelError = atof(argv[4]);
52 | globalError = atof(argv[5]);
53 | break;
54 | default:
55 | std::cerr << "Usage: ./HW1 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl;
56 | exit(1);
57 | }
58 | //load the image and give us our input and output pointers
59 | preProcess(&h_rgbaImage, &h_greyImage, &d_rgbaImage, &d_greyImage, input_file);
60 |
61 | GpuTimer timer;
62 | timer.Start();
63 | //call the students' code
64 | your_rgba_to_greyscale(h_rgbaImage, d_rgbaImage, d_greyImage, numRows(), numCols());
65 | timer.Stop();
66 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
67 |
68 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
69 |
70 | if (err < 0) {
71 | //Couldn't print! Probably the student closed stdout - bad news
72 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
73 | exit(1);
74 | }
75 |
76 | size_t numPixels = numRows()*numCols();
77 | checkCudaErrors(cudaMemcpy(h_greyImage, d_greyImage, sizeof(unsigned char) * numPixels, cudaMemcpyDeviceToHost));
78 |
79 | //check results and output the grey image
80 | postProcess(output_file, h_greyImage);
81 |
82 | referenceCalculation(h_rgbaImage, h_greyImage, numRows(), numCols());
83 |
84 | postProcess(reference_file, h_greyImage);
85 |
86 | //generateReferenceImage(input_file, reference_file);
87 | compareImages(reference_file, output_file, useEpsCheck, perPixelError,
88 | globalError);
89 |
90 | cleanup();
91 |
92 | return 0;
93 | }
94 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/reference_calc.cpp:
--------------------------------------------------------------------------------
1 | // for uchar4 struct
2 | #include
3 |
4 | void referenceCalculation(const uchar4* const rgbaImage,
5 | unsigned char *const greyImage,
6 | size_t numRows,
7 | size_t numCols)
8 | {
9 | for (size_t r = 0; r < numRows; ++r) {
10 | for (size_t c = 0; c < numCols; ++c) {
11 | uchar4 rgba = rgbaImage[r * numCols + c];
12 | float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
13 | greyImage[r * numCols + c] = channelSum;
14 | }
15 | }
16 | }
17 |
18 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 |
4 | void referenceCalculation(const uchar4* const rgbaImage,
5 | unsigned char *const greyImage,
6 | size_t numRows,
7 | size_t numCols);
8 |
9 | #endif
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/student_func.cu:
--------------------------------------------------------------------------------
1 | // Homework 1
2 | // Color to Greyscale Conversion
3 |
4 | //A common way to represent color images is known as RGBA - the color
5 | //is specified by how much Red, Grean and Blue is in it.
6 | //The 'A' stands for Alpha and is used for transparency, it will be
7 | //ignored in this homework.
8 |
9 | //Each channel Red, Blue, Green and Alpha is represented by one byte.
10 | //Since we are using one byte for each color there are 256 different
11 | //possible values for each color. This means we use 4 bytes per pixel.
12 |
13 | //Greyscale images are represented by a single intensity value per pixel
14 | //which is one byte in size.
15 |
16 | //To convert an image from color to grayscale one simple method is to
17 | //set the intensity to the average of the RGB channels. But we will
18 | //use a more sophisticated method that takes into account how the eye
19 | //perceives color and weights the channels unequally.
20 |
21 | //The eye responds most strongly to green followed by red and then blue.
22 | //The NTSC (National Television System Committee) recommends the following
23 | //formula for color to greyscale conversion:
24 |
25 | //I = .299f * R + .587f * G + .114f * B
26 |
27 | //Notice the trailing f's on the numbers which indicate that they are
28 | //single precision floating point constants and not double precision
29 | //constants.
30 |
31 | //You should fill in the kernel as well as set the block and grid sizes
32 | //so that the entire image is processed.
33 |
34 | #include "utils.h"
35 | #include "device_launch_parameters.h"
36 |
37 | const size_t blockWidth = 32; //threads per block on one dimension (32*32 total)
38 |
39 | __global__
40 | void rgba_to_greyscale(const uchar4* const rgbaImage,
41 | unsigned char* const greyImage,
42 | size_t numRows, size_t numCols)
43 | {
44 | //Fill in the kernel to convert from color to greyscale
45 | //the mapping from components of a uchar4 to RGBA is:
46 | // .x -> R ; .y -> G ; .z -> B ; .w -> A
47 | //
48 | //The output (greyImage) at each pixel should be the result of
49 | //applying the formula: output = .299f * R + .587f * G + .114f * B;
50 | //Note: We will be ignoring the alpha channel for this conversion
51 |
52 | //First create a mapping from the 2D block and grid locations
53 | //to an absolute 2D location in the image, then use that to
54 | //calculate a 1D offset
55 | size_t idx_x = threadIdx.x + blockIdx.x*blockDim.x;
56 | size_t idx_y = threadIdx.y + blockIdx.y*blockDim.y;
57 |
58 | if (idx_x >= numRows || idx_y >= numCols) return; //it can happen on the "remainder" block
59 |
60 | size_t idxvec = idx_x*numCols + idx_y;
61 | uchar4 rgb_value = rgbaImage[idxvec];
62 | greyImage[idxvec] = (unsigned char)(.299f*rgb_value.x + .587f*rgb_value.y + .114f*rgb_value.z);
63 | }
64 |
65 | void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
66 | unsigned char* const d_greyImage, size_t numRows, size_t numCols)
67 | {
68 | //You must fill in the correct sizes for the blockSize and gridSize
69 | //currently only one block with one thread is being launched
70 |
71 | const dim3 blockSize(blockWidth,blockWidth, 1);
72 | unsigned int numBlocksX = (unsigned int)(numRows / blockWidth + 1);
73 | unsigned int numBlocksY = (unsigned int)(numCols / blockWidth + 1);
74 | const dim3 gridSize(numBlocksX,numBlocksY, 1);
75 | rgba_to_greyscale<<>>(d_rgbaImage, d_greyImage, numRows, numCols);
76 |
77 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
78 |
79 | }
80 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/timer.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_TIMER_H__
2 | #define GPU_TIMER_H__
3 |
4 | #include
5 |
6 | struct GpuTimer
7 | {
8 | cudaEvent_t start;
9 | cudaEvent_t stop;
10 |
11 | GpuTimer()
12 | {
13 | cudaEventCreate(&start);
14 | cudaEventCreate(&stop);
15 | }
16 |
17 | ~GpuTimer()
18 | {
19 | cudaEventDestroy(start);
20 | cudaEventDestroy(stop);
21 | }
22 |
23 | void Start()
24 | {
25 | cudaEventRecord(start, 0);
26 | }
27 |
28 | void Stop()
29 | {
30 | cudaEventRecord(stop, 0);
31 | }
32 |
33 | float Elapsed()
34 | {
35 | float elapsed;
36 | cudaEventSynchronize(stop);
37 | cudaEventElapsedTime(&elapsed, start, stop);
38 | return elapsed;
39 | }
40 | };
41 |
42 | #endif /* GPU_TIMER_H__ */
43 |
--------------------------------------------------------------------------------
/ProblemSet1-RGB2Gray/utils.h:
--------------------------------------------------------------------------------
1 | #ifndef UTILS_H__
2 | #define UTILS_H__
3 |
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 |
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 |
15 | template
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 | if (err != cudaSuccess) {
18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 | exit(1);
21 | }
22 | }
23 |
24 | template
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 | //check that the GPU result matches the CPU result
27 | for (size_t i = 0; i < numElem; ++i) {
28 | if (ref[i] != gpu[i]) {
29 | std::cerr << "Difference at pos " << i << std::endl;
30 | //the + is magic to convert char to int without messing
31 | //with other types
32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 | "\nGPU : " << +gpu[i] << std::endl;
34 | exit(1);
35 | }
36 | }
37 | }
38 |
39 | template
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 | assert(eps1 >= 0 && eps2 >= 0);
42 | unsigned long long totalDiff = 0;
43 | unsigned numSmallDifferences = 0;
44 | for (size_t i = 0; i < numElem; ++i) {
45 | //subtract smaller from larger in case of unsigned types
46 | T smaller = std::min(ref[i], gpu[i]);
47 | T larger = std::max(ref[i], gpu[i]);
48 | T diff = larger - smaller;
49 | if (diff > 0 && diff <= eps1) {
50 | numSmallDifferences++;
51 | }
52 | else if (diff > eps1) {
53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 | "\nGPU : " << +gpu[i] << std::endl;
56 | exit(1);
57 | }
58 | totalDiff += diff * diff;
59 | }
60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 | if (percentSmallDifferences > eps2) {
62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 | exit(1);
65 | }
66 | }
67 |
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 |
74 | size_t numBadPixels = 0;
75 | for (size_t i = 0; i < numElem; ++i) {
76 | T smaller = std::min(ref[i], gpu[i]);
77 | T larger = std::max(ref[i], gpu[i]);
78 | T diff = larger - smaller;
79 | if (diff > variance)
80 | ++numBadPixels;
81 | }
82 |
83 | if (numBadPixels > tolerance) {
84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 | exit(1);
86 | }
87 | }
88 |
89 | #endif
90 |
--------------------------------------------------------------------------------
/ProblemSet2-Blur/CMakeLists.txt:
--------------------------------------------------------------------------------
1 | ############################################################################
2 | # CMakeLists.txt for OpenCV and CUDA.
3 | # 2012-02-07
4 | # Quan Tran Minh. edit by Johannes Kast, Michael Sarahan
5 | # quantm@unist.ac.kr kast.jo@googlemail.com msarahan@gmail.com
6 | ############################################################################
7 |
8 | # collect source files
9 |
10 | file( GLOB hdr *.hpp *.h )
11 | file( GLOB cu *.cu)
12 | SET (HW2_files main.cpp reference_calc.cpp compare.cpp)
13 |
14 | CUDA_ADD_EXECUTABLE(HW2 ${HW2_files} ${hdr} ${cu})
15 |
--------------------------------------------------------------------------------
/ProblemSet2-Blur/HW2.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include "utils.h"
5 | #include
6 | #include
7 | #include
8 |
9 | static cv::Mat imageInputRGBA;
10 | static cv::Mat imageOutputRGBA;
11 |
12 | static uchar4 *d_inputImageRGBA__;
13 | static uchar4 *d_outputImageRGBA__;
14 |
15 | static float *h_filter__;
16 |
17 | static size_t numRows() { return imageInputRGBA.rows; }
18 | static size_t numCols() { return imageInputRGBA.cols; }
19 |
20 | //return types are void since any internal error will be handled by quitting
21 | //no point in returning error codes...
22 | //returns a pointer to an RGBA version of the input image
23 | //and a pointer to the single channel grey-scale output
24 | //on both the host and device
25 | static void preProcess(uchar4 **h_inputImageRGBA, uchar4 **h_outputImageRGBA,
26 | uchar4 **d_inputImageRGBA, uchar4 **d_outputImageRGBA,
27 | unsigned char **d_redBlurred,
28 | unsigned char **d_greenBlurred,
29 | unsigned char **d_blueBlurred,
30 | float **h_filter, int *filterWidth,
31 | const std::string &filename) {
32 |
33 | //make sure the context initializes ok
34 | checkCudaErrors(cudaFree(0));
35 |
36 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
37 | if (image.empty()) {
38 | std::cerr << "Couldn't open file: " << filename << std::endl;
39 | exit(1);
40 | }
41 |
42 | cv::cvtColor(image, imageInputRGBA, CV_BGR2RGBA);
43 |
44 | //allocate memory for the output
45 | imageOutputRGBA.create(image.rows, image.cols, CV_8UC4);
46 |
47 | //This shouldn't ever happen given the way the images are created
48 | //at least based upon my limited understanding of OpenCV, but better to check
49 | if (!imageInputRGBA.isContinuous() || !imageOutputRGBA.isContinuous()) {
50 | std::cerr << "Images aren't continuous!! Exiting." << std::endl;
51 | exit(1);
52 | }
53 |
54 | *h_inputImageRGBA = (uchar4 *)imageInputRGBA.ptr(0);
55 | *h_outputImageRGBA = (uchar4 *)imageOutputRGBA.ptr(0);
56 |
57 | const size_t numPixels = numRows() * numCols();
58 | //allocate memory on the device for both input and output
59 | checkCudaErrors(cudaMalloc(d_inputImageRGBA, sizeof(uchar4) * numPixels));
60 | checkCudaErrors(cudaMalloc(d_outputImageRGBA, sizeof(uchar4) * numPixels));
61 | checkCudaErrors(cudaMemset(*d_outputImageRGBA, 0, numPixels * sizeof(uchar4))); //make sure no memory is left laying around
62 |
63 | //copy input array to the GPU
64 | checkCudaErrors(cudaMemcpy(*d_inputImageRGBA, *h_inputImageRGBA, sizeof(uchar4) * numPixels, cudaMemcpyHostToDevice));
65 |
66 | d_inputImageRGBA__ = *d_inputImageRGBA;
67 | d_outputImageRGBA__ = *d_outputImageRGBA;
68 |
69 | //now create the filter that they will use
70 | const int blurKernelWidth = 9;
71 | const float blurKernelSigma = 2.;
72 |
73 | *filterWidth = blurKernelWidth;
74 |
75 | //create and fill the filter we will convolve with
76 | *h_filter = new float[blurKernelWidth * blurKernelWidth];
77 | h_filter__ = *h_filter;
78 |
79 | float filterSum = 0.f; //for normalization
80 |
81 | for (int r = -blurKernelWidth/2; r <= blurKernelWidth/2; ++r) {
82 | for (int c = -blurKernelWidth/2; c <= blurKernelWidth/2; ++c) {
83 | float filterValue = expf( -(float)(c * c + r * r) / (2.f * blurKernelSigma * blurKernelSigma));
84 | (*h_filter)[(r + blurKernelWidth/2) * blurKernelWidth + c + blurKernelWidth/2] = filterValue;
85 | filterSum += filterValue;
86 | }
87 | }
88 |
89 | float normalizationFactor = 1.f / filterSum;
90 |
91 | for (int r = -blurKernelWidth/2; r <= blurKernelWidth/2; ++r) {
92 | for (int c = -blurKernelWidth/2; c <= blurKernelWidth/2; ++c) {
93 | (*h_filter)[(r + blurKernelWidth/2) * blurKernelWidth + c + blurKernelWidth/2] *= normalizationFactor;
94 | }
95 | }
96 |
97 | //blurred
98 | checkCudaErrors(cudaMalloc(d_redBlurred, sizeof(unsigned char) * numPixels));
99 | checkCudaErrors(cudaMalloc(d_greenBlurred, sizeof(unsigned char) * numPixels));
100 | checkCudaErrors(cudaMalloc(d_blueBlurred, sizeof(unsigned char) * numPixels));
101 | checkCudaErrors(cudaMemset(*d_redBlurred, 0, sizeof(unsigned char) * numPixels));
102 | checkCudaErrors(cudaMemset(*d_greenBlurred, 0, sizeof(unsigned char) * numPixels));
103 | checkCudaErrors(cudaMemset(*d_blueBlurred, 0, sizeof(unsigned char) * numPixels));
104 | }
105 |
106 | static void postProcess(const std::string& output_file, uchar4* data_ptr) {
107 | cv::Mat output(numRows(), numCols(), CV_8UC4, (void*)data_ptr);
108 |
109 | cv::Mat imageOutputBGR;
110 | cv::cvtColor(output, imageOutputBGR, CV_RGBA2BGR);
111 | //output the image
112 | cv::imwrite(output_file.c_str(), imageOutputBGR);
113 | }
114 |
115 | static void cleanUp(void)
116 | {
117 | cudaFree(d_inputImageRGBA__);
118 | cudaFree(d_outputImageRGBA__);
119 | delete[] h_filter__;
120 | }
121 |
122 |
123 | // An unused bit of code showing how to accomplish this assignment using OpenCV. It is much faster
124 | // than the naive implementation in reference_calc.cpp.
125 | static void generateReferenceImage(std::string input_file, std::string reference_file, int kernel_size)
126 | {
127 | cv::Mat input = cv::imread(input_file);
128 | // Create an identical image for the output as a placeholder
129 | cv::Mat reference = cv::imread(input_file);
130 | cv::GaussianBlur(input, reference, cv::Size2i(kernel_size, kernel_size),0);
131 | cv::imwrite(reference_file, reference);
132 | }
133 |
--------------------------------------------------------------------------------
/ProblemSet2-Blur/HW2_differenceImage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/HW2_differenceImage.png
--------------------------------------------------------------------------------
/ProblemSet2-Blur/HW2_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/HW2_reference.png
--------------------------------------------------------------------------------
/ProblemSet2-Blur/Makefile:
--------------------------------------------------------------------------------
1 | NVCC=nvcc
2 |
3 | ###################################
4 | # These are the default install #
5 | # locations on most linux distros #
6 | ###################################
7 |
8 | OPENCV_LIBPATH=/usr/lib
9 | OPENCV_INCLUDEPATH=/usr/include
10 |
11 | ###################################################
12 | # On Macs the default install locations are below #
13 | ###################################################
14 |
15 | #OPENCV_LIBPATH=/usr/local/lib
16 | #OPENCV_INCLUDEPATH=/usr/local/include
17 |
18 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui
19 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
20 |
21 | ######################################################
22 | # On Macs the default install locations are below #
23 | # ####################################################
24 |
25 | #CUDA_INCLUDEPATH=/usr/local/cuda/include
26 | #CUDA_LIBPATH=/usr/local/cuda/lib
27 |
28 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
29 |
30 | GCC_OPTS=-O3 -Wall -Wextra -m64
31 |
32 | student: main.o student_func.o compare.o reference_calc.o Makefile
33 | $(NVCC) -o HW2 main.o student_func.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS)
34 |
35 | main.o: main.cpp timer.h utils.h HW2.cpp
36 | g++ -c main.cpp $(GCC_OPTS) -I $(OPENCV_INCLUDEPATH) -I $(CUDA_INCLUDEPATH)
37 |
38 | student_func.o: student_func.cu reference_calc.cpp utils.h
39 | nvcc -c student_func.cu $(NVCC_OPTS)
40 |
41 | compare.o: compare.cpp compare.h
42 | g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
43 |
44 | reference_calc.o: reference_calc.cpp reference_calc.h
45 | g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
46 |
47 | clean:
48 | rm -f *.o *.png hw
49 |
--------------------------------------------------------------------------------
/ProblemSet2-Blur/cinque_terre.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre.gold
--------------------------------------------------------------------------------
/ProblemSet2-Blur/cinque_terre_blur.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre_blur.jpg
--------------------------------------------------------------------------------
/ProblemSet2-Blur/cinque_terre_small.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre_small.jpg
--------------------------------------------------------------------------------
/ProblemSet2-Blur/compare.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 |
5 | #include "utils.h"
6 |
7 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
8 | double perPixelError, double globalError)
9 | {
10 | cv::Mat reference = cv::imread(reference_filename, -1);
11 | cv::Mat test = cv::imread(test_filename, -1);
12 |
13 | cv::Mat diff = abs(reference - test);
14 |
15 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
16 |
17 | double minVal, maxVal;
18 |
19 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
20 |
21 | //now perform transform so that we bump values to the full range
22 |
23 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
24 |
25 | diff = diffSingleChannel.reshape(reference.channels(), 0);
26 |
27 | cv::imwrite("HW2_differenceImage.png", diff);
28 | //OK, now we can start comparing values...
29 | unsigned char *referencePtr = reference.ptr(0);
30 | unsigned char *testPtr = test.ptr(0);
31 |
32 | if (useEpsCheck) {
33 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
34 | }
35 | else
36 | {
37 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
38 | }
39 |
40 | std::cout << "PASS" << std::endl;
41 | return;
42 | }
--------------------------------------------------------------------------------
/ProblemSet2-Blur/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef COMPARE_H__
2 | #define COMPARE_H__
3 |
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | double perPixelError, double globalError);
6 |
7 | #endif
--------------------------------------------------------------------------------
/ProblemSet2-Blur/main.cpp:
--------------------------------------------------------------------------------
1 | //Udacity HW2 Driver
2 |
3 | #include
4 | #include "timer.h"
5 | #include "utils.h"
6 | #include
7 | #include
8 |
9 | #include "reference_calc.h"
10 | #include "compare.h"
11 |
12 | //include the definitions of the above functions for this homework
13 | #include "HW2.cpp"
14 |
15 |
16 | /******* DEFINED IN student_func.cu *********/
17 |
18 | void your_gaussian_blur(const uchar4 * const h_inputImageRGBA, uchar4 * const d_inputImageRGBA,
19 | uchar4* const d_outputImageRGBA,
20 | const size_t numRows, const size_t numCols,
21 | unsigned char *d_redBlurred,
22 | unsigned char *d_greenBlurred,
23 | unsigned char *d_blueBlurred,
24 | const int filterWidth);
25 |
26 | void allocateMemoryAndCopyToGPU(const size_t numRowsImage, const size_t numColsImage,
27 | const float* const h_filter, const size_t filterWidth);
28 |
29 |
30 | /******* Begin main *********/
31 |
32 | int main(int argc, char **argv) {
33 | uchar4 *h_inputImageRGBA, *d_inputImageRGBA;
34 | uchar4 *h_outputImageRGBA, *d_outputImageRGBA;
35 | unsigned char *d_redBlurred, *d_greenBlurred, *d_blueBlurred;
36 |
37 | float *h_filter;
38 | int filterWidth;
39 |
40 | std::string input_file;
41 | std::string output_file;
42 | std::string reference_file;
43 | double perPixelError = 0.0;
44 | double globalError = 0.0;
45 | bool useEpsCheck = false;
46 | switch (argc)
47 | {
48 | case 2:
49 | input_file = std::string(argv[1]);
50 | output_file = "HW2_output.png";
51 | reference_file = "HW2_reference.png";
52 | break;
53 | case 3:
54 | input_file = std::string(argv[1]);
55 | output_file = std::string(argv[2]);
56 | reference_file = "HW2_reference.png";
57 | break;
58 | case 4:
59 | input_file = std::string(argv[1]);
60 | output_file = std::string(argv[2]);
61 | reference_file = std::string(argv[3]);
62 | break;
63 | case 6:
64 | useEpsCheck=true;
65 | input_file = std::string(argv[1]);
66 | output_file = std::string(argv[2]);
67 | reference_file = std::string(argv[3]);
68 | perPixelError = atof(argv[4]);
69 | globalError = atof(argv[5]);
70 | break;
71 | default:
72 | std::cerr << "Usage: ./HW2 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl;
73 | exit(1);
74 | }
75 | //load the image and give us our input and output pointers
76 | preProcess(&h_inputImageRGBA, &h_outputImageRGBA, &d_inputImageRGBA, &d_outputImageRGBA,
77 | &d_redBlurred, &d_greenBlurred, &d_blueBlurred,
78 | &h_filter, &filterWidth, input_file);
79 |
80 | allocateMemoryAndCopyToGPU(numRows(), numCols(), h_filter, filterWidth);
81 | GpuTimer timer;
82 | timer.Start();
83 | //call the students' code
84 | your_gaussian_blur(h_inputImageRGBA, d_inputImageRGBA, d_outputImageRGBA, numRows(), numCols(),
85 | d_redBlurred, d_greenBlurred, d_blueBlurred, filterWidth);
86 | timer.Stop();
87 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
88 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
89 |
90 | if (err < 0) {
91 | //Couldn't print! Probably the student closed stdout - bad news
92 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
93 | exit(1);
94 | }
95 |
96 | //check results and output the blurred image
97 |
98 | size_t numPixels = numRows()*numCols();
99 | //copy the output back to the host
100 | checkCudaErrors(cudaMemcpy(h_outputImageRGBA, d_outputImageRGBA__, sizeof(uchar4) * numPixels, cudaMemcpyDeviceToHost));
101 |
102 | postProcess(output_file, h_outputImageRGBA);
103 |
104 | referenceCalculation(h_inputImageRGBA, h_outputImageRGBA,
105 | numRows(), numCols(),
106 | h_filter, filterWidth);
107 |
108 | postProcess(reference_file, h_outputImageRGBA);
109 |
110 | // Cheater easy way with OpenCV
111 | //generateReferenceImage(input_file, reference_file, filterWidth);
112 |
113 | compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError);
114 |
115 | checkCudaErrors(cudaFree(d_redBlurred));
116 | checkCudaErrors(cudaFree(d_greenBlurred));
117 | checkCudaErrors(cudaFree(d_blueBlurred));
118 |
119 | cleanUp();
120 |
121 | return 0;
122 | }
123 |
--------------------------------------------------------------------------------
/ProblemSet2-Blur/reference_calc.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | // for uchar4 struct
4 | #include
5 |
6 | void channelConvolution(const unsigned char* const channel,
7 | unsigned char* const channelBlurred,
8 | const size_t numRows, const size_t numCols,
9 | const float *filter, const int filterWidth)
10 | {
11 | //Dealing with an even width filter is trickier
12 | assert(filterWidth % 2 == 1);
13 |
14 | //For every pixel in the image
15 | for (int r = 0; r < (int)numRows; ++r) {
16 | for (int c = 0; c < (int)numCols; ++c) {
17 | float result = 0.f;
18 | //For every value in the filter around the pixel (c, r)
19 | for (int filter_r = -filterWidth/2; filter_r <= filterWidth/2; ++filter_r) {
20 | for (int filter_c = -filterWidth/2; filter_c <= filterWidth/2; ++filter_c) {
21 | //Find the global image position for this filter position
22 | //clamp to boundary of the image
23 | int image_r = std::min(std::max(r + filter_r, 0), static_cast(numRows - 1));
24 | int image_c = std::min(std::max(c + filter_c, 0), static_cast(numCols - 1));
25 |
26 | float image_value = static_cast(channel[image_r * numCols + image_c]);
27 | float filter_value = filter[(filter_r + filterWidth/2) * filterWidth + filter_c + filterWidth/2];
28 |
29 | result += image_value * filter_value;
30 | }
31 | }
32 |
33 | channelBlurred[r * numCols + c] = result;
34 | }
35 | }
36 | }
37 |
38 | void referenceCalculation(const uchar4* const rgbaImage, uchar4 *const outputImage,
39 | size_t numRows, size_t numCols,
40 | const float* const filter, const int filterWidth)
41 | {
42 | unsigned char *red = new unsigned char[numRows * numCols];
43 | unsigned char *blue = new unsigned char[numRows * numCols];
44 | unsigned char *green = new unsigned char[numRows * numCols];
45 |
46 | unsigned char *redBlurred = new unsigned char[numRows * numCols];
47 | unsigned char *blueBlurred = new unsigned char[numRows * numCols];
48 | unsigned char *greenBlurred = new unsigned char[numRows * numCols];
49 |
50 | //First we separate the incoming RGBA image into three separate channels
51 | //for Red, Green and Blue
52 | for (size_t i = 0; i < numRows * numCols; ++i) {
53 | uchar4 rgba = rgbaImage[i];
54 | red[i] = rgba.x;
55 | green[i] = rgba.y;
56 | blue[i] = rgba.z;
57 | }
58 |
59 | //Now we can do the convolution for each of the color channels
60 | channelConvolution(red, redBlurred, numRows, numCols, filter, filterWidth);
61 | channelConvolution(green, greenBlurred, numRows, numCols, filter, filterWidth);
62 | channelConvolution(blue, blueBlurred, numRows, numCols, filter, filterWidth);
63 |
64 | //now recombine into the output image - Alpha is 255 for no transparency
65 | for (size_t i = 0; i < numRows * numCols; ++i) {
66 | uchar4 rgba = make_uchar4(redBlurred[i], greenBlurred[i], blueBlurred[i], 255);
67 | outputImage[i] = rgba;
68 | }
69 |
70 | delete[] red;
71 | delete[] green;
72 | delete[] blue;
73 |
74 | delete[] redBlurred;
75 | delete[] greenBlurred;
76 | delete[] blueBlurred;
77 | }
78 |
--------------------------------------------------------------------------------
/ProblemSet2-Blur/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 |
4 | void referenceCalculation(const uchar4* const rgbaImage, uchar4 *const outputImage,
5 | size_t numRows, size_t numCols,
6 | const float* const filter, const int filterWidth);
7 |
8 | #endif
--------------------------------------------------------------------------------
/ProblemSet2-Blur/timer.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_TIMER_H__
2 | #define GPU_TIMER_H__
3 |
4 | #include
5 |
6 | struct GpuTimer
7 | {
8 | cudaEvent_t start;
9 | cudaEvent_t stop;
10 |
11 | GpuTimer()
12 | {
13 | cudaEventCreate(&start);
14 | cudaEventCreate(&stop);
15 | }
16 |
17 | ~GpuTimer()
18 | {
19 | cudaEventDestroy(start);
20 | cudaEventDestroy(stop);
21 | }
22 |
23 | void Start()
24 | {
25 | cudaEventRecord(start, 0);
26 | }
27 |
28 | void Stop()
29 | {
30 | cudaEventRecord(stop, 0);
31 | }
32 |
33 | float Elapsed()
34 | {
35 | float elapsed;
36 | cudaEventSynchronize(stop);
37 | cudaEventElapsedTime(&elapsed, start, stop);
38 | return elapsed;
39 | }
40 | };
41 |
42 | #endif /* GPU_TIMER_H__ */
43 |
--------------------------------------------------------------------------------
/ProblemSet2-Blur/utils.h:
--------------------------------------------------------------------------------
1 | #ifndef UTILS_H__
2 | #define UTILS_H__
3 |
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 |
12 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
13 |
14 | template
15 | void check(T err, const char* const func, const char* const file, const int line) {
16 | if (err != cudaSuccess) {
17 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
18 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
19 | exit(1);
20 | }
21 | }
22 |
23 | template
24 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
25 | //check that the GPU result matches the CPU result
26 | for (size_t i = 0; i < numElem; ++i) {
27 | if (ref[i] != gpu[i]) {
28 | std::cerr << "Difference at pos " << i << std::endl;
29 | //the + is magic to convert char to int without messing
30 | //with other types
31 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
32 | "\nGPU : " << +gpu[i] << std::endl;
33 | exit(1);
34 | }
35 | }
36 | }
37 |
38 | template
39 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
40 | assert(eps1 >= 0 && eps2 >= 0);
41 | unsigned long long totalDiff = 0;
42 | unsigned numSmallDifferences = 0;
43 | for (size_t i = 0; i < numElem; ++i) {
44 | //subtract smaller from larger in case of unsigned types
45 | T smaller = std::min(ref[i], gpu[i]);
46 | T larger = std::max(ref[i], gpu[i]);
47 | T diff = larger - smaller;
48 | if (diff > 0 && diff <= eps1) {
49 | numSmallDifferences++;
50 | }
51 | else if (diff > eps1) {
52 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
53 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
54 | "\nGPU : " << +gpu[i] << std::endl;
55 | exit(1);
56 | }
57 | totalDiff += diff * diff;
58 | }
59 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
60 | if (percentSmallDifferences > eps2) {
61 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
62 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
63 | exit(1);
64 | }
65 | }
66 |
67 | //Uses the autodesk method of image comparison
68 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
69 | template
70 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
71 | {
72 |
73 | size_t numBadPixels = 0;
74 | for (size_t i = 0; i < numElem; ++i) {
75 | T smaller = std::min(ref[i], gpu[i]);
76 | T larger = std::max(ref[i], gpu[i]);
77 | T diff = larger - smaller;
78 | if (diff > variance)
79 | ++numBadPixels;
80 | }
81 |
82 | if (numBadPixels > tolerance) {
83 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
84 | exit(1);
85 | }
86 | }
87 |
88 | #endif
89 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/CMakeLists.txt:
--------------------------------------------------------------------------------
1 | ############################################################################
2 | # CMakeLists.txt for OpenCV and CUDA.
3 | # 2012-02-07
4 | # Quan Tran Minh. edit by Johannes Kast, Michael Sarahan
5 | # quantm@unist.ac.kr kast.jo@googlemail.com msarahan@gmail.com
6 | ############################################################################
7 | # minimum required cmake version
8 | cmake_minimum_required(VERSION 2.8)
9 | find_package(CUDA QUIET REQUIRED)
10 |
11 | SET (compare_files compare.cpp)
12 |
13 | file( GLOB hdr *.hpp *.h )
14 | file( GLOB cu *.cu)
15 | SET (HW3_files main.cpp loadSaveImage.cpp reference_calc.cpp compare.cpp)
16 |
17 | CUDA_ADD_EXECUTABLE(HW3 ${HW3_files} ${hdr} ${cu})
18 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HDR-image.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HDR-image.jpg
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HDR-image_mapped.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HDR-image_mapped.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HW3_differenceImage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_differenceImage.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HW3_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_reference.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/HW3_reference_old.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_reference_old.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/Makefile:
--------------------------------------------------------------------------------
1 | NVCC=nvcc
2 |
3 | ###################################
4 | # These are the default install #
5 | # locations on most linux distros #
6 | ###################################
7 |
8 | OPENCV_LIBPATH=/usr/lib
9 | OPENCV_INCLUDEPATH=/usr/include
10 |
11 | ###################################################
12 | # On Macs the default install locations are below #
13 | ###################################################
14 |
15 | #OPENCV_LIBPATH=/usr/local/lib
16 | #OPENCV_INCLUDEPATH=/usr/local/include
17 |
18 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui
19 |
20 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
21 |
22 | ######################################################
23 | # On Macs the default install locations are below #
24 | # ####################################################
25 |
26 | #CUDA_INCLUDEPATH=/usr/local/cuda/include
27 | #CUDA_LIBPATH=/usr/local/cuda/lib
28 |
29 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
30 |
31 | GCC_OPTS=-O3 -Wall -Wextra -m64
32 |
33 | student: main.o student_func.o HW3.o loadSaveImage.o compare.o reference_calc.o Makefile
34 | $(NVCC) -o HW3 main.o student_func.o HW3.o loadSaveImage.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS)
35 |
36 | main.o: main.cpp timer.h utils.h reference_calc.h compare.h
37 | g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
38 |
39 | HW3.o: HW3.cu loadSaveImage.h utils.h
40 | $(NVCC) -c HW3.cu -I $(OPENCV_INCLUDEPATH) $(NVCC_OPTS)
41 |
42 | loadSaveImage.o: loadSaveImage.cpp loadSaveImage.h
43 | g++ -c loadSaveImage.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
44 |
45 | compare.o: compare.cpp compare.h
46 | g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
47 |
48 | reference_calc.o: reference_calc.cpp reference_calc.h
49 | g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
50 |
51 | student_func.o: student_func.cu utils.h
52 | $(NVCC) -c student_func.cu $(NVCC_OPTS)
53 |
54 | clean:
55 | rm -f *.o hw
56 | find . -type f -name '*.exr' | grep -v memorial | xargs rm -f
57 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/ProblemSet3-ToneMapping.vcxproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Debug
6 | Win32
7 |
8 |
9 | Debug
10 | x64
11 |
12 |
13 | Release
14 | Win32
15 |
16 |
17 | Release
18 | x64
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}
40 | ProblemSet3_ToneMapping
41 |
42 |
43 |
44 | Application
45 | true
46 | MultiByte
47 | v140
48 |
49 |
50 | Application
51 | true
52 | MultiByte
53 | v140
54 |
55 |
56 | Application
57 | false
58 | true
59 | MultiByte
60 | v140
61 |
62 |
63 | Application
64 | false
65 | true
66 | MultiByte
67 | v140
68 |
69 |
70 |
71 |
72 |
73 |
74 |
75 |
76 |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 | true
88 |
89 |
90 | true
91 |
92 |
93 |
94 | Level3
95 | Disabled
96 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
97 |
98 |
99 | true
100 | Console
101 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
102 |
103 |
104 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
105 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
106 |
107 |
108 |
109 |
110 | Level3
111 | Disabled
112 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
113 |
114 |
115 | true
116 | Console
117 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
118 |
119 |
120 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
121 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
122 |
123 |
124 | 64
125 |
126 |
127 |
128 |
129 | Level3
130 | MaxSpeed
131 | true
132 | true
133 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
134 |
135 |
136 | true
137 | true
138 | true
139 | Console
140 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
141 |
142 |
143 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
144 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
145 |
146 |
147 |
148 |
149 | Level3
150 | MaxSpeed
151 | true
152 | true
153 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
154 | %(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);C:\opencv\build\include;C:\opencv\build\include\opencv2
155 |
156 |
157 | true
158 | true
159 | true
160 | Console
161 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;opencv_world320d.lib;%(AdditionalDependencies)
162 | %(AdditionalLibraryDirectories);$(CudaToolkitLibDir);C:\opencv\build\x64\vc14\lib
163 |
164 |
165 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
166 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
167 |
168 |
169 | 64
170 | compute_61,sm_61
171 |
172 |
173 |
174 |
175 |
176 |
177 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/compare.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include "utils.h"
3 |
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | double perPixelError, double globalError)
6 | {
7 | cv::Mat reference = cv::imread(reference_filename, -1);
8 | cv::Mat test = cv::imread(test_filename, -1);
9 |
10 | cv::Mat diff = abs(reference - test);
11 |
12 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
13 |
14 | double minVal, maxVal;
15 |
16 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
17 |
18 | //now perform transform so that we bump values to the full range
19 |
20 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
21 |
22 | diff = diffSingleChannel.reshape(reference.channels(), 0);
23 |
24 | cv::imwrite("HW3_differenceImage.png", diff);
25 | //OK, now we can start comparing values...
26 | unsigned char *referencePtr = reference.ptr(0);
27 | unsigned char *testPtr = test.ptr(0);
28 |
29 | if (useEpsCheck) {
30 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
31 | }
32 | else
33 | {
34 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
35 | }
36 |
37 | std::cout << "PASS" << std::endl;
38 | return;
39 | }
40 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef HW3_H__
2 | #define HW3_H__
3 |
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | double perPixelError, double globalError);
6 |
7 | #endif
8 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/input.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/input.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/loadSaveImage.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 | #include "cuda_runtime.h"
7 |
8 | //The caller becomes responsible for the returned pointer. This
9 | //is done in the interest of keeping this code as simple as possible.
10 | //In production code this is a bad idea - we should use RAII
11 | //to ensure the memory is freed. DO NOT COPY THIS AND USE IN PRODUCTION
12 | //CODE!!!
13 | void loadImageHDR(const std::string &filename,
14 | float **imagePtr,
15 | size_t *numRows, size_t *numCols)
16 | {
17 | cv::Mat originImg = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH);
18 |
19 | cv::Mat image;
20 |
21 | if(originImg.type() != CV_32FC3){
22 | originImg.convertTo(image,CV_32FC3);
23 | } else{
24 | image = originImg;
25 | }
26 |
27 | if (image.empty()) {
28 | std::cerr << "Couldn't open file: " << filename << std::endl;
29 | exit(1);
30 | }
31 |
32 | if (image.channels() != 3) {
33 | std::cerr << "Image must be color!" << std::endl;
34 | exit(1);
35 | }
36 |
37 | if (!image.isContinuous()) {
38 | std::cerr << "Image isn't continuous!" << std::endl;
39 | exit(1);
40 | }
41 |
42 | *imagePtr = new float[image.rows * image.cols * image.channels()];
43 |
44 | float *cvPtr = image.ptr(0);
45 | for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i)
46 | (*imagePtr)[i] = cvPtr[i];
47 |
48 | *numRows = image.rows;
49 | *numCols = image.cols;
50 | }
51 |
52 | void loadImageRGBA(const std::string &filename,
53 | uchar4 **imagePtr,
54 | size_t *numRows, size_t *numCols)
55 | {
56 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
57 | if (image.empty()) {
58 | std::cerr << "Couldn't open file: " << filename << std::endl;
59 | exit(1);
60 | }
61 |
62 | if (image.channels() != 3) {
63 | std::cerr << "Image must be color!" << std::endl;
64 | exit(1);
65 | }
66 |
67 | if (!image.isContinuous()) {
68 | std::cerr << "Image isn't continuous!" << std::endl;
69 | exit(1);
70 | }
71 |
72 | cv::Mat imageRGBA;
73 | cv::cvtColor(image, imageRGBA, CV_BGR2RGBA);
74 |
75 | *imagePtr = new uchar4[image.rows * image.cols];
76 |
77 | unsigned char *cvPtr = imageRGBA.ptr(0);
78 | for (size_t i = 0; i < image.rows * image.cols; ++i) {
79 | (*imagePtr)[i].x = cvPtr[4 * i + 0];
80 | (*imagePtr)[i].y = cvPtr[4 * i + 1];
81 | (*imagePtr)[i].z = cvPtr[4 * i + 2];
82 | (*imagePtr)[i].w = cvPtr[4 * i + 3];
83 | }
84 |
85 | *numRows = image.rows;
86 | *numCols = image.cols;
87 | }
88 |
89 | void saveImageRGBA(const uchar4* const image,
90 | const size_t numRows, const size_t numCols,
91 | const std::string &output_file)
92 | {
93 | int sizes[2];
94 | sizes[0] = numRows;
95 | sizes[1] = numCols;
96 | cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image);
97 | cv::Mat imageOutputBGR;
98 | cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR);
99 | //output the image
100 | cv::imwrite(output_file.c_str(), imageOutputBGR);
101 | }
102 |
103 | //output an exr file
104 | //assumed to already be BGR
105 | void saveImageHDR(const float* const image,
106 | const size_t numRows, const size_t numCols,
107 | const std::string &output_file)
108 | {
109 | int sizes[2];
110 | sizes[0] = numRows;
111 | sizes[1] = numCols;
112 |
113 | cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image);
114 |
115 | imageHDR = imageHDR * 255;
116 |
117 | cv::imwrite(output_file.c_str(), imageHDR);
118 | }
119 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/loadSaveImage.h:
--------------------------------------------------------------------------------
1 | #ifndef LOADSAVEIMAGE_H__
2 | #define LOADSAVEIMAGE_H__
3 |
4 | #include
5 | #include //for uchar4
6 |
7 | void loadImageHDR(const std::string &filename,
8 | float **imagePtr,
9 | size_t *numRows, size_t *numCols);
10 |
11 | void loadImageRGBA(const std::string &filename,
12 | uchar4 **imagePtr,
13 | size_t *numRows, size_t *numCols);
14 |
15 | void saveImageRGBA(const uchar4* const image,
16 | const size_t numRows, const size_t numCols,
17 | const std::string &output_file);
18 |
19 | void saveImageHDR(const float* const image,
20 | const size_t numRows, const size_t numCols,
21 | const std::string &output_file);
22 |
23 | #endif
24 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/main.cpp:
--------------------------------------------------------------------------------
1 | //Udacity HW3 Driver
2 |
3 | #include
4 | #include "timer.h"
5 | #include "utils.h"
6 | #include
7 | #include
8 | #include
9 |
10 | #include "compare.h"
11 | #include "reference_calc.h"
12 |
13 | // Functions from HW3.cu
14 | void preProcess(float **d_luminance, unsigned int **d_cdf,
15 | size_t *numRows, size_t *numCols, unsigned int *numBins,
16 | const std::string& filename);
17 |
18 | void postProcess(const std::string& output_file, size_t numRows, size_t numCols,
19 | float min_logLum, float max_logLum);
20 |
21 | void cleanupGlobalMemory(void);
22 |
23 | // Function from student_func.cu
24 | void your_histogram_and_prefixsum(const float* const d_luminance,
25 | unsigned int* const d_cdf,
26 | float &min_logLum,
27 | float &max_logLum,
28 | const size_t numRows,
29 | const size_t numCols,
30 | const size_t numBins);
31 |
32 |
33 | int main(int argc, char **argv) {
34 | float *d_luminance;
35 | unsigned int *d_cdf;
36 |
37 | size_t numRows, numCols;
38 | unsigned int numBins;
39 |
40 | std::string input_file;
41 | std::string output_file;
42 | std::string reference_file;
43 | double perPixelError = 0.0;
44 | double globalError = 0.0;
45 | bool useEpsCheck = false;
46 |
47 | switch (argc)
48 | {
49 | case 2:
50 | input_file = std::string(argv[1]);
51 | output_file = "HW3_output.png";
52 | reference_file = "HW3_reference.png";
53 | break;
54 | case 3:
55 | input_file = std::string(argv[1]);
56 | output_file = std::string(argv[2]);
57 | reference_file = "HW3_reference.png";
58 | break;
59 | case 4:
60 | input_file = std::string(argv[1]);
61 | output_file = std::string(argv[2]);
62 | reference_file = std::string(argv[3]);
63 | break;
64 | case 6:
65 | useEpsCheck=true;
66 | input_file = std::string(argv[1]);
67 | output_file = std::string(argv[2]);
68 | reference_file = std::string(argv[3]);
69 | perPixelError = atof(argv[4]);
70 | globalError = atof(argv[5]);
71 | break;
72 | default:
73 | std::cerr << "Usage: ./HW3 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl;
74 | exit(1);
75 | }
76 | //load the image and give us our input and output pointers
77 | preProcess(&d_luminance, &d_cdf,
78 | &numRows, &numCols, &numBins, input_file);
79 |
80 | GpuTimer timer;
81 | float min_logLum, max_logLum;
82 | min_logLum = 0.f;
83 | max_logLum = 1.f;
84 | timer.Start();
85 | //call the students' code
86 | your_histogram_and_prefixsum(d_luminance, d_cdf, min_logLum, max_logLum,
87 | numRows, numCols, numBins);
88 | timer.Stop();
89 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
90 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
91 |
92 | if (err < 0) {
93 | //Couldn't print! Probably the student closed stdout - bad news
94 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
95 | exit(1);
96 | }
97 |
98 | float *h_luminance = (float *) malloc(sizeof(float)*numRows*numCols);
99 | unsigned int *h_cdf = (unsigned int *) malloc(sizeof(unsigned int)*numBins);
100 |
101 | checkCudaErrors(cudaMemcpy(h_luminance, d_luminance, numRows*numCols*sizeof(float), cudaMemcpyDeviceToHost));
102 |
103 | //check results and output the tone-mapped image
104 | postProcess(output_file, numRows, numCols, min_logLum, max_logLum);
105 |
106 | for (size_t i = 1; i < numCols * numRows; ++i) {
107 | min_logLum = std::min(h_luminance[i], min_logLum);
108 | max_logLum = std::max(h_luminance[i], max_logLum);
109 | }
110 |
111 | referenceCalculation(h_luminance, h_cdf, numRows, numCols, numBins, min_logLum, max_logLum);
112 |
113 | checkCudaErrors(cudaMemcpy(d_cdf, h_cdf, sizeof(unsigned int) * numBins, cudaMemcpyHostToDevice));
114 |
115 | //check results and output the tone-mapped image
116 | postProcess(reference_file, numRows, numCols, min_logLum, max_logLum);
117 |
118 | cleanupGlobalMemory();
119 |
120 | compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError);
121 |
122 | return 0;
123 | }
124 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial.exr
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_large.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_large.exr
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_png.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_png.gold
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_png_large.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_png_large.gold
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_raw.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_raw_large.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_large.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_raw_large_mapped.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_large_mapped.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/memorial_raw_mapped.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_mapped.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/my_output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/my_output.png
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/reference_calc.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | void referenceCalculation(const float* const h_logLuminance, unsigned int* const h_cdf,
4 | const size_t numRows, const size_t numCols, const size_t numBins,
5 | float &logLumMin, float &logLumMax)
6 | {
7 | logLumMin = h_logLuminance[0];
8 | logLumMax = h_logLuminance[0];
9 |
10 | //Step 1
11 | //first we find the minimum and maximum across the entire image
12 | for (size_t i = 1; i < numCols * numRows; ++i) {
13 | logLumMin = std::min(h_logLuminance[i], logLumMin);
14 | logLumMax = std::max(h_logLuminance[i], logLumMax);
15 | }
16 |
17 | //Step 2
18 | float logLumRange = logLumMax - logLumMin;
19 |
20 | //Step 3
21 | //next we use the now known range to compute
22 | //a histogram of numBins bins
23 | unsigned int *histo = new unsigned int[numBins];
24 |
25 | for (size_t i = 0; i < numBins; ++i) histo[i] = 0;
26 |
27 | for (size_t i = 0; i < numCols * numRows; ++i) {
28 | unsigned int bin = std::min(static_cast(numBins - 1),
29 | static_cast((h_logLuminance[i] - logLumMin) / logLumRange * numBins));
30 | histo[bin]++;
31 | }
32 |
33 | //Step 4
34 | //finally we perform and exclusive scan (prefix sum)
35 | //on the histogram to get the cumulative distribution
36 | h_cdf[0] = 0;
37 | for (size_t i = 1; i < numBins; ++i) {
38 | h_cdf[i] = h_cdf[i - 1] + histo[i - 1];
39 | }
40 |
41 | delete[] histo;
42 | }
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 |
4 | void referenceCalculation(const float* const h_logLuminance, unsigned int* const h_cdf,
5 | const size_t numRows, const size_t numCols, const size_t numBins,
6 | float &logLumMin, float &logLumMax);
7 |
8 | #endif
9 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/student_func.cu:
--------------------------------------------------------------------------------
1 | /* Udacity Homework 3
2 | HDR Tone-mapping
3 |
4 | Background HDR
5 | ==============
6 |
7 | A High Dynamic Range (HDR) image contains a wider variation of intensity
8 | and color than is allowed by the RGB format with 1 byte per channel that we
9 | have used in the previous assignment.
10 |
11 | To store this extra information we use single precision floating point for
12 | each channel. This allows for an extremely wide range of intensity values.
13 |
14 | In the image for this assignment, the inside of church with light coming in
15 | through stained glass windows, the raw input floating point values for the
16 | channels range from 0 to 275. But the mean is .41 and 98% of the values are
17 | less than 3! This means that certain areas (the windows) are extremely bright
18 | compared to everywhere else. If we linearly map this [0-275] range into the
19 | [0-255] range that we have been using then most values will be mapped to zero!
20 | The only thing we will be able to see are the very brightest areas - the
21 | windows - everything else will appear pitch black.
22 |
23 | The problem is that although we have cameras capable of recording the wide
24 | range of intensity that exists in the real world our monitors are not capable
25 | of displaying them. Our eyes are also quite capable of observing a much wider
26 | range of intensities than our image formats / monitors are capable of
27 | displaying.
28 |
29 | Tone-mapping is a process that transforms the intensities in the image so that
30 | the brightest values aren't nearly so far away from the mean. That way when
31 | we transform the values into [0-255] we can actually see the entire image.
32 | There are many ways to perform this process and it is as much an art as a
33 | science - there is no single "right" answer. In this homework we will
34 | implement one possible technique.
35 |
36 | Background Chrominance-Luminance
37 | ================================
38 |
39 | The RGB space that we have been using to represent images can be thought of as
40 | one possible set of axes spanning a three dimensional space of color. We
41 | sometimes choose other axes to represent this space because they make certain
42 | operations more convenient.
43 |
44 | Another possible way of representing a color image is to separate the color
45 | information (chromaticity) from the brightness information. There are
46 | multiple different methods for doing this - a common one during the analog
47 | television days was known as Chrominance-Luminance or YUV.
48 |
49 | We choose to represent the image in this way so that we can remap only the
50 | intensity channel and then recombine the new intensity values with the color
51 | information to form the final image.
52 |
53 | Old TV signals used to be transmitted in this way so that black & white
54 | televisions could display the luminance channel while color televisions would
55 | display all three of the channels.
56 |
57 |
58 | Tone-mapping
59 | ============
60 |
61 | In this assignment we are going to transform the luminance channel (actually
62 | the log of the luminance, but this is unimportant for the parts of the
63 | algorithm that you will be implementing) by compressing its range to [0, 1].
64 | To do this we need the cumulative distribution of the luminance values.
65 |
66 | Example
67 | -------
68 |
69 | input : [2 4 3 3 1 7 4 5 7 0 9 4 3 2]
70 | min / max / range: 0 / 9 / 9
71 |
72 | histo with 3 bins: [4 7 3]
73 |
74 | cdf : [4 11 14]
75 |
76 |
77 | Your task is to calculate this cumulative distribution by following these
78 | steps.
79 |
80 | */
81 |
82 | #include "utils.h"
83 | #include "device_launch_parameters.h"
84 | //#include "reference_calc.cpp"
85 | #include
86 | #include
87 | #include
88 |
89 | const int BLOCK_SIZE = 1024;
90 |
91 | __device__ float _min(float a, float b) {
92 | return a < b ? a : b;
93 | }
94 |
95 | __device__ float _max(float a, float b) {
96 | return a > b ? a : b;
97 | }
98 |
99 | __global__ void minmax_reduce(float* d_out, const float * d_in, int input_size,bool isMin) {
100 |
101 | extern __shared__ float sdata[];
102 |
103 | int tid = threadIdx.x;
104 | int global_id = tid + blockDim.x*blockIdx.x;
105 |
106 | if (global_id >= input_size) { sdata[tid] = d_in[0]; } //dummy init (does not modify the final result)
107 | else sdata[tid] = d_in[global_id];
108 | __syncthreads();
109 | for (int s = blockDim.x/2; s > 0; s>>=1){
110 | if (tid < s) sdata[tid] = isMin ? _min(sdata[tid], sdata[tid + s]) : _max(sdata[tid], sdata[tid + s]);
111 | __syncthreads();
112 | }
113 | if (tid == 0) {
114 | d_out[blockIdx.x] = sdata[0];
115 | }
116 | }
117 |
118 |
119 |
120 | __global__ void histo_atomic(unsigned int* out_histo, const float * d_in, int numBins, int input_size, float minVal, float rangeVals) {
121 | int tid = threadIdx.x;
122 | int global_id = tid + blockDim.x*blockIdx.x;
123 | if (global_id >= input_size) return;
124 | int bin = ((d_in[global_id] - minVal)*numBins) / rangeVals;
125 | bin = bin == numBins ? numBins - 1 : bin; //max value bin is the last of the histo
126 | atomicAdd(&(out_histo[bin]), 1);
127 | }
128 |
129 |
130 | //--------HILLIS-STEELE SCAN----------
131 | //Optimal step efficiency (histogram is a relatively small vector)
132 | //Works on maximum 1024 (Pascal) elems vector.
133 | __global__ void scan_hillis_steele(unsigned int* d_out,const unsigned int* d_in, int size) {
134 | extern __shared__ unsigned int temp[];
135 | int tid = threadIdx.x;
136 | int pout = 0,pin=1;
137 | temp[tid] = tid>0? d_in[tid-1]:0; //exclusive scan
138 | __syncthreads();
139 |
140 | //double buffered
141 | for (int off = 1; off < size; off <<= 1) {
142 | pout = 1 - pout;
143 | pin = 1 - pout;
144 | if (tid >= off) temp[size*pout + tid] = temp[size*pin + tid]+temp[size*pin + tid - off];
145 | else temp[size*pout + tid] = temp[size*pin + tid];
146 | __syncthreads();
147 | }
148 | d_out[tid] = temp[pout*size + tid];
149 | }
150 |
151 |
152 | float reduce(const float* const d_logLuminance, int input_size,bool isMin) {
153 | int threads = BLOCK_SIZE;
154 | float* d_current_in = NULL;
155 | int size = input_size;
156 | int blocks = ceil(1.0f*size / threads);
157 | while (true) {
158 | //allocate memory for intermediate results
159 | //printf("Size %d blocks %d\n", size,blocks);
160 | float* d_out;
161 | checkCudaErrors(cudaMalloc(&d_out, blocks * sizeof(float)));
162 | //call reduce kernel: if first iteration use original vector, otherwise use the last intermediate result.
163 | if (d_current_in == NULL) minmax_reduce << > > (d_out, d_logLuminance, size, isMin);
164 | else minmax_reduce << > > (d_out, d_current_in, size, isMin);;
165 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
166 |
167 | //free last intermediate result
168 | if (d_current_in != NULL) checkCudaErrors(cudaFree(d_current_in));
169 |
170 | if (blocks == 1) {
171 | //end of reduction reached
172 | float h_out;
173 | checkCudaErrors(cudaMemcpy(&h_out, d_out, sizeof(float), cudaMemcpyDeviceToHost));
174 | return h_out;
175 | }
176 | size = blocks;
177 | blocks = ceil(1.0f*size / threads);
178 | if (blocks == 0)blocks++;
179 | d_current_in = d_out;//point to new intermediate result
180 |
181 | }
182 |
183 | }
184 |
185 |
186 | unsigned int* compute_histogram(const float* const d_logLuminance, int numBins, int input_size, float minVal, float rangeVals) {
187 | unsigned int* d_histo;
188 | checkCudaErrors(cudaMalloc(&d_histo, numBins * sizeof(unsigned int)));
189 | checkCudaErrors(cudaMemset(d_histo, 0, numBins * sizeof(unsigned int)));
190 | int threads = BLOCK_SIZE;
191 | int blocks = ceil(1.0f*input_size / threads);
192 | histo_atomic << > >(d_histo, d_logLuminance, numBins, input_size, minVal, rangeVals);
193 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
194 | return d_histo;
195 | }
196 |
197 | void your_histogram_and_prefixsum(const float* const d_logLuminance,
198 | unsigned int* const d_cdf,
199 | float &min_logLum,
200 | float &max_logLum,
201 | const size_t numRows,
202 | const size_t numCols,
203 | const size_t numBins)
204 | {
205 | /*Here are the steps you need to implement
206 | 1) find the minimum and maximum value in the input logLuminance channel
207 | store in min_logLum and max_logLum
208 | 2) subtract them to find the range
209 | 3) generate a histogram of all the values in the logLuminance channel using
210 | the formula: bin = (lum[i] - lumMin) / lumRange * numBins
211 | 4) Perform an exclusive scan (prefix sum) on the histogram to get
212 | the cumulative distribution of luminance values (this should go in the
213 | incoming d_cdf pointer which already has been allocated for you) */
214 |
215 | //1. Reduce
216 | int input_size = numRows*numCols;
217 | min_logLum = reduce(d_logLuminance, input_size, true);
218 | max_logLum = reduce(d_logLuminance, input_size, false);
219 | //printf("%f %f\n", min_logLum, max_logLum);
220 |
221 | //2. Range
222 | float range = max_logLum - min_logLum;
223 |
224 | //3. Histogram
225 | unsigned int* d_histo=compute_histogram(d_logLuminance, numBins, input_size, min_logLum, range);
226 |
227 | //4. CDF (scan)
228 | //Assumption: numBins<=1024
229 | scan_hillis_steele << <1, numBins, 2*numBins*sizeof(unsigned int) >> > (d_cdf,d_histo, numBins);
230 |
231 | checkCudaErrors(cudaFree(d_histo));
232 |
233 | }
234 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/timer.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_TIMER_H__
2 | #define GPU_TIMER_H__
3 |
4 | #include
5 |
6 | struct GpuTimer
7 | {
8 | cudaEvent_t start;
9 | cudaEvent_t stop;
10 |
11 | GpuTimer()
12 | {
13 | cudaEventCreate(&start);
14 | cudaEventCreate(&stop);
15 | }
16 |
17 | ~GpuTimer()
18 | {
19 | cudaEventDestroy(start);
20 | cudaEventDestroy(stop);
21 | }
22 |
23 | void Start()
24 | {
25 | cudaEventRecord(start, 0);
26 | }
27 |
28 | void Stop()
29 | {
30 | cudaEventRecord(stop, 0);
31 | }
32 |
33 | float Elapsed()
34 | {
35 | float elapsed;
36 | cudaEventSynchronize(stop);
37 | cudaEventElapsedTime(&elapsed, start, stop);
38 | return elapsed;
39 | }
40 | };
41 |
42 | #endif /* GPU_TIMER_H__ */
43 |
--------------------------------------------------------------------------------
/ProblemSet3-ToneMapping/utils.h:
--------------------------------------------------------------------------------
1 | #ifndef UTILS_H__
2 | #define UTILS_H__
3 |
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 |
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 |
15 | template
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 | if (err != cudaSuccess) {
18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 | exit(1);
21 | }
22 | }
23 |
24 | template
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 | //check that the GPU result matches the CPU result
27 | for (size_t i = 0; i < numElem; ++i) {
28 | if (ref[i] != gpu[i]) {
29 | std::cerr << "Difference at pos " << i << std::endl;
30 | //the + is magic to convert char to int without messing
31 | //with other types
32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 | "\nGPU : " << +gpu[i] << std::endl;
34 | exit(1);
35 | }
36 | }
37 | }
38 |
39 | template
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 | assert(eps1 >= 0 && eps2 >= 0);
42 | unsigned long long totalDiff = 0;
43 | unsigned numSmallDifferences = 0;
44 | for (size_t i = 0; i < numElem; ++i) {
45 | //subtract smaller from larger in case of unsigned types
46 | T smaller = std::min(ref[i], gpu[i]);
47 | T larger = std::max(ref[i], gpu[i]);
48 | T diff = larger - smaller;
49 | if (diff > 0 && diff <= eps1) {
50 | numSmallDifferences++;
51 | }
52 | else if (diff > eps1) {
53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 | "\nGPU : " << +gpu[i] << std::endl;
56 | exit(1);
57 | }
58 | totalDiff += diff * diff;
59 | }
60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 | if (percentSmallDifferences > eps2) {
62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 | exit(1);
65 | }
66 | }
67 |
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 |
74 | size_t numBadPixels = 0;
75 | for (size_t i = 0; i < numElem; ++i) {
76 | T smaller = std::min(ref[i], gpu[i]);
77 | T larger = std::max(ref[i], gpu[i]);
78 | T diff = larger - smaller;
79 | if (diff > variance)
80 | ++numBadPixels;
81 | }
82 |
83 | if (numBadPixels > tolerance) {
84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 | exit(1);
86 | }
87 | }
88 |
89 | #endif
90 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/CMakeLists.txt:
--------------------------------------------------------------------------------
1 | ############################################################################
2 | # CMakeLists.txt for OpenCV and CUDA.
3 | # 2012-02-07
4 | # Quan Tran Minh. edit by Johannes Kast, Michael Sarahan
5 | # quantm@unist.ac.kr kast.jo@googlemail.com msarahan@gmail.com
6 | ############################################################################
7 |
8 | # collect source files
9 |
10 | file( GLOB hdr *.hpp *.h )
11 | file( GLOB cu *.cu)
12 | SET (HW4_files main.cpp loadSaveImage.cpp reference_calc.cpp compare.cpp)
13 |
14 | CUDA_ADD_EXECUTABLE(HW4 ${HW4_files} ${hdr} ${img} ${cu})
15 |
16 |
17 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/HW4_output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/HW4_output.png
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/Makefile:
--------------------------------------------------------------------------------
1 | NVCC=/usr/local/cuda-5.0/bin/nvcc
2 | #NVCC=nvcc
3 |
4 | ###################################
5 | # These are the default install #
6 | # locations on most linux distros #
7 | ###################################
8 |
9 | OPENCV_LIBPATH=/usr/lib
10 | OPENCV_INCLUDEPATH=/usr/include
11 |
12 | ###################################################
13 | # On Macs the default install locations are below #
14 | ###################################################
15 |
16 | #OPENCV_LIBPATH=/usr/local/lib
17 | #OPENCV_INCLUDEPATH=/usr/local/include
18 |
19 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui
20 |
21 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
22 | # CUDA_INCLUDEPATH=/usr/local/cuda/lib64/include
23 | # CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include
24 | # CUDA_INCLUDEPATH=/Developer/NVIDIA/CUDA-5.0/include
25 |
26 | ######################################################
27 | # On Macs the default install locations are below #
28 | # ####################################################
29 |
30 | #CUDA_INCLUDEPATH=/usr/local/cuda/include
31 | #CUDA_LIBPATH=/usr/local/cuda/lib
32 | CUDA_LIBPATH=/usr/local/cuda-5.0/lib64
33 |
34 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
35 |
36 | GCC_OPTS=-O3 -Wall -Wextra -m64
37 |
38 | student: main.o student_func.o HW4.o loadSaveImage.o compare.o reference_calc.o Makefile
39 | $(NVCC) -o HW4 main.o student_func.o HW4.o loadSaveImage.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS)
40 |
41 | main.o: main.cpp timer.h utils.h reference_calc.h
42 | g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
43 |
44 | HW4.o: HW4.cu loadSaveImage.h utils.h
45 | $(NVCC) -c HW4.cu -I $(OPENCV_INCLUDEPATH) $(NVCC_OPTS)
46 |
47 | loadSaveImage.o: loadSaveImage.cpp loadSaveImage.h
48 | g++ -c loadSaveImage.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
49 |
50 | compare.o: compare.cpp compare.h
51 | g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
52 |
53 | reference_calc.o: reference_calc.cpp reference_calc.h
54 | g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)
55 |
56 | student_func.o: student_func.cu reference_calc.cpp utils.h
57 | $(NVCC) -c student_func.cu $(NVCC_OPTS)
58 |
59 | clean:
60 | rm -f *.o *.png hw
61 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/compare.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include "utils.h"
3 |
4 |
5 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
6 | double perPixelError, double globalError)
7 | {
8 | cv::Mat reference = cv::imread(reference_filename, -1);
9 | cv::Mat test = cv::imread(test_filename, -1);
10 |
11 | cv::Mat diff = abs(reference - test);
12 |
13 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
14 |
15 | double minVal, maxVal;
16 |
17 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
18 |
19 | //now perform transform so that we bump values to the full range
20 |
21 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
22 |
23 | diff = diffSingleChannel.reshape(reference.channels(), 0);
24 |
25 | cv::imwrite("HW4_differenceImage.png", diff);
26 | //OK, now we can start comparing values...
27 | unsigned char *referencePtr = reference.ptr(0);
28 | unsigned char *testPtr = test.ptr(0);
29 |
30 | if (useEpsCheck) {
31 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
32 | }
33 | else
34 | {
35 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
36 | }
37 |
38 | std::cout << "PASS" << std::endl;
39 | return;
40 | }
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef HW4_H__
2 | #define HW4_H__
3 |
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | double perPixelError, double globalError);
6 |
7 | #endif
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/loadSaveImage.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include "cuda_runtime.h"
6 |
7 | //The caller becomes responsible for the returned pointer. This
8 | //is done in the interest of keeping this code as simple as possible.
9 | //In production code this is a bad idea - we should use RAII
10 | //to ensure the memory is freed. DO NOT COPY THIS AND USE IN PRODUCTION
11 | //CODE!!!
12 | void loadImageHDR(const std::string &filename,
13 | float **imagePtr,
14 | size_t *numRows, size_t *numCols)
15 | {
16 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH);
17 | if (image.empty()) {
18 | std::cerr << "Couldn't open file: " << filename << std::endl;
19 | exit(1);
20 | }
21 |
22 | if (image.channels() != 3) {
23 | std::cerr << "Image must be color!" << std::endl;
24 | exit(1);
25 | }
26 |
27 | if (!image.isContinuous()) {
28 | std::cerr << "Image isn't continuous!" << std::endl;
29 | exit(1);
30 | }
31 |
32 | *imagePtr = new float[image.rows * image.cols * image.channels()];
33 |
34 | float *cvPtr = image.ptr(0);
35 | for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i)
36 | (*imagePtr)[i] = cvPtr[i];
37 |
38 | *numRows = image.rows;
39 | *numCols = image.cols;
40 | }
41 |
42 | void loadImageRGBA(const std::string &filename,
43 | uchar4 **imagePtr,
44 | size_t *numRows, size_t *numCols)
45 | {
46 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
47 | if (image.empty()) {
48 | std::cerr << "Couldn't open file: " << filename << std::endl;
49 | exit(1);
50 | }
51 |
52 | if (image.channels() != 3) {
53 | std::cerr << "Image must be color!" << std::endl;
54 | exit(1);
55 | }
56 |
57 | if (!image.isContinuous()) {
58 | std::cerr << "Image isn't continuous!" << std::endl;
59 | exit(1);
60 | }
61 |
62 | cv::Mat imageRGBA;
63 | cv::cvtColor(image, imageRGBA, CV_BGR2RGBA);
64 |
65 | *imagePtr = new uchar4[image.rows * image.cols];
66 |
67 | unsigned char *cvPtr = imageRGBA.ptr(0);
68 | for (size_t i = 0; i < image.rows * image.cols; ++i) {
69 | (*imagePtr)[i].x = cvPtr[4 * i + 0];
70 | (*imagePtr)[i].y = cvPtr[4 * i + 1];
71 | (*imagePtr)[i].z = cvPtr[4 * i + 2];
72 | (*imagePtr)[i].w = cvPtr[4 * i + 3];
73 | }
74 |
75 | *numRows = image.rows;
76 | *numCols = image.cols;
77 | }
78 |
79 | void saveImageRGBA(const uchar4* const image,
80 | const size_t numRows, const size_t numCols,
81 | const std::string &output_file)
82 | {
83 | int sizes[2];
84 | sizes[0] = numRows;
85 | sizes[1] = numCols;
86 | cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image);
87 | cv::Mat imageOutputBGR;
88 | cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR);
89 | //output the image
90 | cv::imwrite(output_file.c_str(), imageOutputBGR);
91 | }
92 |
93 | //output an exr file
94 | //assumed to already be BGR
95 | void saveImageHDR(const float* const image,
96 | const size_t numRows, const size_t numCols,
97 | const std::string &output_file)
98 | {
99 | int sizes[2];
100 | sizes[0] = numRows;
101 | sizes[1] = numCols;
102 |
103 | cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image);
104 |
105 | imageHDR = imageHDR * 255;
106 |
107 | cv::imwrite(output_file.c_str(), imageHDR);
108 | }
109 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/loadSaveImage.h:
--------------------------------------------------------------------------------
1 | #ifndef LOADSAVEIMAGE_H__
2 | #define LOADSAVEIMAGE_H__
3 |
4 | #include
5 | #include //for uchar4
6 |
7 | void loadImageHDR(const std::string &filename,
8 | float **imagePtr,
9 | size_t *numRows, size_t *numCols);
10 |
11 | void loadImageRGBA(const std::string &filename,
12 | uchar4 **imagePtr,
13 | size_t *numRows, size_t *numCols);
14 |
15 | void saveImageRGBA(const uchar4* const image,
16 | const size_t numRows, const size_t numCols,
17 | const std::string &output_file);
18 |
19 | void saveImageHDR(const float* const image,
20 | const size_t numRows, const size_t numCols,
21 | const std::string &output_file);
22 |
23 | #endif
24 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/main.cpp:
--------------------------------------------------------------------------------
1 | //Udacity HW4 Driver
2 |
3 | #include
4 | #include "timer.h"
5 | #include "utils.h"
6 | #include
7 | #include
8 | #include
9 | #include
10 |
11 | #include "compare.h"
12 | #include "reference_calc.h"
13 |
14 | void preProcess(unsigned int **inputVals,
15 | unsigned int **inputPos,
16 | unsigned int **outputVals,
17 | unsigned int **outputPos,
18 | size_t &numElems,
19 | const std::string& filename,
20 | const std::string& template_file);
21 |
22 | void postProcess(const unsigned int* const outputVals,
23 | const unsigned int* const outputPos,
24 | const size_t numElems,
25 | const std::string& output_file);
26 |
27 | void your_sort(unsigned int* const inputVals,
28 | unsigned int* const inputPos,
29 | unsigned int* const outputVals,
30 | unsigned int* const outputPos,
31 | const size_t numElems);
32 |
33 | int main(int argc, char **argv) {
34 | unsigned int *inputVals;
35 | unsigned int *inputPos;
36 | unsigned int *outputVals;
37 | unsigned int *outputPos;
38 |
39 | size_t numElems;
40 |
41 | std::string input_file;
42 | std::string template_file;
43 | std::string output_file;
44 | std::string reference_file;
45 | double perPixelError = 0.0;
46 | double globalError = 0.0;
47 | bool useEpsCheck = false;
48 |
49 | switch (argc)
50 | {
51 | case 3:
52 | input_file = std::string(argv[1]);
53 | template_file = std::string(argv[2]);
54 | output_file = "HW4_output.png";
55 | break;
56 | case 4:
57 | input_file = std::string(argv[1]);
58 | template_file = std::string(argv[2]);
59 | output_file = std::string(argv[3]);
60 | break;
61 | default:
62 | std::cerr << "Usage: ./HW4 input_file template_file [output_filename]" << std::endl;
63 | exit(1);
64 | }
65 | //load the image and give us our input and output pointers
66 | preProcess(&inputVals, &inputPos, &outputVals, &outputPos, numElems, input_file, template_file);
67 |
68 | GpuTimer timer;
69 | timer.Start();
70 |
71 | //call the students' code
72 | your_sort(inputVals, inputPos, outputVals, outputPos, numElems);
73 |
74 | timer.Stop();
75 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
76 | printf("\n");
77 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
78 |
79 | if (err < 0) {
80 | //Couldn't print! Probably the student closed stdout - bad news
81 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
82 | exit(1);
83 | }
84 |
85 | //check results and output the red-eye corrected image
86 | postProcess(outputVals, outputPos, numElems, output_file);
87 |
88 | // check code moved from HW4.cu
89 | /****************************************************************************
90 | * You can use the code below to help with debugging, but make sure to *
91 | * comment it out again before submitting your assignment for grading, *
92 | * otherwise this code will take too much time and make it seem like your *
93 | * GPU implementation isn't fast enough. *
94 | * *
95 | * This code MUST RUN BEFORE YOUR CODE in case you accidentally change *
96 | * the input values when implementing your radix sort. *
97 | * *
98 | * This code performs the reference radix sort on the host and compares your *
99 | * sorted values to the reference. *
100 | * *
101 | * Thrust containers are used for copying memory from the GPU *
102 | * ************************************************************************* */
103 | thrust::device_ptr d_inputVals(inputVals);
104 | thrust::device_ptr d_inputPos(inputPos);
105 |
106 | thrust::host_vector h_inputVals(d_inputVals,
107 | d_inputVals+numElems);
108 | thrust::host_vector h_inputPos(d_inputPos,
109 | d_inputPos + numElems);
110 |
111 | thrust::host_vector h_outputVals(numElems);
112 | thrust::host_vector h_outputPos(numElems);
113 |
114 | reference_calculation(&h_inputVals[0], &h_inputPos[0],
115 | &h_outputVals[0], &h_outputPos[0],
116 | numElems);
117 |
118 | //postProcess(valsPtr, posPtr, numElems, reference_file);
119 |
120 | //compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError);
121 |
122 | thrust::device_ptr d_outputVals(outputVals);
123 | thrust::device_ptr d_outputPos(outputPos);
124 |
125 | thrust::host_vector h_yourOutputVals(d_outputVals,
126 | d_outputVals + numElems);
127 | thrust::host_vector h_yourOutputPos(d_outputPos,
128 | d_outputPos + numElems);
129 |
130 | checkResultsExact(&h_outputVals[0], &h_yourOutputVals[0], numElems);
131 | checkResultsExact(&h_outputPos[0], &h_yourOutputPos[0], numElems);
132 |
133 | checkCudaErrors(cudaFree(inputVals));
134 | checkCudaErrors(cudaFree(inputPos));
135 | checkCudaErrors(cudaFree(outputVals));
136 | checkCudaErrors(cudaFree(outputPos));
137 |
138 | return 0;
139 | }
140 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/red_eye_effect.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect.gold
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/red_eye_effect_5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_5.jpg
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/red_eye_effect_5_out.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_5_out.jpg
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/red_eye_effect_template_5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_template_5.jpg
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/reference_calc.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | // For memset
3 | #include
4 |
5 | void reference_calculation(unsigned int* inputVals,
6 | unsigned int* inputPos,
7 | unsigned int* outputVals,
8 | unsigned int* outputPos,
9 | const size_t numElems)
10 | {
11 | const int numBits = 1;
12 | const int numBins = 1 << numBits;
13 |
14 | unsigned int *binHistogram = new unsigned int[numBins];
15 | unsigned int *binScan = new unsigned int[numBins];
16 |
17 | unsigned int *vals_src = inputVals;
18 | unsigned int *pos_src = inputPos;
19 |
20 | unsigned int *vals_dst = outputVals;
21 | unsigned int *pos_dst = outputPos;
22 |
23 | //a simple radix sort - only guaranteed to work for numBits that are multiples of 2
24 | for (unsigned int i = 0; i < 8 * sizeof(unsigned int); i += numBits) {
25 | unsigned int mask = (numBins - 1) << i;
26 |
27 | memset(binHistogram, 0, sizeof(unsigned int) * numBins); //zero out the bins
28 | memset(binScan, 0, sizeof(unsigned int) * numBins); //zero out the bins
29 |
30 | //perform histogram of data & mask into bins
31 | for (unsigned int j = 0; j < numElems; ++j) {
32 | unsigned int bin = (vals_src[j] & mask) >> i;
33 | binHistogram[bin]++;
34 | }
35 |
36 | //perform exclusive prefix sum (scan) on binHistogram to get starting
37 | //location for each bin
38 | for (unsigned int j = 1; j < numBins; ++j) {
39 | binScan[j] = binScan[j - 1] + binHistogram[j - 1];
40 | }
41 |
42 | //Gather everything into the correct location
43 | //need to move vals and positions
44 | for (unsigned int j = 0; j < numElems; ++j) {
45 | unsigned int bin = (vals_src[j] & mask) >> i;
46 | vals_dst[binScan[bin]] = vals_src[j];
47 | pos_dst[binScan[bin]] = pos_src[j];
48 | binScan[bin]++;
49 | }
50 |
51 | //swap the buffers (pointers only)
52 | std::swap(vals_dst, vals_src);
53 | std::swap(pos_dst, pos_src);
54 | }
55 |
56 | //we did an even number of iterations, need to copy from input buffer into output
57 | std::copy(inputVals, inputVals + numElems, outputVals);
58 | std::copy(inputPos, inputPos + numElems, outputPos);
59 |
60 | delete[] binHistogram;
61 | delete[] binScan;
62 | }
63 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 |
4 |
5 | //A simple un-optimized reference radix sort calculation
6 | //Only deals with power-of-2 radices
7 |
8 |
9 | void reference_calculation(unsigned int* inputVals,
10 | unsigned int* inputPos,
11 | unsigned int* outputVals,
12 | unsigned int* outputPos,
13 | const size_t numElems);
14 | #endif
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/student_func.cu:
--------------------------------------------------------------------------------
1 | //Udacity HW 4
2 | //Radix Sorting
3 |
4 | #include "utils.h"
5 | #include "device_launch_parameters.h"
6 | #include
7 |
8 | const int BLOCK_SIZE = 1024;
9 |
10 | /* Red Eye Removal
11 | ===============
12 |
13 | For this assignment we are implementing red eye removal. This is
14 | accomplished by first creating a score for every pixel that tells us how
15 | likely it is to be a red eye pixel. We have already done this for you - you
16 | are receiving the scores and need to sort them in ascending order so that we
17 | know which pixels to alter to remove the red eye.
18 |
19 | Note: ascending order == smallest to largest
20 |
21 | Each score is associated with a position, when you sort the scores, you must
22 | also move the positions accordingly.
23 |
24 | Implementing Parallel Radix Sort with CUDA
25 | ==========================================
26 |
27 | The basic idea is to construct a histogram on each pass of how many of each
28 | "digit" there are. Then we scan this histogram so that we know where to put
29 | the output of each digit. For example, the first 1 must come after all the
30 | 0s so we have to know how many 0s there are to be able to start moving 1s
31 | into the correct position.
32 |
33 | 1) Histogram of the number of occurrences of each digit
34 | 2) Exclusive Prefix Sum of Histogram
35 | 3) Determine relative offset of each digit
36 | For example [0 0 1 1 0 0 1]
37 | -> [0 1 0 1 2 3 2]
38 | 4) Combine the results of steps 2 & 3 to determine the final
39 | output location for each element and move it there
40 |
41 | LSB Radix sort is an out-of-place sort and you will need to ping-pong values
42 | between the input and output buffers we have provided. Make sure the final
43 | sorted results end up in the output buffer! Hint: You may need to do a copy
44 | at the end.
45 |
46 | */
47 |
48 |
49 |
50 | __global__ void predicate(unsigned int* predicate, const unsigned int* d_in, size_t numElems,int bit) {
51 | int tid = threadIdx.x;
52 | int global_id = tid + blockDim.x*blockIdx.x;
53 | if (global_id >= numElems) return;
54 | unsigned int bin = ((d_in[global_id] >> bit) & 1u);
55 | predicate[global_id] =bin;
56 | }
57 |
58 |
59 | __global__ void bielloch_scan(unsigned int* d_out, const unsigned int* d_in, size_t input_size, unsigned int* blockSums) {
60 | extern __shared__ unsigned int data[];
61 |
62 | int tid = threadIdx.x;
63 | int offset = 1;
64 | int abs_start = 2*blockDim.x*blockIdx.x;
65 |
66 | data[2 * tid] =(abs_start+2*tid)>1; d>0; d>>=1) {
70 | __syncthreads();
71 |
72 | if (tid < d) {
73 | int ai = offset*(2 * tid + 1) - 1;
74 | int bi = offset*(2 * tid + 2) - 1;
75 |
76 | data[bi] += data[ai];
77 | }
78 | offset <<= 1;
79 | }
80 | if (tid == 0)data[2*blockDim.x - 1] = 0;
81 |
82 | for (int d = 1; d < 2 * blockDim.x; d<<=1) {
83 | offset >>= 1;
84 | __syncthreads();
85 | if (tid < d) {
86 | int ai = offset*(2 * tid + 1) - 1;
87 | int bi = offset*(2 * tid + 2) - 1;
88 | unsigned int t = data[ai];
89 | data[ai] = data[bi];
90 | data[bi] += t;
91 | }
92 | }
93 |
94 | __syncthreads();
95 |
96 | if (abs_start + 2 * tid < input_size) {
97 | d_out[abs_start + 2 * tid] = data[2 * tid];
98 | }
99 | if (abs_start + 2 * tid+1 < input_size) {
100 | d_out[abs_start + 2 * tid+1] = data[2 * tid+1];
101 | }
102 |
103 | if (tid == 0) {
104 | blockSums[blockIdx.x] = data[blockDim.x * 2 - 1];
105 | if(abs_start + blockDim.x * 2 - 1= input_size)return;
126 | predicate[pos] = predicate[pos] ? 0 : 1;
127 | }
128 |
129 | __global__ void moveElements(unsigned int* d_out, const unsigned int* d_in, const unsigned int* d_histo,
130 | const unsigned int* d_predicate,const unsigned int* d_scan_true, const unsigned int* d_scan_false, size_t input_size) {
131 | int tid = threadIdx.x;
132 | int pos = blockDim.x*blockIdx.x + tid;
133 | if (pos >= input_size)return;
134 | //calculate new index of element at position pos
135 | int newindex;
136 | if (d_predicate[pos])newindex = d_histo[0] + d_scan_false[pos];
137 | else newindex = d_histo[1] + d_scan_true[pos];
138 | if (newindex >= input_size) return; //IMP
139 | d_out[newindex] = d_in[pos];
140 | }
141 |
142 |
143 |
144 | unsigned int biellochScan(unsigned int* d_scan, unsigned int* d_pred, size_t numElems) {
145 |
146 | int num_double_blocks = ceil(1.0f*numElems / (2*BLOCK_SIZE));
147 | unsigned int* d_blocksums;
148 | checkCudaErrors(cudaMalloc(&d_blocksums, num_double_blocks * sizeof(unsigned int)));
149 | bielloch_scan << > > (d_scan, d_pred, numElems, d_blocksums);
150 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
151 |
152 | unsigned int finalSum;
153 | //Scan of the blocksums array
154 | if (num_double_blocks > 1) {
155 | unsigned int* d_scan_temp;
156 | checkCudaErrors(cudaMalloc(&d_scan_temp, num_double_blocks * sizeof(unsigned int)));
157 | finalSum=biellochScan(d_scan_temp, d_blocksums, num_double_blocks);
158 | adjustIncrement << > > (d_scan, d_scan_temp, numElems);
159 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
160 | checkCudaErrors(cudaFree(d_scan_temp));
161 | }
162 | else {
163 |
164 | checkCudaErrors(cudaMemcpy(&finalSum, d_blocksums, sizeof(unsigned int), cudaMemcpyDeviceToHost));
165 | checkCudaErrors(cudaFree(d_blocksums));
166 | }
167 |
168 | return finalSum;
169 |
170 | }
171 |
172 | void your_sort(unsigned int* const d_inputVals,
173 | unsigned int* const d_inputPos,
174 | unsigned int* const d_outputVals,
175 | unsigned int* const d_outputPos,
176 | size_t numElems)
177 | {
178 | //PUT YOUR SORT HERE
179 | int num_blocks = ceil(1.0f*numElems / BLOCK_SIZE);
180 |
181 | unsigned int h_histo[2];
182 | h_histo[0] = 0;
183 |
184 | unsigned int* d_histo;
185 | unsigned int* d_pred;
186 | unsigned int* d_scan_true;
187 | unsigned int* d_scan_false;
188 |
189 | checkCudaErrors(cudaMalloc(&d_histo, 2 * sizeof(unsigned int)));
190 | checkCudaErrors(cudaMalloc(&d_pred, numElems*sizeof(unsigned int)));
191 | checkCudaErrors(cudaMalloc(&d_scan_true, numElems * sizeof(unsigned int)));
192 | checkCudaErrors(cudaMalloc(&d_scan_false, numElems * sizeof(unsigned int)));
193 | //for each of the 32 bits
194 | for (size_t i = 0; i < 32; i++) {
195 |
196 | //compute predicate
197 | if (i % 2 == 0)predicate << > > (d_pred, d_inputVals, numElems, i);
198 | else predicate << > > (d_pred, d_outputVals, numElems, i);
199 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
200 |
201 |
202 |
203 | //Exclusive Prefix Sum of 2-bins histogram is: [0 numFalse].
204 | //You can obtain it buy sum-reduce on predicate: equivalent to last sumBlock of BiellochScan
205 |
206 | //Compute offset of positives
207 | //Bielloch scan
208 | unsigned int number_trues=biellochScan(d_scan_true, d_pred, numElems);
209 |
210 | //Flip bits
211 | negatePredicate << > > (d_pred, numElems);
212 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
213 |
214 | //Compute offset of negatives
215 | unsigned int number_falses=biellochScan(d_scan_false, d_pred, numElems);
216 |
217 | h_histo[1] = number_falses;
218 | checkCudaErrors(cudaMemcpy(d_histo, h_histo, 2 * sizeof(unsigned int), cudaMemcpyHostToDevice));
219 |
220 | //Moving elements and indices
221 | if (i % 2 == 0) {
222 | moveElements << > > (d_outputVals, d_inputVals, d_histo, d_pred, d_scan_true, d_scan_false, numElems);
223 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
224 | moveElements << > > (d_outputPos, d_inputPos, d_histo, d_pred, d_scan_true, d_scan_false, numElems);
225 |
226 | }
227 | else {
228 | moveElements << > > (d_inputVals, d_outputVals, d_histo, d_pred, d_scan_true, d_scan_false, numElems);
229 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
230 | moveElements << > > (d_inputPos, d_outputPos, d_histo, d_pred, d_scan_true, d_scan_false, numElems);
231 |
232 | }
233 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
234 |
235 | }
236 |
237 | //Copy result into d_outputVals
238 | checkCudaErrors(cudaMemcpy(d_outputVals, d_inputVals, numElems * sizeof(unsigned int), cudaMemcpyDeviceToDevice));
239 | checkCudaErrors(cudaMemcpy(d_outputPos, d_inputPos, numElems * sizeof(unsigned int), cudaMemcpyDeviceToDevice));
240 |
241 |
242 | checkCudaErrors(cudaFree(d_histo));
243 | checkCudaErrors(cudaFree(d_pred));
244 | checkCudaErrors(cudaFree(d_scan_true));
245 | checkCudaErrors(cudaFree(d_scan_false));
246 |
247 | }
248 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/timer.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_TIMER_H__
2 | #define GPU_TIMER_H__
3 |
4 | #include
5 |
6 | struct GpuTimer
7 | {
8 | cudaEvent_t start;
9 | cudaEvent_t stop;
10 |
11 | GpuTimer()
12 | {
13 | cudaEventCreate(&start);
14 | cudaEventCreate(&stop);
15 | }
16 |
17 | ~GpuTimer()
18 | {
19 | cudaEventDestroy(start);
20 | cudaEventDestroy(stop);
21 | }
22 |
23 | void Start()
24 | {
25 | cudaEventRecord(start, 0);
26 | }
27 |
28 | void Stop()
29 | {
30 | cudaEventRecord(stop, 0);
31 | }
32 |
33 | float Elapsed()
34 | {
35 | float elapsed;
36 | cudaEventSynchronize(stop);
37 | cudaEventElapsedTime(&elapsed, start, stop);
38 | return elapsed;
39 | }
40 | };
41 |
42 | #endif /* GPU_TIMER_H__ */
43 |
--------------------------------------------------------------------------------
/ProblemSet4-RedEyeRemoval/utils.h:
--------------------------------------------------------------------------------
1 | #ifndef UTILS_H__
2 | #define UTILS_H__
3 |
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 |
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 |
15 | template
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 | if (err != cudaSuccess) {
18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 | exit(1);
21 | }
22 | }
23 |
24 | template
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 | //check that the GPU result matches the CPU result
27 | for (size_t i = 0; i < numElem; ++i) {
28 | if (ref[i] != gpu[i]) {
29 | std::cerr << "Difference at pos " << i << std::endl;
30 | //the + is magic to convert char to int without messing
31 | //with other types
32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 | "\nGPU : " << +gpu[i] << std::endl;
34 | exit(1);
35 | }
36 | }
37 | }
38 |
39 | template
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 | assert(eps1 >= 0 && eps2 >= 0);
42 | unsigned long long totalDiff = 0;
43 | unsigned numSmallDifferences = 0;
44 | for (size_t i = 0; i < numElem; ++i) {
45 | //subtract smaller from larger in case of unsigned types
46 | T smaller = std::min(ref[i], gpu[i]);
47 | T larger = std::max(ref[i], gpu[i]);
48 | T diff = larger - smaller;
49 | if (diff > 0 && diff <= eps1) {
50 | numSmallDifferences++;
51 | }
52 | else if (diff > eps1) {
53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 | "\nGPU : " << +gpu[i] << std::endl;
56 | exit(1);
57 | }
58 | totalDiff += diff * diff;
59 | }
60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 | if (percentSmallDifferences > eps2) {
62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 | exit(1);
65 | }
66 | }
67 |
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 |
74 | size_t numBadPixels = 0;
75 | for (size_t i = 0; i < numElem; ++i) {
76 | T smaller = std::min(ref[i], gpu[i]);
77 | T larger = std::max(ref[i], gpu[i]);
78 | T diff = larger - smaller;
79 | if (diff > variance)
80 | ++numBadPixels;
81 | }
82 |
83 | if (numBadPixels > tolerance) {
84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 | exit(1);
86 | }
87 | }
88 |
89 | #endif
90 |
--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/ProblemSet5-OptimizedHistogram.vcxproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Debug
6 | Win32
7 |
8 |
9 | Debug
10 | x64
11 |
12 |
13 | Release
14 | Win32
15 |
16 |
17 | Release
18 | x64
19 |
20 |
21 |
22 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}
23 | ProblemSet5_OptimizedHistogram
24 |
25 |
26 |
27 | Application
28 | true
29 | MultiByte
30 | v140
31 |
32 |
33 | Application
34 | true
35 | MultiByte
36 | v140
37 |
38 |
39 | Application
40 | false
41 | true
42 | MultiByte
43 | v140
44 |
45 |
46 | Application
47 | false
48 | true
49 | MultiByte
50 | v140
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
70 | true
71 |
72 |
73 | true
74 |
75 |
76 |
77 | Level3
78 | Disabled
79 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
80 |
81 |
82 | true
83 | Console
84 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
85 |
86 |
87 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
88 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
89 |
90 |
91 |
92 |
93 | Level3
94 | Disabled
95 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
96 |
97 |
98 | true
99 | Console
100 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
101 |
102 |
103 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
104 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
105 |
106 |
107 | 64
108 |
109 |
110 |
111 |
112 | Level3
113 | MaxSpeed
114 | true
115 | true
116 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
117 |
118 |
119 | true
120 | true
121 | true
122 | Console
123 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
124 |
125 |
126 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
127 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
128 |
129 |
130 |
131 |
132 | Level3
133 | MaxSpeed
134 | true
135 | true
136 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
137 |
138 |
139 | true
140 | true
141 | true
142 | Console
143 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
144 |
145 |
146 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
147 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
148 |
149 |
150 | 64
151 |
152 |
153 |
154 |
155 |
156 |
157 |
158 |
159 |
160 |
161 |
162 |
163 |
164 |
165 |
166 |
167 |
168 |
169 |
--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/main.cu:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include "utils.h"
6 | #include "timer.h"
7 | #include
8 | #if defined(_WIN16) || defined(_WIN32) || defined(_WIN64)
9 | #include
10 | #else
11 | #include
12 | #endif
13 |
14 | #include
15 | #include
16 | #include
17 |
18 | #include "reference_calc.h"
19 |
20 | void computeHistogram(const unsigned int *const d_vals,
21 | unsigned int* const d_histo,
22 | const unsigned int numBins,
23 | const unsigned int numElems);
24 |
25 | int main(void)
26 | {
27 | const unsigned int numBins = 1024;
28 | const unsigned int numElems = 10000 * numBins;
29 | const float stddev = 100.f;
30 |
31 | unsigned int *vals = new unsigned int[numElems];
32 | unsigned int *h_vals = new unsigned int[numElems];
33 | unsigned int *h_studentHisto = new unsigned int[numBins];
34 | unsigned int *h_refHisto = new unsigned int[numBins];
35 |
36 | #if defined(_WIN16) || defined(_WIN32) || defined(_WIN64)
37 | srand(GetTickCount());
38 | #else
39 | timeval tv;
40 | gettimeofday(&tv, NULL);
41 |
42 | srand(tv.tv_usec);
43 | #endif
44 |
45 | //make the mean unpredictable, but close enough to the middle
46 | //so that timings are unaffected
47 | unsigned int mean = rand() % 100 + 462;
48 |
49 | //Output mean so that grading can happen with the same inputs
50 | std::cout << mean << std::endl;
51 |
52 | thrust::minstd_rand rng;
53 |
54 | thrust::random::normal_distribution normalDist((float)mean, stddev);
55 |
56 |
57 |
58 | // Generate the random values
59 | for (size_t i = 0; i < numElems; ++i) {
60 | vals[i] = std::min((unsigned int) std::max((int)normalDist(rng), 0), numBins - 1);
61 | }
62 |
63 | unsigned int *d_vals, *d_histo;
64 |
65 | GpuTimer timer;
66 |
67 | checkCudaErrors(cudaMalloc(&d_vals, sizeof(unsigned int) * numElems));
68 | checkCudaErrors(cudaMalloc(&d_histo, sizeof(unsigned int) * numBins));
69 | checkCudaErrors(cudaMemset(d_histo, 0, sizeof(unsigned int) * numBins));
70 |
71 | checkCudaErrors(cudaMemcpy(d_vals, vals, sizeof(unsigned int) * numElems, cudaMemcpyHostToDevice));
72 |
73 | timer.Start();
74 | computeHistogram(d_vals, d_histo, numBins, numElems);
75 | timer.Stop();
76 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
77 |
78 | if (err < 0) {
79 | //Couldn't print! Probably the student closed stdout - bad news
80 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
81 | exit(1);
82 | }
83 |
84 | // copy the student-computed histogram back to the host
85 | checkCudaErrors(cudaMemcpy(h_studentHisto, d_histo, sizeof(unsigned int) * numBins, cudaMemcpyDeviceToHost));
86 |
87 | //generate reference for the given mean
88 | reference_calculation(vals, h_refHisto, numBins, numElems);
89 |
90 | //Now do the comparison
91 | checkResultsExact(h_refHisto, h_studentHisto, numBins);
92 |
93 | delete[] h_vals;
94 | delete[] h_refHisto;
95 | delete[] h_studentHisto;
96 |
97 | cudaFree(d_vals);
98 | cudaFree(d_histo);
99 |
100 | return 0;
101 | }
102 |
--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/reference_calc.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | //Reference Histogram calculation
3 |
4 | void reference_calculation(const unsigned int* const vals,
5 | unsigned int* const histo,
6 | const size_t numBins,
7 | const size_t numElems)
8 |
9 | {
10 | //zero out bins
11 | for (size_t i = 0; i < numBins; ++i)
12 | histo[i] = 0;
13 |
14 | //go through vals and increment appropriate bin
15 | for (size_t i = 0; i < numElems; ++i)
16 | histo[vals[i]]++;
17 | }
18 |
--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 |
4 | //Reference Histogram calculation
5 |
6 | void reference_calculation(const unsigned int* const vals,
7 | unsigned int* const histo,
8 | const size_t numBins,
9 | const size_t numElems);
10 |
11 | #endif
--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/student.cu:
--------------------------------------------------------------------------------
1 | /* Udacity HW5
2 | Histogramming for Speed
3 |
4 | The goal of this assignment is compute a histogram
5 | as fast as possible. We have simplified the problem as much as
6 | possible to allow you to focus solely on the histogramming algorithm.
7 |
8 | The input values that you need to histogram are already the exact
9 | bins that need to be updated. This is unlike in HW3 where you needed
10 | to compute the range of the data and then do:
11 | bin = (val - valMin) / valRange to determine the bin.
12 |
13 | Here the bin is just:
14 | bin = val
15 |
16 | so the serial histogram calculation looks like:
17 | for (i = 0; i < numElems; ++i)
18 | histo[val[i]]++;
19 |
20 | That's it! Your job is to make it run as fast as possible!
21 |
22 | The values are normally distributed - you may take
23 | advantage of this fact in your implementation.
24 |
25 | */
26 |
27 |
28 | #include "utils.h"
29 | #include "device_launch_parameters.h"
30 | #include
31 |
32 | const int N_THREADS = 1024;
33 |
34 |
35 |
36 | __global__
37 | void naiveHisto(const unsigned int* const vals, //INPUT
38 | unsigned int* const histo, //OUPUT
39 | int numVals)
40 | {
41 | int tid = threadIdx.x;
42 | int global_id = tid + blockDim.x*blockIdx.x;
43 | if (global_id >= numVals) return;
44 | atomicAdd(&(histo[vals[global_id]]), 1);
45 | }
46 |
47 | __global__
48 | void perBlockHisto(const unsigned int* const vals, //INPUT
49 | unsigned int* const histo, //OUPUT
50 | int numVals,int numBins) {
51 |
52 | extern __shared__ unsigned int sharedHisto[]; //size as original histo
53 |
54 | //coalesced initialization: multiple blocks could manage the same shared histo
55 | for (int i = threadIdx.x; i < numBins; i += blockDim.x) {
56 | sharedHisto[i] = 0;
57 | }
58 |
59 | __syncthreads();
60 |
61 | int globalid = threadIdx.x + blockIdx.x*blockDim.x;
62 | atomicAdd(&sharedHisto[vals[globalid]], 1);
63 |
64 | __syncthreads();
65 |
66 | for (int i = threadIdx.x; i < numBins; i += blockDim.x) {
67 | atomicAdd(&histo[i], sharedHisto[i]);
68 | }
69 |
70 |
71 | }
72 |
73 |
74 |
75 | void computeHistogram(const unsigned int* const d_vals, //INPUT
76 | unsigned int* const d_histo, //OUTPUT
77 | const unsigned int numBins,
78 | const unsigned int numElems)
79 | {
80 | //TODO Launch the yourHisto kernel
81 |
82 | int blocks = ceil(numElems / N_THREADS);
83 |
84 | //naiveHisto <<< blocks, N_THREADS >>> (d_vals, d_histo, numElems);
85 |
86 |
87 | //more than 7x speedup over naiveHisto
88 | perBlockHisto << > > (d_vals, d_histo, numElems, numBins);
89 |
90 | //if you want to use/launch more than one kernel,
91 | //feel free
92 |
93 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
94 | }
95 |
--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/timer.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_TIMER_H__
2 | #define GPU_TIMER_H__
3 |
4 | #include
5 |
6 | struct GpuTimer
7 | {
8 | cudaEvent_t start;
9 | cudaEvent_t stop;
10 |
11 | GpuTimer()
12 | {
13 | cudaEventCreate(&start);
14 | cudaEventCreate(&stop);
15 | }
16 |
17 | ~GpuTimer()
18 | {
19 | cudaEventDestroy(start);
20 | cudaEventDestroy(stop);
21 | }
22 |
23 | void Start()
24 | {
25 | cudaEventRecord(start, 0);
26 | }
27 |
28 | void Stop()
29 | {
30 | cudaEventRecord(stop, 0);
31 | }
32 |
33 | float Elapsed()
34 | {
35 | float elapsed;
36 | cudaEventSynchronize(stop);
37 | cudaEventElapsedTime(&elapsed, start, stop);
38 | return elapsed;
39 | }
40 | };
41 |
42 | #endif /* GPU_TIMER_H__ */
43 |
--------------------------------------------------------------------------------
/ProblemSet5-OptimizedHistogram/utils.h:
--------------------------------------------------------------------------------
1 | #ifndef UTILS_H__
2 | #define UTILS_H__
3 |
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 |
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 |
15 | template
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 | if (err != cudaSuccess) {
18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 | exit(1);
21 | }
22 | }
23 |
24 | template
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 | //check that the GPU result matches the CPU result
27 | for (size_t i = 0; i < numElem; ++i) {
28 | if (ref[i] != gpu[i]) {
29 | std::cerr << "Difference at pos " << i << std::endl;
30 | //the + is magic to convert char to int without messing
31 | //with other types
32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 | "\nGPU : " << +gpu[i] << std::endl;
34 | exit(1);
35 | }
36 | }
37 | }
38 |
39 | template
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 | assert(eps1 >= 0 && eps2 >= 0);
42 | unsigned long long totalDiff = 0;
43 | unsigned numSmallDifferences = 0;
44 | for (size_t i = 0; i < numElem; ++i) {
45 | //subtract smaller from larger in case of unsigned types
46 | T smaller = std::min(ref[i], gpu[i]);
47 | T larger = std::max(ref[i], gpu[i]);
48 | T diff = larger - smaller;
49 | if (diff > 0 && diff <= eps1) {
50 | numSmallDifferences++;
51 | }
52 | else if (diff > eps1) {
53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 | "\nGPU : " << +gpu[i] << std::endl;
56 | exit(1);
57 | }
58 | totalDiff += diff * diff;
59 | }
60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 | if (percentSmallDifferences > eps2) {
62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 | exit(1);
65 | }
66 | }
67 |
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 |
74 | size_t numBadPixels = 0;
75 | for (size_t i = 0; i < numElem; ++i) {
76 | T smaller = std::min(ref[i], gpu[i]);
77 | T larger = std::max(ref[i], gpu[i]);
78 | T diff = larger - smaller;
79 | if (diff > variance)
80 | ++numBadPixels;
81 | }
82 |
83 | if (numBadPixels > tolerance) {
84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 | exit(1);
86 | }
87 | }
88 |
89 | #endif
90 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/HW6.cu:
--------------------------------------------------------------------------------
1 | #include "utils.h"
2 | #include
3 | #include
4 | #include
5 | #include
6 |
7 | #include "loadSaveImage.h"
8 | #include
9 |
10 |
11 | //return types are void since any internal error will be handled by quitting
12 | //no point in returning error codes...
13 | void preProcess( uchar4 **sourceImg,
14 | size_t &numRows, size_t &numCols,
15 | uchar4 **destImg,
16 | uchar4 **blendedImg, const std::string& source_filename,
17 | const std::string& dest_filename){
18 |
19 | //make sure the context initializes ok
20 | checkCudaErrors(cudaFree(0));
21 |
22 | size_t numRowsSource, numColsSource, numRowsDest, numColsDest;
23 |
24 | loadImageRGBA(source_filename, sourceImg, &numRowsSource, &numColsSource);
25 | loadImageRGBA(dest_filename, destImg, &numRowsDest, &numColsDest);
26 |
27 | assert(numRowsSource == numRowsDest);
28 | assert(numColsSource == numColsDest);
29 |
30 | numRows = numRowsSource;
31 | numCols = numColsSource;
32 |
33 | *blendedImg = new uchar4[numRows * numCols];
34 |
35 | }
36 |
37 | void postProcess(const uchar4* const blendedImg,
38 | const size_t numRowsDest, const size_t numColsDest,
39 | const std::string& output_file)
40 | {
41 | //just need to save the image...
42 | saveImageRGBA(blendedImg, numRowsDest, numColsDest, output_file);
43 | }
44 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/HW6_differenceImage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_differenceImage.png
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/HW6_output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_output.png
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/HW6_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_reference.png
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/blended.gold:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/blended.gold
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/compare.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include "utils.h"
3 |
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | double perPixelError, double globalError)
6 | {
7 | cv::Mat reference = cv::imread(reference_filename, -1);
8 | cv::Mat test = cv::imread(test_filename, -1);
9 |
10 | cv::Mat diff = abs(reference - test);
11 |
12 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows
13 |
14 | double minVal, maxVal;
15 |
16 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location
17 |
18 | //now perform transform so that we bump values to the full range
19 |
20 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal));
21 |
22 | diff = diffSingleChannel.reshape(reference.channels(), 0);
23 |
24 | cv::imwrite("HW6_differenceImage.png", diff);
25 | //OK, now we can start comparing values...
26 | unsigned char *referencePtr = reference.ptr(0);
27 | unsigned char *testPtr = test.ptr(0);
28 |
29 | if (useEpsCheck) {
30 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError);
31 | }
32 | else
33 | {
34 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels());
35 | }
36 |
37 | std::cout << "PASS" << std::endl;
38 | return;
39 | }
40 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/compare.h:
--------------------------------------------------------------------------------
1 | #ifndef HW3_H__
2 | #define HW3_H__
3 |
4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck,
5 | double perPixelError, double globalError);
6 |
7 | #endif
8 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/destination.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/destination.png
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/loadSaveImage.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include "cuda_runtime.h"
6 |
7 | //The caller becomes responsible for the returned pointer. This
8 | //is done in the interest of keeping this code as simple as possible.
9 | //In production code this is a bad idea - we should use RAII
10 | //to ensure the memory is freed. DO NOT COPY THIS AND USE IN PRODUCTION
11 | //CODE!!!
12 | void loadImageHDR(const std::string &filename,
13 | float **imagePtr,
14 | size_t *numRows, size_t *numCols)
15 | {
16 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH);
17 | if (image.empty()) {
18 | std::cerr << "Couldn't open file: " << filename << std::endl;
19 | exit(1);
20 | }
21 |
22 | if (image.channels() != 3) {
23 | std::cerr << "Image must be color!" << std::endl;
24 | exit(1);
25 | }
26 |
27 | if (!image.isContinuous()) {
28 | std::cerr << "Image isn't continuous!" << std::endl;
29 | exit(1);
30 | }
31 |
32 | *imagePtr = new float[image.rows * image.cols * image.channels()];
33 |
34 | float *cvPtr = image.ptr(0);
35 | for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i)
36 | (*imagePtr)[i] = cvPtr[i];
37 |
38 | *numRows = image.rows;
39 | *numCols = image.cols;
40 | }
41 |
42 | void loadImageGrey(const std::string &filename,
43 | unsigned char **imagePtr,
44 | size_t *numRows, size_t *numCols)
45 | {
46 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_GRAYSCALE);
47 | if (image.empty()) {
48 | std::cerr << "Couldn't open file: " << filename << std::endl;
49 | exit(1);
50 | }
51 |
52 | if (image.channels() != 1) {
53 | std::cerr << "Image must be greyscale!" << std::endl;
54 | exit(1);
55 | }
56 |
57 | if (!image.isContinuous()) {
58 | std::cerr << "Image isn't continuous!" << std::endl;
59 | exit(1);
60 | }
61 |
62 | *imagePtr = new unsigned char[image.rows * image.cols];
63 |
64 | unsigned char *cvPtr = image.ptr(0);
65 | for (size_t i = 0; i < image.rows * image.cols; ++i) {
66 | (*imagePtr)[i] = cvPtr[i];
67 | }
68 |
69 | *numRows = image.rows;
70 | *numCols = image.cols;
71 | }
72 | void loadImageRGBA(const std::string &filename,
73 | uchar4 **imagePtr,
74 | size_t *numRows, size_t *numCols)
75 | {
76 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR);
77 | if (image.empty()) {
78 | std::cerr << "Couldn't open file: " << filename << std::endl;
79 | exit(1);
80 | }
81 |
82 | if (image.channels() != 3) {
83 | std::cerr << "Image must be color!" << std::endl;
84 | exit(1);
85 | }
86 |
87 | if (!image.isContinuous()) {
88 | std::cerr << "Image isn't continuous!" << std::endl;
89 | exit(1);
90 | }
91 |
92 | cv::Mat imageRGBA;
93 | cv::cvtColor(image, imageRGBA, CV_BGR2RGBA);
94 |
95 | *imagePtr = new uchar4[image.rows * image.cols];
96 |
97 | unsigned char *cvPtr = imageRGBA.ptr(0);
98 | for (size_t i = 0; i < image.rows * image.cols; ++i) {
99 | (*imagePtr)[i].x = cvPtr[4 * i + 0];
100 | (*imagePtr)[i].y = cvPtr[4 * i + 1];
101 | (*imagePtr)[i].z = cvPtr[4 * i + 2];
102 | (*imagePtr)[i].w = cvPtr[4 * i + 3];
103 | }
104 |
105 | *numRows = image.rows;
106 | *numCols = image.cols;
107 | }
108 |
109 | void saveImageRGBA(const uchar4* const image,
110 | const size_t numRows, const size_t numCols,
111 | const std::string &output_file)
112 | {
113 | int sizes[2];
114 | sizes[0] = numRows;
115 | sizes[1] = numCols;
116 | cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image);
117 | cv::Mat imageOutputBGR;
118 | cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR);
119 | //output the image
120 | cv::imwrite(output_file.c_str(), imageOutputBGR);
121 | }
122 |
123 | //output an exr file
124 | //assumed to already be BGR
125 | void saveImageHDR(const float* const image,
126 | const size_t numRows, const size_t numCols,
127 | const std::string &output_file)
128 | {
129 | int sizes[2];
130 | sizes[0] = numRows;
131 | sizes[1] = numCols;
132 |
133 | cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image);
134 |
135 | imageHDR = imageHDR * 255;
136 |
137 | cv::imwrite(output_file.c_str(), imageHDR);
138 | }
139 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/loadSaveImage.h:
--------------------------------------------------------------------------------
1 | #ifndef LOADSAVEIMAGE_H__
2 | #define LOADSAVEIMAGE_H__
3 |
4 | #include
5 | #include //for uchar4
6 |
7 | void loadImageHDR(const std::string &filename,
8 | float **imagePtr,
9 | size_t *numRows, size_t *numCols);
10 |
11 | void loadImageRGBA(const std::string &filename,
12 | uchar4 **imagePtr,
13 | size_t *numRows, size_t *numCols);
14 |
15 | void loadImageGrey(const std::string &filename,
16 | unsigned char **imagePtr,
17 | size_t *numRows, size_t *numCols);
18 |
19 | void saveImageRGBA(const uchar4* const image,
20 | const size_t numRows, const size_t numCols,
21 | const std::string &output_file);
22 |
23 | void saveImageHDR(const float* const image,
24 | const size_t numRows, const size_t numCols,
25 | const std::string &output_file);
26 |
27 | #endif
28 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/main.cpp:
--------------------------------------------------------------------------------
1 | //Udacity HW6 Driver
2 |
3 | #include
4 | #include "timer.h"
5 | #include "utils.h"
6 | #include
7 | #include
8 |
9 | #include
10 | #include
11 | #include
12 |
13 | #include "reference_calc.h"
14 | #include "compare.h"
15 |
16 | void preProcess( uchar4 **sourceImg, size_t &numRowsSource, size_t &numColsSource,
17 | uchar4 **destImg,
18 | uchar4 **blendedImg, const std::string& source_filename,
19 | const std::string& dest_filename);
20 |
21 | void postProcess(const uchar4* const blendedImg,
22 | const size_t numRowsDest, const size_t numColsDest,
23 | const std::string& output_file);
24 |
25 | void your_blend(const uchar4* const sourceImg,
26 | const size_t numRowsSource, const size_t numColsSource,
27 | const uchar4* const destImg,
28 | uchar4* const blendedImg);
29 |
30 | int main(int argc, char **argv) {
31 | uchar4 *h_sourceImg, *h_destImg, *h_blendedImg;
32 | size_t numRowsSource, numColsSource;
33 |
34 | std::string input_source_file;
35 | std::string input_dest_file;
36 | std::string output_file;
37 |
38 | std::string reference_file;
39 | double perPixelError = 0.0;
40 | double globalError = 0.0;
41 | bool useEpsCheck = false;
42 |
43 | switch (argc)
44 | {
45 | case 3:
46 | input_source_file = std::string(argv[1]);
47 | input_dest_file = std::string(argv[2]);
48 | output_file = "HW6_output.png";
49 | reference_file = "HW6_reference.png";
50 | break;
51 | case 4:
52 | input_source_file = std::string(argv[1]);
53 | input_dest_file = std::string(argv[2]);
54 | output_file = std::string(argv[3]);
55 | reference_file = "HW6_reference.png";
56 | break;
57 | case 5:
58 | input_source_file = std::string(argv[1]);
59 | input_dest_file = std::string(argv[2]);
60 | output_file = std::string(argv[3]);
61 | reference_file = std::string(argv[4]);
62 | break;
63 | case 7:
64 | useEpsCheck=true;
65 | input_source_file = std::string(argv[1]);
66 | input_dest_file = std::string(argv[2]);
67 | output_file = std::string(argv[3]);
68 | reference_file = std::string(argv[4]);
69 | perPixelError = atof(argv[5]);
70 | globalError = atof(argv[6]);
71 | break;
72 | default:
73 | std::cerr << "Usage: ./HW6 input_source_file input_dest_filename [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl;
74 | exit(1);
75 | }
76 |
77 | //load the image and give us our input and output pointers
78 | preProcess(&h_sourceImg, numRowsSource, numColsSource,
79 | &h_destImg,
80 | &h_blendedImg, input_source_file, input_dest_file);
81 |
82 | GpuTimer timer;
83 | timer.Start();
84 |
85 | //call the students' code
86 | your_blend(h_sourceImg, numRowsSource, numColsSource,
87 | h_destImg,
88 | h_blendedImg);
89 |
90 | timer.Stop();
91 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
92 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed());
93 | printf("\n");
94 | if (err < 0) {
95 | //Couldn't print! Probably the student closed stdout - bad news
96 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl;
97 | exit(1);
98 | }
99 |
100 | //check results and output the tone-mapped image
101 | postProcess(h_blendedImg, numRowsSource, numColsSource, output_file);
102 |
103 | // calculate the reference image
104 | uchar4* h_reference = new uchar4[numRowsSource*numColsSource];
105 | reference_calc(h_sourceImg, numRowsSource, numColsSource,
106 | h_destImg, h_reference);
107 |
108 | // save the reference image
109 | postProcess(h_reference, numRowsSource, numColsSource, reference_file);
110 |
111 | compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError);
112 |
113 | delete[] h_reference;
114 | delete[] h_destImg;
115 | delete[] h_sourceImg;
116 | delete[] h_blendedImg;
117 | return 0;
118 | }
119 |
120 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/reference_calc.cpp:
--------------------------------------------------------------------------------
1 | //Udacity HW 6
2 | //Poisson Blending Reference Calculation
3 |
4 | #include "utils.h"
5 | #include
6 |
7 | //Performs one iteration of the solver
8 | void computeIteration(const unsigned char* const dstImg,
9 | const unsigned char* const strictInteriorPixels,
10 | const unsigned char* const borderPixels,
11 | const std::vector& interiorPixelList,
12 | const size_t numColsSource,
13 | const float* const f,
14 | const float* const g,
15 | float* const f_next)
16 | {
17 | unsigned int off = interiorPixelList[0].x * numColsSource + interiorPixelList[0].y;
18 |
19 | for (size_t i = 0; i < interiorPixelList.size(); ++i) {
20 | float blendedSum = 0.f;
21 | float borderSum = 0.f;
22 |
23 | uint2 coord = interiorPixelList[i];
24 |
25 | unsigned int offset = coord.x * numColsSource + coord.y;
26 |
27 | //process all 4 neighbor pixels
28 | //for each pixel if it is an interior pixel
29 | //then we add the previous f, otherwise if it is a
30 | //border pixel then we add the value of the destination
31 | //image at the border. These border values are our boundary
32 | //conditions.
33 | if (strictInteriorPixels[offset - 1]) {
34 | blendedSum += f[offset - 1];
35 | }
36 | else {
37 | borderSum += dstImg[offset - 1];
38 | }
39 |
40 | if (strictInteriorPixels[offset + 1]) {
41 | blendedSum += f[offset + 1];
42 | }
43 | else {
44 | borderSum += dstImg[offset + 1];
45 | }
46 |
47 | if (strictInteriorPixels[offset - numColsSource]) {
48 | blendedSum += f[offset - numColsSource];
49 | }
50 | else {
51 | borderSum += dstImg[offset - numColsSource];
52 | }
53 |
54 | if (strictInteriorPixels[offset + numColsSource]) {
55 | blendedSum += f[offset + numColsSource];
56 | }
57 | else {
58 | borderSum += dstImg[offset + numColsSource];
59 | }
60 |
61 | float f_next_val = (blendedSum + borderSum + g[offset]) / 4.f;
62 |
63 | f_next[offset] = std::min(255.f, std::max(0.f, f_next_val)); //clip to [0, 255]
64 | }
65 |
66 | }
67 |
68 | //pre-compute the values of g, which depend only the source image
69 | //and aren't iteration dependent.
70 | void computeG(const unsigned char* const channel,
71 | float* const g,
72 | const size_t numColsSource,
73 | const std::vector& interiorPixelList)
74 | {
75 | for (size_t i = 0; i < interiorPixelList.size(); ++i) {
76 | uint2 coord = interiorPixelList[i];
77 | unsigned int offset = coord.x * numColsSource + coord.y;
78 |
79 | float sum = 4.f * channel[offset];
80 |
81 | sum -= (float)channel[offset - 1] + (float)channel[offset + 1];
82 | sum -= (float)channel[offset + numColsSource] + (float)channel[offset - numColsSource];
83 |
84 | g[offset] = sum;
85 | }
86 | }
87 |
88 | void reference_calc(const uchar4* const h_sourceImg,
89 | const size_t numRowsSource, const size_t numColsSource,
90 | const uchar4* const h_destImg,
91 | uchar4* const h_blendedImg){
92 |
93 | //we need to create a list of border pixels and interior pixels
94 | //this is a conceptually simple implementation, not a particularly efficient one...
95 |
96 | //first create mask
97 | size_t srcSize = numRowsSource * numColsSource;
98 | unsigned char* mask = new unsigned char[srcSize];
99 |
100 | for (int i = 0; i < srcSize; ++i) {
101 | mask[i] = (h_sourceImg[i].x + h_sourceImg[i].y + h_sourceImg[i].z < 3 * 255) ? 1 : 0;
102 | }
103 |
104 | //next compute strictly interior pixels and border pixels
105 | unsigned char *borderPixels = new unsigned char[srcSize];
106 | unsigned char *strictInteriorPixels = new unsigned char[srcSize];
107 |
108 | std::vector interiorPixelList;
109 |
110 | //the source region in the homework isn't near an image boundary, so we can
111 | //simplify the conditionals a little...
112 | for (size_t r = 1; r < numRowsSource - 1; ++r) {
113 | for (size_t c = 1; c < numColsSource - 1; ++c) {
114 | if (mask[r * numColsSource + c]) {
115 | if (mask[(r -1) * numColsSource + c] && mask[(r + 1) * numColsSource + c] &&
116 | mask[r * numColsSource + c - 1] && mask[r * numColsSource + c + 1]) {
117 | strictInteriorPixels[r * numColsSource + c] = 1;
118 | borderPixels[r * numColsSource + c] = 0;
119 | interiorPixelList.push_back(make_uint2(r, c));
120 | }
121 | else {
122 | strictInteriorPixels[r * numColsSource + c] = 0;
123 | borderPixels[r * numColsSource + c] = 1;
124 | }
125 | }
126 | else {
127 | strictInteriorPixels[r * numColsSource + c] = 0;
128 | borderPixels[r * numColsSource + c] = 0;
129 |
130 | }
131 | }
132 | }
133 |
134 | //split the source and destination images into their respective
135 | //channels
136 | unsigned char* red_src = new unsigned char[srcSize];
137 | unsigned char* blue_src = new unsigned char[srcSize];
138 | unsigned char* green_src = new unsigned char[srcSize];
139 |
140 | for (int i = 0; i < srcSize; ++i) {
141 | red_src[i] = h_sourceImg[i].x;
142 | blue_src[i] = h_sourceImg[i].y;
143 | green_src[i] = h_sourceImg[i].z;
144 | }
145 |
146 | unsigned char* red_dst = new unsigned char[srcSize];
147 | unsigned char* blue_dst = new unsigned char[srcSize];
148 | unsigned char* green_dst = new unsigned char[srcSize];
149 |
150 | for (int i = 0; i < srcSize; ++i) {
151 | red_dst[i] = h_destImg[i].x;
152 | blue_dst[i] = h_destImg[i].y;
153 | green_dst[i] = h_destImg[i].z;
154 | }
155 |
156 | //next we'll precompute the g term - it never changes, no need to recompute every iteration
157 | float *g_red = new float[srcSize];
158 | float *g_blue = new float[srcSize];
159 | float *g_green = new float[srcSize];
160 |
161 | memset(g_red, 0, srcSize * sizeof(float));
162 | memset(g_blue, 0, srcSize * sizeof(float));
163 | memset(g_green, 0, srcSize * sizeof(float));
164 |
165 | computeG(red_src, g_red, numColsSource, interiorPixelList);
166 | computeG(blue_src, g_blue, numColsSource, interiorPixelList);
167 | computeG(green_src, g_green, numColsSource, interiorPixelList);
168 |
169 | //for each color channel we'll need two buffers and we'll ping-pong between them
170 | float *blendedValsRed_1 = new float[srcSize];
171 | float *blendedValsRed_2 = new float[srcSize];
172 |
173 | float *blendedValsBlue_1 = new float[srcSize];
174 | float *blendedValsBlue_2 = new float[srcSize];
175 |
176 | float *blendedValsGreen_1 = new float[srcSize];
177 | float *blendedValsGreen_2 = new float[srcSize];
178 |
179 | //IC is the source image, copy over
180 | for (size_t i = 0; i < srcSize; ++i) {
181 | blendedValsRed_1[i] = red_src[i];
182 | blendedValsRed_2[i] = red_src[i];
183 | blendedValsBlue_1[i] = blue_src[i];
184 | blendedValsBlue_2[i] = blue_src[i];
185 | blendedValsGreen_1[i] = green_src[i];
186 | blendedValsGreen_2[i] = green_src[i];
187 | }
188 |
189 | //Perform the solve on each color channel
190 | const size_t numIterations = 800;
191 | for (size_t i = 0; i < numIterations; ++i) {
192 | computeIteration(red_dst, strictInteriorPixels, borderPixels,
193 | interiorPixelList, numColsSource, blendedValsRed_1, g_red,
194 | blendedValsRed_2);
195 |
196 | std::swap(blendedValsRed_1, blendedValsRed_2);
197 | }
198 |
199 | for (size_t i = 0; i < numIterations; ++i) {
200 | computeIteration(blue_dst, strictInteriorPixels, borderPixels,
201 | interiorPixelList, numColsSource, blendedValsBlue_1, g_blue,
202 | blendedValsBlue_2);
203 |
204 | std::swap(blendedValsBlue_1, blendedValsBlue_2);
205 | }
206 |
207 | for (size_t i = 0; i < numIterations; ++i) {
208 | computeIteration(green_dst, strictInteriorPixels, borderPixels,
209 | interiorPixelList, numColsSource, blendedValsGreen_1, g_green,
210 | blendedValsGreen_2);
211 |
212 | std::swap(blendedValsGreen_1, blendedValsGreen_2);
213 | }
214 | std::swap(blendedValsRed_1, blendedValsRed_2); //put output into _2
215 | std::swap(blendedValsBlue_1, blendedValsBlue_2); //put output into _2
216 | std::swap(blendedValsGreen_1, blendedValsGreen_2); //put output into _2
217 |
218 | //copy the destination image to the output
219 | memcpy(h_blendedImg, h_destImg, sizeof(uchar4) * srcSize);
220 |
221 | //copy computed values for the interior into the output
222 | for (size_t i = 0; i < interiorPixelList.size(); ++i) {
223 | uint2 coord = interiorPixelList[i];
224 |
225 | unsigned int offset = coord.x * numColsSource + coord.y;
226 |
227 | h_blendedImg[offset].x = blendedValsRed_2[offset];
228 | h_blendedImg[offset].y = blendedValsBlue_2[offset];
229 | h_blendedImg[offset].z = blendedValsGreen_2[offset];
230 | }
231 |
232 | //wow, we allocated a lot of memory!
233 | delete[] mask;
234 | delete[] blendedValsRed_1;
235 | delete[] blendedValsRed_2;
236 | delete[] blendedValsBlue_1;
237 | delete[] blendedValsBlue_2;
238 | delete[] blendedValsGreen_1;
239 | delete[] blendedValsGreen_2;
240 | delete[] g_red;
241 | delete[] g_blue;
242 | delete[] g_green;
243 | delete[] red_src;
244 | delete[] red_dst;
245 | delete[] blue_src;
246 | delete[] blue_dst;
247 | delete[] green_src;
248 | delete[] green_dst;
249 | delete[] borderPixels;
250 | delete[] strictInteriorPixels;
251 | }
252 |
253 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/reference_calc.h:
--------------------------------------------------------------------------------
1 | #ifndef REFERENCE_H__
2 | #define REFERENCE_H__
3 |
4 | void reference_calc(const uchar4* const h_sourceImg,
5 | const size_t numRowsSource, const size_t numColsSource,
6 | const uchar4* const h_destImg,
7 | uchar4* const h_blendedImg);
8 |
9 | #endif
10 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/source.png
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/timer.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_TIMER_H__
2 | #define GPU_TIMER_H__
3 |
4 | #include
5 |
6 | struct GpuTimer
7 | {
8 | cudaEvent_t start;
9 | cudaEvent_t stop;
10 |
11 | GpuTimer()
12 | {
13 | cudaEventCreate(&start);
14 | cudaEventCreate(&stop);
15 | }
16 |
17 | ~GpuTimer()
18 | {
19 | cudaEventDestroy(start);
20 | cudaEventDestroy(stop);
21 | }
22 |
23 | void Start()
24 | {
25 | cudaEventRecord(start, 0);
26 | }
27 |
28 | void Stop()
29 | {
30 | cudaEventRecord(stop, 0);
31 | }
32 |
33 | float Elapsed()
34 | {
35 | float elapsed;
36 | cudaEventSynchronize(stop);
37 | cudaEventElapsedTime(&elapsed, start, stop);
38 | return elapsed;
39 | }
40 | };
41 |
42 | #endif /* GPU_TIMER_H__ */
43 |
--------------------------------------------------------------------------------
/ProblemSet6-SeamlessImageCloning/utils.h:
--------------------------------------------------------------------------------
1 | #ifndef UTILS_H__
2 | #define UTILS_H__
3 |
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 |
13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__)
14 |
15 | template
16 | void check(T err, const char* const func, const char* const file, const int line) {
17 | if (err != cudaSuccess) {
18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
20 | exit(1);
21 | }
22 | }
23 |
24 | template
25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) {
26 | //check that the GPU result matches the CPU result
27 | for (size_t i = 0; i < numElem; ++i) {
28 | if (ref[i] != gpu[i]) {
29 | std::cerr << "Difference at pos " << i << std::endl;
30 | //the + is magic to convert char to int without messing
31 | //with other types
32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
33 | "\nGPU : " << +gpu[i] << std::endl;
34 | exit(1);
35 | }
36 | }
37 | }
38 |
39 | template
40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) {
41 | assert(eps1 >= 0 && eps2 >= 0);
42 | unsigned long long totalDiff = 0;
43 | unsigned numSmallDifferences = 0;
44 | for (size_t i = 0; i < numElem; ++i) {
45 | //subtract smaller from larger in case of unsigned types
46 | T smaller = std::min(ref[i], gpu[i]);
47 | T larger = std::max(ref[i], gpu[i]);
48 | T diff = larger - smaller;
49 | if (diff > 0 && diff <= eps1) {
50 | numSmallDifferences++;
51 | }
52 | else if (diff > eps1) {
53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl;
54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] <<
55 | "\nGPU : " << +gpu[i] << std::endl;
56 | exit(1);
57 | }
58 | totalDiff += diff * diff;
59 | }
60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem;
61 | if (percentSmallDifferences > eps2) {
62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl;
63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl;
64 | exit(1);
65 | }
66 | }
67 |
68 | //Uses the autodesk method of image comparison
69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels
70 | template
71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance)
72 | {
73 |
74 | size_t numBadPixels = 0;
75 | for (size_t i = 0; i < numElem; ++i) {
76 | T smaller = std::min(ref[i], gpu[i]);
77 | T larger = std::max(ref[i], gpu[i]);
78 | T diff = larger - smaller;
79 | if (diff > variance)
80 | ++numBadPixels;
81 | }
82 |
83 | if (numBadPixels > tolerance) {
84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl;
85 | exit(1);
86 | }
87 | }
88 |
89 | #endif
90 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # udacity-IntroToParallelProgramming
2 | CS344 - Introduction To Parallel Programming course (Udacity) proposed solutions
3 |
4 | Testing Environment: Visual Studio 2015 x64 + nVidia CUDA 8.0 + OpenCV 3.2.0
5 |
6 | For each problem set, the core of the algorithm to be implemented is located in the _students_func.cu_ file.
7 |
8 | ## Problem Set 1 - RGB2Gray:
9 | ### Objective
10 | Convert an input RGBA image into grayscale version (ignoring the A channel).
11 | ### Topics
12 | Example of a **map** primitive operation on a data structure.
13 |
14 | ## Problem Set 2 - Blur
15 | ### Objective
16 | Apply a Gaussian blur convolution filter to an input RGBA image (blur each channel independently, ignoring the A channel).
17 | ### Topics
18 | Example of a **stencil** primitive operation on a 2D array. Use of the **shared memory** in order to speed-up the algorithm. Both global memory and shared memory based kernels are provided, the latter providing approx. 1.6 speedup over the first.
19 |
20 | ## Problem Set 3 -Tone Mapping
21 | ### Objective
22 | Map a High Dynamic Range image into an image for a device supporting a smaller range of intensity values.
23 | ### Topics
24 | - Compute range of intensity values of the input image: min and max **reduce** implemented.
25 | - Compute **histogram** of intensity values (1024-values array)
26 | - Compute the cumulative ditribution function of the histogram: Hillis & Steele **scan** algorithm (step-efficient, well suited for small arrays like the histogram one).
27 |
28 | ## Problem Set 4 - Red eyes removal
29 | ### Objective
30 | Remove red eys effect from an inout RGBA image (it uses Normalized Cross Correlation against a training template).
31 | ### Topics
32 | Sorting algorithms with GPU: given an input array of NCC scores, sort it in ascending order: **radix sort**. For each bit:
33 | - Compute a predicate vector (0:false, 1:true)
34 | - Performs **Bielloch Scan** on the predicate vector (for both false and positive cases)
35 | - From Bielloch Scan extracts: an histogram of predicate values [0 numberOfFalses], an offset vector (the actual result of scan)
36 | - A move kernel computes the new index of each element (using the two structures above), and moves it.
37 |
38 | ## Problem Set 5 - Optimized histogram computation
39 | ### Objective
40 | Improve the histogram computation performance on GPU over the simple global atomic solution.
41 | ### Topics
42 | **Per-block** histogram computation. Each block computes his own histogram in shared memory, and histograms are combined at the end in global memory (more than 7x speedup over global atomic implementation, while being relatively simple).
43 |
44 | ## Problem Set 6 - Seamless Image Cloning
45 | ### Objective
46 | Given a target image (e.g. a swimming pool), do a seamless attachment of a source image mask (e.g. an hyppo).
47 | ### Topics
48 | The algorithm consists into performing Jacobi iterations on the source and target image to blend one with the other.
49 | - Given the mask, detect the interior points and the boundary points
50 | - Since the algorithm has to be performed only on the interior points, compute the **bounding box** of the mask region to restrict the Jacobi iterations on a subimage.
51 | - Split the images in the R,G and B channels.
52 | - Run 800 Jacobi iterations on each channel. The code makes use of **CUDA Streams** to run concurrently the same kernel on the 3 different channels (speedup of 3x on my machine, of 1.5x on the Udacity machine). The Jacobi kernel makes extensive use of shared memory, so the number of threads per block has been reduced to maximize SM's occupancy.
53 | - Recombine the 3 channels to form the output image.
54 |
--------------------------------------------------------------------------------