├── .gitattributes ├── .gitignore ├── IntroParallelProgramming.sln ├── Lesson1-CubeNumbers ├── IntroParallelProgramming.vcxproj └── main.cu ├── Lesson4-Reduction └── Lesson4-Reduction.vcxproj ├── ProblemSet1-RGB2Gray ├── CMakeLists.txt ├── HW1.cpp ├── HW1_differenceImage.png ├── HW1_reference.png ├── Makefile ├── RGB2Gray.vcxproj ├── cinque_terre.gold ├── cinque_terre_gray.jpg ├── cinque_terre_small.jpg ├── compare.cpp ├── compare.h ├── main.cpp ├── reference_calc.cpp ├── reference_calc.h ├── student_func.cu ├── timer.h └── utils.h ├── ProblemSet2-Blur ├── CMakeLists.txt ├── HW2.cpp ├── HW2_differenceImage.png ├── HW2_reference.png ├── Makefile ├── ProblemSet2-Blur.vcxproj ├── cinque_terre.gold ├── cinque_terre_blur.jpg ├── cinque_terre_small.jpg ├── compare.cpp ├── compare.h ├── main.cpp ├── reference_calc.cpp ├── reference_calc.h ├── student_func.cu ├── timer.h └── utils.h ├── ProblemSet3-ToneMapping ├── CMakeLists.txt ├── HDR-image.jpg ├── HDR-image_mapped.png ├── HW3.cu ├── HW3_differenceImage.png ├── HW3_reference.png ├── HW3_reference_old.png ├── Makefile ├── ProblemSet3-ToneMapping.vcxproj ├── compare.cpp ├── compare.h ├── input.png ├── loadSaveImage.cpp ├── loadSaveImage.h ├── main.cpp ├── memorial.exr ├── memorial_large.exr ├── memorial_png.gold ├── memorial_png_large.gold ├── memorial_raw.png ├── memorial_raw_large.png ├── memorial_raw_large_mapped.png ├── memorial_raw_mapped.png ├── my_output.png ├── reference_calc.cpp ├── reference_calc.h ├── student_func.cu ├── timer.h └── utils.h ├── ProblemSet4-RedEyeRemoval ├── CMakeLists.txt ├── HW4.cu ├── HW4_output.png ├── Makefile ├── ProblemSet4-RedEyeRemoval.vcxproj ├── compare.cpp ├── compare.h ├── loadSaveImage.cpp ├── loadSaveImage.h ├── main.cpp ├── red_eye_effect.gold ├── red_eye_effect_5.jpg ├── red_eye_effect_5_out.jpg ├── red_eye_effect_template_5.jpg ├── reference_calc.cpp ├── reference_calc.h ├── student_func.cu ├── timer.h └── utils.h ├── ProblemSet5-OptimizedHistogram ├── ProblemSet5-OptimizedHistogram.vcxproj ├── main.cu ├── reference_calc.cpp ├── reference_calc.h ├── student.cu ├── timer.h └── utils.h ├── ProblemSet6-SeamlessImageCloning ├── HW6.cu ├── HW6_differenceImage.png ├── HW6_output.png ├── HW6_reference.png ├── ProblemSet6-SeamlessImageCloning.vcxproj ├── blended.gold ├── compare.cpp ├── compare.h ├── destination.png ├── loadSaveImage.cpp ├── loadSaveImage.h ├── main.cpp ├── reference_calc.cpp ├── reference_calc.h ├── source.png ├── student_func.cu ├── timer.h └── utils.h └── README.md /.gitattributes: -------------------------------------------------------------------------------- 1 | ############################################################################### 2 | # Set default behavior to automatically normalize line endings. 3 | ############################################################################### 4 | * text=auto 5 | 6 | ############################################################################### 7 | # Set default behavior for command prompt diff. 8 | # 9 | # This is need for earlier builds of msysgit that does not have it on by 10 | # default for csharp files. 11 | # Note: This is only used by command line 12 | ############################################################################### 13 | #*.cs diff=csharp 14 | 15 | ############################################################################### 16 | # Set the merge driver for project and solution files 17 | # 18 | # Merging from the command prompt will add diff markers to the files if there 19 | # are conflicts (Merging from VS is not affected by the settings below, in VS 20 | # the diff markers are never inserted). Diff markers may cause the following 21 | # file extensions to fail to load in VS. An alternative would be to treat 22 | # these files as binary and thus will always conflict and require user 23 | # intervention with every merge. To do so, just uncomment the entries below 24 | ############################################################################### 25 | #*.sln merge=binary 26 | #*.csproj merge=binary 27 | #*.vbproj merge=binary 28 | #*.vcxproj merge=binary 29 | #*.vcproj merge=binary 30 | #*.dbproj merge=binary 31 | #*.fsproj merge=binary 32 | #*.lsproj merge=binary 33 | #*.wixproj merge=binary 34 | #*.modelproj merge=binary 35 | #*.sqlproj merge=binary 36 | #*.wwaproj merge=binary 37 | 38 | ############################################################################### 39 | # behavior for image files 40 | # 41 | # image files are treated as binary by default. 42 | ############################################################################### 43 | #*.jpg binary 44 | #*.png binary 45 | #*.gif binary 46 | 47 | ############################################################################### 48 | # diff behavior for common document formats 49 | # 50 | # Convert binary document formats to text before diffing them. This feature 51 | # is only available from the command line. Turn it on by uncommenting the 52 | # entries below. 53 | ############################################################################### 54 | #*.doc diff=astextplain 55 | #*.DOC diff=astextplain 56 | #*.docx diff=astextplain 57 | #*.DOCX diff=astextplain 58 | #*.dot diff=astextplain 59 | #*.DOT diff=astextplain 60 | #*.pdf diff=astextplain 61 | #*.PDF diff=astextplain 62 | #*.rtf diff=astextplain 63 | #*.RTF diff=astextplain 64 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ## Ignore Visual Studio temporary files, build results, and 2 | ## files generated by popular Visual Studio add-ons. 3 | 4 | # User-specific files 5 | *.suo 6 | *.user 7 | *.userosscache 8 | *.sln.docstates 9 | 10 | # User-specific files (MonoDevelop/Xamarin Studio) 11 | *.userprefs 12 | 13 | # Build results 14 | [Dd]ebug/ 15 | [Dd]ebugPublic/ 16 | [Rr]elease/ 17 | [Rr]eleases/ 18 | [Xx]64/ 19 | [Xx]86/ 20 | [Bb]uild/ 21 | bld/ 22 | [Bb]in/ 23 | [Oo]bj/ 24 | 25 | # Visual Studio 2015 cache/options directory 26 | .vs/ 27 | # Uncomment if you have tasks that create the project's static files in wwwroot 28 | #wwwroot/ 29 | 30 | # MSTest test Results 31 | [Tt]est[Rr]esult*/ 32 | [Bb]uild[Ll]og.* 33 | 34 | # NUNIT 35 | *.VisualState.xml 36 | TestResult.xml 37 | 38 | # Build Results of an ATL Project 39 | [Dd]ebugPS/ 40 | [Rr]eleasePS/ 41 | dlldata.c 42 | 43 | # DNX 44 | project.lock.json 45 | artifacts/ 46 | 47 | *_i.c 48 | *_p.c 49 | *_i.h 50 | *.ilk 51 | *.meta 52 | *.obj 53 | *.pch 54 | *.pdb 55 | *.pgc 56 | *.pgd 57 | *.rsp 58 | *.sbr 59 | *.tlb 60 | *.tli 61 | *.tlh 62 | *.tmp 63 | *.tmp_proj 64 | *.log 65 | *.vspscc 66 | *.vssscc 67 | .builds 68 | *.pidb 69 | *.svclog 70 | *.scc 71 | 72 | # Chutzpah Test files 73 | _Chutzpah* 74 | 75 | # Visual C++ cache files 76 | ipch/ 77 | *.aps 78 | *.ncb 79 | *.opendb 80 | *.opensdf 81 | *.sdf 82 | *.cachefile 83 | *.VC.db 84 | 85 | # Visual Studio profiler 86 | *.psess 87 | *.vsp 88 | *.vspx 89 | *.sap 90 | 91 | # TFS 2012 Local Workspace 92 | $tf/ 93 | 94 | # Guidance Automation Toolkit 95 | *.gpState 96 | 97 | # ReSharper is a .NET coding add-in 98 | _ReSharper*/ 99 | *.[Rr]e[Ss]harper 100 | *.DotSettings.user 101 | 102 | # JustCode is a .NET coding add-in 103 | .JustCode 104 | 105 | # TeamCity is a build add-in 106 | _TeamCity* 107 | 108 | # DotCover is a Code Coverage Tool 109 | *.dotCover 110 | 111 | # NCrunch 112 | _NCrunch_* 113 | .*crunch*.local.xml 114 | nCrunchTemp_* 115 | 116 | # MightyMoose 117 | *.mm.* 118 | AutoTest.Net/ 119 | 120 | # Web workbench (sass) 121 | .sass-cache/ 122 | 123 | # Installshield output folder 124 | [Ee]xpress/ 125 | 126 | # DocProject is a documentation generator add-in 127 | DocProject/buildhelp/ 128 | DocProject/Help/*.HxT 129 | DocProject/Help/*.HxC 130 | DocProject/Help/*.hhc 131 | DocProject/Help/*.hhk 132 | DocProject/Help/*.hhp 133 | DocProject/Help/Html2 134 | DocProject/Help/html 135 | 136 | # Click-Once directory 137 | publish/ 138 | 139 | # Publish Web Output 140 | *.[Pp]ublish.xml 141 | *.azurePubxml 142 | 143 | # TODO: Un-comment the next line if you do not want to checkin 144 | # your web deploy settings because they may include unencrypted 145 | # passwords 146 | #*.pubxml 147 | *.publishproj 148 | 149 | # NuGet Packages 150 | *.nupkg 151 | # The packages folder can be ignored because of Package Restore 152 | **/packages/* 153 | # except build/, which is used as an MSBuild target. 154 | !**/packages/build/ 155 | # Uncomment if necessary however generally it will be regenerated when needed 156 | #!**/packages/repositories.config 157 | # NuGet v3's project.json files produces more ignoreable files 158 | *.nuget.props 159 | *.nuget.targets 160 | 161 | # Microsoft Azure Build Output 162 | csx/ 163 | *.build.csdef 164 | 165 | # Microsoft Azure Emulator 166 | ecf/ 167 | rcf/ 168 | 169 | # Windows Store app package directory 170 | AppPackages/ 171 | BundleArtifacts/ 172 | 173 | # Visual Studio cache files 174 | # files ending in .cache can be ignored 175 | *.[Cc]ache 176 | # but keep track of directories ending in .cache 177 | !*.[Cc]ache/ 178 | 179 | # Others 180 | ClientBin/ 181 | [Ss]tyle[Cc]op.* 182 | ~$* 183 | *~ 184 | *.dbmdl 185 | *.dbproj.schemaview 186 | *.pfx 187 | *.publishsettings 188 | node_modules/ 189 | orleans.codegen.cs 190 | 191 | # RIA/Silverlight projects 192 | Generated_Code/ 193 | 194 | # Backup & report files from converting an old project file 195 | # to a newer Visual Studio version. Backup files are not needed, 196 | # because we have git ;-) 197 | _UpgradeReport_Files/ 198 | Backup*/ 199 | UpgradeLog*.XML 200 | UpgradeLog*.htm 201 | 202 | # SQL Server files 203 | *.mdf 204 | *.ldf 205 | 206 | # Business Intelligence projects 207 | *.rdl.data 208 | *.bim.layout 209 | *.bim_*.settings 210 | 211 | # Microsoft Fakes 212 | FakesAssemblies/ 213 | 214 | # GhostDoc plugin setting file 215 | *.GhostDoc.xml 216 | 217 | # Node.js Tools for Visual Studio 218 | .ntvs_analysis.dat 219 | 220 | # Visual Studio 6 build log 221 | *.plg 222 | 223 | # Visual Studio 6 workspace options file 224 | *.opt 225 | 226 | # Visual Studio LightSwitch build output 227 | **/*.HTMLClient/GeneratedArtifacts 228 | **/*.DesktopClient/GeneratedArtifacts 229 | **/*.DesktopClient/ModelManifest.xml 230 | **/*.Server/GeneratedArtifacts 231 | **/*.Server/ModelManifest.xml 232 | _Pvt_Extensions 233 | 234 | # LightSwitch generated files 235 | GeneratedArtifacts/ 236 | ModelManifest.xml 237 | 238 | # Paket dependency manager 239 | .paket/paket.exe 240 | 241 | # FAKE - F# Make 242 | .fake/ 243 | -------------------------------------------------------------------------------- /IntroParallelProgramming.sln: -------------------------------------------------------------------------------- 1 |  2 | Microsoft Visual Studio Solution File, Format Version 12.00 3 | # Visual Studio 14 4 | VisualStudioVersion = 14.0.25420.1 5 | MinimumVisualStudioVersion = 10.0.40219.1 6 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "Lesson1-CubeNumbers", "Lesson1-CubeNumbers\IntroParallelProgramming.vcxproj", "{E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}" 7 | EndProject 8 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet1-RGB2Gray", "ProblemSet1-RGB2Gray\RGB2Gray.vcxproj", "{681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}" 9 | EndProject 10 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet2-Blur", "ProblemSet2-Blur\ProblemSet2-Blur.vcxproj", "{5B684E70-F85A-4BE0-82A2-A45023E5F5CF}" 11 | EndProject 12 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet3-ToneMapping", "ProblemSet3-ToneMapping\ProblemSet3-ToneMapping.vcxproj", "{EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}" 13 | EndProject 14 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet4-RedEyeRemoval", "ProblemSet4-RedEyeRemoval\ProblemSet4-RedEyeRemoval.vcxproj", "{B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}" 15 | EndProject 16 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet5-OptimizedHistogram", "ProblemSet5-OptimizedHistogram\ProblemSet5-OptimizedHistogram.vcxproj", "{0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}" 17 | EndProject 18 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ProblemSet6-SeamlessImageCloning", "ProblemSet6-SeamlessImageCloning\ProblemSet6-SeamlessImageCloning.vcxproj", "{5781233B-6022-4F34-B559-1473B9674B39}" 19 | EndProject 20 | Global 21 | GlobalSection(SolutionConfigurationPlatforms) = preSolution 22 | Debug|x64 = Debug|x64 23 | Debug|x86 = Debug|x86 24 | Release|x64 = Release|x64 25 | Release|x86 = Release|x86 26 | EndGlobalSection 27 | GlobalSection(ProjectConfigurationPlatforms) = postSolution 28 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x64.ActiveCfg = Debug|x64 29 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x64.Build.0 = Debug|x64 30 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x86.ActiveCfg = Debug|Win32 31 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Debug|x86.Build.0 = Debug|Win32 32 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x64.ActiveCfg = Release|x64 33 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x64.Build.0 = Release|x64 34 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x86.ActiveCfg = Release|Win32 35 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6}.Release|x86.Build.0 = Release|Win32 36 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x64.ActiveCfg = Debug|x64 37 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x64.Build.0 = Debug|x64 38 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x86.ActiveCfg = Debug|Win32 39 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Debug|x86.Build.0 = Debug|Win32 40 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x64.ActiveCfg = Release|x64 41 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x64.Build.0 = Release|x64 42 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x86.ActiveCfg = Release|Win32 43 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA}.Release|x86.Build.0 = Release|Win32 44 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x64.ActiveCfg = Debug|x64 45 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x64.Build.0 = Debug|x64 46 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x86.ActiveCfg = Debug|Win32 47 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Debug|x86.Build.0 = Debug|Win32 48 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x64.ActiveCfg = Release|x64 49 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x64.Build.0 = Release|x64 50 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x86.ActiveCfg = Release|Win32 51 | {5B684E70-F85A-4BE0-82A2-A45023E5F5CF}.Release|x86.Build.0 = Release|Win32 52 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x64.ActiveCfg = Debug|x64 53 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x64.Build.0 = Debug|x64 54 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x86.ActiveCfg = Debug|Win32 55 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Debug|x86.Build.0 = Debug|Win32 56 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x64.ActiveCfg = Release|x64 57 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x64.Build.0 = Release|x64 58 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x86.ActiveCfg = Release|Win32 59 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9}.Release|x86.Build.0 = Release|Win32 60 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x64.ActiveCfg = Debug|x64 61 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x64.Build.0 = Debug|x64 62 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x86.ActiveCfg = Debug|Win32 63 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Debug|x86.Build.0 = Debug|Win32 64 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x64.ActiveCfg = Release|x64 65 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x64.Build.0 = Release|x64 66 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x86.ActiveCfg = Release|Win32 67 | {B3DC55E1-BA50-4DBA-9FD5-0B6B96FDCE5A}.Release|x86.Build.0 = Release|Win32 68 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x64.ActiveCfg = Debug|x64 69 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x64.Build.0 = Debug|x64 70 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x86.ActiveCfg = Debug|Win32 71 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Debug|x86.Build.0 = Debug|Win32 72 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x64.ActiveCfg = Release|x64 73 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x64.Build.0 = Release|x64 74 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x86.ActiveCfg = Release|Win32 75 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491}.Release|x86.Build.0 = Release|Win32 76 | {5781233B-6022-4F34-B559-1473B9674B39}.Debug|x64.ActiveCfg = Debug|x64 77 | {5781233B-6022-4F34-B559-1473B9674B39}.Debug|x64.Build.0 = Debug|x64 78 | {5781233B-6022-4F34-B559-1473B9674B39}.Debug|x86.ActiveCfg = Debug|Win32 79 | {5781233B-6022-4F34-B559-1473B9674B39}.Debug|x86.Build.0 = Debug|Win32 80 | {5781233B-6022-4F34-B559-1473B9674B39}.Release|x64.ActiveCfg = Release|x64 81 | {5781233B-6022-4F34-B559-1473B9674B39}.Release|x64.Build.0 = Release|x64 82 | {5781233B-6022-4F34-B559-1473B9674B39}.Release|x86.ActiveCfg = Release|Win32 83 | {5781233B-6022-4F34-B559-1473B9674B39}.Release|x86.Build.0 = Release|Win32 84 | EndGlobalSection 85 | GlobalSection(SolutionProperties) = preSolution 86 | HideSolutionNode = FALSE 87 | EndGlobalSection 88 | EndGlobal 89 | -------------------------------------------------------------------------------- /Lesson1-CubeNumbers/IntroParallelProgramming.vcxproj: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | 5 | Debug 6 | Win32 7 | 8 | 9 | Debug 10 | x64 11 | 12 | 13 | Release 14 | Win32 15 | 16 | 17 | Release 18 | x64 19 | 20 | 21 | 22 | 23 | 24 | 25 | {E60EEE8A-D8AC-48FF-A1E9-4A5BB39805B6} 26 | IntroParallelProgramming 27 | Lesson1-CubeNumbers 28 | 29 | 30 | 31 | Application 32 | true 33 | MultiByte 34 | v140 35 | 36 | 37 | Application 38 | true 39 | MultiByte 40 | v140 41 | 42 | 43 | Application 44 | false 45 | true 46 | MultiByte 47 | v140 48 | 49 | 50 | Application 51 | false 52 | true 53 | MultiByte 54 | v140 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | true 75 | 76 | 77 | true 78 | 79 | 80 | 81 | Level3 82 | Disabled 83 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 84 | 85 | 86 | true 87 | Console 88 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 89 | 90 | 91 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 92 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 93 | 94 | 95 | 96 | 97 | Level3 98 | Disabled 99 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 100 | 101 | 102 | true 103 | Console 104 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 105 | 106 | 107 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 108 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 109 | 110 | 111 | 64 112 | 113 | 114 | 115 | 116 | Level3 117 | MaxSpeed 118 | true 119 | true 120 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 121 | 122 | 123 | true 124 | true 125 | true 126 | Console 127 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 128 | 129 | 130 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 131 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 132 | 133 | 134 | 135 | 136 | Level3 137 | MaxSpeed 138 | true 139 | true 140 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 141 | 142 | 143 | true 144 | true 145 | true 146 | Console 147 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 148 | 149 | 150 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 151 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 152 | 153 | 154 | 64 155 | 156 | 157 | 158 | 159 | 160 | 161 | -------------------------------------------------------------------------------- /Lesson1-CubeNumbers/main.cu: -------------------------------------------------------------------------------- 1 | #include "cuda_runtime.h" 2 | #include "device_launch_parameters.h" 3 | 4 | #include 5 | 6 | __global__ void cube(float * d_out, float * d_in) { 7 | 8 | int idx = threadIdx.x; 9 | float f = d_in[idx]; 10 | d_out[idx] = f*f*f; 11 | } 12 | 13 | int main(int argc, char ** argv) { 14 | const int ARRAY_SIZE = 96; 15 | const int ARRAY_BYTES = ARRAY_SIZE * sizeof(float); 16 | 17 | // generate the input array on the host 18 | float h_in[ARRAY_SIZE]; 19 | for (int i = 0; i < ARRAY_SIZE; i++) { 20 | h_in[i] = float(i); 21 | } 22 | float h_out[ARRAY_SIZE]; 23 | 24 | // declare GPU memory pointers 25 | float * d_in; 26 | float * d_out; 27 | 28 | // allocate GPU memory 29 | cudaMalloc((void**)&d_in, ARRAY_BYTES); 30 | cudaMalloc((void**)&d_out, ARRAY_BYTES); 31 | 32 | // transfer the array to the GPU 33 | cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice); 34 | 35 | // launch the kernel 36 | cube << <1, ARRAY_SIZE >> >(d_out, d_in); 37 | 38 | // copy back the result array to the CPU 39 | cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost); 40 | 41 | // print out the resulting array 42 | for (int i = 0; i < ARRAY_SIZE; i++) { 43 | printf("%f", h_out[i]); 44 | printf(((i % 4) != 3) ? "\t" : "\n"); 45 | } 46 | 47 | cudaFree(d_in); 48 | cudaFree(d_out); 49 | 50 | return 0; 51 | } -------------------------------------------------------------------------------- /Lesson4-Reduction/Lesson4-Reduction.vcxproj: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | 5 | Debug 6 | Win32 7 | 8 | 9 | Debug 10 | x64 11 | 12 | 13 | Release 14 | Win32 15 | 16 | 17 | Release 18 | x64 19 | 20 | 21 | 22 | {0741C52D-C5E1-4C2F-A8E9-67C29CBF5B97} 23 | Lesson4_Reduction 24 | Lesson3-Reduction 25 | 26 | 27 | 28 | Application 29 | true 30 | MultiByte 31 | v140 32 | 33 | 34 | Application 35 | true 36 | MultiByte 37 | v140 38 | 39 | 40 | Application 41 | false 42 | true 43 | MultiByte 44 | v140 45 | 46 | 47 | Application 48 | false 49 | true 50 | MultiByte 51 | v140 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | true 72 | 73 | 74 | true 75 | 76 | 77 | 78 | Level3 79 | Disabled 80 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 81 | 82 | 83 | true 84 | Console 85 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 86 | 87 | 88 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 89 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 90 | 91 | 92 | 93 | 94 | Level3 95 | Disabled 96 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 97 | 98 | 99 | true 100 | Console 101 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 102 | 103 | 104 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 105 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 106 | 107 | 108 | 64 109 | 110 | 111 | 112 | 113 | Level3 114 | MaxSpeed 115 | true 116 | true 117 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 118 | 119 | 120 | true 121 | true 122 | true 123 | Console 124 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 125 | 126 | 127 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 128 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 129 | 130 | 131 | 132 | 133 | Level3 134 | MaxSpeed 135 | true 136 | true 137 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 138 | 139 | 140 | true 141 | true 142 | true 143 | Console 144 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 145 | 146 | 147 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 148 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 149 | 150 | 151 | 64 152 | 153 | 154 | 155 | 156 | 157 | 158 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | ############################################################################ 2 | # CMakeLists.txt for OpenCV and CUDA. 3 | # 2012-02-07 4 | # Quan Tran Minh. edit by Johannes Kast, Michael Sarahan 5 | # quantm@unist.ac.kr kast.jo@googlemail.com msarahan@gmail.com 6 | ############################################################################ 7 | 8 | # collect source files 9 | 10 | file( GLOB hdr *.hpp *.h ) 11 | file( GLOB cu *.cu) 12 | SET (HW1_files main.cpp reference_calc.cpp compare.cpp) 13 | 14 | CUDA_ADD_EXECUTABLE(HW1 ${HW1_files} ${hdr} ${cu}) -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/HW1.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "utils.h" 5 | #include 6 | #include 7 | #include 8 | 9 | static cv::Mat imageRGBA; 10 | static cv::Mat imageGrey; 11 | 12 | static uchar4 *d_rgbaImage__; 13 | static unsigned char *d_greyImage__; 14 | 15 | static size_t numRows() { return imageRGBA.rows; } 16 | static size_t numCols() { return imageRGBA.cols; } 17 | 18 | //return types are void since any internal error will be handled by quitting 19 | //no point in returning error codes... 20 | //returns a pointer to an RGBA version of the input image 21 | //and a pointer to the single channel grey-scale output 22 | //on both the host and device 23 | static void preProcess(uchar4 **inputImage, unsigned char **greyImage, 24 | uchar4 **d_rgbaImage, unsigned char **d_greyImage, 25 | const std::string &filename) { 26 | //make sure the context initializes ok 27 | checkCudaErrors(cudaFree(0)); 28 | 29 | cv::Mat image; 30 | image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR); 31 | if (image.empty()) { 32 | std::cerr << "Couldn't open file: " << filename << std::endl; 33 | exit(1); 34 | } 35 | 36 | cv::cvtColor(image, imageRGBA, CV_BGR2RGBA); 37 | 38 | //allocate memory for the output 39 | imageGrey.create(image.rows, image.cols, CV_8UC1); 40 | 41 | //This shouldn't ever happen given the way the images are created 42 | //at least based upon my limited understanding of OpenCV, but better to check 43 | if (!imageRGBA.isContinuous() || !imageGrey.isContinuous()) { 44 | std::cerr << "Images aren't continuous!! Exiting." << std::endl; 45 | exit(1); 46 | } 47 | 48 | *inputImage = (uchar4 *)imageRGBA.ptr(0); 49 | *greyImage = imageGrey.ptr(0); 50 | 51 | const size_t numPixels = numRows() * numCols(); 52 | //allocate memory on the device for both input and output 53 | checkCudaErrors(cudaMalloc(d_rgbaImage, sizeof(uchar4) * numPixels)); 54 | checkCudaErrors(cudaMalloc(d_greyImage, sizeof(unsigned char) * numPixels)); 55 | checkCudaErrors(cudaMemset(*d_greyImage, 0, numPixels * sizeof(unsigned char))); //make sure no memory is left laying around 56 | 57 | //copy input array to the GPU 58 | checkCudaErrors(cudaMemcpy(*d_rgbaImage, *inputImage, sizeof(uchar4) * numPixels, cudaMemcpyHostToDevice)); 59 | 60 | d_rgbaImage__ = *d_rgbaImage; 61 | d_greyImage__ = *d_greyImage; 62 | } 63 | 64 | static void postProcess(const std::string& output_file, unsigned char* data_ptr) { 65 | cv::Mat output(numRows(), numCols(), CV_8UC1, (void*)data_ptr); 66 | 67 | //output the image 68 | cv::imwrite(output_file.c_str(), output); 69 | } 70 | 71 | static void cleanup() 72 | { 73 | //cleanup 74 | cudaFree(d_rgbaImage__); 75 | cudaFree(d_greyImage__); 76 | } 77 | 78 | static void generateReferenceImage(std::string input_filename, std::string output_filename) 79 | { 80 | cv::Mat reference = cv::imread(input_filename, CV_LOAD_IMAGE_GRAYSCALE); 81 | 82 | cv::imwrite(output_filename, reference); 83 | 84 | } 85 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/HW1_differenceImage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/HW1_differenceImage.png -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/HW1_reference.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/HW1_reference.png -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/Makefile: -------------------------------------------------------------------------------- 1 | NVCC=nvcc 2 | 3 | ################################### 4 | # These are the default install # 5 | # locations on most linux distros # 6 | ################################### 7 | 8 | OPENCV_LIBPATH=/usr/lib 9 | OPENCV_INCLUDEPATH=/usr/include 10 | 11 | ################################################### 12 | # On Macs the default install locations are below # 13 | ################################################### 14 | 15 | #OPENCV_LIBPATH=/usr/local/lib 16 | #OPENCV_INCLUDEPATH=/usr/local/include 17 | 18 | # or if using MacPorts 19 | 20 | #OPENCV_LIBPATH=/opt/local/lib 21 | #OPENCV_INCLUDEPATH=/opt/local/include 22 | 23 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui 24 | 25 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include 26 | 27 | ###################################################### 28 | # On Macs the default install locations are below # 29 | # #################################################### 30 | 31 | #CUDA_INCLUDEPATH=/usr/local/cuda/include 32 | #CUDA_LIBPATH=/usr/local/cuda/lib 33 | 34 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64 35 | 36 | GCC_OPTS=-O3 -Wall -Wextra -m64 37 | 38 | student: main.o student_func.o compare.o reference_calc.o Makefile 39 | $(NVCC) -o HW1 main.o student_func.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS) 40 | 41 | main.o: main.cpp timer.h utils.h reference_calc.cpp compare.cpp HW1.cpp 42 | g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) -I $(OPENCV_INCLUDEPATH) 43 | 44 | student_func.o: student_func.cu utils.h 45 | nvcc -c student_func.cu $(NVCC_OPTS) 46 | 47 | compare.o: compare.cpp compare.h 48 | g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 49 | 50 | reference_calc.o: reference_calc.cpp reference_calc.h 51 | g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 52 | 53 | clean: 54 | rm -f *.o *.png hw 55 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/RGB2Gray.vcxproj: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | 5 | Debug 6 | Win32 7 | 8 | 9 | Debug 10 | x64 11 | 12 | 13 | Release 14 | Win32 15 | 16 | 17 | Release 18 | x64 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | {681E3CC1-F969-459A-BAF7-C9DBB5E78FEA} 38 | RGB2Gray 39 | ProblemSet1-RGB2Gray 40 | 41 | 42 | 43 | Application 44 | true 45 | MultiByte 46 | v140 47 | 48 | 49 | Application 50 | true 51 | MultiByte 52 | v140 53 | 54 | 55 | Application 56 | false 57 | true 58 | MultiByte 59 | v140 60 | 61 | 62 | Application 63 | false 64 | true 65 | MultiByte 66 | v140 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | true 87 | 88 | 89 | true 90 | 91 | 92 | 93 | Level3 94 | Disabled 95 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 96 | 97 | 98 | true 99 | Console 100 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 101 | 102 | 103 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 104 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 105 | 106 | 107 | 108 | 109 | Level3 110 | Disabled 111 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 112 | 113 | 114 | true 115 | Console 116 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 117 | 118 | 119 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 120 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 121 | 122 | 123 | 64 124 | 125 | 126 | 127 | 128 | Level3 129 | MaxSpeed 130 | true 131 | true 132 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 133 | 134 | 135 | true 136 | true 137 | true 138 | Console 139 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 140 | 141 | 142 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 143 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 144 | 145 | 146 | 147 | 148 | Level3 149 | MaxSpeed 150 | true 151 | true 152 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 153 | %(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);C:\opencv\build\include;C:\opencv\build\include\opencv2 154 | 155 | 156 | true 157 | true 158 | true 159 | Console 160 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;opencv_world320d.lib;%(AdditionalDependencies) 161 | %(AdditionalLibraryDirectories);$(CudaToolkitLibDir);C:\opencv\build\x64\vc14\lib 162 | 163 | 164 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 165 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 166 | 167 | 168 | 64 169 | 170 | 171 | 172 | 173 | 174 | 175 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/cinque_terre.gold: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre.gold -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/cinque_terre_gray.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre_gray.jpg -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/cinque_terre_small.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet1-RGB2Gray/cinque_terre_small.jpg -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/compare.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "utils.h" 6 | 7 | void compareImages(std::string reference_filename, std::string test_filename, 8 | bool useEpsCheck, double perPixelError, double globalError) 9 | { 10 | cv::Mat reference = cv::imread(reference_filename, -1); 11 | cv::Mat test = cv::imread(test_filename, -1); 12 | 13 | cv::Mat diff = abs(reference - test); 14 | 15 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows 16 | 17 | double minVal, maxVal; 18 | 19 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location 20 | 21 | //now perform transform so that we bump values to the full range 22 | 23 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal)); 24 | 25 | diff = diffSingleChannel.reshape(reference.channels(), 0); 26 | 27 | cv::imwrite("HW1_differenceImage.png", diff); 28 | //OK, now we can start comparing values... 29 | unsigned char *referencePtr = reference.ptr(0); 30 | unsigned char *testPtr = test.ptr(0); 31 | 32 | if (useEpsCheck) { 33 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError); 34 | } 35 | else 36 | { 37 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels()); 38 | } 39 | 40 | std::cout << "PASS" << std::endl; 41 | return; 42 | } 43 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/compare.h: -------------------------------------------------------------------------------- 1 | #ifndef COMPARE_H__ 2 | #define COMPARE_H__ 3 | 4 | void compareImages(std::string reference_filename, std::string test_filename, 5 | bool useEpsCheck, double perPixelError, double globalError); 6 | 7 | #endif 8 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/main.cpp: -------------------------------------------------------------------------------- 1 | //Udacity HW1 Solution 2 | 3 | #include 4 | #include "timer.h" 5 | #include "utils.h" 6 | #include 7 | #include 8 | #include "reference_calc.h" 9 | #include "compare.h" 10 | 11 | void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, 12 | uchar4 * const d_rgbaImage, 13 | unsigned char* const d_greyImage, 14 | size_t numRows, size_t numCols); 15 | 16 | //include the definitions of the above functions for this homework 17 | #include "HW1.cpp" 18 | 19 | int main(int argc, char **argv) { 20 | uchar4 *h_rgbaImage, *d_rgbaImage; 21 | unsigned char *h_greyImage, *d_greyImage; 22 | 23 | std::string input_file; 24 | std::string output_file; 25 | std::string reference_file; 26 | double perPixelError = 0.0; 27 | double globalError = 0.0; 28 | bool useEpsCheck = false; 29 | switch (argc) 30 | { 31 | case 2: 32 | input_file = std::string(argv[1]); 33 | output_file = "HW1_output.png"; 34 | reference_file = "HW1_reference.png"; 35 | break; 36 | case 3: 37 | input_file = std::string(argv[1]); 38 | output_file = std::string(argv[2]); 39 | reference_file = "HW1_reference.png"; 40 | break; 41 | case 4: 42 | input_file = std::string(argv[1]); 43 | output_file = std::string(argv[2]); 44 | reference_file = std::string(argv[3]); 45 | break; 46 | case 6: 47 | useEpsCheck=true; 48 | input_file = std::string(argv[1]); 49 | output_file = std::string(argv[2]); 50 | reference_file = std::string(argv[3]); 51 | perPixelError = atof(argv[4]); 52 | globalError = atof(argv[5]); 53 | break; 54 | default: 55 | std::cerr << "Usage: ./HW1 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl; 56 | exit(1); 57 | } 58 | //load the image and give us our input and output pointers 59 | preProcess(&h_rgbaImage, &h_greyImage, &d_rgbaImage, &d_greyImage, input_file); 60 | 61 | GpuTimer timer; 62 | timer.Start(); 63 | //call the students' code 64 | your_rgba_to_greyscale(h_rgbaImage, d_rgbaImage, d_greyImage, numRows(), numCols()); 65 | timer.Stop(); 66 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 67 | 68 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed()); 69 | 70 | if (err < 0) { 71 | //Couldn't print! Probably the student closed stdout - bad news 72 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl; 73 | exit(1); 74 | } 75 | 76 | size_t numPixels = numRows()*numCols(); 77 | checkCudaErrors(cudaMemcpy(h_greyImage, d_greyImage, sizeof(unsigned char) * numPixels, cudaMemcpyDeviceToHost)); 78 | 79 | //check results and output the grey image 80 | postProcess(output_file, h_greyImage); 81 | 82 | referenceCalculation(h_rgbaImage, h_greyImage, numRows(), numCols()); 83 | 84 | postProcess(reference_file, h_greyImage); 85 | 86 | //generateReferenceImage(input_file, reference_file); 87 | compareImages(reference_file, output_file, useEpsCheck, perPixelError, 88 | globalError); 89 | 90 | cleanup(); 91 | 92 | return 0; 93 | } 94 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/reference_calc.cpp: -------------------------------------------------------------------------------- 1 | // for uchar4 struct 2 | #include 3 | 4 | void referenceCalculation(const uchar4* const rgbaImage, 5 | unsigned char *const greyImage, 6 | size_t numRows, 7 | size_t numCols) 8 | { 9 | for (size_t r = 0; r < numRows; ++r) { 10 | for (size_t c = 0; c < numCols; ++c) { 11 | uchar4 rgba = rgbaImage[r * numCols + c]; 12 | float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z; 13 | greyImage[r * numCols + c] = channelSum; 14 | } 15 | } 16 | } 17 | 18 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/reference_calc.h: -------------------------------------------------------------------------------- 1 | #ifndef REFERENCE_H__ 2 | #define REFERENCE_H__ 3 | 4 | void referenceCalculation(const uchar4* const rgbaImage, 5 | unsigned char *const greyImage, 6 | size_t numRows, 7 | size_t numCols); 8 | 9 | #endif -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/student_func.cu: -------------------------------------------------------------------------------- 1 | // Homework 1 2 | // Color to Greyscale Conversion 3 | 4 | //A common way to represent color images is known as RGBA - the color 5 | //is specified by how much Red, Grean and Blue is in it. 6 | //The 'A' stands for Alpha and is used for transparency, it will be 7 | //ignored in this homework. 8 | 9 | //Each channel Red, Blue, Green and Alpha is represented by one byte. 10 | //Since we are using one byte for each color there are 256 different 11 | //possible values for each color. This means we use 4 bytes per pixel. 12 | 13 | //Greyscale images are represented by a single intensity value per pixel 14 | //which is one byte in size. 15 | 16 | //To convert an image from color to grayscale one simple method is to 17 | //set the intensity to the average of the RGB channels. But we will 18 | //use a more sophisticated method that takes into account how the eye 19 | //perceives color and weights the channels unequally. 20 | 21 | //The eye responds most strongly to green followed by red and then blue. 22 | //The NTSC (National Television System Committee) recommends the following 23 | //formula for color to greyscale conversion: 24 | 25 | //I = .299f * R + .587f * G + .114f * B 26 | 27 | //Notice the trailing f's on the numbers which indicate that they are 28 | //single precision floating point constants and not double precision 29 | //constants. 30 | 31 | //You should fill in the kernel as well as set the block and grid sizes 32 | //so that the entire image is processed. 33 | 34 | #include "utils.h" 35 | #include "device_launch_parameters.h" 36 | 37 | const size_t blockWidth = 32; //threads per block on one dimension (32*32 total) 38 | 39 | __global__ 40 | void rgba_to_greyscale(const uchar4* const rgbaImage, 41 | unsigned char* const greyImage, 42 | size_t numRows, size_t numCols) 43 | { 44 | //Fill in the kernel to convert from color to greyscale 45 | //the mapping from components of a uchar4 to RGBA is: 46 | // .x -> R ; .y -> G ; .z -> B ; .w -> A 47 | // 48 | //The output (greyImage) at each pixel should be the result of 49 | //applying the formula: output = .299f * R + .587f * G + .114f * B; 50 | //Note: We will be ignoring the alpha channel for this conversion 51 | 52 | //First create a mapping from the 2D block and grid locations 53 | //to an absolute 2D location in the image, then use that to 54 | //calculate a 1D offset 55 | size_t idx_x = threadIdx.x + blockIdx.x*blockDim.x; 56 | size_t idx_y = threadIdx.y + blockIdx.y*blockDim.y; 57 | 58 | if (idx_x >= numRows || idx_y >= numCols) return; //it can happen on the "remainder" block 59 | 60 | size_t idxvec = idx_x*numCols + idx_y; 61 | uchar4 rgb_value = rgbaImage[idxvec]; 62 | greyImage[idxvec] = (unsigned char)(.299f*rgb_value.x + .587f*rgb_value.y + .114f*rgb_value.z); 63 | } 64 | 65 | void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage, 66 | unsigned char* const d_greyImage, size_t numRows, size_t numCols) 67 | { 68 | //You must fill in the correct sizes for the blockSize and gridSize 69 | //currently only one block with one thread is being launched 70 | 71 | const dim3 blockSize(blockWidth,blockWidth, 1); 72 | unsigned int numBlocksX = (unsigned int)(numRows / blockWidth + 1); 73 | unsigned int numBlocksY = (unsigned int)(numCols / blockWidth + 1); 74 | const dim3 gridSize(numBlocksX,numBlocksY, 1); 75 | rgba_to_greyscale<<>>(d_rgbaImage, d_greyImage, numRows, numCols); 76 | 77 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 78 | 79 | } 80 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/timer.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_TIMER_H__ 2 | #define GPU_TIMER_H__ 3 | 4 | #include 5 | 6 | struct GpuTimer 7 | { 8 | cudaEvent_t start; 9 | cudaEvent_t stop; 10 | 11 | GpuTimer() 12 | { 13 | cudaEventCreate(&start); 14 | cudaEventCreate(&stop); 15 | } 16 | 17 | ~GpuTimer() 18 | { 19 | cudaEventDestroy(start); 20 | cudaEventDestroy(stop); 21 | } 22 | 23 | void Start() 24 | { 25 | cudaEventRecord(start, 0); 26 | } 27 | 28 | void Stop() 29 | { 30 | cudaEventRecord(stop, 0); 31 | } 32 | 33 | float Elapsed() 34 | { 35 | float elapsed; 36 | cudaEventSynchronize(stop); 37 | cudaEventElapsedTime(&elapsed, start, stop); 38 | return elapsed; 39 | } 40 | }; 41 | 42 | #endif /* GPU_TIMER_H__ */ 43 | -------------------------------------------------------------------------------- /ProblemSet1-RGB2Gray/utils.h: -------------------------------------------------------------------------------- 1 | #ifndef UTILS_H__ 2 | #define UTILS_H__ 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__) 14 | 15 | template 16 | void check(T err, const char* const func, const char* const file, const int line) { 17 | if (err != cudaSuccess) { 18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl; 19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl; 20 | exit(1); 21 | } 22 | } 23 | 24 | template 25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) { 26 | //check that the GPU result matches the CPU result 27 | for (size_t i = 0; i < numElem; ++i) { 28 | if (ref[i] != gpu[i]) { 29 | std::cerr << "Difference at pos " << i << std::endl; 30 | //the + is magic to convert char to int without messing 31 | //with other types 32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 33 | "\nGPU : " << +gpu[i] << std::endl; 34 | exit(1); 35 | } 36 | } 37 | } 38 | 39 | template 40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) { 41 | assert(eps1 >= 0 && eps2 >= 0); 42 | unsigned long long totalDiff = 0; 43 | unsigned numSmallDifferences = 0; 44 | for (size_t i = 0; i < numElem; ++i) { 45 | //subtract smaller from larger in case of unsigned types 46 | T smaller = std::min(ref[i], gpu[i]); 47 | T larger = std::max(ref[i], gpu[i]); 48 | T diff = larger - smaller; 49 | if (diff > 0 && diff <= eps1) { 50 | numSmallDifferences++; 51 | } 52 | else if (diff > eps1) { 53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl; 54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 55 | "\nGPU : " << +gpu[i] << std::endl; 56 | exit(1); 57 | } 58 | totalDiff += diff * diff; 59 | } 60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem; 61 | if (percentSmallDifferences > eps2) { 62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl; 63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl; 64 | exit(1); 65 | } 66 | } 67 | 68 | //Uses the autodesk method of image comparison 69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels 70 | template 71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance) 72 | { 73 | 74 | size_t numBadPixels = 0; 75 | for (size_t i = 0; i < numElem; ++i) { 76 | T smaller = std::min(ref[i], gpu[i]); 77 | T larger = std::max(ref[i], gpu[i]); 78 | T diff = larger - smaller; 79 | if (diff > variance) 80 | ++numBadPixels; 81 | } 82 | 83 | if (numBadPixels > tolerance) { 84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl; 85 | exit(1); 86 | } 87 | } 88 | 89 | #endif 90 | -------------------------------------------------------------------------------- /ProblemSet2-Blur/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | ############################################################################ 2 | # CMakeLists.txt for OpenCV and CUDA. 3 | # 2012-02-07 4 | # Quan Tran Minh. edit by Johannes Kast, Michael Sarahan 5 | # quantm@unist.ac.kr kast.jo@googlemail.com msarahan@gmail.com 6 | ############################################################################ 7 | 8 | # collect source files 9 | 10 | file( GLOB hdr *.hpp *.h ) 11 | file( GLOB cu *.cu) 12 | SET (HW2_files main.cpp reference_calc.cpp compare.cpp) 13 | 14 | CUDA_ADD_EXECUTABLE(HW2 ${HW2_files} ${hdr} ${cu}) 15 | -------------------------------------------------------------------------------- /ProblemSet2-Blur/HW2.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "utils.h" 5 | #include 6 | #include 7 | #include 8 | 9 | static cv::Mat imageInputRGBA; 10 | static cv::Mat imageOutputRGBA; 11 | 12 | static uchar4 *d_inputImageRGBA__; 13 | static uchar4 *d_outputImageRGBA__; 14 | 15 | static float *h_filter__; 16 | 17 | static size_t numRows() { return imageInputRGBA.rows; } 18 | static size_t numCols() { return imageInputRGBA.cols; } 19 | 20 | //return types are void since any internal error will be handled by quitting 21 | //no point in returning error codes... 22 | //returns a pointer to an RGBA version of the input image 23 | //and a pointer to the single channel grey-scale output 24 | //on both the host and device 25 | static void preProcess(uchar4 **h_inputImageRGBA, uchar4 **h_outputImageRGBA, 26 | uchar4 **d_inputImageRGBA, uchar4 **d_outputImageRGBA, 27 | unsigned char **d_redBlurred, 28 | unsigned char **d_greenBlurred, 29 | unsigned char **d_blueBlurred, 30 | float **h_filter, int *filterWidth, 31 | const std::string &filename) { 32 | 33 | //make sure the context initializes ok 34 | checkCudaErrors(cudaFree(0)); 35 | 36 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR); 37 | if (image.empty()) { 38 | std::cerr << "Couldn't open file: " << filename << std::endl; 39 | exit(1); 40 | } 41 | 42 | cv::cvtColor(image, imageInputRGBA, CV_BGR2RGBA); 43 | 44 | //allocate memory for the output 45 | imageOutputRGBA.create(image.rows, image.cols, CV_8UC4); 46 | 47 | //This shouldn't ever happen given the way the images are created 48 | //at least based upon my limited understanding of OpenCV, but better to check 49 | if (!imageInputRGBA.isContinuous() || !imageOutputRGBA.isContinuous()) { 50 | std::cerr << "Images aren't continuous!! Exiting." << std::endl; 51 | exit(1); 52 | } 53 | 54 | *h_inputImageRGBA = (uchar4 *)imageInputRGBA.ptr(0); 55 | *h_outputImageRGBA = (uchar4 *)imageOutputRGBA.ptr(0); 56 | 57 | const size_t numPixels = numRows() * numCols(); 58 | //allocate memory on the device for both input and output 59 | checkCudaErrors(cudaMalloc(d_inputImageRGBA, sizeof(uchar4) * numPixels)); 60 | checkCudaErrors(cudaMalloc(d_outputImageRGBA, sizeof(uchar4) * numPixels)); 61 | checkCudaErrors(cudaMemset(*d_outputImageRGBA, 0, numPixels * sizeof(uchar4))); //make sure no memory is left laying around 62 | 63 | //copy input array to the GPU 64 | checkCudaErrors(cudaMemcpy(*d_inputImageRGBA, *h_inputImageRGBA, sizeof(uchar4) * numPixels, cudaMemcpyHostToDevice)); 65 | 66 | d_inputImageRGBA__ = *d_inputImageRGBA; 67 | d_outputImageRGBA__ = *d_outputImageRGBA; 68 | 69 | //now create the filter that they will use 70 | const int blurKernelWidth = 9; 71 | const float blurKernelSigma = 2.; 72 | 73 | *filterWidth = blurKernelWidth; 74 | 75 | //create and fill the filter we will convolve with 76 | *h_filter = new float[blurKernelWidth * blurKernelWidth]; 77 | h_filter__ = *h_filter; 78 | 79 | float filterSum = 0.f; //for normalization 80 | 81 | for (int r = -blurKernelWidth/2; r <= blurKernelWidth/2; ++r) { 82 | for (int c = -blurKernelWidth/2; c <= blurKernelWidth/2; ++c) { 83 | float filterValue = expf( -(float)(c * c + r * r) / (2.f * blurKernelSigma * blurKernelSigma)); 84 | (*h_filter)[(r + blurKernelWidth/2) * blurKernelWidth + c + blurKernelWidth/2] = filterValue; 85 | filterSum += filterValue; 86 | } 87 | } 88 | 89 | float normalizationFactor = 1.f / filterSum; 90 | 91 | for (int r = -blurKernelWidth/2; r <= blurKernelWidth/2; ++r) { 92 | for (int c = -blurKernelWidth/2; c <= blurKernelWidth/2; ++c) { 93 | (*h_filter)[(r + blurKernelWidth/2) * blurKernelWidth + c + blurKernelWidth/2] *= normalizationFactor; 94 | } 95 | } 96 | 97 | //blurred 98 | checkCudaErrors(cudaMalloc(d_redBlurred, sizeof(unsigned char) * numPixels)); 99 | checkCudaErrors(cudaMalloc(d_greenBlurred, sizeof(unsigned char) * numPixels)); 100 | checkCudaErrors(cudaMalloc(d_blueBlurred, sizeof(unsigned char) * numPixels)); 101 | checkCudaErrors(cudaMemset(*d_redBlurred, 0, sizeof(unsigned char) * numPixels)); 102 | checkCudaErrors(cudaMemset(*d_greenBlurred, 0, sizeof(unsigned char) * numPixels)); 103 | checkCudaErrors(cudaMemset(*d_blueBlurred, 0, sizeof(unsigned char) * numPixels)); 104 | } 105 | 106 | static void postProcess(const std::string& output_file, uchar4* data_ptr) { 107 | cv::Mat output(numRows(), numCols(), CV_8UC4, (void*)data_ptr); 108 | 109 | cv::Mat imageOutputBGR; 110 | cv::cvtColor(output, imageOutputBGR, CV_RGBA2BGR); 111 | //output the image 112 | cv::imwrite(output_file.c_str(), imageOutputBGR); 113 | } 114 | 115 | static void cleanUp(void) 116 | { 117 | cudaFree(d_inputImageRGBA__); 118 | cudaFree(d_outputImageRGBA__); 119 | delete[] h_filter__; 120 | } 121 | 122 | 123 | // An unused bit of code showing how to accomplish this assignment using OpenCV. It is much faster 124 | // than the naive implementation in reference_calc.cpp. 125 | static void generateReferenceImage(std::string input_file, std::string reference_file, int kernel_size) 126 | { 127 | cv::Mat input = cv::imread(input_file); 128 | // Create an identical image for the output as a placeholder 129 | cv::Mat reference = cv::imread(input_file); 130 | cv::GaussianBlur(input, reference, cv::Size2i(kernel_size, kernel_size),0); 131 | cv::imwrite(reference_file, reference); 132 | } 133 | -------------------------------------------------------------------------------- /ProblemSet2-Blur/HW2_differenceImage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/HW2_differenceImage.png -------------------------------------------------------------------------------- /ProblemSet2-Blur/HW2_reference.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/HW2_reference.png -------------------------------------------------------------------------------- /ProblemSet2-Blur/Makefile: -------------------------------------------------------------------------------- 1 | NVCC=nvcc 2 | 3 | ################################### 4 | # These are the default install # 5 | # locations on most linux distros # 6 | ################################### 7 | 8 | OPENCV_LIBPATH=/usr/lib 9 | OPENCV_INCLUDEPATH=/usr/include 10 | 11 | ################################################### 12 | # On Macs the default install locations are below # 13 | ################################################### 14 | 15 | #OPENCV_LIBPATH=/usr/local/lib 16 | #OPENCV_INCLUDEPATH=/usr/local/include 17 | 18 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui 19 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include 20 | 21 | ###################################################### 22 | # On Macs the default install locations are below # 23 | # #################################################### 24 | 25 | #CUDA_INCLUDEPATH=/usr/local/cuda/include 26 | #CUDA_LIBPATH=/usr/local/cuda/lib 27 | 28 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64 29 | 30 | GCC_OPTS=-O3 -Wall -Wextra -m64 31 | 32 | student: main.o student_func.o compare.o reference_calc.o Makefile 33 | $(NVCC) -o HW2 main.o student_func.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS) 34 | 35 | main.o: main.cpp timer.h utils.h HW2.cpp 36 | g++ -c main.cpp $(GCC_OPTS) -I $(OPENCV_INCLUDEPATH) -I $(CUDA_INCLUDEPATH) 37 | 38 | student_func.o: student_func.cu reference_calc.cpp utils.h 39 | nvcc -c student_func.cu $(NVCC_OPTS) 40 | 41 | compare.o: compare.cpp compare.h 42 | g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 43 | 44 | reference_calc.o: reference_calc.cpp reference_calc.h 45 | g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 46 | 47 | clean: 48 | rm -f *.o *.png hw 49 | -------------------------------------------------------------------------------- /ProblemSet2-Blur/cinque_terre.gold: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre.gold -------------------------------------------------------------------------------- /ProblemSet2-Blur/cinque_terre_blur.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre_blur.jpg -------------------------------------------------------------------------------- /ProblemSet2-Blur/cinque_terre_small.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet2-Blur/cinque_terre_small.jpg -------------------------------------------------------------------------------- /ProblemSet2-Blur/compare.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "utils.h" 6 | 7 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck, 8 | double perPixelError, double globalError) 9 | { 10 | cv::Mat reference = cv::imread(reference_filename, -1); 11 | cv::Mat test = cv::imread(test_filename, -1); 12 | 13 | cv::Mat diff = abs(reference - test); 14 | 15 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows 16 | 17 | double minVal, maxVal; 18 | 19 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location 20 | 21 | //now perform transform so that we bump values to the full range 22 | 23 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal)); 24 | 25 | diff = diffSingleChannel.reshape(reference.channels(), 0); 26 | 27 | cv::imwrite("HW2_differenceImage.png", diff); 28 | //OK, now we can start comparing values... 29 | unsigned char *referencePtr = reference.ptr(0); 30 | unsigned char *testPtr = test.ptr(0); 31 | 32 | if (useEpsCheck) { 33 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError); 34 | } 35 | else 36 | { 37 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels()); 38 | } 39 | 40 | std::cout << "PASS" << std::endl; 41 | return; 42 | } -------------------------------------------------------------------------------- /ProblemSet2-Blur/compare.h: -------------------------------------------------------------------------------- 1 | #ifndef COMPARE_H__ 2 | #define COMPARE_H__ 3 | 4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck, 5 | double perPixelError, double globalError); 6 | 7 | #endif -------------------------------------------------------------------------------- /ProblemSet2-Blur/main.cpp: -------------------------------------------------------------------------------- 1 | //Udacity HW2 Driver 2 | 3 | #include 4 | #include "timer.h" 5 | #include "utils.h" 6 | #include 7 | #include 8 | 9 | #include "reference_calc.h" 10 | #include "compare.h" 11 | 12 | //include the definitions of the above functions for this homework 13 | #include "HW2.cpp" 14 | 15 | 16 | /******* DEFINED IN student_func.cu *********/ 17 | 18 | void your_gaussian_blur(const uchar4 * const h_inputImageRGBA, uchar4 * const d_inputImageRGBA, 19 | uchar4* const d_outputImageRGBA, 20 | const size_t numRows, const size_t numCols, 21 | unsigned char *d_redBlurred, 22 | unsigned char *d_greenBlurred, 23 | unsigned char *d_blueBlurred, 24 | const int filterWidth); 25 | 26 | void allocateMemoryAndCopyToGPU(const size_t numRowsImage, const size_t numColsImage, 27 | const float* const h_filter, const size_t filterWidth); 28 | 29 | 30 | /******* Begin main *********/ 31 | 32 | int main(int argc, char **argv) { 33 | uchar4 *h_inputImageRGBA, *d_inputImageRGBA; 34 | uchar4 *h_outputImageRGBA, *d_outputImageRGBA; 35 | unsigned char *d_redBlurred, *d_greenBlurred, *d_blueBlurred; 36 | 37 | float *h_filter; 38 | int filterWidth; 39 | 40 | std::string input_file; 41 | std::string output_file; 42 | std::string reference_file; 43 | double perPixelError = 0.0; 44 | double globalError = 0.0; 45 | bool useEpsCheck = false; 46 | switch (argc) 47 | { 48 | case 2: 49 | input_file = std::string(argv[1]); 50 | output_file = "HW2_output.png"; 51 | reference_file = "HW2_reference.png"; 52 | break; 53 | case 3: 54 | input_file = std::string(argv[1]); 55 | output_file = std::string(argv[2]); 56 | reference_file = "HW2_reference.png"; 57 | break; 58 | case 4: 59 | input_file = std::string(argv[1]); 60 | output_file = std::string(argv[2]); 61 | reference_file = std::string(argv[3]); 62 | break; 63 | case 6: 64 | useEpsCheck=true; 65 | input_file = std::string(argv[1]); 66 | output_file = std::string(argv[2]); 67 | reference_file = std::string(argv[3]); 68 | perPixelError = atof(argv[4]); 69 | globalError = atof(argv[5]); 70 | break; 71 | default: 72 | std::cerr << "Usage: ./HW2 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl; 73 | exit(1); 74 | } 75 | //load the image and give us our input and output pointers 76 | preProcess(&h_inputImageRGBA, &h_outputImageRGBA, &d_inputImageRGBA, &d_outputImageRGBA, 77 | &d_redBlurred, &d_greenBlurred, &d_blueBlurred, 78 | &h_filter, &filterWidth, input_file); 79 | 80 | allocateMemoryAndCopyToGPU(numRows(), numCols(), h_filter, filterWidth); 81 | GpuTimer timer; 82 | timer.Start(); 83 | //call the students' code 84 | your_gaussian_blur(h_inputImageRGBA, d_inputImageRGBA, d_outputImageRGBA, numRows(), numCols(), 85 | d_redBlurred, d_greenBlurred, d_blueBlurred, filterWidth); 86 | timer.Stop(); 87 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 88 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed()); 89 | 90 | if (err < 0) { 91 | //Couldn't print! Probably the student closed stdout - bad news 92 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl; 93 | exit(1); 94 | } 95 | 96 | //check results and output the blurred image 97 | 98 | size_t numPixels = numRows()*numCols(); 99 | //copy the output back to the host 100 | checkCudaErrors(cudaMemcpy(h_outputImageRGBA, d_outputImageRGBA__, sizeof(uchar4) * numPixels, cudaMemcpyDeviceToHost)); 101 | 102 | postProcess(output_file, h_outputImageRGBA); 103 | 104 | referenceCalculation(h_inputImageRGBA, h_outputImageRGBA, 105 | numRows(), numCols(), 106 | h_filter, filterWidth); 107 | 108 | postProcess(reference_file, h_outputImageRGBA); 109 | 110 | // Cheater easy way with OpenCV 111 | //generateReferenceImage(input_file, reference_file, filterWidth); 112 | 113 | compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError); 114 | 115 | checkCudaErrors(cudaFree(d_redBlurred)); 116 | checkCudaErrors(cudaFree(d_greenBlurred)); 117 | checkCudaErrors(cudaFree(d_blueBlurred)); 118 | 119 | cleanUp(); 120 | 121 | return 0; 122 | } 123 | -------------------------------------------------------------------------------- /ProblemSet2-Blur/reference_calc.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | // for uchar4 struct 4 | #include 5 | 6 | void channelConvolution(const unsigned char* const channel, 7 | unsigned char* const channelBlurred, 8 | const size_t numRows, const size_t numCols, 9 | const float *filter, const int filterWidth) 10 | { 11 | //Dealing with an even width filter is trickier 12 | assert(filterWidth % 2 == 1); 13 | 14 | //For every pixel in the image 15 | for (int r = 0; r < (int)numRows; ++r) { 16 | for (int c = 0; c < (int)numCols; ++c) { 17 | float result = 0.f; 18 | //For every value in the filter around the pixel (c, r) 19 | for (int filter_r = -filterWidth/2; filter_r <= filterWidth/2; ++filter_r) { 20 | for (int filter_c = -filterWidth/2; filter_c <= filterWidth/2; ++filter_c) { 21 | //Find the global image position for this filter position 22 | //clamp to boundary of the image 23 | int image_r = std::min(std::max(r + filter_r, 0), static_cast(numRows - 1)); 24 | int image_c = std::min(std::max(c + filter_c, 0), static_cast(numCols - 1)); 25 | 26 | float image_value = static_cast(channel[image_r * numCols + image_c]); 27 | float filter_value = filter[(filter_r + filterWidth/2) * filterWidth + filter_c + filterWidth/2]; 28 | 29 | result += image_value * filter_value; 30 | } 31 | } 32 | 33 | channelBlurred[r * numCols + c] = result; 34 | } 35 | } 36 | } 37 | 38 | void referenceCalculation(const uchar4* const rgbaImage, uchar4 *const outputImage, 39 | size_t numRows, size_t numCols, 40 | const float* const filter, const int filterWidth) 41 | { 42 | unsigned char *red = new unsigned char[numRows * numCols]; 43 | unsigned char *blue = new unsigned char[numRows * numCols]; 44 | unsigned char *green = new unsigned char[numRows * numCols]; 45 | 46 | unsigned char *redBlurred = new unsigned char[numRows * numCols]; 47 | unsigned char *blueBlurred = new unsigned char[numRows * numCols]; 48 | unsigned char *greenBlurred = new unsigned char[numRows * numCols]; 49 | 50 | //First we separate the incoming RGBA image into three separate channels 51 | //for Red, Green and Blue 52 | for (size_t i = 0; i < numRows * numCols; ++i) { 53 | uchar4 rgba = rgbaImage[i]; 54 | red[i] = rgba.x; 55 | green[i] = rgba.y; 56 | blue[i] = rgba.z; 57 | } 58 | 59 | //Now we can do the convolution for each of the color channels 60 | channelConvolution(red, redBlurred, numRows, numCols, filter, filterWidth); 61 | channelConvolution(green, greenBlurred, numRows, numCols, filter, filterWidth); 62 | channelConvolution(blue, blueBlurred, numRows, numCols, filter, filterWidth); 63 | 64 | //now recombine into the output image - Alpha is 255 for no transparency 65 | for (size_t i = 0; i < numRows * numCols; ++i) { 66 | uchar4 rgba = make_uchar4(redBlurred[i], greenBlurred[i], blueBlurred[i], 255); 67 | outputImage[i] = rgba; 68 | } 69 | 70 | delete[] red; 71 | delete[] green; 72 | delete[] blue; 73 | 74 | delete[] redBlurred; 75 | delete[] greenBlurred; 76 | delete[] blueBlurred; 77 | } 78 | -------------------------------------------------------------------------------- /ProblemSet2-Blur/reference_calc.h: -------------------------------------------------------------------------------- 1 | #ifndef REFERENCE_H__ 2 | #define REFERENCE_H__ 3 | 4 | void referenceCalculation(const uchar4* const rgbaImage, uchar4 *const outputImage, 5 | size_t numRows, size_t numCols, 6 | const float* const filter, const int filterWidth); 7 | 8 | #endif -------------------------------------------------------------------------------- /ProblemSet2-Blur/timer.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_TIMER_H__ 2 | #define GPU_TIMER_H__ 3 | 4 | #include 5 | 6 | struct GpuTimer 7 | { 8 | cudaEvent_t start; 9 | cudaEvent_t stop; 10 | 11 | GpuTimer() 12 | { 13 | cudaEventCreate(&start); 14 | cudaEventCreate(&stop); 15 | } 16 | 17 | ~GpuTimer() 18 | { 19 | cudaEventDestroy(start); 20 | cudaEventDestroy(stop); 21 | } 22 | 23 | void Start() 24 | { 25 | cudaEventRecord(start, 0); 26 | } 27 | 28 | void Stop() 29 | { 30 | cudaEventRecord(stop, 0); 31 | } 32 | 33 | float Elapsed() 34 | { 35 | float elapsed; 36 | cudaEventSynchronize(stop); 37 | cudaEventElapsedTime(&elapsed, start, stop); 38 | return elapsed; 39 | } 40 | }; 41 | 42 | #endif /* GPU_TIMER_H__ */ 43 | -------------------------------------------------------------------------------- /ProblemSet2-Blur/utils.h: -------------------------------------------------------------------------------- 1 | #ifndef UTILS_H__ 2 | #define UTILS_H__ 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__) 13 | 14 | template 15 | void check(T err, const char* const func, const char* const file, const int line) { 16 | if (err != cudaSuccess) { 17 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl; 18 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl; 19 | exit(1); 20 | } 21 | } 22 | 23 | template 24 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) { 25 | //check that the GPU result matches the CPU result 26 | for (size_t i = 0; i < numElem; ++i) { 27 | if (ref[i] != gpu[i]) { 28 | std::cerr << "Difference at pos " << i << std::endl; 29 | //the + is magic to convert char to int without messing 30 | //with other types 31 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 32 | "\nGPU : " << +gpu[i] << std::endl; 33 | exit(1); 34 | } 35 | } 36 | } 37 | 38 | template 39 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) { 40 | assert(eps1 >= 0 && eps2 >= 0); 41 | unsigned long long totalDiff = 0; 42 | unsigned numSmallDifferences = 0; 43 | for (size_t i = 0; i < numElem; ++i) { 44 | //subtract smaller from larger in case of unsigned types 45 | T smaller = std::min(ref[i], gpu[i]); 46 | T larger = std::max(ref[i], gpu[i]); 47 | T diff = larger - smaller; 48 | if (diff > 0 && diff <= eps1) { 49 | numSmallDifferences++; 50 | } 51 | else if (diff > eps1) { 52 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl; 53 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 54 | "\nGPU : " << +gpu[i] << std::endl; 55 | exit(1); 56 | } 57 | totalDiff += diff * diff; 58 | } 59 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem; 60 | if (percentSmallDifferences > eps2) { 61 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl; 62 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl; 63 | exit(1); 64 | } 65 | } 66 | 67 | //Uses the autodesk method of image comparison 68 | //Note the the tolerance here is in PIXELS not a percentage of input pixels 69 | template 70 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance) 71 | { 72 | 73 | size_t numBadPixels = 0; 74 | for (size_t i = 0; i < numElem; ++i) { 75 | T smaller = std::min(ref[i], gpu[i]); 76 | T larger = std::max(ref[i], gpu[i]); 77 | T diff = larger - smaller; 78 | if (diff > variance) 79 | ++numBadPixels; 80 | } 81 | 82 | if (numBadPixels > tolerance) { 83 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl; 84 | exit(1); 85 | } 86 | } 87 | 88 | #endif 89 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | ############################################################################ 2 | # CMakeLists.txt for OpenCV and CUDA. 3 | # 2012-02-07 4 | # Quan Tran Minh. edit by Johannes Kast, Michael Sarahan 5 | # quantm@unist.ac.kr kast.jo@googlemail.com msarahan@gmail.com 6 | ############################################################################ 7 | # minimum required cmake version 8 | cmake_minimum_required(VERSION 2.8) 9 | find_package(CUDA QUIET REQUIRED) 10 | 11 | SET (compare_files compare.cpp) 12 | 13 | file( GLOB hdr *.hpp *.h ) 14 | file( GLOB cu *.cu) 15 | SET (HW3_files main.cpp loadSaveImage.cpp reference_calc.cpp compare.cpp) 16 | 17 | CUDA_ADD_EXECUTABLE(HW3 ${HW3_files} ${hdr} ${cu}) 18 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/HDR-image.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HDR-image.jpg -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/HDR-image_mapped.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HDR-image_mapped.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/HW3_differenceImage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_differenceImage.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/HW3_reference.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_reference.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/HW3_reference_old.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/HW3_reference_old.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/Makefile: -------------------------------------------------------------------------------- 1 | NVCC=nvcc 2 | 3 | ################################### 4 | # These are the default install # 5 | # locations on most linux distros # 6 | ################################### 7 | 8 | OPENCV_LIBPATH=/usr/lib 9 | OPENCV_INCLUDEPATH=/usr/include 10 | 11 | ################################################### 12 | # On Macs the default install locations are below # 13 | ################################################### 14 | 15 | #OPENCV_LIBPATH=/usr/local/lib 16 | #OPENCV_INCLUDEPATH=/usr/local/include 17 | 18 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui 19 | 20 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include 21 | 22 | ###################################################### 23 | # On Macs the default install locations are below # 24 | # #################################################### 25 | 26 | #CUDA_INCLUDEPATH=/usr/local/cuda/include 27 | #CUDA_LIBPATH=/usr/local/cuda/lib 28 | 29 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64 30 | 31 | GCC_OPTS=-O3 -Wall -Wextra -m64 32 | 33 | student: main.o student_func.o HW3.o loadSaveImage.o compare.o reference_calc.o Makefile 34 | $(NVCC) -o HW3 main.o student_func.o HW3.o loadSaveImage.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS) 35 | 36 | main.o: main.cpp timer.h utils.h reference_calc.h compare.h 37 | g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 38 | 39 | HW3.o: HW3.cu loadSaveImage.h utils.h 40 | $(NVCC) -c HW3.cu -I $(OPENCV_INCLUDEPATH) $(NVCC_OPTS) 41 | 42 | loadSaveImage.o: loadSaveImage.cpp loadSaveImage.h 43 | g++ -c loadSaveImage.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 44 | 45 | compare.o: compare.cpp compare.h 46 | g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 47 | 48 | reference_calc.o: reference_calc.cpp reference_calc.h 49 | g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 50 | 51 | student_func.o: student_func.cu utils.h 52 | $(NVCC) -c student_func.cu $(NVCC_OPTS) 53 | 54 | clean: 55 | rm -f *.o hw 56 | find . -type f -name '*.exr' | grep -v memorial | xargs rm -f 57 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/ProblemSet3-ToneMapping.vcxproj: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | 5 | Debug 6 | Win32 7 | 8 | 9 | Debug 10 | x64 11 | 12 | 13 | Release 14 | Win32 15 | 16 | 17 | Release 18 | x64 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | {EF9EF6B2-0414-41ED-86D3-DBF8A4799BA9} 40 | ProblemSet3_ToneMapping 41 | 42 | 43 | 44 | Application 45 | true 46 | MultiByte 47 | v140 48 | 49 | 50 | Application 51 | true 52 | MultiByte 53 | v140 54 | 55 | 56 | Application 57 | false 58 | true 59 | MultiByte 60 | v140 61 | 62 | 63 | Application 64 | false 65 | true 66 | MultiByte 67 | v140 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | true 88 | 89 | 90 | true 91 | 92 | 93 | 94 | Level3 95 | Disabled 96 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 97 | 98 | 99 | true 100 | Console 101 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 102 | 103 | 104 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 105 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 106 | 107 | 108 | 109 | 110 | Level3 111 | Disabled 112 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 113 | 114 | 115 | true 116 | Console 117 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 118 | 119 | 120 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 121 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 122 | 123 | 124 | 64 125 | 126 | 127 | 128 | 129 | Level3 130 | MaxSpeed 131 | true 132 | true 133 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 134 | 135 | 136 | true 137 | true 138 | true 139 | Console 140 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 141 | 142 | 143 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 144 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 145 | 146 | 147 | 148 | 149 | Level3 150 | MaxSpeed 151 | true 152 | true 153 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 154 | %(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);C:\opencv\build\include;C:\opencv\build\include\opencv2 155 | 156 | 157 | true 158 | true 159 | true 160 | Console 161 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;opencv_world320d.lib;%(AdditionalDependencies) 162 | %(AdditionalLibraryDirectories);$(CudaToolkitLibDir);C:\opencv\build\x64\vc14\lib 163 | 164 | 165 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 166 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 167 | 168 | 169 | 64 170 | compute_61,sm_61 171 | 172 | 173 | 174 | 175 | 176 | 177 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/compare.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include "utils.h" 3 | 4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck, 5 | double perPixelError, double globalError) 6 | { 7 | cv::Mat reference = cv::imread(reference_filename, -1); 8 | cv::Mat test = cv::imread(test_filename, -1); 9 | 10 | cv::Mat diff = abs(reference - test); 11 | 12 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows 13 | 14 | double minVal, maxVal; 15 | 16 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location 17 | 18 | //now perform transform so that we bump values to the full range 19 | 20 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal)); 21 | 22 | diff = diffSingleChannel.reshape(reference.channels(), 0); 23 | 24 | cv::imwrite("HW3_differenceImage.png", diff); 25 | //OK, now we can start comparing values... 26 | unsigned char *referencePtr = reference.ptr(0); 27 | unsigned char *testPtr = test.ptr(0); 28 | 29 | if (useEpsCheck) { 30 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError); 31 | } 32 | else 33 | { 34 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels()); 35 | } 36 | 37 | std::cout << "PASS" << std::endl; 38 | return; 39 | } 40 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/compare.h: -------------------------------------------------------------------------------- 1 | #ifndef HW3_H__ 2 | #define HW3_H__ 3 | 4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck, 5 | double perPixelError, double globalError); 6 | 7 | #endif 8 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/input.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/input.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/loadSaveImage.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include "cuda_runtime.h" 7 | 8 | //The caller becomes responsible for the returned pointer. This 9 | //is done in the interest of keeping this code as simple as possible. 10 | //In production code this is a bad idea - we should use RAII 11 | //to ensure the memory is freed. DO NOT COPY THIS AND USE IN PRODUCTION 12 | //CODE!!! 13 | void loadImageHDR(const std::string &filename, 14 | float **imagePtr, 15 | size_t *numRows, size_t *numCols) 16 | { 17 | cv::Mat originImg = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH); 18 | 19 | cv::Mat image; 20 | 21 | if(originImg.type() != CV_32FC3){ 22 | originImg.convertTo(image,CV_32FC3); 23 | } else{ 24 | image = originImg; 25 | } 26 | 27 | if (image.empty()) { 28 | std::cerr << "Couldn't open file: " << filename << std::endl; 29 | exit(1); 30 | } 31 | 32 | if (image.channels() != 3) { 33 | std::cerr << "Image must be color!" << std::endl; 34 | exit(1); 35 | } 36 | 37 | if (!image.isContinuous()) { 38 | std::cerr << "Image isn't continuous!" << std::endl; 39 | exit(1); 40 | } 41 | 42 | *imagePtr = new float[image.rows * image.cols * image.channels()]; 43 | 44 | float *cvPtr = image.ptr(0); 45 | for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i) 46 | (*imagePtr)[i] = cvPtr[i]; 47 | 48 | *numRows = image.rows; 49 | *numCols = image.cols; 50 | } 51 | 52 | void loadImageRGBA(const std::string &filename, 53 | uchar4 **imagePtr, 54 | size_t *numRows, size_t *numCols) 55 | { 56 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR); 57 | if (image.empty()) { 58 | std::cerr << "Couldn't open file: " << filename << std::endl; 59 | exit(1); 60 | } 61 | 62 | if (image.channels() != 3) { 63 | std::cerr << "Image must be color!" << std::endl; 64 | exit(1); 65 | } 66 | 67 | if (!image.isContinuous()) { 68 | std::cerr << "Image isn't continuous!" << std::endl; 69 | exit(1); 70 | } 71 | 72 | cv::Mat imageRGBA; 73 | cv::cvtColor(image, imageRGBA, CV_BGR2RGBA); 74 | 75 | *imagePtr = new uchar4[image.rows * image.cols]; 76 | 77 | unsigned char *cvPtr = imageRGBA.ptr(0); 78 | for (size_t i = 0; i < image.rows * image.cols; ++i) { 79 | (*imagePtr)[i].x = cvPtr[4 * i + 0]; 80 | (*imagePtr)[i].y = cvPtr[4 * i + 1]; 81 | (*imagePtr)[i].z = cvPtr[4 * i + 2]; 82 | (*imagePtr)[i].w = cvPtr[4 * i + 3]; 83 | } 84 | 85 | *numRows = image.rows; 86 | *numCols = image.cols; 87 | } 88 | 89 | void saveImageRGBA(const uchar4* const image, 90 | const size_t numRows, const size_t numCols, 91 | const std::string &output_file) 92 | { 93 | int sizes[2]; 94 | sizes[0] = numRows; 95 | sizes[1] = numCols; 96 | cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image); 97 | cv::Mat imageOutputBGR; 98 | cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR); 99 | //output the image 100 | cv::imwrite(output_file.c_str(), imageOutputBGR); 101 | } 102 | 103 | //output an exr file 104 | //assumed to already be BGR 105 | void saveImageHDR(const float* const image, 106 | const size_t numRows, const size_t numCols, 107 | const std::string &output_file) 108 | { 109 | int sizes[2]; 110 | sizes[0] = numRows; 111 | sizes[1] = numCols; 112 | 113 | cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image); 114 | 115 | imageHDR = imageHDR * 255; 116 | 117 | cv::imwrite(output_file.c_str(), imageHDR); 118 | } 119 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/loadSaveImage.h: -------------------------------------------------------------------------------- 1 | #ifndef LOADSAVEIMAGE_H__ 2 | #define LOADSAVEIMAGE_H__ 3 | 4 | #include 5 | #include //for uchar4 6 | 7 | void loadImageHDR(const std::string &filename, 8 | float **imagePtr, 9 | size_t *numRows, size_t *numCols); 10 | 11 | void loadImageRGBA(const std::string &filename, 12 | uchar4 **imagePtr, 13 | size_t *numRows, size_t *numCols); 14 | 15 | void saveImageRGBA(const uchar4* const image, 16 | const size_t numRows, const size_t numCols, 17 | const std::string &output_file); 18 | 19 | void saveImageHDR(const float* const image, 20 | const size_t numRows, const size_t numCols, 21 | const std::string &output_file); 22 | 23 | #endif 24 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/main.cpp: -------------------------------------------------------------------------------- 1 | //Udacity HW3 Driver 2 | 3 | #include 4 | #include "timer.h" 5 | #include "utils.h" 6 | #include 7 | #include 8 | #include 9 | 10 | #include "compare.h" 11 | #include "reference_calc.h" 12 | 13 | // Functions from HW3.cu 14 | void preProcess(float **d_luminance, unsigned int **d_cdf, 15 | size_t *numRows, size_t *numCols, unsigned int *numBins, 16 | const std::string& filename); 17 | 18 | void postProcess(const std::string& output_file, size_t numRows, size_t numCols, 19 | float min_logLum, float max_logLum); 20 | 21 | void cleanupGlobalMemory(void); 22 | 23 | // Function from student_func.cu 24 | void your_histogram_and_prefixsum(const float* const d_luminance, 25 | unsigned int* const d_cdf, 26 | float &min_logLum, 27 | float &max_logLum, 28 | const size_t numRows, 29 | const size_t numCols, 30 | const size_t numBins); 31 | 32 | 33 | int main(int argc, char **argv) { 34 | float *d_luminance; 35 | unsigned int *d_cdf; 36 | 37 | size_t numRows, numCols; 38 | unsigned int numBins; 39 | 40 | std::string input_file; 41 | std::string output_file; 42 | std::string reference_file; 43 | double perPixelError = 0.0; 44 | double globalError = 0.0; 45 | bool useEpsCheck = false; 46 | 47 | switch (argc) 48 | { 49 | case 2: 50 | input_file = std::string(argv[1]); 51 | output_file = "HW3_output.png"; 52 | reference_file = "HW3_reference.png"; 53 | break; 54 | case 3: 55 | input_file = std::string(argv[1]); 56 | output_file = std::string(argv[2]); 57 | reference_file = "HW3_reference.png"; 58 | break; 59 | case 4: 60 | input_file = std::string(argv[1]); 61 | output_file = std::string(argv[2]); 62 | reference_file = std::string(argv[3]); 63 | break; 64 | case 6: 65 | useEpsCheck=true; 66 | input_file = std::string(argv[1]); 67 | output_file = std::string(argv[2]); 68 | reference_file = std::string(argv[3]); 69 | perPixelError = atof(argv[4]); 70 | globalError = atof(argv[5]); 71 | break; 72 | default: 73 | std::cerr << "Usage: ./HW3 input_file [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl; 74 | exit(1); 75 | } 76 | //load the image and give us our input and output pointers 77 | preProcess(&d_luminance, &d_cdf, 78 | &numRows, &numCols, &numBins, input_file); 79 | 80 | GpuTimer timer; 81 | float min_logLum, max_logLum; 82 | min_logLum = 0.f; 83 | max_logLum = 1.f; 84 | timer.Start(); 85 | //call the students' code 86 | your_histogram_and_prefixsum(d_luminance, d_cdf, min_logLum, max_logLum, 87 | numRows, numCols, numBins); 88 | timer.Stop(); 89 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 90 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed()); 91 | 92 | if (err < 0) { 93 | //Couldn't print! Probably the student closed stdout - bad news 94 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl; 95 | exit(1); 96 | } 97 | 98 | float *h_luminance = (float *) malloc(sizeof(float)*numRows*numCols); 99 | unsigned int *h_cdf = (unsigned int *) malloc(sizeof(unsigned int)*numBins); 100 | 101 | checkCudaErrors(cudaMemcpy(h_luminance, d_luminance, numRows*numCols*sizeof(float), cudaMemcpyDeviceToHost)); 102 | 103 | //check results and output the tone-mapped image 104 | postProcess(output_file, numRows, numCols, min_logLum, max_logLum); 105 | 106 | for (size_t i = 1; i < numCols * numRows; ++i) { 107 | min_logLum = std::min(h_luminance[i], min_logLum); 108 | max_logLum = std::max(h_luminance[i], max_logLum); 109 | } 110 | 111 | referenceCalculation(h_luminance, h_cdf, numRows, numCols, numBins, min_logLum, max_logLum); 112 | 113 | checkCudaErrors(cudaMemcpy(d_cdf, h_cdf, sizeof(unsigned int) * numBins, cudaMemcpyHostToDevice)); 114 | 115 | //check results and output the tone-mapped image 116 | postProcess(reference_file, numRows, numCols, min_logLum, max_logLum); 117 | 118 | cleanupGlobalMemory(); 119 | 120 | compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError); 121 | 122 | return 0; 123 | } 124 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/memorial.exr: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial.exr -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/memorial_large.exr: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_large.exr -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/memorial_png.gold: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_png.gold -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/memorial_png_large.gold: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_png_large.gold -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/memorial_raw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/memorial_raw_large.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_large.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/memorial_raw_large_mapped.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_large_mapped.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/memorial_raw_mapped.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/memorial_raw_mapped.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/my_output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet3-ToneMapping/my_output.png -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/reference_calc.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | void referenceCalculation(const float* const h_logLuminance, unsigned int* const h_cdf, 4 | const size_t numRows, const size_t numCols, const size_t numBins, 5 | float &logLumMin, float &logLumMax) 6 | { 7 | logLumMin = h_logLuminance[0]; 8 | logLumMax = h_logLuminance[0]; 9 | 10 | //Step 1 11 | //first we find the minimum and maximum across the entire image 12 | for (size_t i = 1; i < numCols * numRows; ++i) { 13 | logLumMin = std::min(h_logLuminance[i], logLumMin); 14 | logLumMax = std::max(h_logLuminance[i], logLumMax); 15 | } 16 | 17 | //Step 2 18 | float logLumRange = logLumMax - logLumMin; 19 | 20 | //Step 3 21 | //next we use the now known range to compute 22 | //a histogram of numBins bins 23 | unsigned int *histo = new unsigned int[numBins]; 24 | 25 | for (size_t i = 0; i < numBins; ++i) histo[i] = 0; 26 | 27 | for (size_t i = 0; i < numCols * numRows; ++i) { 28 | unsigned int bin = std::min(static_cast(numBins - 1), 29 | static_cast((h_logLuminance[i] - logLumMin) / logLumRange * numBins)); 30 | histo[bin]++; 31 | } 32 | 33 | //Step 4 34 | //finally we perform and exclusive scan (prefix sum) 35 | //on the histogram to get the cumulative distribution 36 | h_cdf[0] = 0; 37 | for (size_t i = 1; i < numBins; ++i) { 38 | h_cdf[i] = h_cdf[i - 1] + histo[i - 1]; 39 | } 40 | 41 | delete[] histo; 42 | } -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/reference_calc.h: -------------------------------------------------------------------------------- 1 | #ifndef REFERENCE_H__ 2 | #define REFERENCE_H__ 3 | 4 | void referenceCalculation(const float* const h_logLuminance, unsigned int* const h_cdf, 5 | const size_t numRows, const size_t numCols, const size_t numBins, 6 | float &logLumMin, float &logLumMax); 7 | 8 | #endif 9 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/student_func.cu: -------------------------------------------------------------------------------- 1 | /* Udacity Homework 3 2 | HDR Tone-mapping 3 | 4 | Background HDR 5 | ============== 6 | 7 | A High Dynamic Range (HDR) image contains a wider variation of intensity 8 | and color than is allowed by the RGB format with 1 byte per channel that we 9 | have used in the previous assignment. 10 | 11 | To store this extra information we use single precision floating point for 12 | each channel. This allows for an extremely wide range of intensity values. 13 | 14 | In the image for this assignment, the inside of church with light coming in 15 | through stained glass windows, the raw input floating point values for the 16 | channels range from 0 to 275. But the mean is .41 and 98% of the values are 17 | less than 3! This means that certain areas (the windows) are extremely bright 18 | compared to everywhere else. If we linearly map this [0-275] range into the 19 | [0-255] range that we have been using then most values will be mapped to zero! 20 | The only thing we will be able to see are the very brightest areas - the 21 | windows - everything else will appear pitch black. 22 | 23 | The problem is that although we have cameras capable of recording the wide 24 | range of intensity that exists in the real world our monitors are not capable 25 | of displaying them. Our eyes are also quite capable of observing a much wider 26 | range of intensities than our image formats / monitors are capable of 27 | displaying. 28 | 29 | Tone-mapping is a process that transforms the intensities in the image so that 30 | the brightest values aren't nearly so far away from the mean. That way when 31 | we transform the values into [0-255] we can actually see the entire image. 32 | There are many ways to perform this process and it is as much an art as a 33 | science - there is no single "right" answer. In this homework we will 34 | implement one possible technique. 35 | 36 | Background Chrominance-Luminance 37 | ================================ 38 | 39 | The RGB space that we have been using to represent images can be thought of as 40 | one possible set of axes spanning a three dimensional space of color. We 41 | sometimes choose other axes to represent this space because they make certain 42 | operations more convenient. 43 | 44 | Another possible way of representing a color image is to separate the color 45 | information (chromaticity) from the brightness information. There are 46 | multiple different methods for doing this - a common one during the analog 47 | television days was known as Chrominance-Luminance or YUV. 48 | 49 | We choose to represent the image in this way so that we can remap only the 50 | intensity channel and then recombine the new intensity values with the color 51 | information to form the final image. 52 | 53 | Old TV signals used to be transmitted in this way so that black & white 54 | televisions could display the luminance channel while color televisions would 55 | display all three of the channels. 56 | 57 | 58 | Tone-mapping 59 | ============ 60 | 61 | In this assignment we are going to transform the luminance channel (actually 62 | the log of the luminance, but this is unimportant for the parts of the 63 | algorithm that you will be implementing) by compressing its range to [0, 1]. 64 | To do this we need the cumulative distribution of the luminance values. 65 | 66 | Example 67 | ------- 68 | 69 | input : [2 4 3 3 1 7 4 5 7 0 9 4 3 2] 70 | min / max / range: 0 / 9 / 9 71 | 72 | histo with 3 bins: [4 7 3] 73 | 74 | cdf : [4 11 14] 75 | 76 | 77 | Your task is to calculate this cumulative distribution by following these 78 | steps. 79 | 80 | */ 81 | 82 | #include "utils.h" 83 | #include "device_launch_parameters.h" 84 | //#include "reference_calc.cpp" 85 | #include 86 | #include 87 | #include 88 | 89 | const int BLOCK_SIZE = 1024; 90 | 91 | __device__ float _min(float a, float b) { 92 | return a < b ? a : b; 93 | } 94 | 95 | __device__ float _max(float a, float b) { 96 | return a > b ? a : b; 97 | } 98 | 99 | __global__ void minmax_reduce(float* d_out, const float * d_in, int input_size,bool isMin) { 100 | 101 | extern __shared__ float sdata[]; 102 | 103 | int tid = threadIdx.x; 104 | int global_id = tid + blockDim.x*blockIdx.x; 105 | 106 | if (global_id >= input_size) { sdata[tid] = d_in[0]; } //dummy init (does not modify the final result) 107 | else sdata[tid] = d_in[global_id]; 108 | __syncthreads(); 109 | for (int s = blockDim.x/2; s > 0; s>>=1){ 110 | if (tid < s) sdata[tid] = isMin ? _min(sdata[tid], sdata[tid + s]) : _max(sdata[tid], sdata[tid + s]); 111 | __syncthreads(); 112 | } 113 | if (tid == 0) { 114 | d_out[blockIdx.x] = sdata[0]; 115 | } 116 | } 117 | 118 | 119 | 120 | __global__ void histo_atomic(unsigned int* out_histo, const float * d_in, int numBins, int input_size, float minVal, float rangeVals) { 121 | int tid = threadIdx.x; 122 | int global_id = tid + blockDim.x*blockIdx.x; 123 | if (global_id >= input_size) return; 124 | int bin = ((d_in[global_id] - minVal)*numBins) / rangeVals; 125 | bin = bin == numBins ? numBins - 1 : bin; //max value bin is the last of the histo 126 | atomicAdd(&(out_histo[bin]), 1); 127 | } 128 | 129 | 130 | //--------HILLIS-STEELE SCAN---------- 131 | //Optimal step efficiency (histogram is a relatively small vector) 132 | //Works on maximum 1024 (Pascal) elems vector. 133 | __global__ void scan_hillis_steele(unsigned int* d_out,const unsigned int* d_in, int size) { 134 | extern __shared__ unsigned int temp[]; 135 | int tid = threadIdx.x; 136 | int pout = 0,pin=1; 137 | temp[tid] = tid>0? d_in[tid-1]:0; //exclusive scan 138 | __syncthreads(); 139 | 140 | //double buffered 141 | for (int off = 1; off < size; off <<= 1) { 142 | pout = 1 - pout; 143 | pin = 1 - pout; 144 | if (tid >= off) temp[size*pout + tid] = temp[size*pin + tid]+temp[size*pin + tid - off]; 145 | else temp[size*pout + tid] = temp[size*pin + tid]; 146 | __syncthreads(); 147 | } 148 | d_out[tid] = temp[pout*size + tid]; 149 | } 150 | 151 | 152 | float reduce(const float* const d_logLuminance, int input_size,bool isMin) { 153 | int threads = BLOCK_SIZE; 154 | float* d_current_in = NULL; 155 | int size = input_size; 156 | int blocks = ceil(1.0f*size / threads); 157 | while (true) { 158 | //allocate memory for intermediate results 159 | //printf("Size %d blocks %d\n", size,blocks); 160 | float* d_out; 161 | checkCudaErrors(cudaMalloc(&d_out, blocks * sizeof(float))); 162 | //call reduce kernel: if first iteration use original vector, otherwise use the last intermediate result. 163 | if (d_current_in == NULL) minmax_reduce << > > (d_out, d_logLuminance, size, isMin); 164 | else minmax_reduce << > > (d_out, d_current_in, size, isMin);; 165 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 166 | 167 | //free last intermediate result 168 | if (d_current_in != NULL) checkCudaErrors(cudaFree(d_current_in)); 169 | 170 | if (blocks == 1) { 171 | //end of reduction reached 172 | float h_out; 173 | checkCudaErrors(cudaMemcpy(&h_out, d_out, sizeof(float), cudaMemcpyDeviceToHost)); 174 | return h_out; 175 | } 176 | size = blocks; 177 | blocks = ceil(1.0f*size / threads); 178 | if (blocks == 0)blocks++; 179 | d_current_in = d_out;//point to new intermediate result 180 | 181 | } 182 | 183 | } 184 | 185 | 186 | unsigned int* compute_histogram(const float* const d_logLuminance, int numBins, int input_size, float minVal, float rangeVals) { 187 | unsigned int* d_histo; 188 | checkCudaErrors(cudaMalloc(&d_histo, numBins * sizeof(unsigned int))); 189 | checkCudaErrors(cudaMemset(d_histo, 0, numBins * sizeof(unsigned int))); 190 | int threads = BLOCK_SIZE; 191 | int blocks = ceil(1.0f*input_size / threads); 192 | histo_atomic << > >(d_histo, d_logLuminance, numBins, input_size, minVal, rangeVals); 193 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 194 | return d_histo; 195 | } 196 | 197 | void your_histogram_and_prefixsum(const float* const d_logLuminance, 198 | unsigned int* const d_cdf, 199 | float &min_logLum, 200 | float &max_logLum, 201 | const size_t numRows, 202 | const size_t numCols, 203 | const size_t numBins) 204 | { 205 | /*Here are the steps you need to implement 206 | 1) find the minimum and maximum value in the input logLuminance channel 207 | store in min_logLum and max_logLum 208 | 2) subtract them to find the range 209 | 3) generate a histogram of all the values in the logLuminance channel using 210 | the formula: bin = (lum[i] - lumMin) / lumRange * numBins 211 | 4) Perform an exclusive scan (prefix sum) on the histogram to get 212 | the cumulative distribution of luminance values (this should go in the 213 | incoming d_cdf pointer which already has been allocated for you) */ 214 | 215 | //1. Reduce 216 | int input_size = numRows*numCols; 217 | min_logLum = reduce(d_logLuminance, input_size, true); 218 | max_logLum = reduce(d_logLuminance, input_size, false); 219 | //printf("%f %f\n", min_logLum, max_logLum); 220 | 221 | //2. Range 222 | float range = max_logLum - min_logLum; 223 | 224 | //3. Histogram 225 | unsigned int* d_histo=compute_histogram(d_logLuminance, numBins, input_size, min_logLum, range); 226 | 227 | //4. CDF (scan) 228 | //Assumption: numBins<=1024 229 | scan_hillis_steele << <1, numBins, 2*numBins*sizeof(unsigned int) >> > (d_cdf,d_histo, numBins); 230 | 231 | checkCudaErrors(cudaFree(d_histo)); 232 | 233 | } 234 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/timer.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_TIMER_H__ 2 | #define GPU_TIMER_H__ 3 | 4 | #include 5 | 6 | struct GpuTimer 7 | { 8 | cudaEvent_t start; 9 | cudaEvent_t stop; 10 | 11 | GpuTimer() 12 | { 13 | cudaEventCreate(&start); 14 | cudaEventCreate(&stop); 15 | } 16 | 17 | ~GpuTimer() 18 | { 19 | cudaEventDestroy(start); 20 | cudaEventDestroy(stop); 21 | } 22 | 23 | void Start() 24 | { 25 | cudaEventRecord(start, 0); 26 | } 27 | 28 | void Stop() 29 | { 30 | cudaEventRecord(stop, 0); 31 | } 32 | 33 | float Elapsed() 34 | { 35 | float elapsed; 36 | cudaEventSynchronize(stop); 37 | cudaEventElapsedTime(&elapsed, start, stop); 38 | return elapsed; 39 | } 40 | }; 41 | 42 | #endif /* GPU_TIMER_H__ */ 43 | -------------------------------------------------------------------------------- /ProblemSet3-ToneMapping/utils.h: -------------------------------------------------------------------------------- 1 | #ifndef UTILS_H__ 2 | #define UTILS_H__ 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__) 14 | 15 | template 16 | void check(T err, const char* const func, const char* const file, const int line) { 17 | if (err != cudaSuccess) { 18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl; 19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl; 20 | exit(1); 21 | } 22 | } 23 | 24 | template 25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) { 26 | //check that the GPU result matches the CPU result 27 | for (size_t i = 0; i < numElem; ++i) { 28 | if (ref[i] != gpu[i]) { 29 | std::cerr << "Difference at pos " << i << std::endl; 30 | //the + is magic to convert char to int without messing 31 | //with other types 32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 33 | "\nGPU : " << +gpu[i] << std::endl; 34 | exit(1); 35 | } 36 | } 37 | } 38 | 39 | template 40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) { 41 | assert(eps1 >= 0 && eps2 >= 0); 42 | unsigned long long totalDiff = 0; 43 | unsigned numSmallDifferences = 0; 44 | for (size_t i = 0; i < numElem; ++i) { 45 | //subtract smaller from larger in case of unsigned types 46 | T smaller = std::min(ref[i], gpu[i]); 47 | T larger = std::max(ref[i], gpu[i]); 48 | T diff = larger - smaller; 49 | if (diff > 0 && diff <= eps1) { 50 | numSmallDifferences++; 51 | } 52 | else if (diff > eps1) { 53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl; 54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 55 | "\nGPU : " << +gpu[i] << std::endl; 56 | exit(1); 57 | } 58 | totalDiff += diff * diff; 59 | } 60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem; 61 | if (percentSmallDifferences > eps2) { 62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl; 63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl; 64 | exit(1); 65 | } 66 | } 67 | 68 | //Uses the autodesk method of image comparison 69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels 70 | template 71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance) 72 | { 73 | 74 | size_t numBadPixels = 0; 75 | for (size_t i = 0; i < numElem; ++i) { 76 | T smaller = std::min(ref[i], gpu[i]); 77 | T larger = std::max(ref[i], gpu[i]); 78 | T diff = larger - smaller; 79 | if (diff > variance) 80 | ++numBadPixels; 81 | } 82 | 83 | if (numBadPixels > tolerance) { 84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl; 85 | exit(1); 86 | } 87 | } 88 | 89 | #endif 90 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | ############################################################################ 2 | # CMakeLists.txt for OpenCV and CUDA. 3 | # 2012-02-07 4 | # Quan Tran Minh. edit by Johannes Kast, Michael Sarahan 5 | # quantm@unist.ac.kr kast.jo@googlemail.com msarahan@gmail.com 6 | ############################################################################ 7 | 8 | # collect source files 9 | 10 | file( GLOB hdr *.hpp *.h ) 11 | file( GLOB cu *.cu) 12 | SET (HW4_files main.cpp loadSaveImage.cpp reference_calc.cpp compare.cpp) 13 | 14 | CUDA_ADD_EXECUTABLE(HW4 ${HW4_files} ${hdr} ${img} ${cu}) 15 | 16 | 17 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/HW4_output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/HW4_output.png -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/Makefile: -------------------------------------------------------------------------------- 1 | NVCC=/usr/local/cuda-5.0/bin/nvcc 2 | #NVCC=nvcc 3 | 4 | ################################### 5 | # These are the default install # 6 | # locations on most linux distros # 7 | ################################### 8 | 9 | OPENCV_LIBPATH=/usr/lib 10 | OPENCV_INCLUDEPATH=/usr/include 11 | 12 | ################################################### 13 | # On Macs the default install locations are below # 14 | ################################################### 15 | 16 | #OPENCV_LIBPATH=/usr/local/lib 17 | #OPENCV_INCLUDEPATH=/usr/local/include 18 | 19 | OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui 20 | 21 | CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include 22 | # CUDA_INCLUDEPATH=/usr/local/cuda/lib64/include 23 | # CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include 24 | # CUDA_INCLUDEPATH=/Developer/NVIDIA/CUDA-5.0/include 25 | 26 | ###################################################### 27 | # On Macs the default install locations are below # 28 | # #################################################### 29 | 30 | #CUDA_INCLUDEPATH=/usr/local/cuda/include 31 | #CUDA_LIBPATH=/usr/local/cuda/lib 32 | CUDA_LIBPATH=/usr/local/cuda-5.0/lib64 33 | 34 | NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64 35 | 36 | GCC_OPTS=-O3 -Wall -Wextra -m64 37 | 38 | student: main.o student_func.o HW4.o loadSaveImage.o compare.o reference_calc.o Makefile 39 | $(NVCC) -o HW4 main.o student_func.o HW4.o loadSaveImage.o compare.o reference_calc.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(NVCC_OPTS) 40 | 41 | main.o: main.cpp timer.h utils.h reference_calc.h 42 | g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 43 | 44 | HW4.o: HW4.cu loadSaveImage.h utils.h 45 | $(NVCC) -c HW4.cu -I $(OPENCV_INCLUDEPATH) $(NVCC_OPTS) 46 | 47 | loadSaveImage.o: loadSaveImage.cpp loadSaveImage.h 48 | g++ -c loadSaveImage.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 49 | 50 | compare.o: compare.cpp compare.h 51 | g++ -c compare.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 52 | 53 | reference_calc.o: reference_calc.cpp reference_calc.h 54 | g++ -c reference_calc.cpp -I $(OPENCV_INCLUDEPATH) $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) 55 | 56 | student_func.o: student_func.cu reference_calc.cpp utils.h 57 | $(NVCC) -c student_func.cu $(NVCC_OPTS) 58 | 59 | clean: 60 | rm -f *.o *.png hw 61 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/compare.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include "utils.h" 3 | 4 | 5 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck, 6 | double perPixelError, double globalError) 7 | { 8 | cv::Mat reference = cv::imread(reference_filename, -1); 9 | cv::Mat test = cv::imread(test_filename, -1); 10 | 11 | cv::Mat diff = abs(reference - test); 12 | 13 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows 14 | 15 | double minVal, maxVal; 16 | 17 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location 18 | 19 | //now perform transform so that we bump values to the full range 20 | 21 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal)); 22 | 23 | diff = diffSingleChannel.reshape(reference.channels(), 0); 24 | 25 | cv::imwrite("HW4_differenceImage.png", diff); 26 | //OK, now we can start comparing values... 27 | unsigned char *referencePtr = reference.ptr(0); 28 | unsigned char *testPtr = test.ptr(0); 29 | 30 | if (useEpsCheck) { 31 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError); 32 | } 33 | else 34 | { 35 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels()); 36 | } 37 | 38 | std::cout << "PASS" << std::endl; 39 | return; 40 | } -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/compare.h: -------------------------------------------------------------------------------- 1 | #ifndef HW4_H__ 2 | #define HW4_H__ 3 | 4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck, 5 | double perPixelError, double globalError); 6 | 7 | #endif -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/loadSaveImage.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "cuda_runtime.h" 6 | 7 | //The caller becomes responsible for the returned pointer. This 8 | //is done in the interest of keeping this code as simple as possible. 9 | //In production code this is a bad idea - we should use RAII 10 | //to ensure the memory is freed. DO NOT COPY THIS AND USE IN PRODUCTION 11 | //CODE!!! 12 | void loadImageHDR(const std::string &filename, 13 | float **imagePtr, 14 | size_t *numRows, size_t *numCols) 15 | { 16 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH); 17 | if (image.empty()) { 18 | std::cerr << "Couldn't open file: " << filename << std::endl; 19 | exit(1); 20 | } 21 | 22 | if (image.channels() != 3) { 23 | std::cerr << "Image must be color!" << std::endl; 24 | exit(1); 25 | } 26 | 27 | if (!image.isContinuous()) { 28 | std::cerr << "Image isn't continuous!" << std::endl; 29 | exit(1); 30 | } 31 | 32 | *imagePtr = new float[image.rows * image.cols * image.channels()]; 33 | 34 | float *cvPtr = image.ptr(0); 35 | for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i) 36 | (*imagePtr)[i] = cvPtr[i]; 37 | 38 | *numRows = image.rows; 39 | *numCols = image.cols; 40 | } 41 | 42 | void loadImageRGBA(const std::string &filename, 43 | uchar4 **imagePtr, 44 | size_t *numRows, size_t *numCols) 45 | { 46 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR); 47 | if (image.empty()) { 48 | std::cerr << "Couldn't open file: " << filename << std::endl; 49 | exit(1); 50 | } 51 | 52 | if (image.channels() != 3) { 53 | std::cerr << "Image must be color!" << std::endl; 54 | exit(1); 55 | } 56 | 57 | if (!image.isContinuous()) { 58 | std::cerr << "Image isn't continuous!" << std::endl; 59 | exit(1); 60 | } 61 | 62 | cv::Mat imageRGBA; 63 | cv::cvtColor(image, imageRGBA, CV_BGR2RGBA); 64 | 65 | *imagePtr = new uchar4[image.rows * image.cols]; 66 | 67 | unsigned char *cvPtr = imageRGBA.ptr(0); 68 | for (size_t i = 0; i < image.rows * image.cols; ++i) { 69 | (*imagePtr)[i].x = cvPtr[4 * i + 0]; 70 | (*imagePtr)[i].y = cvPtr[4 * i + 1]; 71 | (*imagePtr)[i].z = cvPtr[4 * i + 2]; 72 | (*imagePtr)[i].w = cvPtr[4 * i + 3]; 73 | } 74 | 75 | *numRows = image.rows; 76 | *numCols = image.cols; 77 | } 78 | 79 | void saveImageRGBA(const uchar4* const image, 80 | const size_t numRows, const size_t numCols, 81 | const std::string &output_file) 82 | { 83 | int sizes[2]; 84 | sizes[0] = numRows; 85 | sizes[1] = numCols; 86 | cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image); 87 | cv::Mat imageOutputBGR; 88 | cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR); 89 | //output the image 90 | cv::imwrite(output_file.c_str(), imageOutputBGR); 91 | } 92 | 93 | //output an exr file 94 | //assumed to already be BGR 95 | void saveImageHDR(const float* const image, 96 | const size_t numRows, const size_t numCols, 97 | const std::string &output_file) 98 | { 99 | int sizes[2]; 100 | sizes[0] = numRows; 101 | sizes[1] = numCols; 102 | 103 | cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image); 104 | 105 | imageHDR = imageHDR * 255; 106 | 107 | cv::imwrite(output_file.c_str(), imageHDR); 108 | } 109 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/loadSaveImage.h: -------------------------------------------------------------------------------- 1 | #ifndef LOADSAVEIMAGE_H__ 2 | #define LOADSAVEIMAGE_H__ 3 | 4 | #include 5 | #include //for uchar4 6 | 7 | void loadImageHDR(const std::string &filename, 8 | float **imagePtr, 9 | size_t *numRows, size_t *numCols); 10 | 11 | void loadImageRGBA(const std::string &filename, 12 | uchar4 **imagePtr, 13 | size_t *numRows, size_t *numCols); 14 | 15 | void saveImageRGBA(const uchar4* const image, 16 | const size_t numRows, const size_t numCols, 17 | const std::string &output_file); 18 | 19 | void saveImageHDR(const float* const image, 20 | const size_t numRows, const size_t numCols, 21 | const std::string &output_file); 22 | 23 | #endif 24 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/main.cpp: -------------------------------------------------------------------------------- 1 | //Udacity HW4 Driver 2 | 3 | #include 4 | #include "timer.h" 5 | #include "utils.h" 6 | #include 7 | #include 8 | #include 9 | #include 10 | 11 | #include "compare.h" 12 | #include "reference_calc.h" 13 | 14 | void preProcess(unsigned int **inputVals, 15 | unsigned int **inputPos, 16 | unsigned int **outputVals, 17 | unsigned int **outputPos, 18 | size_t &numElems, 19 | const std::string& filename, 20 | const std::string& template_file); 21 | 22 | void postProcess(const unsigned int* const outputVals, 23 | const unsigned int* const outputPos, 24 | const size_t numElems, 25 | const std::string& output_file); 26 | 27 | void your_sort(unsigned int* const inputVals, 28 | unsigned int* const inputPos, 29 | unsigned int* const outputVals, 30 | unsigned int* const outputPos, 31 | const size_t numElems); 32 | 33 | int main(int argc, char **argv) { 34 | unsigned int *inputVals; 35 | unsigned int *inputPos; 36 | unsigned int *outputVals; 37 | unsigned int *outputPos; 38 | 39 | size_t numElems; 40 | 41 | std::string input_file; 42 | std::string template_file; 43 | std::string output_file; 44 | std::string reference_file; 45 | double perPixelError = 0.0; 46 | double globalError = 0.0; 47 | bool useEpsCheck = false; 48 | 49 | switch (argc) 50 | { 51 | case 3: 52 | input_file = std::string(argv[1]); 53 | template_file = std::string(argv[2]); 54 | output_file = "HW4_output.png"; 55 | break; 56 | case 4: 57 | input_file = std::string(argv[1]); 58 | template_file = std::string(argv[2]); 59 | output_file = std::string(argv[3]); 60 | break; 61 | default: 62 | std::cerr << "Usage: ./HW4 input_file template_file [output_filename]" << std::endl; 63 | exit(1); 64 | } 65 | //load the image and give us our input and output pointers 66 | preProcess(&inputVals, &inputPos, &outputVals, &outputPos, numElems, input_file, template_file); 67 | 68 | GpuTimer timer; 69 | timer.Start(); 70 | 71 | //call the students' code 72 | your_sort(inputVals, inputPos, outputVals, outputPos, numElems); 73 | 74 | timer.Stop(); 75 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 76 | printf("\n"); 77 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed()); 78 | 79 | if (err < 0) { 80 | //Couldn't print! Probably the student closed stdout - bad news 81 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl; 82 | exit(1); 83 | } 84 | 85 | //check results and output the red-eye corrected image 86 | postProcess(outputVals, outputPos, numElems, output_file); 87 | 88 | // check code moved from HW4.cu 89 | /**************************************************************************** 90 | * You can use the code below to help with debugging, but make sure to * 91 | * comment it out again before submitting your assignment for grading, * 92 | * otherwise this code will take too much time and make it seem like your * 93 | * GPU implementation isn't fast enough. * 94 | * * 95 | * This code MUST RUN BEFORE YOUR CODE in case you accidentally change * 96 | * the input values when implementing your radix sort. * 97 | * * 98 | * This code performs the reference radix sort on the host and compares your * 99 | * sorted values to the reference. * 100 | * * 101 | * Thrust containers are used for copying memory from the GPU * 102 | * ************************************************************************* */ 103 | thrust::device_ptr d_inputVals(inputVals); 104 | thrust::device_ptr d_inputPos(inputPos); 105 | 106 | thrust::host_vector h_inputVals(d_inputVals, 107 | d_inputVals+numElems); 108 | thrust::host_vector h_inputPos(d_inputPos, 109 | d_inputPos + numElems); 110 | 111 | thrust::host_vector h_outputVals(numElems); 112 | thrust::host_vector h_outputPos(numElems); 113 | 114 | reference_calculation(&h_inputVals[0], &h_inputPos[0], 115 | &h_outputVals[0], &h_outputPos[0], 116 | numElems); 117 | 118 | //postProcess(valsPtr, posPtr, numElems, reference_file); 119 | 120 | //compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError); 121 | 122 | thrust::device_ptr d_outputVals(outputVals); 123 | thrust::device_ptr d_outputPos(outputPos); 124 | 125 | thrust::host_vector h_yourOutputVals(d_outputVals, 126 | d_outputVals + numElems); 127 | thrust::host_vector h_yourOutputPos(d_outputPos, 128 | d_outputPos + numElems); 129 | 130 | checkResultsExact(&h_outputVals[0], &h_yourOutputVals[0], numElems); 131 | checkResultsExact(&h_outputPos[0], &h_yourOutputPos[0], numElems); 132 | 133 | checkCudaErrors(cudaFree(inputVals)); 134 | checkCudaErrors(cudaFree(inputPos)); 135 | checkCudaErrors(cudaFree(outputVals)); 136 | checkCudaErrors(cudaFree(outputPos)); 137 | 138 | return 0; 139 | } 140 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/red_eye_effect.gold: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect.gold -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/red_eye_effect_5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_5.jpg -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/red_eye_effect_5_out.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_5_out.jpg -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/red_eye_effect_template_5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet4-RedEyeRemoval/red_eye_effect_template_5.jpg -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/reference_calc.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | // For memset 3 | #include 4 | 5 | void reference_calculation(unsigned int* inputVals, 6 | unsigned int* inputPos, 7 | unsigned int* outputVals, 8 | unsigned int* outputPos, 9 | const size_t numElems) 10 | { 11 | const int numBits = 1; 12 | const int numBins = 1 << numBits; 13 | 14 | unsigned int *binHistogram = new unsigned int[numBins]; 15 | unsigned int *binScan = new unsigned int[numBins]; 16 | 17 | unsigned int *vals_src = inputVals; 18 | unsigned int *pos_src = inputPos; 19 | 20 | unsigned int *vals_dst = outputVals; 21 | unsigned int *pos_dst = outputPos; 22 | 23 | //a simple radix sort - only guaranteed to work for numBits that are multiples of 2 24 | for (unsigned int i = 0; i < 8 * sizeof(unsigned int); i += numBits) { 25 | unsigned int mask = (numBins - 1) << i; 26 | 27 | memset(binHistogram, 0, sizeof(unsigned int) * numBins); //zero out the bins 28 | memset(binScan, 0, sizeof(unsigned int) * numBins); //zero out the bins 29 | 30 | //perform histogram of data & mask into bins 31 | for (unsigned int j = 0; j < numElems; ++j) { 32 | unsigned int bin = (vals_src[j] & mask) >> i; 33 | binHistogram[bin]++; 34 | } 35 | 36 | //perform exclusive prefix sum (scan) on binHistogram to get starting 37 | //location for each bin 38 | for (unsigned int j = 1; j < numBins; ++j) { 39 | binScan[j] = binScan[j - 1] + binHistogram[j - 1]; 40 | } 41 | 42 | //Gather everything into the correct location 43 | //need to move vals and positions 44 | for (unsigned int j = 0; j < numElems; ++j) { 45 | unsigned int bin = (vals_src[j] & mask) >> i; 46 | vals_dst[binScan[bin]] = vals_src[j]; 47 | pos_dst[binScan[bin]] = pos_src[j]; 48 | binScan[bin]++; 49 | } 50 | 51 | //swap the buffers (pointers only) 52 | std::swap(vals_dst, vals_src); 53 | std::swap(pos_dst, pos_src); 54 | } 55 | 56 | //we did an even number of iterations, need to copy from input buffer into output 57 | std::copy(inputVals, inputVals + numElems, outputVals); 58 | std::copy(inputPos, inputPos + numElems, outputPos); 59 | 60 | delete[] binHistogram; 61 | delete[] binScan; 62 | } 63 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/reference_calc.h: -------------------------------------------------------------------------------- 1 | #ifndef REFERENCE_H__ 2 | #define REFERENCE_H__ 3 | 4 | 5 | //A simple un-optimized reference radix sort calculation 6 | //Only deals with power-of-2 radices 7 | 8 | 9 | void reference_calculation(unsigned int* inputVals, 10 | unsigned int* inputPos, 11 | unsigned int* outputVals, 12 | unsigned int* outputPos, 13 | const size_t numElems); 14 | #endif -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/student_func.cu: -------------------------------------------------------------------------------- 1 | //Udacity HW 4 2 | //Radix Sorting 3 | 4 | #include "utils.h" 5 | #include "device_launch_parameters.h" 6 | #include 7 | 8 | const int BLOCK_SIZE = 1024; 9 | 10 | /* Red Eye Removal 11 | =============== 12 | 13 | For this assignment we are implementing red eye removal. This is 14 | accomplished by first creating a score for every pixel that tells us how 15 | likely it is to be a red eye pixel. We have already done this for you - you 16 | are receiving the scores and need to sort them in ascending order so that we 17 | know which pixels to alter to remove the red eye. 18 | 19 | Note: ascending order == smallest to largest 20 | 21 | Each score is associated with a position, when you sort the scores, you must 22 | also move the positions accordingly. 23 | 24 | Implementing Parallel Radix Sort with CUDA 25 | ========================================== 26 | 27 | The basic idea is to construct a histogram on each pass of how many of each 28 | "digit" there are. Then we scan this histogram so that we know where to put 29 | the output of each digit. For example, the first 1 must come after all the 30 | 0s so we have to know how many 0s there are to be able to start moving 1s 31 | into the correct position. 32 | 33 | 1) Histogram of the number of occurrences of each digit 34 | 2) Exclusive Prefix Sum of Histogram 35 | 3) Determine relative offset of each digit 36 | For example [0 0 1 1 0 0 1] 37 | -> [0 1 0 1 2 3 2] 38 | 4) Combine the results of steps 2 & 3 to determine the final 39 | output location for each element and move it there 40 | 41 | LSB Radix sort is an out-of-place sort and you will need to ping-pong values 42 | between the input and output buffers we have provided. Make sure the final 43 | sorted results end up in the output buffer! Hint: You may need to do a copy 44 | at the end. 45 | 46 | */ 47 | 48 | 49 | 50 | __global__ void predicate(unsigned int* predicate, const unsigned int* d_in, size_t numElems,int bit) { 51 | int tid = threadIdx.x; 52 | int global_id = tid + blockDim.x*blockIdx.x; 53 | if (global_id >= numElems) return; 54 | unsigned int bin = ((d_in[global_id] >> bit) & 1u); 55 | predicate[global_id] =bin; 56 | } 57 | 58 | 59 | __global__ void bielloch_scan(unsigned int* d_out, const unsigned int* d_in, size_t input_size, unsigned int* blockSums) { 60 | extern __shared__ unsigned int data[]; 61 | 62 | int tid = threadIdx.x; 63 | int offset = 1; 64 | int abs_start = 2*blockDim.x*blockIdx.x; 65 | 66 | data[2 * tid] =(abs_start+2*tid)>1; d>0; d>>=1) { 70 | __syncthreads(); 71 | 72 | if (tid < d) { 73 | int ai = offset*(2 * tid + 1) - 1; 74 | int bi = offset*(2 * tid + 2) - 1; 75 | 76 | data[bi] += data[ai]; 77 | } 78 | offset <<= 1; 79 | } 80 | if (tid == 0)data[2*blockDim.x - 1] = 0; 81 | 82 | for (int d = 1; d < 2 * blockDim.x; d<<=1) { 83 | offset >>= 1; 84 | __syncthreads(); 85 | if (tid < d) { 86 | int ai = offset*(2 * tid + 1) - 1; 87 | int bi = offset*(2 * tid + 2) - 1; 88 | unsigned int t = data[ai]; 89 | data[ai] = data[bi]; 90 | data[bi] += t; 91 | } 92 | } 93 | 94 | __syncthreads(); 95 | 96 | if (abs_start + 2 * tid < input_size) { 97 | d_out[abs_start + 2 * tid] = data[2 * tid]; 98 | } 99 | if (abs_start + 2 * tid+1 < input_size) { 100 | d_out[abs_start + 2 * tid+1] = data[2 * tid+1]; 101 | } 102 | 103 | if (tid == 0) { 104 | blockSums[blockIdx.x] = data[blockDim.x * 2 - 1]; 105 | if(abs_start + blockDim.x * 2 - 1= input_size)return; 126 | predicate[pos] = predicate[pos] ? 0 : 1; 127 | } 128 | 129 | __global__ void moveElements(unsigned int* d_out, const unsigned int* d_in, const unsigned int* d_histo, 130 | const unsigned int* d_predicate,const unsigned int* d_scan_true, const unsigned int* d_scan_false, size_t input_size) { 131 | int tid = threadIdx.x; 132 | int pos = blockDim.x*blockIdx.x + tid; 133 | if (pos >= input_size)return; 134 | //calculate new index of element at position pos 135 | int newindex; 136 | if (d_predicate[pos])newindex = d_histo[0] + d_scan_false[pos]; 137 | else newindex = d_histo[1] + d_scan_true[pos]; 138 | if (newindex >= input_size) return; //IMP 139 | d_out[newindex] = d_in[pos]; 140 | } 141 | 142 | 143 | 144 | unsigned int biellochScan(unsigned int* d_scan, unsigned int* d_pred, size_t numElems) { 145 | 146 | int num_double_blocks = ceil(1.0f*numElems / (2*BLOCK_SIZE)); 147 | unsigned int* d_blocksums; 148 | checkCudaErrors(cudaMalloc(&d_blocksums, num_double_blocks * sizeof(unsigned int))); 149 | bielloch_scan << > > (d_scan, d_pred, numElems, d_blocksums); 150 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 151 | 152 | unsigned int finalSum; 153 | //Scan of the blocksums array 154 | if (num_double_blocks > 1) { 155 | unsigned int* d_scan_temp; 156 | checkCudaErrors(cudaMalloc(&d_scan_temp, num_double_blocks * sizeof(unsigned int))); 157 | finalSum=biellochScan(d_scan_temp, d_blocksums, num_double_blocks); 158 | adjustIncrement << > > (d_scan, d_scan_temp, numElems); 159 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 160 | checkCudaErrors(cudaFree(d_scan_temp)); 161 | } 162 | else { 163 | 164 | checkCudaErrors(cudaMemcpy(&finalSum, d_blocksums, sizeof(unsigned int), cudaMemcpyDeviceToHost)); 165 | checkCudaErrors(cudaFree(d_blocksums)); 166 | } 167 | 168 | return finalSum; 169 | 170 | } 171 | 172 | void your_sort(unsigned int* const d_inputVals, 173 | unsigned int* const d_inputPos, 174 | unsigned int* const d_outputVals, 175 | unsigned int* const d_outputPos, 176 | size_t numElems) 177 | { 178 | //PUT YOUR SORT HERE 179 | int num_blocks = ceil(1.0f*numElems / BLOCK_SIZE); 180 | 181 | unsigned int h_histo[2]; 182 | h_histo[0] = 0; 183 | 184 | unsigned int* d_histo; 185 | unsigned int* d_pred; 186 | unsigned int* d_scan_true; 187 | unsigned int* d_scan_false; 188 | 189 | checkCudaErrors(cudaMalloc(&d_histo, 2 * sizeof(unsigned int))); 190 | checkCudaErrors(cudaMalloc(&d_pred, numElems*sizeof(unsigned int))); 191 | checkCudaErrors(cudaMalloc(&d_scan_true, numElems * sizeof(unsigned int))); 192 | checkCudaErrors(cudaMalloc(&d_scan_false, numElems * sizeof(unsigned int))); 193 | //for each of the 32 bits 194 | for (size_t i = 0; i < 32; i++) { 195 | 196 | //compute predicate 197 | if (i % 2 == 0)predicate << > > (d_pred, d_inputVals, numElems, i); 198 | else predicate << > > (d_pred, d_outputVals, numElems, i); 199 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 200 | 201 | 202 | 203 | //Exclusive Prefix Sum of 2-bins histogram is: [0 numFalse]. 204 | //You can obtain it buy sum-reduce on predicate: equivalent to last sumBlock of BiellochScan 205 | 206 | //Compute offset of positives 207 | //Bielloch scan 208 | unsigned int number_trues=biellochScan(d_scan_true, d_pred, numElems); 209 | 210 | //Flip bits 211 | negatePredicate << > > (d_pred, numElems); 212 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 213 | 214 | //Compute offset of negatives 215 | unsigned int number_falses=biellochScan(d_scan_false, d_pred, numElems); 216 | 217 | h_histo[1] = number_falses; 218 | checkCudaErrors(cudaMemcpy(d_histo, h_histo, 2 * sizeof(unsigned int), cudaMemcpyHostToDevice)); 219 | 220 | //Moving elements and indices 221 | if (i % 2 == 0) { 222 | moveElements << > > (d_outputVals, d_inputVals, d_histo, d_pred, d_scan_true, d_scan_false, numElems); 223 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 224 | moveElements << > > (d_outputPos, d_inputPos, d_histo, d_pred, d_scan_true, d_scan_false, numElems); 225 | 226 | } 227 | else { 228 | moveElements << > > (d_inputVals, d_outputVals, d_histo, d_pred, d_scan_true, d_scan_false, numElems); 229 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 230 | moveElements << > > (d_inputPos, d_outputPos, d_histo, d_pred, d_scan_true, d_scan_false, numElems); 231 | 232 | } 233 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 234 | 235 | } 236 | 237 | //Copy result into d_outputVals 238 | checkCudaErrors(cudaMemcpy(d_outputVals, d_inputVals, numElems * sizeof(unsigned int), cudaMemcpyDeviceToDevice)); 239 | checkCudaErrors(cudaMemcpy(d_outputPos, d_inputPos, numElems * sizeof(unsigned int), cudaMemcpyDeviceToDevice)); 240 | 241 | 242 | checkCudaErrors(cudaFree(d_histo)); 243 | checkCudaErrors(cudaFree(d_pred)); 244 | checkCudaErrors(cudaFree(d_scan_true)); 245 | checkCudaErrors(cudaFree(d_scan_false)); 246 | 247 | } 248 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/timer.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_TIMER_H__ 2 | #define GPU_TIMER_H__ 3 | 4 | #include 5 | 6 | struct GpuTimer 7 | { 8 | cudaEvent_t start; 9 | cudaEvent_t stop; 10 | 11 | GpuTimer() 12 | { 13 | cudaEventCreate(&start); 14 | cudaEventCreate(&stop); 15 | } 16 | 17 | ~GpuTimer() 18 | { 19 | cudaEventDestroy(start); 20 | cudaEventDestroy(stop); 21 | } 22 | 23 | void Start() 24 | { 25 | cudaEventRecord(start, 0); 26 | } 27 | 28 | void Stop() 29 | { 30 | cudaEventRecord(stop, 0); 31 | } 32 | 33 | float Elapsed() 34 | { 35 | float elapsed; 36 | cudaEventSynchronize(stop); 37 | cudaEventElapsedTime(&elapsed, start, stop); 38 | return elapsed; 39 | } 40 | }; 41 | 42 | #endif /* GPU_TIMER_H__ */ 43 | -------------------------------------------------------------------------------- /ProblemSet4-RedEyeRemoval/utils.h: -------------------------------------------------------------------------------- 1 | #ifndef UTILS_H__ 2 | #define UTILS_H__ 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__) 14 | 15 | template 16 | void check(T err, const char* const func, const char* const file, const int line) { 17 | if (err != cudaSuccess) { 18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl; 19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl; 20 | exit(1); 21 | } 22 | } 23 | 24 | template 25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) { 26 | //check that the GPU result matches the CPU result 27 | for (size_t i = 0; i < numElem; ++i) { 28 | if (ref[i] != gpu[i]) { 29 | std::cerr << "Difference at pos " << i << std::endl; 30 | //the + is magic to convert char to int without messing 31 | //with other types 32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 33 | "\nGPU : " << +gpu[i] << std::endl; 34 | exit(1); 35 | } 36 | } 37 | } 38 | 39 | template 40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) { 41 | assert(eps1 >= 0 && eps2 >= 0); 42 | unsigned long long totalDiff = 0; 43 | unsigned numSmallDifferences = 0; 44 | for (size_t i = 0; i < numElem; ++i) { 45 | //subtract smaller from larger in case of unsigned types 46 | T smaller = std::min(ref[i], gpu[i]); 47 | T larger = std::max(ref[i], gpu[i]); 48 | T diff = larger - smaller; 49 | if (diff > 0 && diff <= eps1) { 50 | numSmallDifferences++; 51 | } 52 | else if (diff > eps1) { 53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl; 54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 55 | "\nGPU : " << +gpu[i] << std::endl; 56 | exit(1); 57 | } 58 | totalDiff += diff * diff; 59 | } 60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem; 61 | if (percentSmallDifferences > eps2) { 62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl; 63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl; 64 | exit(1); 65 | } 66 | } 67 | 68 | //Uses the autodesk method of image comparison 69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels 70 | template 71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance) 72 | { 73 | 74 | size_t numBadPixels = 0; 75 | for (size_t i = 0; i < numElem; ++i) { 76 | T smaller = std::min(ref[i], gpu[i]); 77 | T larger = std::max(ref[i], gpu[i]); 78 | T diff = larger - smaller; 79 | if (diff > variance) 80 | ++numBadPixels; 81 | } 82 | 83 | if (numBadPixels > tolerance) { 84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl; 85 | exit(1); 86 | } 87 | } 88 | 89 | #endif 90 | -------------------------------------------------------------------------------- /ProblemSet5-OptimizedHistogram/ProblemSet5-OptimizedHistogram.vcxproj: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | 5 | Debug 6 | Win32 7 | 8 | 9 | Debug 10 | x64 11 | 12 | 13 | Release 14 | Win32 15 | 16 | 17 | Release 18 | x64 19 | 20 | 21 | 22 | {0FDE11C1-E4D4-4E2C-B6B0-E39417A02491} 23 | ProblemSet5_OptimizedHistogram 24 | 25 | 26 | 27 | Application 28 | true 29 | MultiByte 30 | v140 31 | 32 | 33 | Application 34 | true 35 | MultiByte 36 | v140 37 | 38 | 39 | Application 40 | false 41 | true 42 | MultiByte 43 | v140 44 | 45 | 46 | Application 47 | false 48 | true 49 | MultiByte 50 | v140 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | true 71 | 72 | 73 | true 74 | 75 | 76 | 77 | Level3 78 | Disabled 79 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 80 | 81 | 82 | true 83 | Console 84 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 85 | 86 | 87 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 88 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 89 | 90 | 91 | 92 | 93 | Level3 94 | Disabled 95 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 96 | 97 | 98 | true 99 | Console 100 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 101 | 102 | 103 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 104 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 105 | 106 | 107 | 64 108 | 109 | 110 | 111 | 112 | Level3 113 | MaxSpeed 114 | true 115 | true 116 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 117 | 118 | 119 | true 120 | true 121 | true 122 | Console 123 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 124 | 125 | 126 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 127 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 128 | 129 | 130 | 131 | 132 | Level3 133 | MaxSpeed 134 | true 135 | true 136 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 137 | 138 | 139 | true 140 | true 141 | true 142 | Console 143 | cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies) 144 | 145 | 146 | echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 147 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)" 148 | 149 | 150 | 64 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | -------------------------------------------------------------------------------- /ProblemSet5-OptimizedHistogram/main.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "utils.h" 6 | #include "timer.h" 7 | #include 8 | #if defined(_WIN16) || defined(_WIN32) || defined(_WIN64) 9 | #include 10 | #else 11 | #include 12 | #endif 13 | 14 | #include 15 | #include 16 | #include 17 | 18 | #include "reference_calc.h" 19 | 20 | void computeHistogram(const unsigned int *const d_vals, 21 | unsigned int* const d_histo, 22 | const unsigned int numBins, 23 | const unsigned int numElems); 24 | 25 | int main(void) 26 | { 27 | const unsigned int numBins = 1024; 28 | const unsigned int numElems = 10000 * numBins; 29 | const float stddev = 100.f; 30 | 31 | unsigned int *vals = new unsigned int[numElems]; 32 | unsigned int *h_vals = new unsigned int[numElems]; 33 | unsigned int *h_studentHisto = new unsigned int[numBins]; 34 | unsigned int *h_refHisto = new unsigned int[numBins]; 35 | 36 | #if defined(_WIN16) || defined(_WIN32) || defined(_WIN64) 37 | srand(GetTickCount()); 38 | #else 39 | timeval tv; 40 | gettimeofday(&tv, NULL); 41 | 42 | srand(tv.tv_usec); 43 | #endif 44 | 45 | //make the mean unpredictable, but close enough to the middle 46 | //so that timings are unaffected 47 | unsigned int mean = rand() % 100 + 462; 48 | 49 | //Output mean so that grading can happen with the same inputs 50 | std::cout << mean << std::endl; 51 | 52 | thrust::minstd_rand rng; 53 | 54 | thrust::random::normal_distribution normalDist((float)mean, stddev); 55 | 56 | 57 | 58 | // Generate the random values 59 | for (size_t i = 0; i < numElems; ++i) { 60 | vals[i] = std::min((unsigned int) std::max((int)normalDist(rng), 0), numBins - 1); 61 | } 62 | 63 | unsigned int *d_vals, *d_histo; 64 | 65 | GpuTimer timer; 66 | 67 | checkCudaErrors(cudaMalloc(&d_vals, sizeof(unsigned int) * numElems)); 68 | checkCudaErrors(cudaMalloc(&d_histo, sizeof(unsigned int) * numBins)); 69 | checkCudaErrors(cudaMemset(d_histo, 0, sizeof(unsigned int) * numBins)); 70 | 71 | checkCudaErrors(cudaMemcpy(d_vals, vals, sizeof(unsigned int) * numElems, cudaMemcpyHostToDevice)); 72 | 73 | timer.Start(); 74 | computeHistogram(d_vals, d_histo, numBins, numElems); 75 | timer.Stop(); 76 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed()); 77 | 78 | if (err < 0) { 79 | //Couldn't print! Probably the student closed stdout - bad news 80 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl; 81 | exit(1); 82 | } 83 | 84 | // copy the student-computed histogram back to the host 85 | checkCudaErrors(cudaMemcpy(h_studentHisto, d_histo, sizeof(unsigned int) * numBins, cudaMemcpyDeviceToHost)); 86 | 87 | //generate reference for the given mean 88 | reference_calculation(vals, h_refHisto, numBins, numElems); 89 | 90 | //Now do the comparison 91 | checkResultsExact(h_refHisto, h_studentHisto, numBins); 92 | 93 | delete[] h_vals; 94 | delete[] h_refHisto; 95 | delete[] h_studentHisto; 96 | 97 | cudaFree(d_vals); 98 | cudaFree(d_histo); 99 | 100 | return 0; 101 | } 102 | -------------------------------------------------------------------------------- /ProblemSet5-OptimizedHistogram/reference_calc.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | //Reference Histogram calculation 3 | 4 | void reference_calculation(const unsigned int* const vals, 5 | unsigned int* const histo, 6 | const size_t numBins, 7 | const size_t numElems) 8 | 9 | { 10 | //zero out bins 11 | for (size_t i = 0; i < numBins; ++i) 12 | histo[i] = 0; 13 | 14 | //go through vals and increment appropriate bin 15 | for (size_t i = 0; i < numElems; ++i) 16 | histo[vals[i]]++; 17 | } 18 | -------------------------------------------------------------------------------- /ProblemSet5-OptimizedHistogram/reference_calc.h: -------------------------------------------------------------------------------- 1 | #ifndef REFERENCE_H__ 2 | #define REFERENCE_H__ 3 | 4 | //Reference Histogram calculation 5 | 6 | void reference_calculation(const unsigned int* const vals, 7 | unsigned int* const histo, 8 | const size_t numBins, 9 | const size_t numElems); 10 | 11 | #endif -------------------------------------------------------------------------------- /ProblemSet5-OptimizedHistogram/student.cu: -------------------------------------------------------------------------------- 1 | /* Udacity HW5 2 | Histogramming for Speed 3 | 4 | The goal of this assignment is compute a histogram 5 | as fast as possible. We have simplified the problem as much as 6 | possible to allow you to focus solely on the histogramming algorithm. 7 | 8 | The input values that you need to histogram are already the exact 9 | bins that need to be updated. This is unlike in HW3 where you needed 10 | to compute the range of the data and then do: 11 | bin = (val - valMin) / valRange to determine the bin. 12 | 13 | Here the bin is just: 14 | bin = val 15 | 16 | so the serial histogram calculation looks like: 17 | for (i = 0; i < numElems; ++i) 18 | histo[val[i]]++; 19 | 20 | That's it! Your job is to make it run as fast as possible! 21 | 22 | The values are normally distributed - you may take 23 | advantage of this fact in your implementation. 24 | 25 | */ 26 | 27 | 28 | #include "utils.h" 29 | #include "device_launch_parameters.h" 30 | #include 31 | 32 | const int N_THREADS = 1024; 33 | 34 | 35 | 36 | __global__ 37 | void naiveHisto(const unsigned int* const vals, //INPUT 38 | unsigned int* const histo, //OUPUT 39 | int numVals) 40 | { 41 | int tid = threadIdx.x; 42 | int global_id = tid + blockDim.x*blockIdx.x; 43 | if (global_id >= numVals) return; 44 | atomicAdd(&(histo[vals[global_id]]), 1); 45 | } 46 | 47 | __global__ 48 | void perBlockHisto(const unsigned int* const vals, //INPUT 49 | unsigned int* const histo, //OUPUT 50 | int numVals,int numBins) { 51 | 52 | extern __shared__ unsigned int sharedHisto[]; //size as original histo 53 | 54 | //coalesced initialization: multiple blocks could manage the same shared histo 55 | for (int i = threadIdx.x; i < numBins; i += blockDim.x) { 56 | sharedHisto[i] = 0; 57 | } 58 | 59 | __syncthreads(); 60 | 61 | int globalid = threadIdx.x + blockIdx.x*blockDim.x; 62 | atomicAdd(&sharedHisto[vals[globalid]], 1); 63 | 64 | __syncthreads(); 65 | 66 | for (int i = threadIdx.x; i < numBins; i += blockDim.x) { 67 | atomicAdd(&histo[i], sharedHisto[i]); 68 | } 69 | 70 | 71 | } 72 | 73 | 74 | 75 | void computeHistogram(const unsigned int* const d_vals, //INPUT 76 | unsigned int* const d_histo, //OUTPUT 77 | const unsigned int numBins, 78 | const unsigned int numElems) 79 | { 80 | //TODO Launch the yourHisto kernel 81 | 82 | int blocks = ceil(numElems / N_THREADS); 83 | 84 | //naiveHisto <<< blocks, N_THREADS >>> (d_vals, d_histo, numElems); 85 | 86 | 87 | //more than 7x speedup over naiveHisto 88 | perBlockHisto << > > (d_vals, d_histo, numElems, numBins); 89 | 90 | //if you want to use/launch more than one kernel, 91 | //feel free 92 | 93 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 94 | } 95 | -------------------------------------------------------------------------------- /ProblemSet5-OptimizedHistogram/timer.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_TIMER_H__ 2 | #define GPU_TIMER_H__ 3 | 4 | #include 5 | 6 | struct GpuTimer 7 | { 8 | cudaEvent_t start; 9 | cudaEvent_t stop; 10 | 11 | GpuTimer() 12 | { 13 | cudaEventCreate(&start); 14 | cudaEventCreate(&stop); 15 | } 16 | 17 | ~GpuTimer() 18 | { 19 | cudaEventDestroy(start); 20 | cudaEventDestroy(stop); 21 | } 22 | 23 | void Start() 24 | { 25 | cudaEventRecord(start, 0); 26 | } 27 | 28 | void Stop() 29 | { 30 | cudaEventRecord(stop, 0); 31 | } 32 | 33 | float Elapsed() 34 | { 35 | float elapsed; 36 | cudaEventSynchronize(stop); 37 | cudaEventElapsedTime(&elapsed, start, stop); 38 | return elapsed; 39 | } 40 | }; 41 | 42 | #endif /* GPU_TIMER_H__ */ 43 | -------------------------------------------------------------------------------- /ProblemSet5-OptimizedHistogram/utils.h: -------------------------------------------------------------------------------- 1 | #ifndef UTILS_H__ 2 | #define UTILS_H__ 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__) 14 | 15 | template 16 | void check(T err, const char* const func, const char* const file, const int line) { 17 | if (err != cudaSuccess) { 18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl; 19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl; 20 | exit(1); 21 | } 22 | } 23 | 24 | template 25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) { 26 | //check that the GPU result matches the CPU result 27 | for (size_t i = 0; i < numElem; ++i) { 28 | if (ref[i] != gpu[i]) { 29 | std::cerr << "Difference at pos " << i << std::endl; 30 | //the + is magic to convert char to int without messing 31 | //with other types 32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 33 | "\nGPU : " << +gpu[i] << std::endl; 34 | exit(1); 35 | } 36 | } 37 | } 38 | 39 | template 40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) { 41 | assert(eps1 >= 0 && eps2 >= 0); 42 | unsigned long long totalDiff = 0; 43 | unsigned numSmallDifferences = 0; 44 | for (size_t i = 0; i < numElem; ++i) { 45 | //subtract smaller from larger in case of unsigned types 46 | T smaller = std::min(ref[i], gpu[i]); 47 | T larger = std::max(ref[i], gpu[i]); 48 | T diff = larger - smaller; 49 | if (diff > 0 && diff <= eps1) { 50 | numSmallDifferences++; 51 | } 52 | else if (diff > eps1) { 53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl; 54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 55 | "\nGPU : " << +gpu[i] << std::endl; 56 | exit(1); 57 | } 58 | totalDiff += diff * diff; 59 | } 60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem; 61 | if (percentSmallDifferences > eps2) { 62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl; 63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl; 64 | exit(1); 65 | } 66 | } 67 | 68 | //Uses the autodesk method of image comparison 69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels 70 | template 71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance) 72 | { 73 | 74 | size_t numBadPixels = 0; 75 | for (size_t i = 0; i < numElem; ++i) { 76 | T smaller = std::min(ref[i], gpu[i]); 77 | T larger = std::max(ref[i], gpu[i]); 78 | T diff = larger - smaller; 79 | if (diff > variance) 80 | ++numBadPixels; 81 | } 82 | 83 | if (numBadPixels > tolerance) { 84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl; 85 | exit(1); 86 | } 87 | } 88 | 89 | #endif 90 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/HW6.cu: -------------------------------------------------------------------------------- 1 | #include "utils.h" 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include "loadSaveImage.h" 8 | #include 9 | 10 | 11 | //return types are void since any internal error will be handled by quitting 12 | //no point in returning error codes... 13 | void preProcess( uchar4 **sourceImg, 14 | size_t &numRows, size_t &numCols, 15 | uchar4 **destImg, 16 | uchar4 **blendedImg, const std::string& source_filename, 17 | const std::string& dest_filename){ 18 | 19 | //make sure the context initializes ok 20 | checkCudaErrors(cudaFree(0)); 21 | 22 | size_t numRowsSource, numColsSource, numRowsDest, numColsDest; 23 | 24 | loadImageRGBA(source_filename, sourceImg, &numRowsSource, &numColsSource); 25 | loadImageRGBA(dest_filename, destImg, &numRowsDest, &numColsDest); 26 | 27 | assert(numRowsSource == numRowsDest); 28 | assert(numColsSource == numColsDest); 29 | 30 | numRows = numRowsSource; 31 | numCols = numColsSource; 32 | 33 | *blendedImg = new uchar4[numRows * numCols]; 34 | 35 | } 36 | 37 | void postProcess(const uchar4* const blendedImg, 38 | const size_t numRowsDest, const size_t numColsDest, 39 | const std::string& output_file) 40 | { 41 | //just need to save the image... 42 | saveImageRGBA(blendedImg, numRowsDest, numColsDest, output_file); 43 | } 44 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/HW6_differenceImage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_differenceImage.png -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/HW6_output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_output.png -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/HW6_reference.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/HW6_reference.png -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/blended.gold: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/blended.gold -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/compare.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include "utils.h" 3 | 4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck, 5 | double perPixelError, double globalError) 6 | { 7 | cv::Mat reference = cv::imread(reference_filename, -1); 8 | cv::Mat test = cv::imread(test_filename, -1); 9 | 10 | cv::Mat diff = abs(reference - test); 11 | 12 | cv::Mat diffSingleChannel = diff.reshape(1, 0); //convert to 1 channel, same # rows 13 | 14 | double minVal, maxVal; 15 | 16 | cv::minMaxLoc(diffSingleChannel, &minVal, &maxVal, NULL, NULL); //NULL because we don't care about location 17 | 18 | //now perform transform so that we bump values to the full range 19 | 20 | diffSingleChannel = (diffSingleChannel - minVal) * (255. / (maxVal - minVal)); 21 | 22 | diff = diffSingleChannel.reshape(reference.channels(), 0); 23 | 24 | cv::imwrite("HW6_differenceImage.png", diff); 25 | //OK, now we can start comparing values... 26 | unsigned char *referencePtr = reference.ptr(0); 27 | unsigned char *testPtr = test.ptr(0); 28 | 29 | if (useEpsCheck) { 30 | checkResultsEps(referencePtr, testPtr, reference.rows * reference.cols * reference.channels(), perPixelError, globalError); 31 | } 32 | else 33 | { 34 | checkResultsExact(referencePtr, testPtr, reference.rows * reference.cols * reference.channels()); 35 | } 36 | 37 | std::cout << "PASS" << std::endl; 38 | return; 39 | } 40 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/compare.h: -------------------------------------------------------------------------------- 1 | #ifndef HW3_H__ 2 | #define HW3_H__ 3 | 4 | void compareImages(std::string reference_filename, std::string test_filename, bool useEpsCheck, 5 | double perPixelError, double globalError); 6 | 7 | #endif 8 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/destination.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/destination.png -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/loadSaveImage.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "cuda_runtime.h" 6 | 7 | //The caller becomes responsible for the returned pointer. This 8 | //is done in the interest of keeping this code as simple as possible. 9 | //In production code this is a bad idea - we should use RAII 10 | //to ensure the memory is freed. DO NOT COPY THIS AND USE IN PRODUCTION 11 | //CODE!!! 12 | void loadImageHDR(const std::string &filename, 13 | float **imagePtr, 14 | size_t *numRows, size_t *numCols) 15 | { 16 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH); 17 | if (image.empty()) { 18 | std::cerr << "Couldn't open file: " << filename << std::endl; 19 | exit(1); 20 | } 21 | 22 | if (image.channels() != 3) { 23 | std::cerr << "Image must be color!" << std::endl; 24 | exit(1); 25 | } 26 | 27 | if (!image.isContinuous()) { 28 | std::cerr << "Image isn't continuous!" << std::endl; 29 | exit(1); 30 | } 31 | 32 | *imagePtr = new float[image.rows * image.cols * image.channels()]; 33 | 34 | float *cvPtr = image.ptr(0); 35 | for (size_t i = 0; i < image.rows * image.cols * image.channels(); ++i) 36 | (*imagePtr)[i] = cvPtr[i]; 37 | 38 | *numRows = image.rows; 39 | *numCols = image.cols; 40 | } 41 | 42 | void loadImageGrey(const std::string &filename, 43 | unsigned char **imagePtr, 44 | size_t *numRows, size_t *numCols) 45 | { 46 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_GRAYSCALE); 47 | if (image.empty()) { 48 | std::cerr << "Couldn't open file: " << filename << std::endl; 49 | exit(1); 50 | } 51 | 52 | if (image.channels() != 1) { 53 | std::cerr << "Image must be greyscale!" << std::endl; 54 | exit(1); 55 | } 56 | 57 | if (!image.isContinuous()) { 58 | std::cerr << "Image isn't continuous!" << std::endl; 59 | exit(1); 60 | } 61 | 62 | *imagePtr = new unsigned char[image.rows * image.cols]; 63 | 64 | unsigned char *cvPtr = image.ptr(0); 65 | for (size_t i = 0; i < image.rows * image.cols; ++i) { 66 | (*imagePtr)[i] = cvPtr[i]; 67 | } 68 | 69 | *numRows = image.rows; 70 | *numCols = image.cols; 71 | } 72 | void loadImageRGBA(const std::string &filename, 73 | uchar4 **imagePtr, 74 | size_t *numRows, size_t *numCols) 75 | { 76 | cv::Mat image = cv::imread(filename.c_str(), CV_LOAD_IMAGE_COLOR); 77 | if (image.empty()) { 78 | std::cerr << "Couldn't open file: " << filename << std::endl; 79 | exit(1); 80 | } 81 | 82 | if (image.channels() != 3) { 83 | std::cerr << "Image must be color!" << std::endl; 84 | exit(1); 85 | } 86 | 87 | if (!image.isContinuous()) { 88 | std::cerr << "Image isn't continuous!" << std::endl; 89 | exit(1); 90 | } 91 | 92 | cv::Mat imageRGBA; 93 | cv::cvtColor(image, imageRGBA, CV_BGR2RGBA); 94 | 95 | *imagePtr = new uchar4[image.rows * image.cols]; 96 | 97 | unsigned char *cvPtr = imageRGBA.ptr(0); 98 | for (size_t i = 0; i < image.rows * image.cols; ++i) { 99 | (*imagePtr)[i].x = cvPtr[4 * i + 0]; 100 | (*imagePtr)[i].y = cvPtr[4 * i + 1]; 101 | (*imagePtr)[i].z = cvPtr[4 * i + 2]; 102 | (*imagePtr)[i].w = cvPtr[4 * i + 3]; 103 | } 104 | 105 | *numRows = image.rows; 106 | *numCols = image.cols; 107 | } 108 | 109 | void saveImageRGBA(const uchar4* const image, 110 | const size_t numRows, const size_t numCols, 111 | const std::string &output_file) 112 | { 113 | int sizes[2]; 114 | sizes[0] = numRows; 115 | sizes[1] = numCols; 116 | cv::Mat imageRGBA(2, sizes, CV_8UC4, (void *)image); 117 | cv::Mat imageOutputBGR; 118 | cv::cvtColor(imageRGBA, imageOutputBGR, CV_RGBA2BGR); 119 | //output the image 120 | cv::imwrite(output_file.c_str(), imageOutputBGR); 121 | } 122 | 123 | //output an exr file 124 | //assumed to already be BGR 125 | void saveImageHDR(const float* const image, 126 | const size_t numRows, const size_t numCols, 127 | const std::string &output_file) 128 | { 129 | int sizes[2]; 130 | sizes[0] = numRows; 131 | sizes[1] = numCols; 132 | 133 | cv::Mat imageHDR(2, sizes, CV_32FC3, (void *)image); 134 | 135 | imageHDR = imageHDR * 255; 136 | 137 | cv::imwrite(output_file.c_str(), imageHDR); 138 | } 139 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/loadSaveImage.h: -------------------------------------------------------------------------------- 1 | #ifndef LOADSAVEIMAGE_H__ 2 | #define LOADSAVEIMAGE_H__ 3 | 4 | #include 5 | #include //for uchar4 6 | 7 | void loadImageHDR(const std::string &filename, 8 | float **imagePtr, 9 | size_t *numRows, size_t *numCols); 10 | 11 | void loadImageRGBA(const std::string &filename, 12 | uchar4 **imagePtr, 13 | size_t *numRows, size_t *numCols); 14 | 15 | void loadImageGrey(const std::string &filename, 16 | unsigned char **imagePtr, 17 | size_t *numRows, size_t *numCols); 18 | 19 | void saveImageRGBA(const uchar4* const image, 20 | const size_t numRows, const size_t numCols, 21 | const std::string &output_file); 22 | 23 | void saveImageHDR(const float* const image, 24 | const size_t numRows, const size_t numCols, 25 | const std::string &output_file); 26 | 27 | #endif 28 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/main.cpp: -------------------------------------------------------------------------------- 1 | //Udacity HW6 Driver 2 | 3 | #include 4 | #include "timer.h" 5 | #include "utils.h" 6 | #include 7 | #include 8 | 9 | #include 10 | #include 11 | #include 12 | 13 | #include "reference_calc.h" 14 | #include "compare.h" 15 | 16 | void preProcess( uchar4 **sourceImg, size_t &numRowsSource, size_t &numColsSource, 17 | uchar4 **destImg, 18 | uchar4 **blendedImg, const std::string& source_filename, 19 | const std::string& dest_filename); 20 | 21 | void postProcess(const uchar4* const blendedImg, 22 | const size_t numRowsDest, const size_t numColsDest, 23 | const std::string& output_file); 24 | 25 | void your_blend(const uchar4* const sourceImg, 26 | const size_t numRowsSource, const size_t numColsSource, 27 | const uchar4* const destImg, 28 | uchar4* const blendedImg); 29 | 30 | int main(int argc, char **argv) { 31 | uchar4 *h_sourceImg, *h_destImg, *h_blendedImg; 32 | size_t numRowsSource, numColsSource; 33 | 34 | std::string input_source_file; 35 | std::string input_dest_file; 36 | std::string output_file; 37 | 38 | std::string reference_file; 39 | double perPixelError = 0.0; 40 | double globalError = 0.0; 41 | bool useEpsCheck = false; 42 | 43 | switch (argc) 44 | { 45 | case 3: 46 | input_source_file = std::string(argv[1]); 47 | input_dest_file = std::string(argv[2]); 48 | output_file = "HW6_output.png"; 49 | reference_file = "HW6_reference.png"; 50 | break; 51 | case 4: 52 | input_source_file = std::string(argv[1]); 53 | input_dest_file = std::string(argv[2]); 54 | output_file = std::string(argv[3]); 55 | reference_file = "HW6_reference.png"; 56 | break; 57 | case 5: 58 | input_source_file = std::string(argv[1]); 59 | input_dest_file = std::string(argv[2]); 60 | output_file = std::string(argv[3]); 61 | reference_file = std::string(argv[4]); 62 | break; 63 | case 7: 64 | useEpsCheck=true; 65 | input_source_file = std::string(argv[1]); 66 | input_dest_file = std::string(argv[2]); 67 | output_file = std::string(argv[3]); 68 | reference_file = std::string(argv[4]); 69 | perPixelError = atof(argv[5]); 70 | globalError = atof(argv[6]); 71 | break; 72 | default: 73 | std::cerr << "Usage: ./HW6 input_source_file input_dest_filename [output_filename] [reference_filename] [perPixelError] [globalError]" << std::endl; 74 | exit(1); 75 | } 76 | 77 | //load the image and give us our input and output pointers 78 | preProcess(&h_sourceImg, numRowsSource, numColsSource, 79 | &h_destImg, 80 | &h_blendedImg, input_source_file, input_dest_file); 81 | 82 | GpuTimer timer; 83 | timer.Start(); 84 | 85 | //call the students' code 86 | your_blend(h_sourceImg, numRowsSource, numColsSource, 87 | h_destImg, 88 | h_blendedImg); 89 | 90 | timer.Stop(); 91 | cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); 92 | int err = printf("Your code ran in: %f msecs.\n", timer.Elapsed()); 93 | printf("\n"); 94 | if (err < 0) { 95 | //Couldn't print! Probably the student closed stdout - bad news 96 | std::cerr << "Couldn't print timing information! STDOUT Closed!" << std::endl; 97 | exit(1); 98 | } 99 | 100 | //check results and output the tone-mapped image 101 | postProcess(h_blendedImg, numRowsSource, numColsSource, output_file); 102 | 103 | // calculate the reference image 104 | uchar4* h_reference = new uchar4[numRowsSource*numColsSource]; 105 | reference_calc(h_sourceImg, numRowsSource, numColsSource, 106 | h_destImg, h_reference); 107 | 108 | // save the reference image 109 | postProcess(h_reference, numRowsSource, numColsSource, reference_file); 110 | 111 | compareImages(reference_file, output_file, useEpsCheck, perPixelError, globalError); 112 | 113 | delete[] h_reference; 114 | delete[] h_destImg; 115 | delete[] h_sourceImg; 116 | delete[] h_blendedImg; 117 | return 0; 118 | } 119 | 120 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/reference_calc.cpp: -------------------------------------------------------------------------------- 1 | //Udacity HW 6 2 | //Poisson Blending Reference Calculation 3 | 4 | #include "utils.h" 5 | #include 6 | 7 | //Performs one iteration of the solver 8 | void computeIteration(const unsigned char* const dstImg, 9 | const unsigned char* const strictInteriorPixels, 10 | const unsigned char* const borderPixels, 11 | const std::vector& interiorPixelList, 12 | const size_t numColsSource, 13 | const float* const f, 14 | const float* const g, 15 | float* const f_next) 16 | { 17 | unsigned int off = interiorPixelList[0].x * numColsSource + interiorPixelList[0].y; 18 | 19 | for (size_t i = 0; i < interiorPixelList.size(); ++i) { 20 | float blendedSum = 0.f; 21 | float borderSum = 0.f; 22 | 23 | uint2 coord = interiorPixelList[i]; 24 | 25 | unsigned int offset = coord.x * numColsSource + coord.y; 26 | 27 | //process all 4 neighbor pixels 28 | //for each pixel if it is an interior pixel 29 | //then we add the previous f, otherwise if it is a 30 | //border pixel then we add the value of the destination 31 | //image at the border. These border values are our boundary 32 | //conditions. 33 | if (strictInteriorPixels[offset - 1]) { 34 | blendedSum += f[offset - 1]; 35 | } 36 | else { 37 | borderSum += dstImg[offset - 1]; 38 | } 39 | 40 | if (strictInteriorPixels[offset + 1]) { 41 | blendedSum += f[offset + 1]; 42 | } 43 | else { 44 | borderSum += dstImg[offset + 1]; 45 | } 46 | 47 | if (strictInteriorPixels[offset - numColsSource]) { 48 | blendedSum += f[offset - numColsSource]; 49 | } 50 | else { 51 | borderSum += dstImg[offset - numColsSource]; 52 | } 53 | 54 | if (strictInteriorPixels[offset + numColsSource]) { 55 | blendedSum += f[offset + numColsSource]; 56 | } 57 | else { 58 | borderSum += dstImg[offset + numColsSource]; 59 | } 60 | 61 | float f_next_val = (blendedSum + borderSum + g[offset]) / 4.f; 62 | 63 | f_next[offset] = std::min(255.f, std::max(0.f, f_next_val)); //clip to [0, 255] 64 | } 65 | 66 | } 67 | 68 | //pre-compute the values of g, which depend only the source image 69 | //and aren't iteration dependent. 70 | void computeG(const unsigned char* const channel, 71 | float* const g, 72 | const size_t numColsSource, 73 | const std::vector& interiorPixelList) 74 | { 75 | for (size_t i = 0; i < interiorPixelList.size(); ++i) { 76 | uint2 coord = interiorPixelList[i]; 77 | unsigned int offset = coord.x * numColsSource + coord.y; 78 | 79 | float sum = 4.f * channel[offset]; 80 | 81 | sum -= (float)channel[offset - 1] + (float)channel[offset + 1]; 82 | sum -= (float)channel[offset + numColsSource] + (float)channel[offset - numColsSource]; 83 | 84 | g[offset] = sum; 85 | } 86 | } 87 | 88 | void reference_calc(const uchar4* const h_sourceImg, 89 | const size_t numRowsSource, const size_t numColsSource, 90 | const uchar4* const h_destImg, 91 | uchar4* const h_blendedImg){ 92 | 93 | //we need to create a list of border pixels and interior pixels 94 | //this is a conceptually simple implementation, not a particularly efficient one... 95 | 96 | //first create mask 97 | size_t srcSize = numRowsSource * numColsSource; 98 | unsigned char* mask = new unsigned char[srcSize]; 99 | 100 | for (int i = 0; i < srcSize; ++i) { 101 | mask[i] = (h_sourceImg[i].x + h_sourceImg[i].y + h_sourceImg[i].z < 3 * 255) ? 1 : 0; 102 | } 103 | 104 | //next compute strictly interior pixels and border pixels 105 | unsigned char *borderPixels = new unsigned char[srcSize]; 106 | unsigned char *strictInteriorPixels = new unsigned char[srcSize]; 107 | 108 | std::vector interiorPixelList; 109 | 110 | //the source region in the homework isn't near an image boundary, so we can 111 | //simplify the conditionals a little... 112 | for (size_t r = 1; r < numRowsSource - 1; ++r) { 113 | for (size_t c = 1; c < numColsSource - 1; ++c) { 114 | if (mask[r * numColsSource + c]) { 115 | if (mask[(r -1) * numColsSource + c] && mask[(r + 1) * numColsSource + c] && 116 | mask[r * numColsSource + c - 1] && mask[r * numColsSource + c + 1]) { 117 | strictInteriorPixels[r * numColsSource + c] = 1; 118 | borderPixels[r * numColsSource + c] = 0; 119 | interiorPixelList.push_back(make_uint2(r, c)); 120 | } 121 | else { 122 | strictInteriorPixels[r * numColsSource + c] = 0; 123 | borderPixels[r * numColsSource + c] = 1; 124 | } 125 | } 126 | else { 127 | strictInteriorPixels[r * numColsSource + c] = 0; 128 | borderPixels[r * numColsSource + c] = 0; 129 | 130 | } 131 | } 132 | } 133 | 134 | //split the source and destination images into their respective 135 | //channels 136 | unsigned char* red_src = new unsigned char[srcSize]; 137 | unsigned char* blue_src = new unsigned char[srcSize]; 138 | unsigned char* green_src = new unsigned char[srcSize]; 139 | 140 | for (int i = 0; i < srcSize; ++i) { 141 | red_src[i] = h_sourceImg[i].x; 142 | blue_src[i] = h_sourceImg[i].y; 143 | green_src[i] = h_sourceImg[i].z; 144 | } 145 | 146 | unsigned char* red_dst = new unsigned char[srcSize]; 147 | unsigned char* blue_dst = new unsigned char[srcSize]; 148 | unsigned char* green_dst = new unsigned char[srcSize]; 149 | 150 | for (int i = 0; i < srcSize; ++i) { 151 | red_dst[i] = h_destImg[i].x; 152 | blue_dst[i] = h_destImg[i].y; 153 | green_dst[i] = h_destImg[i].z; 154 | } 155 | 156 | //next we'll precompute the g term - it never changes, no need to recompute every iteration 157 | float *g_red = new float[srcSize]; 158 | float *g_blue = new float[srcSize]; 159 | float *g_green = new float[srcSize]; 160 | 161 | memset(g_red, 0, srcSize * sizeof(float)); 162 | memset(g_blue, 0, srcSize * sizeof(float)); 163 | memset(g_green, 0, srcSize * sizeof(float)); 164 | 165 | computeG(red_src, g_red, numColsSource, interiorPixelList); 166 | computeG(blue_src, g_blue, numColsSource, interiorPixelList); 167 | computeG(green_src, g_green, numColsSource, interiorPixelList); 168 | 169 | //for each color channel we'll need two buffers and we'll ping-pong between them 170 | float *blendedValsRed_1 = new float[srcSize]; 171 | float *blendedValsRed_2 = new float[srcSize]; 172 | 173 | float *blendedValsBlue_1 = new float[srcSize]; 174 | float *blendedValsBlue_2 = new float[srcSize]; 175 | 176 | float *blendedValsGreen_1 = new float[srcSize]; 177 | float *blendedValsGreen_2 = new float[srcSize]; 178 | 179 | //IC is the source image, copy over 180 | for (size_t i = 0; i < srcSize; ++i) { 181 | blendedValsRed_1[i] = red_src[i]; 182 | blendedValsRed_2[i] = red_src[i]; 183 | blendedValsBlue_1[i] = blue_src[i]; 184 | blendedValsBlue_2[i] = blue_src[i]; 185 | blendedValsGreen_1[i] = green_src[i]; 186 | blendedValsGreen_2[i] = green_src[i]; 187 | } 188 | 189 | //Perform the solve on each color channel 190 | const size_t numIterations = 800; 191 | for (size_t i = 0; i < numIterations; ++i) { 192 | computeIteration(red_dst, strictInteriorPixels, borderPixels, 193 | interiorPixelList, numColsSource, blendedValsRed_1, g_red, 194 | blendedValsRed_2); 195 | 196 | std::swap(blendedValsRed_1, blendedValsRed_2); 197 | } 198 | 199 | for (size_t i = 0; i < numIterations; ++i) { 200 | computeIteration(blue_dst, strictInteriorPixels, borderPixels, 201 | interiorPixelList, numColsSource, blendedValsBlue_1, g_blue, 202 | blendedValsBlue_2); 203 | 204 | std::swap(blendedValsBlue_1, blendedValsBlue_2); 205 | } 206 | 207 | for (size_t i = 0; i < numIterations; ++i) { 208 | computeIteration(green_dst, strictInteriorPixels, borderPixels, 209 | interiorPixelList, numColsSource, blendedValsGreen_1, g_green, 210 | blendedValsGreen_2); 211 | 212 | std::swap(blendedValsGreen_1, blendedValsGreen_2); 213 | } 214 | std::swap(blendedValsRed_1, blendedValsRed_2); //put output into _2 215 | std::swap(blendedValsBlue_1, blendedValsBlue_2); //put output into _2 216 | std::swap(blendedValsGreen_1, blendedValsGreen_2); //put output into _2 217 | 218 | //copy the destination image to the output 219 | memcpy(h_blendedImg, h_destImg, sizeof(uchar4) * srcSize); 220 | 221 | //copy computed values for the interior into the output 222 | for (size_t i = 0; i < interiorPixelList.size(); ++i) { 223 | uint2 coord = interiorPixelList[i]; 224 | 225 | unsigned int offset = coord.x * numColsSource + coord.y; 226 | 227 | h_blendedImg[offset].x = blendedValsRed_2[offset]; 228 | h_blendedImg[offset].y = blendedValsBlue_2[offset]; 229 | h_blendedImg[offset].z = blendedValsGreen_2[offset]; 230 | } 231 | 232 | //wow, we allocated a lot of memory! 233 | delete[] mask; 234 | delete[] blendedValsRed_1; 235 | delete[] blendedValsRed_2; 236 | delete[] blendedValsBlue_1; 237 | delete[] blendedValsBlue_2; 238 | delete[] blendedValsGreen_1; 239 | delete[] blendedValsGreen_2; 240 | delete[] g_red; 241 | delete[] g_blue; 242 | delete[] g_green; 243 | delete[] red_src; 244 | delete[] red_dst; 245 | delete[] blue_src; 246 | delete[] blue_dst; 247 | delete[] green_src; 248 | delete[] green_dst; 249 | delete[] borderPixels; 250 | delete[] strictInteriorPixels; 251 | } 252 | 253 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/reference_calc.h: -------------------------------------------------------------------------------- 1 | #ifndef REFERENCE_H__ 2 | #define REFERENCE_H__ 3 | 4 | void reference_calc(const uchar4* const h_sourceImg, 5 | const size_t numRowsSource, const size_t numColsSource, 6 | const uchar4* const h_destImg, 7 | uchar4* const h_blendedImg); 8 | 9 | #endif 10 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/source.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickspell/udacity-IntroToParallelProgramming/4e5a0a4866ab02831099b4623cb715859178bd99/ProblemSet6-SeamlessImageCloning/source.png -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/timer.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_TIMER_H__ 2 | #define GPU_TIMER_H__ 3 | 4 | #include 5 | 6 | struct GpuTimer 7 | { 8 | cudaEvent_t start; 9 | cudaEvent_t stop; 10 | 11 | GpuTimer() 12 | { 13 | cudaEventCreate(&start); 14 | cudaEventCreate(&stop); 15 | } 16 | 17 | ~GpuTimer() 18 | { 19 | cudaEventDestroy(start); 20 | cudaEventDestroy(stop); 21 | } 22 | 23 | void Start() 24 | { 25 | cudaEventRecord(start, 0); 26 | } 27 | 28 | void Stop() 29 | { 30 | cudaEventRecord(stop, 0); 31 | } 32 | 33 | float Elapsed() 34 | { 35 | float elapsed; 36 | cudaEventSynchronize(stop); 37 | cudaEventElapsedTime(&elapsed, start, stop); 38 | return elapsed; 39 | } 40 | }; 41 | 42 | #endif /* GPU_TIMER_H__ */ 43 | -------------------------------------------------------------------------------- /ProblemSet6-SeamlessImageCloning/utils.h: -------------------------------------------------------------------------------- 1 | #ifndef UTILS_H__ 2 | #define UTILS_H__ 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #define checkCudaErrors(val) check( (val), #val, __FILE__, __LINE__) 14 | 15 | template 16 | void check(T err, const char* const func, const char* const file, const int line) { 17 | if (err != cudaSuccess) { 18 | std::cerr << "CUDA error at: " << file << ":" << line << std::endl; 19 | std::cerr << cudaGetErrorString(err) << " " << func << std::endl; 20 | exit(1); 21 | } 22 | } 23 | 24 | template 25 | void checkResultsExact(const T* const ref, const T* const gpu, size_t numElem) { 26 | //check that the GPU result matches the CPU result 27 | for (size_t i = 0; i < numElem; ++i) { 28 | if (ref[i] != gpu[i]) { 29 | std::cerr << "Difference at pos " << i << std::endl; 30 | //the + is magic to convert char to int without messing 31 | //with other types 32 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 33 | "\nGPU : " << +gpu[i] << std::endl; 34 | exit(1); 35 | } 36 | } 37 | } 38 | 39 | template 40 | void checkResultsEps(const T* const ref, const T* const gpu, size_t numElem, double eps1, double eps2) { 41 | assert(eps1 >= 0 && eps2 >= 0); 42 | unsigned long long totalDiff = 0; 43 | unsigned numSmallDifferences = 0; 44 | for (size_t i = 0; i < numElem; ++i) { 45 | //subtract smaller from larger in case of unsigned types 46 | T smaller = std::min(ref[i], gpu[i]); 47 | T larger = std::max(ref[i], gpu[i]); 48 | T diff = larger - smaller; 49 | if (diff > 0 && diff <= eps1) { 50 | numSmallDifferences++; 51 | } 52 | else if (diff > eps1) { 53 | std::cerr << "Difference at pos " << +i << " exceeds tolerance of " << eps1 << std::endl; 54 | std::cerr << "Reference: " << std::setprecision(17) << +ref[i] << 55 | "\nGPU : " << +gpu[i] << std::endl; 56 | exit(1); 57 | } 58 | totalDiff += diff * diff; 59 | } 60 | double percentSmallDifferences = (double)numSmallDifferences / (double)numElem; 61 | if (percentSmallDifferences > eps2) { 62 | std::cerr << "Total percentage of non-zero pixel difference between the two images exceeds " << 100.0 * eps2 << "%" << std::endl; 63 | std::cerr << "Percentage of non-zero pixel differences: " << 100.0 * percentSmallDifferences << "%" << std::endl; 64 | exit(1); 65 | } 66 | } 67 | 68 | //Uses the autodesk method of image comparison 69 | //Note the the tolerance here is in PIXELS not a percentage of input pixels 70 | template 71 | void checkResultsAutodesk(const T* const ref, const T* const gpu, size_t numElem, double variance, size_t tolerance) 72 | { 73 | 74 | size_t numBadPixels = 0; 75 | for (size_t i = 0; i < numElem; ++i) { 76 | T smaller = std::min(ref[i], gpu[i]); 77 | T larger = std::max(ref[i], gpu[i]); 78 | T diff = larger - smaller; 79 | if (diff > variance) 80 | ++numBadPixels; 81 | } 82 | 83 | if (numBadPixels > tolerance) { 84 | std::cerr << "Too many bad pixels in the image." << numBadPixels << "/" << tolerance << std::endl; 85 | exit(1); 86 | } 87 | } 88 | 89 | #endif 90 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # udacity-IntroToParallelProgramming 2 | CS344 - Introduction To Parallel Programming course (Udacity) proposed solutions 3 | 4 | Testing Environment: Visual Studio 2015 x64 + nVidia CUDA 8.0 + OpenCV 3.2.0 5 | 6 | For each problem set, the core of the algorithm to be implemented is located in the _students_func.cu_ file. 7 | 8 | ## Problem Set 1 - RGB2Gray: 9 | ### Objective 10 | Convert an input RGBA image into grayscale version (ignoring the A channel). 11 | ### Topics 12 | Example of a **map** primitive operation on a data structure. 13 | 14 | ## Problem Set 2 - Blur 15 | ### Objective 16 | Apply a Gaussian blur convolution filter to an input RGBA image (blur each channel independently, ignoring the A channel). 17 | ### Topics 18 | Example of a **stencil** primitive operation on a 2D array. Use of the **shared memory** in order to speed-up the algorithm. Both global memory and shared memory based kernels are provided, the latter providing approx. 1.6 speedup over the first. 19 | 20 | ## Problem Set 3 -Tone Mapping 21 | ### Objective 22 | Map a High Dynamic Range image into an image for a device supporting a smaller range of intensity values. 23 | ### Topics 24 | - Compute range of intensity values of the input image: min and max **reduce** implemented. 25 | - Compute **histogram** of intensity values (1024-values array) 26 | - Compute the cumulative ditribution function of the histogram: Hillis & Steele **scan** algorithm (step-efficient, well suited for small arrays like the histogram one). 27 | 28 | ## Problem Set 4 - Red eyes removal 29 | ### Objective 30 | Remove red eys effect from an inout RGBA image (it uses Normalized Cross Correlation against a training template). 31 | ### Topics 32 | Sorting algorithms with GPU: given an input array of NCC scores, sort it in ascending order: **radix sort**. For each bit: 33 | - Compute a predicate vector (0:false, 1:true) 34 | - Performs **Bielloch Scan** on the predicate vector (for both false and positive cases) 35 | - From Bielloch Scan extracts: an histogram of predicate values [0 numberOfFalses], an offset vector (the actual result of scan) 36 | - A move kernel computes the new index of each element (using the two structures above), and moves it. 37 | 38 | ## Problem Set 5 - Optimized histogram computation 39 | ### Objective 40 | Improve the histogram computation performance on GPU over the simple global atomic solution. 41 | ### Topics 42 | **Per-block** histogram computation. Each block computes his own histogram in shared memory, and histograms are combined at the end in global memory (more than 7x speedup over global atomic implementation, while being relatively simple). 43 | 44 | ## Problem Set 6 - Seamless Image Cloning 45 | ### Objective 46 | Given a target image (e.g. a swimming pool), do a seamless attachment of a source image mask (e.g. an hyppo). 47 | ### Topics 48 | The algorithm consists into performing Jacobi iterations on the source and target image to blend one with the other. 49 | - Given the mask, detect the interior points and the boundary points 50 | - Since the algorithm has to be performed only on the interior points, compute the **bounding box** of the mask region to restrict the Jacobi iterations on a subimage. 51 | - Split the images in the R,G and B channels. 52 | - Run 800 Jacobi iterations on each channel. The code makes use of **CUDA Streams** to run concurrently the same kernel on the 3 different channels (speedup of 3x on my machine, of 1.5x on the Udacity machine). The Jacobi kernel makes extensive use of shared memory, so the number of threads per block has been reduced to maximize SM's occupancy. 53 | - Recombine the 3 channels to form the output image. 54 | --------------------------------------------------------------------------------