├── AESRand_Paper
    ├── .gitignore
    ├── README.md
    ├── Makefile
    ├── Makefile.power9
    ├── Makefile.arm
    ├── UnitTest.cpp
    ├── AESRand.h
    ├── test.c
    ├── AESRand.hpp
    └── AESRand.cpp
├── AESRand
    ├── AESRand
    │   ├── pch.h
    │   ├── pch.cpp
    │   ├── AESRand.cpp
    │   ├── AESRand.vcxproj.filters
    │   ├── others.cpp
    │   └── AESRand.vcxproj
    ├── FloatTest
    │   ├── pch.h
    │   ├── pch.cpp
    │   ├── FloatTest.cpp
    │   ├── FloatTest.vcxproj.filters
    │   └── FloatTest.vcxproj
    ├── IntegerRangeTest
    │   ├── pch.h
    │   ├── pch.cpp
    │   ├── IntegerRangeTest.cpp
    │   ├── IntegerRangeTest.vcxproj.filters
    │   └── IntegerRangeTest.vcxproj
    └── AESRand.sln
├── AESRand_Linux
    ├── .gitignore
    ├── Makefile
    ├── AESRand.cpp
    ├── AESRand_BigCrush.cpp
    ├── README
    └── AESRand_BigCrush2.cpp
├── LICENSE
├── BenchmarkResults.md
├── .gitignore
├── PractRand.md
└── README.md


/AESRand_Paper/.gitignore:
--------------------------------------------------------------------------------
1 | AESRand
2 | AESRand.o
3 | AESRand.s
4 | *.swp
5 | UnitTest
6 | 


--------------------------------------------------------------------------------
/AESRand/AESRand/pch.h:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/AESRand/pch.h


--------------------------------------------------------------------------------
/AESRand/AESRand/pch.cpp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/AESRand/pch.cpp


--------------------------------------------------------------------------------
/AESRand/FloatTest/pch.h:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/FloatTest/pch.h


--------------------------------------------------------------------------------
/AESRand_Linux/.gitignore:
--------------------------------------------------------------------------------
1 | AESRand_BigCrush
2 | AESRand_BigCrush2
3 | AESRand
4 | AESRand.s
5 | *.swp
6 | 


--------------------------------------------------------------------------------
/AESRand/FloatTest/pch.cpp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/FloatTest/pch.cpp


--------------------------------------------------------------------------------
/AESRand/AESRand/AESRand.cpp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/AESRand/AESRand.cpp


--------------------------------------------------------------------------------
/AESRand/FloatTest/FloatTest.cpp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/FloatTest/FloatTest.cpp


--------------------------------------------------------------------------------
/AESRand/IntegerRangeTest/pch.h:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/IntegerRangeTest/pch.h


--------------------------------------------------------------------------------
/AESRand/IntegerRangeTest/pch.cpp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/IntegerRangeTest/pch.cpp


--------------------------------------------------------------------------------
/AESRand/IntegerRangeTest/IntegerRangeTest.cpp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dragontamer/AESRand/HEAD/AESRand/IntegerRangeTest/IntegerRangeTest.cpp


--------------------------------------------------------------------------------
/AESRand_Paper/README.md:
--------------------------------------------------------------------------------
1 | While the other directories were made for testing or prototyping, this directory will be the final
2 | version for the expected paper. 
3 | 
4 | The hope is for this single directory to be portable across ARM, x86, and Power9
5 | 
6 | 


--------------------------------------------------------------------------------
/AESRand_Paper/Makefile:
--------------------------------------------------------------------------------
 1 | all: AESRand.s AESRand 
 2 | 
 3 | clean:
 4 | 	rm AESRand.s AESRand AESRand.o UnitTest
 5 | 
 6 | AESRand.s: AESRand.cpp
 7 | 	g++ -maes -S AESRand.cpp -o AESRand.s
 8 | 
 9 | AESRand.o: AESRand.cpp
10 | 	g++ -g -c -maes AESRand.cpp -o AESRand.o
11 | 
12 | UnitTest: UnitTest.cpp AESRand.o
13 | 	g++ -g -maes UnitTest.cpp AESRand.o -o UnitTest
14 | 


--------------------------------------------------------------------------------
/AESRand_Paper/Makefile.power9:
--------------------------------------------------------------------------------
 1 | all: AESRand.s AESRand 
 2 | 
 3 | clean:
 4 | 	rm AESRand.s AESRand AESRand.o UnitTest
 5 | 
 6 | AESRand.s: AESRand.cpp
 7 | 	g++ -mcrypto -O0 -S AESRand.cpp -o AESRand.s
 8 | 
 9 | AESRand.o: AESRand.cpp
10 | 	g++ -g -c -mcrypto -O0 AESRand.cpp -o AESRand.o
11 | 
12 | UnitTest: UnitTest.cpp AESRand.o
13 | 	g++ -g -mcrypto -O0 UnitTest.cpp AESRand.o -o UnitTest
14 | 


--------------------------------------------------------------------------------
/AESRand_Paper/Makefile.arm:
--------------------------------------------------------------------------------
 1 | GPP=g++-8.3
 2 | 
 3 | all: AESRand.s AESRand 
 4 | 
 5 | clean:
 6 | 	rm AESRand.s AESRand AESRand.o UnitTest
 7 | 
 8 | AESRand.s: AESRand.cpp
 9 | 	$(GPP) -std=c++11 -march=armv8-a+simd+crypto -S AESRand.cpp -o AESRand.s
10 | 
11 | AESRand.o: AESRand.cpp
12 | 	$(GPP) -g -c -std=c++11 -march=armv8-a+simd+crypto AESRand.cpp -o AESRand.o
13 | 
14 | UnitTest: UnitTest.cpp AESRand.o
15 | 	$(GPP) -g -std=c++11 -march=armv8-a+simd+crypto UnitTest.cpp AESRand.o -o UnitTest
16 | 


--------------------------------------------------------------------------------
/AESRand_Linux/Makefile:
--------------------------------------------------------------------------------
 1 | all: AESRand.s AESRand AESRand_BigCrush2
 2 | 
 3 | clean:
 4 | 	rm AESRand.s AESRand AESRand_BigCrush
 5 | 
 6 | AESRand.s: AESRand.cpp
 7 | 	g++ -march=westmere -O2 -S AESRand.cpp -o AESRand.s
 8 | 
 9 | AESRand: AESRand.cpp
10 | 	g++ -march=westmere -O2 AESRand.cpp -o AESRand
11 | 
12 | AESRand_BigCrush: AESRand_BigCrush.cpp
13 | 	g++ -march=westmere -O2 AESRand_BigCrush.cpp -o AESRand_BigCrush -ltestu01
14 | 
15 | AESRand_BigCrush2: AESRand_BigCrush2.cpp
16 | 	g++ -march=westmere -O2 AESRand_BigCrush2.cpp -o AESRand_BigCrush2 -ltestu01
17 | 


--------------------------------------------------------------------------------
/AESRand_Paper/UnitTest.cpp:
--------------------------------------------------------------------------------
 1 | #include <iostream>
 2 | #include "AESRand.h"
 3 | #include <string.h>
 4 | 
 5 | int main(){
 6 | 	simd128 state = AESRand_init(); 
 7 | 	AESRand_increment(state);
 8 | 	std::array<uint32_t, 8> ints = AESRand_rand_uint32(state);
 9 | 	std::array<uint32_t, 8> matches = 
10 | 		{
11 | 			0x12e826e6,
12 | 			0x6c302fd5,
13 | 			0x83155f50,
14 | 			0xc33a3964,
15 | 			0x337eacb1,
16 | 			0xe74bf1c4,
17 | 			0xbf8be05e,
18 | 			0x5068aca6,
19 | 		};
20 | 
21 | 	if(memcmp((void*)&ints[0], (void*)&matches[0], sizeof(ints)) == 0){
22 | 		std::cout << "Unit Test passed" << std::endl;
23 | 	} else {
24 | 		std::cout << "Unit Test failed" << std::endl;
25 | 		for(int i=0; i<8; i++){
26 | 			std::cout << std::hex << ints[i] << "  " << matches[i] << std::endl;
27 | 		}
28 | 	}
29 | 
30 | 
31 | }
32 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 dragontamer
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/AESRand_Linux/AESRand.cpp:
--------------------------------------------------------------------------------
 1 | #include <iostream>
 2 | #include <immintrin.h>
 3 | #include <array>
 4 | 
 5 | 
 6 | __m128i AESRand_init(){
 7 | 	return _mm_setzero_si128(); 
 8 | }
 9 | 
10 | __m128i increment = _mm_set_epi8(0x2f, 0x2b, 0x29, 0x25, 0x1f, 0x1d, 0x17, 0x13, 
11 | 		0x11, 0x0D, 0x0B, 0x07, 0x05, 0x03, 0x02, 0x01); 
12 | 
13 | void AESRand_increment(__m128i& state){
14 | 	state += increment; 
15 | }
16 | 
17 | std::array<__m128i, 2> AESRand_rand(const __m128i state){
18 | 	__m128i penultimate = _mm_aesenc_si128(state, increment); 
19 | 	return {_mm_aesenc_si128(penultimate, increment), _mm_aesdec_si128(penultimate, increment)};
20 | }
21 | 
22 | int main(){
23 | 	std::cout << "Running 5-billion iterations (160 Billion-bytes of Random Data)" << std::endl; 
24 | 	__m128i state = AESRand_init(); 
25 | 	__m128i total = _mm_setzero_si128(); 
26 | 	__m128i total2 = _mm_setzero_si128(); 
27 | 
28 | 	for(long long i=0; i<5000000000; i++){
29 | 		AESRand_increment(state); 
30 | 		auto rands = AESRand_rand(state); 
31 | 		total += rands[0];
32 | 		total2 += rands[1]; 
33 | 	}
34 | 
35 | 	total += total2;
36 | 	std::cout << "Dummy print to negate optimizer: " << total[0] << std::endl;
37 | }
38 | 


--------------------------------------------------------------------------------
/AESRand_Linux/AESRand_BigCrush.cpp:
--------------------------------------------------------------------------------
 1 | #include <iostream>
 2 | #include <immintrin.h>
 3 | #include <array>
 4 | #include <stdio.h>
 5 | 
 6 | extern "C"{
 7 | 	#include <testu01/unif01.h>
 8 | 	#include <testu01/bbattery.h>
 9 | }
10 | 
11 | 
12 | __m128i AESRand_init(){
13 | 	return _mm_setzero_si128(); 
14 | }
15 | 
16 | __m128i increment = _mm_set_epi8(0x2f, 0x2b, 0x29, 0x25, 0x1f, 0x1d, 0x17, 0x13, 
17 | 		0x11, 0x0D, 0x0B, 0x07, 0x05, 0x03, 0x02, 0x01); 
18 | 
19 | void AESRand_increment(__m128i& state){
20 | 	state += increment; 
21 | }
22 | 
23 | std::array<__m128i, 2> AESRand_rand(const __m128i state){
24 | 	__m128i penultimate = _mm_aesenc_si128(state, increment); 
25 | 	return {_mm_aesenc_si128(penultimate, increment), _mm_aesdec_si128(penultimate, increment)};
26 | }
27 | 
28 | __m128i state = _mm_setzero_si128(); 
29 | 
30 | unsigned int AESRand_gen(void){ 
31 | 	AESRand_increment(state);
32 | 	auto rands = AESRand_rand(state); 
33 | 	return _mm_extract_epi32(rands[0], 0);  
34 | }
35 | 
36 | int main(){
37 | 	// Thanks to http://www.pcg-random.org/posts/how-to-test-with-testu01.html
38 | 	unif01_Gen* gen = unif01_CreateExternGenBits("AESRand Bottom32", AESRand_gen); 
39 | 	bbattery_BigCrush(gen); 
40 | 	unif01_DeleteExternGenBits(gen); 
41 | 	return 0;
42 | }
43 | 


--------------------------------------------------------------------------------
/AESRand/FloatTest/FloatTest.vcxproj.filters:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
 3 |   <ItemGroup>
 4 |     <Filter Include="Source Files">
 5 |       <UniqueIdentifier>{4FC737F1-C7A5-4376-A066-2A32D752A2FF}</UniqueIdentifier>
 6 |       <Extensions>cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx</Extensions>
 7 |     </Filter>
 8 |     <Filter Include="Header Files">
 9 |       <UniqueIdentifier>{93995380-89BD-4b04-88EB-625FBE52EBFB}</UniqueIdentifier>
10 |       <Extensions>h;hh;hpp;hxx;hm;inl;inc;ipp;xsd</Extensions>
11 |     </Filter>
12 |     <Filter Include="Resource Files">
13 |       <UniqueIdentifier>{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}</UniqueIdentifier>
14 |       <Extensions>rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms</Extensions>
15 |     </Filter>
16 |   </ItemGroup>
17 |   <ItemGroup>
18 |     <ClInclude Include="pch.h">
19 |       <Filter>Header Files</Filter>
20 |     </ClInclude>
21 |   </ItemGroup>
22 |   <ItemGroup>
23 |     <ClCompile Include="pch.cpp">
24 |       <Filter>Source Files</Filter>
25 |     </ClCompile>
26 |     <ClCompile Include="FloatTest.cpp">
27 |       <Filter>Source Files</Filter>
28 |     </ClCompile>
29 |   </ItemGroup>
30 | </Project>


--------------------------------------------------------------------------------
/AESRand/IntegerRangeTest/IntegerRangeTest.vcxproj.filters:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
 3 |   <ItemGroup>
 4 |     <Filter Include="Source Files">
 5 |       <UniqueIdentifier>{4FC737F1-C7A5-4376-A066-2A32D752A2FF}</UniqueIdentifier>
 6 |       <Extensions>cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx</Extensions>
 7 |     </Filter>
 8 |     <Filter Include="Header Files">
 9 |       <UniqueIdentifier>{93995380-89BD-4b04-88EB-625FBE52EBFB}</UniqueIdentifier>
10 |       <Extensions>h;hh;hpp;hxx;hm;inl;inc;ipp;xsd</Extensions>
11 |     </Filter>
12 |     <Filter Include="Resource Files">
13 |       <UniqueIdentifier>{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}</UniqueIdentifier>
14 |       <Extensions>rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms</Extensions>
15 |     </Filter>
16 |   </ItemGroup>
17 |   <ItemGroup>
18 |     <ClInclude Include="pch.h">
19 |       <Filter>Header Files</Filter>
20 |     </ClInclude>
21 |   </ItemGroup>
22 |   <ItemGroup>
23 |     <ClCompile Include="pch.cpp">
24 |       <Filter>Source Files</Filter>
25 |     </ClCompile>
26 |     <ClCompile Include="IntegerRangeTest.cpp">
27 |       <Filter>Source Files</Filter>
28 |     </ClCompile>
29 |   </ItemGroup>
30 | </Project>


--------------------------------------------------------------------------------
/AESRand_Linux/README:
--------------------------------------------------------------------------------
 1 | This Linux version serves as a compact and simplified proof-of-concept
 2 | with far less verbosity than the Windows implementation. 
 3 | 
 4 | For more notes on why certain values were chosen, read the Windows
 5 | implementation
 6 | 
 7 | This Linux version is designed with maximum comptability, with
 8 | -march=westmere in the Makefile. -march=skylake (or any other AVX
 9 | computer) generates cleaner AVX instructions, but doesn't seem
10 | to affect this test very much. Westmere CPUs were first released
11 | on January 7, 2010, so I expect that most people's computers today
12 | would be able to run this RNGAES code.
13 | 
14 | Try running with the "time" command. An example run on my machine is:
15 | 
16 | time ./AESRand
17 | Running 5-billion iterations (160 Billion-bytes of Random Data)
18 | Dummy print to negate optimizer: -535139616294573357
19 | 
20 | real    0m4.818s
21 | user    0m4.797s
22 | sys     0m0.000s
23 | 
24 | This gives a speed of 1.04 billion iterations/sec, or 33.2 GBps of
25 | random data. My computer varies between 3.4GHz and 4GHz, so the code
26 | runs somewhere between 3.5 cycles per iteration, to 4.15 cycles per
27 | iteration. 
28 | 
29 | --------
30 | 
31 | AESRandGenerator is a 2nd program I wrote to be tested with TestU01's
32 | "BigCrush". Its simply the generator that pipes its output to stdout
33 | 


--------------------------------------------------------------------------------
/AESRand_Paper/AESRand.h:
--------------------------------------------------------------------------------
 1 | #ifndef AESRAND_H
 2 | #define AESRAND_H
 3 | 
 4 | // I expect ifdefs galore in this file 
 5 | 
 6 | #if __amd64__
 7 | #include <immintrin.h>
 8 | typedef __m128i simd128;
 9 | typedef __m128 simd128_float;
10 | typedef __m128i simd128_uint32;
11 | #endif
12 | 
13 | #if _ARCH_PPC64
14 | #include <altivec.h>
15 | typedef vector unsigned long long  simd128;
16 | typedef vector float simd128_float;
17 | typedef vector unsigned int simd128_uint32;
18 | #endif
19 | 
20 | #if __aarch64__ 
21 | #include <arm_neon.h>
22 | typedef uint8x16_t  simd128;
23 | typedef float32x4_t simd128_float;
24 | typedef uint32x4_t simd128_uint32;
25 | #endif
26 | 
27 | #include <array>
28 | #include <cstdint>
29 | 
30 | simd128 AESRand_init();
31 | void AESRand_increment(simd128& state);
32 | std::array<simd128, 2> AESRand_rand(const simd128 state);
33 | 
34 | std::array<float, 8> AESRand_rand_float(const simd128 state); 
35 | std::array<uint32_t, 8> AESRand_rand_uint32(const simd128 state); 
36 | 
37 | /*
38 | std::array<uint32_t, 8> AESRand_randInt_range16(const simd128 state, uint16_t lower_bound, uint16_t upper_bound); 
39 | std::array<uint32_t, 4> AESRand_randInt_range32(const simd128 state, uint32_t lower_bound, uint32_t upper_bound); 
40 | uint64_t AESRand_randInt_range64(const simd128 state, uint64_t lower_bound, uint64_t upper_bound); 
41 | */
42 | 
43 | #endif
44 | 


--------------------------------------------------------------------------------
/AESRand/AESRand/AESRand.vcxproj.filters:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="utf-8"?>
 2 | <Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
 3 |   <ItemGroup>
 4 |     <Filter Include="Source Files">
 5 |       <UniqueIdentifier>{4FC737F1-C7A5-4376-A066-2A32D752A2FF}</UniqueIdentifier>
 6 |       <Extensions>cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx</Extensions>
 7 |     </Filter>
 8 |     <Filter Include="Header Files">
 9 |       <UniqueIdentifier>{93995380-89BD-4b04-88EB-625FBE52EBFB}</UniqueIdentifier>
10 |       <Extensions>h;hh;hpp;hxx;hm;inl;inc;ipp;xsd</Extensions>
11 |     </Filter>
12 |     <Filter Include="Resource Files">
13 |       <UniqueIdentifier>{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}</UniqueIdentifier>
14 |       <Extensions>rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms</Extensions>
15 |     </Filter>
16 |   </ItemGroup>
17 |   <ItemGroup>
18 |     <ClInclude Include="pch.h">
19 |       <Filter>Header Files</Filter>
20 |     </ClInclude>
21 |   </ItemGroup>
22 |   <ItemGroup>
23 |     <ClCompile Include="pch.cpp">
24 |       <Filter>Source Files</Filter>
25 |     </ClCompile>
26 |     <ClCompile Include="AESRand.cpp">
27 |       <Filter>Source Files</Filter>
28 |     </ClCompile>
29 |     <ClCompile Include="others.cpp">
30 |       <Filter>Source Files</Filter>
31 |     </ClCompile>
32 |   </ItemGroup>
33 | </Project>


--------------------------------------------------------------------------------
/AESRand_Linux/AESRand_BigCrush2.cpp:
--------------------------------------------------------------------------------
 1 | #include <iostream>
 2 | #include <immintrin.h>
 3 | #include <array>
 4 | #include <stdio.h>
 5 | 
 6 | extern "C"{
 7 | 	#include <testu01/unif01.h>
 8 | 	#include <testu01/bbattery.h>
 9 | }
10 | 
11 | 
12 | __m128i AESRand_init(){
13 | 	return _mm_setzero_si128(); 
14 | }
15 | 
16 | __m128i increment = _mm_set_epi8(0x2f, 0x2b, 0x29, 0x25, 0x1f, 0x1d, 0x17, 0x13, 
17 | 		0x11, 0x0D, 0x0B, 0x07, 0x05, 0x03, 0x02, 0x01); 
18 | 
19 | void AESRand_increment(__m128i& state){
20 | 	state += increment; 
21 | }
22 | 
23 | std::array<__m128i, 2> AESRand_rand(const __m128i state){
24 | 	__m128i penultimate = _mm_aesenc_si128(state, increment); 
25 | 	return {_mm_aesenc_si128(penultimate, increment), _mm_aesdec_si128(penultimate, increment)};
26 | }
27 | 
28 | __m128i state = _mm_setzero_si128(); 
29 | uint32_t buffer[8] __attribute__ ((aligned (16))); 
30 | int buffer_state=8; 
31 | 
32 | // This 2nd test, will test all 8 numbers that comes through
33 | unsigned int AESRand_gen(void){ 
34 | 	if(buffer_state>=8){
35 | 		AESRand_increment(state);
36 | 		auto rands = AESRand_rand(state); 
37 | 		_mm_storeu_si128((__m128i*)&buffer[0], rands[0]);
38 | 		_mm_storeu_si128((__m128i*)&buffer[4], rands[1]);
39 | 		buffer_state = 0;
40 | 	}
41 | 
42 | 	return static_cast<unsigned int*>(&buffer[0])[buffer_state++]; 
43 | }
44 | 
45 | int main(){
46 | 	// Thanks to http://www.pcg-random.org/posts/how-to-test-with-testu01.html
47 | 	unif01_Gen* gen = unif01_CreateExternGenBits("AESRand All 8xint32", AESRand_gen); 
48 | 	bbattery_BigCrush(gen); 
49 | 	unif01_DeleteExternGenBits(gen); 
50 | 	return 0;
51 | }
52 | 


--------------------------------------------------------------------------------
/AESRand_Paper/test.c:
--------------------------------------------------------------------------------
 1 | #if 0
 2 | #include <stdio.h>
 3 | #include <altivec.h>
 4 | 
 5 | void printArray(char array[16]){
 6 | 	for(int i=0; i<16; i++){
 7 | 		printf("%02x ", array[i]); 
 8 | 	}
 9 | 	printf("\n"); 
10 | }
11 | 
12 | int main(){
13 | 	char array[16];
14 | 	vector unsigned long long simd128 = {0, 1};
15 | 	memcpy(array, &simd128, 16); 
16 | 	printArray(array); 
17 | 	simd128 = vec_perm(simd128, simd128, (vector unsigned char){15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0});
18 | 	simd128 = __builtin_crypto_vcipher(simd128, (vector unsigned long long){0,0}); 
19 | 	simd128 = vec_perm(simd128, simd128, (vector unsigned char){15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0});
20 | 	memcpy(array, &simd128, 16); 
21 | 	printArray(array); 
22 | }
23 | #endif
24 | 
25 | #include <string.h>
26 | #include <stdio.h>
27 | #include <arm_neon.h>
28 | 
29 | void printArray(uint8_t array[16]){
30 | 	for(int i=0; i<16; i++){
31 | 		printf("%02x ", array[i]); 
32 | 	}
33 | 	printf("\n"); 
34 | }
35 | 
36 | int main(){
37 | 	//uint8_t increment[16] = {0x2f, 0x2b, 0x29, 0x25, 0x1f, 0x1d, 0x17, 0x13,
38 | 	//		0x11, 0x0D, 0x0B, 0x07, 0x05, 0x03, 0x02, 0x01};
39 | 	uint8_t increment[16] = {0x01, 0x02, 0x03, 0x05, 0x07, 0x0B, 0x0D, 0x11, 
40 | 				0x13, 0x17, 0x1d, 0x1f, 0x25, 0x29, 0x2b, 0x2f};
41 | 	uint8_t array[16] = {
42 | 				0, 0, 0, 0,
43 | 				0, 0, 0, 0,
44 | 				1, 0, 0, 0,
45 | 				0, 0, 0, 0,
46 | 			};
47 | 	uint8x16_t simd128 = vld1q_u8(array);
48 | 	printArray(array); 
49 | 
50 | 	simd128 = vaesmcq_u8(vaeseq_u8(simd128, vdupq_n_u8(0)));
51 | 	memcpy(array, &simd128, 16); 
52 | 	printArray(array); 
53 | 
54 | 	simd128 ^= vld1q_u8(increment);
55 | 	memcpy(array, &simd128, 16); 
56 | 	printArray(array); 
57 | 
58 | /*
59 | 	simd128 = vec_perm(simd128, simd128, (vector unsigned char){15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0});
60 | 	simd128 = __builtin_crypto_vcipher(simd128, (vector unsigned long long){0,0}); 
61 | 	simd128 = vec_perm(simd128, simd128, (vector unsigned char){15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0});
62 | 	memcpy(array, &simd128, 16); 
63 | 	printArray(array); 
64 | */
65 | }
66 | 


--------------------------------------------------------------------------------
/BenchmarkResults.md:
--------------------------------------------------------------------------------
 1 | I redid the test while locking my CPU to 3.4 GHz (AMD Zen options: P0 state 3.4 GHz, P1+ was disabled, Zen Core Performance Boost off). I manually verified in Ryzen Master that the CPU remained at 3.4GHz during the test.
 2 | 
 3 | Pattern for calculating cycles per iteration: (number of seconds) * 3.4GHz clock speed / (number of iterations). Number of iterations is either 10-billion, or 5-billion, depending on how the specific benchmark worked.
 4 | 
 5 | * Single-State AESRand: 3.69 cycles per iteration (32-bytes)
 6 | * Double-State AESRand: 2.95 cycles per iteration (32-bytes)
 7 | * mt19937: 14.76 cycles per iteration (4-bytes)
 8 | * pcg32 (Unrolled): 5.46 cycles per iteration (4-bytes)
 9 | * xoshiro256plus (Unrolled): 3.32 cycles per iteration (8-bytes)
10 | * PlusOne XMM-registers: 1.59 cycles per iteration (Dummy Control, like BogoMIPS)
11 | 
12 | The "overhead" of the AESRand benchmark are:
13 | 1. The "For" loop: one-add per iteration (i++), and the cmp/jnz instruction (i<=ITERATIONS). Unrolling reduced this overhead, but modern CPUs are good at executing the loop-logic in parallel, which mitigates this overhead.
14 | 2. Two SIMD-adds in Single-State AESRand for the "Dummy Print".
15 | 3. Four SIMD-adds for Double-state AESRand
16 | 4. One 32-bit add for mt19937
17 | 5. One 32-bit add for pcg32
18 | 6. One 64-bit add for xoshiro256plus
19 | 
20 | Raw Results (AMD Threadripper 1950x locked to 3.4GHz)
21 | ===========
22 | 
23 | Beginning Single-state 'serial' test
24 | Total Seconds: 5.4278
25 | GBps: 27.4534
26 | Dummy Benchmark anti-optimizer print: 1706011378085583560
27 | Beginning Parallel (2x) test: instruction-level parallelism
28 | Time: 8.67641
29 | GBps: 34.3487
30 | Dummy Benchmark anti-optimizer print: 1283732354369314394
31 | 
32 | Testing mt19937
33 | Time: 21.7154
34 | GBps: 0.857752
35 | Dummy Benchmark anti-optimizer print: 1680273558
36 | 
37 | Testing pcg32 Unrolled x4
38 | Time: 8.0232
39 | GBps: 2.32157
40 | Dummy Benchmark anti-optimizer print: 2362602604
41 | 
42 | Testing pcg32
43 | Time: 8.22974
44 | GBps: 2.26331
45 | Dummy Benchmark anti-optimizer print: 757965796
46 | 
47 | Testing xoshiro256plus Unrolled x4
48 | Time: 4.88474
49 | GBps: 7.62639
50 | Dummy Benchmark anti-optimizer print: 2202972135473059297
51 | 
52 | Testing xoshiro256plus
53 | Time: 5.09052
54 | GBps: 7.31809
55 | Dummy Benchmark anti-optimizer print: 5290432412060736627
56 | 
57 | Beginning PlusOne XMM-registers Test
58 | Total Seconds: 2.3376
59 | GBps: 63.7456
60 | Dummy Benchmark anti-optimizer print: 6553255931290448384


--------------------------------------------------------------------------------
/AESRand/AESRand.sln:
--------------------------------------------------------------------------------
 1 | 
 2 | Microsoft Visual Studio Solution File, Format Version 12.00
 3 | # Visual Studio 15
 4 | VisualStudioVersion = 15.0.28010.2003
 5 | MinimumVisualStudioVersion = 10.0.40219.1
 6 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "AESRand", "AESRand\AESRand.vcxproj", "{F91B1300-34D7-459B-B40C-3479AF111436}"
 7 | EndProject
 8 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "FloatTest", "FloatTest\FloatTest.vcxproj", "{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}"
 9 | EndProject
10 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "IntegerRangeTest", "IntegerRangeTest\IntegerRangeTest.vcxproj", "{14562E75-9BAB-4663-BFE3-51C96298EC81}"
11 | EndProject
12 | Global
13 | 	GlobalSection(SolutionConfigurationPlatforms) = preSolution
14 | 		Debug|x64 = Debug|x64
15 | 		Debug|x86 = Debug|x86
16 | 		Release|x64 = Release|x64
17 | 		Release|x86 = Release|x86
18 | 	EndGlobalSection
19 | 	GlobalSection(ProjectConfigurationPlatforms) = postSolution
20 | 		{F91B1300-34D7-459B-B40C-3479AF111436}.Debug|x64.ActiveCfg = Debug|x64
21 | 		{F91B1300-34D7-459B-B40C-3479AF111436}.Debug|x64.Build.0 = Debug|x64
22 | 		{F91B1300-34D7-459B-B40C-3479AF111436}.Debug|x86.ActiveCfg = Debug|Win32
23 | 		{F91B1300-34D7-459B-B40C-3479AF111436}.Debug|x86.Build.0 = Debug|Win32
24 | 		{F91B1300-34D7-459B-B40C-3479AF111436}.Release|x64.ActiveCfg = Release|x64
25 | 		{F91B1300-34D7-459B-B40C-3479AF111436}.Release|x64.Build.0 = Release|x64
26 | 		{F91B1300-34D7-459B-B40C-3479AF111436}.Release|x86.ActiveCfg = Release|Win32
27 | 		{F91B1300-34D7-459B-B40C-3479AF111436}.Release|x86.Build.0 = Release|Win32
28 | 		{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}.Debug|x64.ActiveCfg = Debug|x64
29 | 		{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}.Debug|x64.Build.0 = Debug|x64
30 | 		{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}.Debug|x86.ActiveCfg = Debug|Win32
31 | 		{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}.Debug|x86.Build.0 = Debug|Win32
32 | 		{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}.Release|x64.ActiveCfg = Release|x64
33 | 		{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}.Release|x64.Build.0 = Release|x64
34 | 		{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}.Release|x86.ActiveCfg = Release|Win32
35 | 		{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}.Release|x86.Build.0 = Release|Win32
36 | 		{14562E75-9BAB-4663-BFE3-51C96298EC81}.Debug|x64.ActiveCfg = Debug|x64
37 | 		{14562E75-9BAB-4663-BFE3-51C96298EC81}.Debug|x64.Build.0 = Debug|x64
38 | 		{14562E75-9BAB-4663-BFE3-51C96298EC81}.Debug|x86.ActiveCfg = Debug|Win32
39 | 		{14562E75-9BAB-4663-BFE3-51C96298EC81}.Debug|x86.Build.0 = Debug|Win32
40 | 		{14562E75-9BAB-4663-BFE3-51C96298EC81}.Release|x64.ActiveCfg = Release|x64
41 | 		{14562E75-9BAB-4663-BFE3-51C96298EC81}.Release|x64.Build.0 = Release|x64
42 | 		{14562E75-9BAB-4663-BFE3-51C96298EC81}.Release|x86.ActiveCfg = Release|Win32
43 | 		{14562E75-9BAB-4663-BFE3-51C96298EC81}.Release|x86.Build.0 = Release|Win32
44 | 	EndGlobalSection
45 | 	GlobalSection(SolutionProperties) = preSolution
46 | 		HideSolutionNode = FALSE
47 | 	EndGlobalSection
48 | 	GlobalSection(ExtensibilityGlobals) = postSolution
49 | 		SolutionGuid = {B018B1BB-6411-45E9-B712-56A5A37D7ACA}
50 | 	EndGlobalSection
51 | EndGlobal
52 | 


--------------------------------------------------------------------------------
/AESRand/AESRand/others.cpp:
--------------------------------------------------------------------------------
  1 | #include "pch.h"
  2 | 
  3 | // Sorry for the mess. I copy/pasted http://vigna.di.unimi.it/xorshift/xoshiro256plus.c and
  4 | // http://www.pcg-random.org/download.html#minimal-c-implementation into this file.
  5 | 
  6 | 
  7 | #include <stdint.h>
  8 | 
  9 | typedef struct { uint64_t state;  uint64_t inc; } pcg32_random_t;
 10 | 
 11 | uint32_t pcg32_random_r(pcg32_random_t* rng)
 12 | {
 13 | 	uint64_t oldstate = rng->state;
 14 | 	// Advance internal state
 15 | 	rng->state = oldstate * 6364136223846793005ULL + (rng->inc | 1);
 16 | 	// Calculate output function (XSH RR), uses old state for max ILP
 17 | 	uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
 18 | 	uint32_t rot = oldstate >> 59u;
 19 | 	return (xorshifted >> rot) | (xorshifted << ((-static_cast<int32_t>(rot)) & 31));
 20 | }
 21 | 
 22 | #include <stdint.h>
 23 | 
 24 | /* This is xoshiro256+ 1.0, our best and fastest generator for floating-point
 25 |    numbers. We suggest to use its upper bits for floating-point
 26 |    generation, as it is slightly faster than xoshiro256**. It passes all
 27 |    tests we are aware of except for the lowest three bits, which might
 28 |    fail linearity tests (and just those), so if low linear complexity is
 29 |    not considered an issue (as it is usually the case) it can be used to
 30 |    generate 64-bit outputs, too.
 31 | 
 32 |    We suggest to use a sign test to extract a random Boolean value, and
 33 |    right shifts to extract subsets of bits.
 34 | 
 35 |    The state must be seeded so that it is not everywhere zero. If you have
 36 |    a 64-bit seed, we suggest to seed a splitmix64 generator and use its
 37 |    output to fill s. */
 38 | 
 39 | 
 40 | static inline uint64_t rotl(const uint64_t x, int k) {
 41 | 	return (x << k) | (x >> (64 - k));
 42 | }
 43 | 
 44 | 
 45 | uint64_t s[4];
 46 | 
 47 | uint64_t next(void) {
 48 | 	const uint64_t result_plus = s[0] + s[3];
 49 | 
 50 | 	const uint64_t t = s[1] << 17;
 51 | 
 52 | 	s[2] ^= s[0];
 53 | 	s[3] ^= s[1];
 54 | 	s[1] ^= s[2];
 55 | 	s[0] ^= s[3];
 56 | 
 57 | 	s[2] ^= t;
 58 | 
 59 | 	s[3] = rotl(s[3], 45);
 60 | 
 61 | 	return result_plus;
 62 | }
 63 | 
 64 | 
 65 | /* This is the jump function for the generator. It is equivalent
 66 |    to 2^128 calls to next(); it can be used to generate 2^128
 67 |    non-overlapping subsequences for parallel computations. */
 68 | 
 69 | void jump(void) {
 70 | 	static const uint64_t JUMP[] = { 0x180ec6d33cfd0aba, 0xd5a61266f0c9392c, 0xa9582618e03fc9aa, 0x39abdc4529b1661c };
 71 | 
 72 | 	uint64_t s0 = 0;
 73 | 	uint64_t s1 = 0;
 74 | 	uint64_t s2 = 0;
 75 | 	uint64_t s3 = 0;
 76 | 	for (int i = 0; i < sizeof JUMP / sizeof *JUMP; i++)
 77 | 		for (int b = 0; b < 64; b++) {
 78 | 			if (JUMP[i] & UINT64_C(1) << b) {
 79 | 				s0 ^= s[0];
 80 | 				s1 ^= s[1];
 81 | 				s2 ^= s[2];
 82 | 				s3 ^= s[3];
 83 | 			}
 84 | 			next();
 85 | 		}
 86 | 
 87 | 	s[0] = s0;
 88 | 	s[1] = s1;
 89 | 	s[2] = s2;
 90 | 	s[3] = s3;
 91 | }
 92 | 
 93 | 
 94 | /* This is the long-jump function for the generator. It is equivalent to
 95 |    2^192 calls to next(); it can be used to generate 2^64 starting points,
 96 |    from each of which jump() will generate 2^64 non-overlapping
 97 |    subsequences for parallel distributed computations. */
 98 | 
 99 | void long_jump(void) {
100 | 	static const uint64_t LONG_JUMP[] = { 0x76e15d3efefdcbbf, 0xc5004e441c522fb3, 0x77710069854ee241, 0x39109bb02acbe635 };
101 | 
102 | 	uint64_t s0 = 0;
103 | 	uint64_t s1 = 0;
104 | 	uint64_t s2 = 0;
105 | 	uint64_t s3 = 0;
106 | 	for (int i = 0; i < sizeof LONG_JUMP / sizeof *LONG_JUMP; i++)
107 | 		for (int b = 0; b < 64; b++) {
108 | 			if (LONG_JUMP[i] & UINT64_C(1) << b) {
109 | 				s0 ^= s[0];
110 | 				s1 ^= s[1];
111 | 				s2 ^= s[2];
112 | 				s3 ^= s[3];
113 | 			}
114 | 			next();
115 | 		}
116 | 
117 | 	s[0] = s0;
118 | 	s[1] = s1;
119 | 	s[2] = s2;
120 | 	s[3] = s3;
121 | }
122 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | ## Ignore Visual Studio temporary files, build results, and
  2 | ## files generated by popular Visual Studio add-ons.
  3 | ##
  4 | ## Get latest from https://github.com/github/gitignore/blob/master/VisualStudio.gitignore
  5 | 
  6 | # User-specific files
  7 | *.suo
  8 | *.user
  9 | *.userosscache
 10 | *.sln.docstates
 11 | 
 12 | # User-specific files (MonoDevelop/Xamarin Studio)
 13 | *.userprefs
 14 | 
 15 | # Build results
 16 | [Dd]ebug/
 17 | [Dd]ebugPublic/
 18 | [Rr]elease/
 19 | [Rr]eleases/
 20 | x64/
 21 | x86/
 22 | bld/
 23 | [Bb]in/
 24 | [Oo]bj/
 25 | [Ll]og/
 26 | 
 27 | # Visual Studio 2015/2017 cache/options directory
 28 | .vs/
 29 | # Uncomment if you have tasks that create the project's static files in wwwroot
 30 | #wwwroot/
 31 | 
 32 | # Visual Studio 2017 auto generated files
 33 | Generated\ Files/
 34 | 
 35 | # MSTest test Results
 36 | [Tt]est[Rr]esult*/
 37 | [Bb]uild[Ll]og.*
 38 | 
 39 | # NUNIT
 40 | *.VisualState.xml
 41 | TestResult.xml
 42 | 
 43 | # Build Results of an ATL Project
 44 | [Dd]ebugPS/
 45 | [Rr]eleasePS/
 46 | dlldata.c
 47 | 
 48 | # Benchmark Results
 49 | BenchmarkDotNet.Artifacts/
 50 | 
 51 | # .NET Core
 52 | project.lock.json
 53 | project.fragment.lock.json
 54 | artifacts/
 55 | **/Properties/launchSettings.json
 56 | 
 57 | # StyleCop
 58 | StyleCopReport.xml
 59 | 
 60 | # Files built by Visual Studio
 61 | *_i.c
 62 | *_p.c
 63 | *_i.h
 64 | *.ilk
 65 | *.meta
 66 | *.obj
 67 | *.iobj
 68 | *.pch
 69 | *.pdb
 70 | *.ipdb
 71 | *.pgc
 72 | *.pgd
 73 | *.rsp
 74 | *.sbr
 75 | *.tlb
 76 | *.tli
 77 | *.tlh
 78 | *.tmp
 79 | *.tmp_proj
 80 | *.log
 81 | *.vspscc
 82 | *.vssscc
 83 | .builds
 84 | *.pidb
 85 | *.svclog
 86 | *.scc
 87 | 
 88 | # Chutzpah Test files
 89 | _Chutzpah*
 90 | 
 91 | # Visual C++ cache files
 92 | ipch/
 93 | *.aps
 94 | *.ncb
 95 | *.opendb
 96 | *.opensdf
 97 | *.sdf
 98 | *.cachefile
 99 | *.VC.db
100 | *.VC.VC.opendb
101 | 
102 | # Visual Studio profiler
103 | *.psess
104 | *.vsp
105 | *.vspx
106 | *.sap
107 | 
108 | # Visual Studio Trace Files
109 | *.e2e
110 | 
111 | # TFS 2012 Local Workspace
112 | $tf/
113 | 
114 | # Guidance Automation Toolkit
115 | *.gpState
116 | 
117 | # ReSharper is a .NET coding add-in
118 | _ReSharper*/
119 | *.[Rr]e[Ss]harper
120 | *.DotSettings.user
121 | 
122 | # JustCode is a .NET coding add-in
123 | .JustCode
124 | 
125 | # TeamCity is a build add-in
126 | _TeamCity*
127 | 
128 | # DotCover is a Code Coverage Tool
129 | *.dotCover
130 | 
131 | # AxoCover is a Code Coverage Tool
132 | .axoCover/*
133 | !.axoCover/settings.json
134 | 
135 | # Visual Studio code coverage results
136 | *.coverage
137 | *.coveragexml
138 | 
139 | # NCrunch
140 | _NCrunch_*
141 | .*crunch*.local.xml
142 | nCrunchTemp_*
143 | 
144 | # MightyMoose
145 | *.mm.*
146 | AutoTest.Net/
147 | 
148 | # Web workbench (sass)
149 | .sass-cache/
150 | 
151 | # Installshield output folder
152 | [Ee]xpress/
153 | 
154 | # DocProject is a documentation generator add-in
155 | DocProject/buildhelp/
156 | DocProject/Help/*.HxT
157 | DocProject/Help/*.HxC
158 | DocProject/Help/*.hhc
159 | DocProject/Help/*.hhk
160 | DocProject/Help/*.hhp
161 | DocProject/Help/Html2
162 | DocProject/Help/html
163 | 
164 | # Click-Once directory
165 | publish/
166 | 
167 | # Publish Web Output
168 | *.[Pp]ublish.xml
169 | *.azurePubxml
170 | # Note: Comment the next line if you want to checkin your web deploy settings,
171 | # but database connection strings (with potential passwords) will be unencrypted
172 | *.pubxml
173 | *.publishproj
174 | 
175 | # Microsoft Azure Web App publish settings. Comment the next line if you want to
176 | # checkin your Azure Web App publish settings, but sensitive information contained
177 | # in these scripts will be unencrypted
178 | PublishScripts/
179 | 
180 | # NuGet Packages
181 | *.nupkg
182 | # The packages folder can be ignored because of Package Restore
183 | **/[Pp]ackages/*
184 | # except build/, which is used as an MSBuild target.
185 | !**/[Pp]ackages/build/
186 | # Uncomment if necessary however generally it will be regenerated when needed
187 | #!**/[Pp]ackages/repositories.config
188 | # NuGet v3's project.json files produces more ignorable files
189 | *.nuget.props
190 | *.nuget.targets
191 | 
192 | # Microsoft Azure Build Output
193 | csx/
194 | *.build.csdef
195 | 
196 | # Microsoft Azure Emulator
197 | ecf/
198 | rcf/
199 | 
200 | # Windows Store app package directories and files
201 | AppPackages/
202 | BundleArtifacts/
203 | Package.StoreAssociation.xml
204 | _pkginfo.txt
205 | *.appx
206 | 
207 | # Visual Studio cache files
208 | # files ending in .cache can be ignored
209 | *.[Cc]ache
210 | # but keep track of directories ending in .cache
211 | !*.[Cc]ache/
212 | 
213 | # Others
214 | ClientBin/
215 | ~$*
216 | *~
217 | *.dbmdl
218 | *.dbproj.schemaview
219 | *.jfm
220 | *.pfx
221 | *.publishsettings
222 | orleans.codegen.cs
223 | 
224 | # Including strong name files can present a security risk 
225 | # (https://github.com/github/gitignore/pull/2483#issue-259490424)
226 | #*.snk
227 | 
228 | # Since there are multiple workflows, uncomment next line to ignore bower_components
229 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
230 | #bower_components/
231 | 
232 | # RIA/Silverlight projects
233 | Generated_Code/
234 | 
235 | # Backup & report files from converting an old project file
236 | # to a newer Visual Studio version. Backup files are not needed,
237 | # because we have git ;-)
238 | _UpgradeReport_Files/
239 | Backup*/
240 | UpgradeLog*.XML
241 | UpgradeLog*.htm
242 | ServiceFabricBackup/
243 | *.rptproj.bak
244 | 
245 | # SQL Server files
246 | *.mdf
247 | *.ldf
248 | *.ndf
249 | 
250 | # Business Intelligence projects
251 | *.rdl.data
252 | *.bim.layout
253 | *.bim_*.settings
254 | *.rptproj.rsuser
255 | 
256 | # Microsoft Fakes
257 | FakesAssemblies/
258 | 
259 | # GhostDoc plugin setting file
260 | *.GhostDoc.xml
261 | 
262 | # Node.js Tools for Visual Studio
263 | .ntvs_analysis.dat
264 | node_modules/
265 | 
266 | # Visual Studio 6 build log
267 | *.plg
268 | 
269 | # Visual Studio 6 workspace options file
270 | *.opt
271 | 
272 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
273 | *.vbw
274 | 
275 | # Visual Studio LightSwitch build output
276 | **/*.HTMLClient/GeneratedArtifacts
277 | **/*.DesktopClient/GeneratedArtifacts
278 | **/*.DesktopClient/ModelManifest.xml
279 | **/*.Server/GeneratedArtifacts
280 | **/*.Server/ModelManifest.xml
281 | _Pvt_Extensions
282 | 
283 | # Paket dependency manager
284 | .paket/paket.exe
285 | paket-files/
286 | 
287 | # FAKE - F# Make
288 | .fake/
289 | 
290 | # JetBrains Rider
291 | .idea/
292 | *.sln.iml
293 | 
294 | # CodeRush
295 | .cr/
296 | 
297 | # Python Tools for Visual Studio (PTVS)
298 | __pycache__/
299 | *.pyc
300 | 
301 | # Cake - Uncomment if you are using it
302 | # tools/**
303 | # !tools/packages.config
304 | 
305 | # Tabs Studio
306 | *.tss
307 | 
308 | # Telerik's JustMock configuration file
309 | *.jmconfig
310 | 
311 | # BizTalk build output
312 | *.btp.cs
313 | *.btm.cs
314 | *.odx.cs
315 | *.xsd.cs
316 | 
317 | # OpenCover UI analysis results
318 | OpenCover/
319 | 
320 | # Azure Stream Analytics local run output 
321 | ASALocalRun/
322 | 
323 | # MSBuild Binary and Structured Log
324 | *.binlog
325 | 
326 | # NVidia Nsight GPU debugger configuration file
327 | *.nvuser
328 | 
329 | # MFractors (Xamarin productivity tool) working folder 
330 | .mfractor/
331 | 


--------------------------------------------------------------------------------
/AESRand_Paper/AESRand.hpp:
--------------------------------------------------------------------------------
  1 | #ifdef __amd64__
  2 | 
  3 | simd128 AESRand_init(){
  4 | 	return _mm_setzero_si128(); 
  5 | }
  6 | 
  7 | static simd128 increment = _mm_set_epi8(0x2f, 0x2b, 0x29, 0x25, 0x1f, 0x1d, 0x17, 0x13, 
  8 | 		0x11, 0x0D, 0x0B, 0x07, 0x05, 0x03, 0x02, 0x01); 
  9 | 
 10 | void AESRand_increment(simd128& state){
 11 | 	state += increment; 
 12 | }
 13 | 
 14 | std::array<simd128, 2> AESRand_rand(const simd128 state){
 15 | 	simd128 penultimate = _mm_aesenc_si128(state, increment); 
 16 | 	return {_mm_aesenc_si128(penultimate, increment), _mm_aesdec_si128(penultimate, increment)};
 17 | }
 18 | 
 19 | static __m128 toFloats(__m128i input){
 20 | 	// Isolate the sign and exponent bits
 21 | 	__m128i isolate = _mm_andnot_si128(_mm_set1_epi32(0xff800000), input);
 22 | 
 23 | 	// 0x3f800000 is the magic number representing floating point 1.0
 24 | 	__m128i addExponent = _mm_or_si128(_mm_set1_epi32(0x3f800000), isolate);
 25 | 
 26 | 	// Numbers are now in between [1.0, 2.0)
 27 | 	__m128 one = _mm_set1_ps(1.0);
 28 | 
 29 | 	// Result is now in [0, 1), but we may have lost some bits.
 30 | 	// We could return now, but... we can regain 9-lost bits without much effort
 31 | 	__m128 fastResult = _mm_sub_ps(_mm_castsi128_ps(addExponent), one);
 32 | 
 33 | 	return fastResult;
 34 | 
 35 | #if 0
 36 | 	// This code takes the 9-unused bits and uses them in the bottom sometimes. This
 37 | 	// may add extra bits of precision in some cases at the cost of possibly returning
 38 | 	// a denormal.
 39 | 	__m128i unused9bits = _mm_and_si128(_mm_set1_epi32(0xff800000), input);
 40 | 	unused9bits = _mm_srli_epi32(unused9bits, 23);
 41 | 
 42 | 	//Doing an _mm_xor_ps with those 9-bits results in a NAN error. Do xors 
 43 | 	// in the integer domain, then convert back.
 44 | 
 45 | 	return _mm_xor_ps(fastResult, _mm_castsi128_ps(unused9bits));
 46 | #endif
 47 | }
 48 | 
 49 | std::array<uint32_t, 8> AESRand_rand_uint32(const simd128 state){
 50 | 	auto rands = AESRand_rand(state); 
 51 | 
 52 | 	std::array<uint32_t, 8> toReturn;
 53 | 	_mm_storeu_si128((__m128i*)&toReturn[0], rands[0]);
 54 | 	_mm_storeu_si128((__m128i*)&toReturn[4], rands[1]);
 55 | 	return toReturn; 
 56 | }
 57 | 
 58 | std::array<float, 8> AESRand_rand_float(const simd128 state){
 59 | 	auto rands = AESRand_rand(state); 
 60 | 	__m128 simd0 = toFloats(rands[0]);
 61 | 	__m128 simd1 = toFloats(rands[1]);
 62 | 
 63 | 	std::array<float, 8> toReturn;
 64 | 	_mm_storeu_ps(&toReturn[0], simd0);
 65 | 	_mm_storeu_ps(&toReturn[4], simd1);
 66 | 	return toReturn; 
 67 | }
 68 | 
 69 | #endif //amd64
 70 | 
 71 | #ifdef _ARCH_PPC64
 72 | 
 73 | // PPC Intrinsics defined in "64-bit ELF V2 ABI Specification", chapter 6 and Appendix A.
 74 | // GCC defines the crypto-extension
 75 | 
 76 | // PowerPC operates on big-endian FIPS 197 compatible AES-vectors
 77 | // Convert to big-endian and back to retain compatibility with AMD64
 78 | static simd128 endianConv(simd128 in){
 79 | 	return vec_perm(in, in, (vector unsigned char){15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0});
 80 | }
 81 | 
 82 | simd128 AESRand_init(){
 83 | 	return (simd128) {0, 0}; 
 84 | }
 85 | 
 86 | static simd128 increment = {0x110d0b0705030201, 0x2f2b29251f1d1713};
 87 | 
 88 | void AESRand_increment(simd128& state){
 89 | 	state += increment; 
 90 | 	//state = vec_add(state, increment); 
 91 | }
 92 | 
 93 | std::array<simd128, 2> AESRand_rand(const simd128 state){
 94 | 	simd128 state_endian = endianConv(state);
 95 | 	simd128 increment_endian = endianConv(increment);
 96 | 	simd128 penultimate = __builtin_crypto_vcipher(state_endian, increment_endian); 
 97 | 	simd128 first_ret = __builtin_crypto_vcipher(penultimate, increment_endian); 
 98 | 	simd128 second_ret = __builtin_crypto_vncipher(penultimate, (vector unsigned long long) {0,0}); 
 99 | 
100 | 	// Note: this is suboptimal. A "MixColumns" can be applied at compile-time to the
101 | 	// increment_endian value to combine this XOR with the above vncipher command. Depends how much
102 | 	// we care about optimization...
103 | 	second_ret ^= increment_endian;
104 | 	return {endianConv(first_ret), endianConv(second_ret)}; 
105 | }
106 | 
107 | std::array<uint32_t, 8> AESRand_rand_uint32(const simd128 state){
108 | 	auto rands = AESRand_rand(state); 
109 | 
110 | 	std::array<uint32_t, 8> toReturn;
111 | 	toReturn[0] = rands[0][0];
112 | 	toReturn[1] = rands[0][0] >> 32;
113 | 	toReturn[2] = rands[0][1];
114 | 	toReturn[3] = rands[0][1] >> 32;
115 | 	toReturn[4] = rands[1][0];
116 | 	toReturn[5] = rands[1][0] >> 32;
117 | 	toReturn[6] = rands[1][1];
118 | 	toReturn[7] = rands[1][1] >> 32;
119 | //	_mm_storeu_si128((__m128i*)&toReturn[0], rands[0]);
120 | //	_mm_storeu_si128((__m128i*)&toReturn[4], rands[1]);
121 | 	return toReturn; 
122 | }
123 | 
124 | #endif
125 | 
126 | #if __aarch64__
127 | 
128 | simd128 AESRand_init(){
129 | 	simd128 arb;
130 | 	return veorq_u8(arb, arb);
131 | }
132 | 
133 | // Endian is reversed compared to Intel. Completely backwards...
134 | uint8_t increment[16] = {0x01, 0x02, 0x03, 0x05, 0x07, 0x0B, 0x0D, 0x11,
135 | 	0x13, 0x17, 0x1d, 0x1f, 0x25, 0x29, 0x2b, 0x2f};
136 | 
137 | void AESRand_increment(simd128& state){
138 | 	simd128 inc = vld1q_u8(increment); 
139 | 	state = vaddq_u8(state, inc); 
140 | }
141 | 
142 | std::array<simd128, 2> AESRand_rand(const simd128 state){
143 | 	simd128 inc = vld1q_u8(increment);
144 | 	simd128 penultimate_intel = vaesmcq_u8(vaeseq_u8(state, vdupq_n_u8(0)));
145 | 	simd128 penultimate_arm_enc = vaesmcq_u8(vaeseq_u8(penultimate_intel, (inc)));
146 | 	simd128 penultimate_arm_dec = vaesimcq_u8(vaesdq_u8(penultimate_intel, (inc)));
147 | 	return {veorq_u8(penultimate_arm_enc, (inc)), veorq_u8(penultimate_arm_dec, inc)};
148 | }
149 | 
150 | std::array<uint32_t, 8> AESRand_rand_uint32(const simd128 state){
151 | 	auto rands = AESRand_rand(state); 
152 | 
153 | 	std::array<uint32_t, 8> toReturn;
154 | 	vst1q_u8((uint8_t*) &toReturn[0], rands[0]);
155 | 	vst1q_u8((uint8_t*) &toReturn[4], rands[1]);
156 | //	_mm_storeu_si128((__m128i*)&toReturn[0], rands[0]);
157 | //	_mm_storeu_si128((__m128i*)&toReturn[4], rands[1]);
158 | 	return toReturn; 
159 | }
160 | 
161 | /*
162 | static __m128 toFloats(__m128i input){
163 | 	// Isolate the sign and exponent bits
164 | 	__m128i isolate = _mm_andnot_si128(_mm_set1_epi32(0xff800000), input);
165 | 
166 | 	// 0x3f800000 is the magic number representing floating point 1.0
167 | 	__m128i addExponent = _mm_or_si128(_mm_set1_epi32(0x3f800000), isolate);
168 | 
169 | 	// Numbers are now in between [1.0, 2.0)
170 | 	__m128 one = _mm_set1_ps(1.0);
171 | 
172 | 	// Result is now in [0, 1), but we may have lost some bits.
173 | 	// We could return now, but... we can regain 9-lost bits without much effort
174 | 	__m128 fastResult = _mm_sub_ps(_mm_castsi128_ps(addExponent), one);
175 | 
176 | 	return fastResult;
177 | 
178 | #if 0
179 | 	// This code takes the 9-unused bits and uses them in the bottom sometimes. This
180 | 	// may add extra bits of precision in some cases at the cost of possibly returning
181 | 	// a denormal.
182 | 	__m128i unused9bits = _mm_and_si128(_mm_set1_epi32(0xff800000), input);
183 | 	unused9bits = _mm_srli_epi32(unused9bits, 23);
184 | 
185 | 	//Doing an _mm_xor_ps with those 9-bits results in a NAN error. Do xors 
186 | 	// in the integer domain, then convert back.
187 | 
188 | 	return _mm_xor_ps(fastResult, _mm_castsi128_ps(unused9bits));
189 | #endif
190 | }
191 | 
192 | std::array<float, 8> AESRand_rand_float(const simd128 state){
193 | 	auto rands = AESRand_rand(state); 
194 | 	__m128 simd0 = toFloats(rands[0]);
195 | 	__m128 simd1 = toFloats(rands[1]);
196 | 
197 | 	std::array<float, 8> toReturn;
198 | 	_mm_storeu_ps(&toReturn[0], simd0);
199 | 	_mm_storeu_ps(&toReturn[4], simd1);
200 | 	return toReturn; 
201 | }
202 | */
203 | #endif
204 | 


--------------------------------------------------------------------------------
/AESRand_Paper/AESRand.cpp:
--------------------------------------------------------------------------------
  1 | #include "AESRand.h"
  2 | 
  3 | #ifdef __amd64__
  4 | 
  5 | simd128 AESRand_init(){
  6 | 	return _mm_setzero_si128(); 
  7 | }
  8 | 
  9 | static simd128 increment = _mm_set_epi8(0x2f, 0x2b, 0x29, 0x25, 0x1f, 0x1d, 0x17, 0x13, 
 10 | 		0x11, 0x0D, 0x0B, 0x07, 0x05, 0x03, 0x02, 0x01); 
 11 | 
 12 | void AESRand_increment(simd128& state){
 13 | 	state += increment; 
 14 | }
 15 | 
 16 | std::array<simd128, 2> AESRand_rand(const simd128 state){
 17 | 	simd128 penultimate = _mm_aesenc_si128(state, increment); 
 18 | 	return {_mm_aesenc_si128(penultimate, increment), _mm_aesdec_si128(penultimate, increment)};
 19 | }
 20 | 
 21 | static __m128 toFloats(__m128i input){
 22 | 	// Isolate the sign and exponent bits
 23 | 	__m128i isolate = _mm_andnot_si128(_mm_set1_epi32(0xff800000), input);
 24 | 
 25 | 	// 0x3f800000 is the magic number representing floating point 1.0
 26 | 	__m128i addExponent = _mm_or_si128(_mm_set1_epi32(0x3f800000), isolate);
 27 | 
 28 | 	// Numbers are now in between [1.0, 2.0)
 29 | 	__m128 one = _mm_set1_ps(1.0);
 30 | 
 31 | 	// Result is now in [0, 1), but we may have lost some bits.
 32 | 	// We could return now, but... we can regain 9-lost bits without much effort
 33 | 	__m128 fastResult = _mm_sub_ps(_mm_castsi128_ps(addExponent), one);
 34 | 
 35 | 	return fastResult;
 36 | 
 37 | #if 0
 38 | 	// This code takes the 9-unused bits and uses them in the bottom sometimes. This
 39 | 	// may add extra bits of precision in some cases at the cost of possibly returning
 40 | 	// a denormal.
 41 | 	__m128i unused9bits = _mm_and_si128(_mm_set1_epi32(0xff800000), input);
 42 | 	unused9bits = _mm_srli_epi32(unused9bits, 23);
 43 | 
 44 | 	//Doing an _mm_xor_ps with those 9-bits results in a NAN error. Do xors 
 45 | 	// in the integer domain, then convert back.
 46 | 
 47 | 	return _mm_xor_ps(fastResult, _mm_castsi128_ps(unused9bits));
 48 | #endif
 49 | }
 50 | 
 51 | std::array<uint32_t, 8> AESRand_rand_uint32(const simd128 state){
 52 | 	auto rands = AESRand_rand(state); 
 53 | 
 54 | 	std::array<uint32_t, 8> toReturn;
 55 | 	_mm_storeu_si128((__m128i*)&toReturn[0], rands[0]);
 56 | 	_mm_storeu_si128((__m128i*)&toReturn[4], rands[1]);
 57 | 	return toReturn; 
 58 | }
 59 | 
 60 | std::array<float, 8> AESRand_rand_float(const simd128 state){
 61 | 	auto rands = AESRand_rand(state); 
 62 | 	__m128 simd0 = toFloats(rands[0]);
 63 | 	__m128 simd1 = toFloats(rands[1]);
 64 | 
 65 | 	std::array<float, 8> toReturn;
 66 | 	_mm_storeu_ps(&toReturn[0], simd0);
 67 | 	_mm_storeu_ps(&toReturn[4], simd1);
 68 | 	return toReturn; 
 69 | }
 70 | 
 71 | #endif //amd64
 72 | 
 73 | #ifdef _ARCH_PPC64
 74 | 
 75 | // PPC Intrinsics defined in "64-bit ELF V2 ABI Specification", chapter 6 and Appendix A.
 76 | // GCC defines the crypto-extension
 77 | 
 78 | // PowerPC operates on big-endian FIPS 197 compatible AES-vectors
 79 | // Convert to big-endian and back to retain compatibility with AMD64
 80 | static simd128 endianConv(simd128 in){
 81 | 	return vec_perm(in, in, (vector unsigned char){15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0});
 82 | }
 83 | 
 84 | simd128 AESRand_init(){
 85 | 	return (simd128) {0, 0}; 
 86 | }
 87 | 
 88 | static simd128 increment = {0x110d0b0705030201, 0x2f2b29251f1d1713};
 89 | 
 90 | void AESRand_increment(simd128& state){
 91 | 	state += increment; 
 92 | 	//state = vec_add(state, increment); 
 93 | }
 94 | 
 95 | std::array<simd128, 2> AESRand_rand(const simd128 state){
 96 | 	simd128 state_endian = endianConv(state);
 97 | 	simd128 increment_endian = endianConv(increment);
 98 | 	simd128 penultimate = __builtin_crypto_vcipher(state_endian, increment_endian); 
 99 | 	simd128 first_ret = __builtin_crypto_vcipher(penultimate, increment_endian); 
100 | 	simd128 second_ret = __builtin_crypto_vncipher(penultimate, (vector unsigned long long) {0,0}); 
101 | 
102 | 	// Note: this is suboptimal. A "MixColumns" can be applied at compile-time to the
103 | 	// increment_endian value to combine this XOR with the above vncipher command. Depends how much
104 | 	// we care about optimization...
105 | 	second_ret ^= increment_endian;
106 | 	return {endianConv(first_ret), endianConv(second_ret)}; 
107 | }
108 | 
109 | std::array<uint32_t, 8> AESRand_rand_uint32(const simd128 state){
110 | 	auto rands = AESRand_rand(state); 
111 | 
112 | 	std::array<uint32_t, 8> toReturn;
113 | 	toReturn[0] = rands[0][0];
114 | 	toReturn[1] = rands[0][0] >> 32;
115 | 	toReturn[2] = rands[0][1];
116 | 	toReturn[3] = rands[0][1] >> 32;
117 | 	toReturn[4] = rands[1][0];
118 | 	toReturn[5] = rands[1][0] >> 32;
119 | 	toReturn[6] = rands[1][1];
120 | 	toReturn[7] = rands[1][1] >> 32;
121 | //	_mm_storeu_si128((__m128i*)&toReturn[0], rands[0]);
122 | //	_mm_storeu_si128((__m128i*)&toReturn[4], rands[1]);
123 | 	return toReturn; 
124 | }
125 | 
126 | #endif
127 | 
128 | #if __aarch64__
129 | 
130 | simd128 AESRand_init(){
131 | 	simd128 arb;
132 | 	return veorq_u8(arb, arb);
133 | }
134 | 
135 | // Endian is reversed compared to Intel. Completely backwards...
136 | uint8_t increment[16] = {0x01, 0x02, 0x03, 0x05, 0x07, 0x0B, 0x0D, 0x11,
137 | 	0x13, 0x17, 0x1d, 0x1f, 0x25, 0x29, 0x2b, 0x2f};
138 | 
139 | void AESRand_increment(simd128& state){
140 | 	simd128 inc = vld1q_u8(increment); 
141 | 	state = vaddq_u8(state, inc); 
142 | }
143 | 
144 | std::array<simd128, 2> AESRand_rand(const simd128 state){
145 | 	simd128 inc = vld1q_u8(increment);
146 | 	simd128 penultimate_intel = vaesmcq_u8(vaeseq_u8(state, vdupq_n_u8(0)));
147 | 	simd128 penultimate_arm_enc = vaesmcq_u8(vaeseq_u8(penultimate_intel, (inc)));
148 | 	simd128 penultimate_arm_dec = vaesimcq_u8(vaesdq_u8(penultimate_intel, (inc)));
149 | 	return {veorq_u8(penultimate_arm_enc, (inc)), veorq_u8(penultimate_arm_dec, inc)};
150 | }
151 | 
152 | std::array<uint32_t, 8> AESRand_rand_uint32(const simd128 state){
153 | 	auto rands = AESRand_rand(state); 
154 | 
155 | 	std::array<uint32_t, 8> toReturn;
156 | 	vst1q_u8((uint8_t*) &toReturn[0], rands[0]);
157 | 	vst1q_u8((uint8_t*) &toReturn[4], rands[1]);
158 | //	_mm_storeu_si128((__m128i*)&toReturn[0], rands[0]);
159 | //	_mm_storeu_si128((__m128i*)&toReturn[4], rands[1]);
160 | 	return toReturn; 
161 | }
162 | 
163 | /*
164 | static __m128 toFloats(__m128i input){
165 | 	// Isolate the sign and exponent bits
166 | 	__m128i isolate = _mm_andnot_si128(_mm_set1_epi32(0xff800000), input);
167 | 
168 | 	// 0x3f800000 is the magic number representing floating point 1.0
169 | 	__m128i addExponent = _mm_or_si128(_mm_set1_epi32(0x3f800000), isolate);
170 | 
171 | 	// Numbers are now in between [1.0, 2.0)
172 | 	__m128 one = _mm_set1_ps(1.0);
173 | 
174 | 	// Result is now in [0, 1), but we may have lost some bits.
175 | 	// We could return now, but... we can regain 9-lost bits without much effort
176 | 	__m128 fastResult = _mm_sub_ps(_mm_castsi128_ps(addExponent), one);
177 | 
178 | 	return fastResult;
179 | 
180 | #if 0
181 | 	// This code takes the 9-unused bits and uses them in the bottom sometimes. This
182 | 	// may add extra bits of precision in some cases at the cost of possibly returning
183 | 	// a denormal.
184 | 	__m128i unused9bits = _mm_and_si128(_mm_set1_epi32(0xff800000), input);
185 | 	unused9bits = _mm_srli_epi32(unused9bits, 23);
186 | 
187 | 	//Doing an _mm_xor_ps with those 9-bits results in a NAN error. Do xors 
188 | 	// in the integer domain, then convert back.
189 | 
190 | 	return _mm_xor_ps(fastResult, _mm_castsi128_ps(unused9bits));
191 | #endif
192 | }
193 | 
194 | std::array<float, 8> AESRand_rand_float(const simd128 state){
195 | 	auto rands = AESRand_rand(state); 
196 | 	__m128 simd0 = toFloats(rands[0]);
197 | 	__m128 simd1 = toFloats(rands[1]);
198 | 
199 | 	std::array<float, 8> toReturn;
200 | 	_mm_storeu_ps(&toReturn[0], simd0);
201 | 	_mm_storeu_ps(&toReturn[4], simd1);
202 | 	return toReturn; 
203 | }
204 | */
205 | #endif
206 | 


--------------------------------------------------------------------------------
/PractRand.md:
--------------------------------------------------------------------------------
  1 | Initial Results from PractRand
  2 | 
  3 | Preliminary PractRand Results on AESRand_increment
  4 | ------------
  5 | 
  6 | RNG_test using PractRand version 0.94
  7 | RNG = RNG_stdin, seed = unknown
  8 | test set = core, folding = standard(unknown format)
  9 | 
 10 | rng=RNG_stdin, seed=unknown
 11 | length= 256 megabytes (2^28 bytes), time= 2.1 seconds
 12 |   no anomalies in 213 test result(s)
 13 | 
 14 | rng=RNG_stdin, seed=unknown
 15 | length= 512 megabytes (2^29 bytes), time= 4.1 seconds
 16 |   no anomalies in 229 test result(s)
 17 | 
 18 | rng=RNG_stdin, seed=unknown
 19 | length= 1 gigabyte (2^30 bytes), time= 7.5 seconds
 20 |   no anomalies in 248 test result(s)
 21 | 
 22 | rng=RNG_stdin, seed=unknown
 23 | length= 2 gigabytes (2^31 bytes), time= 14.2 seconds
 24 |   no anomalies in 266 test result(s)
 25 | 
 26 | rng=RNG_stdin, seed=unknown
 27 | length= 4 gigabytes (2^32 bytes), time= 28.0 seconds
 28 |   no anomalies in 282 test result(s)
 29 | 
 30 | rng=RNG_stdin, seed=unknown
 31 | length= 8 gigabytes (2^33 bytes), time= 53.5 seconds
 32 |   no anomalies in 299 test result(s)
 33 | 
 34 | rng=RNG_stdin, seed=unknown
 35 | length= 16 gigabytes (2^34 bytes), time= 108 seconds
 36 |   no anomalies in 315 test result(s)
 37 | 
 38 | rng=RNG_stdin, seed=unknown
 39 | length= 32 gigabytes (2^35 bytes), time= 208 seconds
 40 |   no anomalies in 328 test result(s)
 41 | 
 42 | rng=RNG_stdin, seed=unknown
 43 | length= 64 gigabytes (2^36 bytes), time= 437 seconds
 44 |   no anomalies in 344 test result(s)
 45 | 
 46 | rng=RNG_stdin, seed=unknown
 47 | length= 128 gigabytes (2^37 bytes), time= 844 seconds
 48 |   no anomalies in 359 test result(s)
 49 | 
 50 | rng=RNG_stdin, seed=unknown
 51 | length= 256 gigabytes (2^38 bytes), time= 1620 seconds
 52 |   no anomalies in 372 test result(s)
 53 | 
 54 | rng=RNG_stdin, seed=unknown
 55 | length= 512 gigabytes (2^39 bytes), time= 3484 seconds
 56 |   no anomalies in 387 test result(s)
 57 | 
 58 | rng=RNG_stdin, seed=unknown
 59 | length= 1 terabyte (2^40 bytes), time= 7024 seconds
 60 |   no anomalies in 401 test result(s)
 61 | 
 62 | rng=RNG_stdin, seed=unknown
 63 | length= 2 terabytes (2^41 bytes), time= 13255 seconds
 64 |   no anomalies in 413 test result(s)
 65 | 
 66 | rng=RNG_stdin, seed=unknown
 67 | length= 4 terabytes (2^42 bytes), time= 27845 seconds
 68 |   no anomalies in 426 test result(s)
 69 | 
 70 | rng=RNG_stdin, seed=unknown
 71 | length= 8 terabytes (2^43 bytes), time= 56894 seconds
 72 |   no anomalies in 438 test result(s)
 73 | 
 74 | Preliminary PractRand Results on AESRand_parallelStream "Plus Pi"
 75 | -------------------------------------------------------
 76 | 
 77 | This version uses:
 78 | 
 79 |     __m128i AESRand_parallelStream(__m128i originalStream) {
 80 |         __m128i copy = originalStream;
 81 |         copy.m128i_u64[1] += 0x3141592653589793; 
 82 |         return copy;
 83 |     }
 84 | 
 85 | One "unusual" result at 4GB of test, but not unusual enough
 86 | to fail PractRand's default settings. 
 87 | 
 88 | RNG_test using PractRand version 0.94
 89 | RNG = RNG_stdin, seed = unknown
 90 | test set = core, folding = standard(unknown format)
 91 | 
 92 | rng=RNG_stdin, seed=unknown
 93 | length= 256 megabytes (2^28 bytes), time= 2.3 seconds
 94 |   no anomalies in 213 test result(s)
 95 | 
 96 | rng=RNG_stdin, seed=unknown
 97 | length= 512 megabytes (2^29 bytes), time= 4.4 seconds
 98 |   no anomalies in 229 test result(s)
 99 | 
100 | rng=RNG_stdin, seed=unknown
101 | length= 1 gigabyte (2^30 bytes), time= 8.0 seconds
102 |   no anomalies in 248 test result(s)
103 | 
104 | rng=RNG_stdin, seed=unknown
105 | length= 2 gigabytes (2^31 bytes), time= 15.0 seconds
106 |   no anomalies in 266 test result(s)
107 | 
108 | rng=RNG_stdin, seed=unknown
109 | length= 4 gigabytes (2^32 bytes), time= 28.3 seconds
110 |   Test Name                         Raw       Processed     Evaluation
111 |   BCFN(2+2,13-0,T)                  R=  +8.2  p =  6.6e-4   unusual
112 |   ...and 281 test result(s) without anomalies
113 | 
114 | rng=RNG_stdin, seed=unknown
115 | length= 8 gigabytes (2^33 bytes), time= 57.6 seconds
116 |   no anomalies in 299 test result(s)
117 | 
118 | rng=RNG_stdin, seed=unknown
119 | length= 16 gigabytes (2^34 bytes), time= 117 seconds
120 |   no anomalies in 315 test result(s)
121 | 
122 | rng=RNG_stdin, seed=unknown
123 | length= 32 gigabytes (2^35 bytes), time= 220 seconds
124 |   no anomalies in 328 test result(s)
125 | 
126 | rng=RNG_stdin, seed=unknown
127 | length= 64 gigabytes (2^36 bytes), time= 461 seconds
128 |   no anomalies in 344 test result(s)
129 | 
130 | rng=RNG_stdin, seed=unknown
131 | length= 128 gigabytes (2^37 bytes), time= 900 seconds
132 |   no anomalies in 359 test result(s)
133 | 
134 | rng=RNG_stdin, seed=unknown
135 | length= 256 gigabytes (2^38 bytes), time= 1733 seconds
136 |   no anomalies in 372 test result(s)
137 | 
138 | rng=RNG_stdin, seed=unknown
139 | length= 512 gigabytes (2^39 bytes), time= 3646 seconds
140 |   no anomalies in 387 test result(s)
141 | 
142 | rng=RNG_stdin, seed=unknown
143 | length= 1 terabyte (2^40 bytes), time= 7479 seconds
144 |   no anomalies in 401 test result(s)
145 | 
146 | rng=RNG_stdin, seed=unknown
147 | length= 2 terabytes (2^41 bytes), time= 14248 seconds
148 |   no anomalies in 413 test result(s)
149 | 
150 | rng=RNG_stdin, seed=unknown
151 | length= 4 terabytes (2^42 bytes), time= 29950 seconds
152 |   no anomalies in 426 test result(s)
153 | 
154 | rng=RNG_stdin, seed=unknown
155 | length= 8 terabytes (2^43 bytes), time= 60801 seconds
156 |   Test Name                         Raw       Processed     Evaluation
157 |   BRank(12):64K(1)                  R= +1078  p~=  1.1e-325   FAIL !!!!!!
158 |   ...and 437 test result(s) without anomalies
159 |  
160 | Preliminary PractRand Results on AESRand_parallelStream Knuth LCGRNG
161 | ----------------
162 | 
163 |  RNG_test using PractRand version 0.94
164 | RNG = RNG_stdin, seed = unknown
165 | test set = core, folding = standard(unknown format)
166 | 
167 | rng=RNG_stdin, seed=unknown
168 | length= 256 megabytes (2^28 bytes), time= 2.1 seconds
169 |   no anomalies in 213 test result(s)
170 | 
171 | rng=RNG_stdin, seed=unknown
172 | length= 512 megabytes (2^29 bytes), time= 3.9 seconds
173 |   no anomalies in 229 test result(s)
174 | 
175 | rng=RNG_stdin, seed=unknown
176 | length= 1 gigabyte (2^30 bytes), time= 7.2 seconds
177 |   no anomalies in 248 test result(s)
178 | 
179 | rng=RNG_stdin, seed=unknown
180 | length= 2 gigabytes (2^31 bytes), time= 13.6 seconds
181 |   no anomalies in 266 test result(s)
182 | 
183 | rng=RNG_stdin, seed=unknown
184 | length= 4 gigabytes (2^32 bytes), time= 25.4 seconds
185 |   no anomalies in 282 test result(s)
186 | 
187 | rng=RNG_stdin, seed=unknown
188 | length= 8 gigabytes (2^33 bytes), time= 52.4 seconds
189 |   no anomalies in 299 test result(s)
190 | 
191 | rng=RNG_stdin, seed=unknown
192 | length= 16 gigabytes (2^34 bytes), time= 109 seconds
193 |   no anomalies in 315 test result(s)
194 | 
195 | rng=RNG_stdin, seed=unknown
196 | length= 32 gigabytes (2^35 bytes), time= 210 seconds
197 |   no anomalies in 328 test result(s)
198 | 
199 | rng=RNG_stdin, seed=unknown
200 | length= 64 gigabytes (2^36 bytes), time= 439 seconds
201 |   no anomalies in 344 test result(s)
202 | 
203 | rng=RNG_stdin, seed=unknown
204 | length= 128 gigabytes (2^37 bytes), time= 861 seconds
205 |   no anomalies in 359 test result(s)
206 | 
207 | rng=RNG_stdin, seed=unknown
208 | length= 256 gigabytes (2^38 bytes), time= 1637 seconds
209 |   no anomalies in 372 test result(s)
210 | 
211 | rng=RNG_stdin, seed=unknown
212 | length= 512 gigabytes (2^39 bytes), time= 3481 seconds
213 |   no anomalies in 387 test result(s)
214 | 
215 | rng=RNG_stdin, seed=unknown
216 | length= 1 terabyte (2^40 bytes), time= 7016 seconds
217 |   no anomalies in 401 test result(s)
218 | 
219 | rng=RNG_stdin, seed=unknown
220 | length= 2 terabytes (2^41 bytes), time= 13248 seconds
221 |   no anomalies in 413 test result(s)
222 | 
223 | rng=RNG_stdin, seed=unknown
224 | length= 4 terabytes (2^42 bytes), time= 27881 seconds
225 |   no anomalies in 426 test result(s)
226 | 
227 | rng=RNG_stdin, seed=unknown
228 | length= 8 terabytes (2^43 bytes), time= 56968 seconds
229 |   Test Name                         Raw       Processed     Evaluation
230 |   BRank(12):64K(1)                  R= +1078  p~=  1.1e-325   FAIL !!!!!!
231 |   ...and 437 test result(s) without anomalies
232 | 


--------------------------------------------------------------------------------
/AESRand/FloatTest/FloatTest.vcxproj:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0" encoding="utf-8"?>
  2 | <Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  3 |   <ItemGroup Label="ProjectConfigurations">
  4 |     <ProjectConfiguration Include="Debug|Win32">
  5 |       <Configuration>Debug</Configuration>
  6 |       <Platform>Win32</Platform>
  7 |     </ProjectConfiguration>
  8 |     <ProjectConfiguration Include="Release|Win32">
  9 |       <Configuration>Release</Configuration>
 10 |       <Platform>Win32</Platform>
 11 |     </ProjectConfiguration>
 12 |     <ProjectConfiguration Include="Debug|x64">
 13 |       <Configuration>Debug</Configuration>
 14 |       <Platform>x64</Platform>
 15 |     </ProjectConfiguration>
 16 |     <ProjectConfiguration Include="Release|x64">
 17 |       <Configuration>Release</Configuration>
 18 |       <Platform>x64</Platform>
 19 |     </ProjectConfiguration>
 20 |   </ItemGroup>
 21 |   <PropertyGroup Label="Globals">
 22 |     <VCProjectVersion>15.0</VCProjectVersion>
 23 |     <ProjectGuid>{F86DAFE3-4A80-4F98-B2BF-63D36B17BA35}</ProjectGuid>
 24 |     <Keyword>Win32Proj</Keyword>
 25 |     <RootNamespace>FloatTest</RootNamespace>
 26 |     <WindowsTargetPlatformVersion>10.0.17134.0</WindowsTargetPlatformVersion>
 27 |   </PropertyGroup>
 28 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 29 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
 30 |     <ConfigurationType>Application</ConfigurationType>
 31 |     <UseDebugLibraries>true</UseDebugLibraries>
 32 |     <PlatformToolset>v141</PlatformToolset>
 33 |     <CharacterSet>Unicode</CharacterSet>
 34 |   </PropertyGroup>
 35 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
 36 |     <ConfigurationType>Application</ConfigurationType>
 37 |     <UseDebugLibraries>false</UseDebugLibraries>
 38 |     <PlatformToolset>v141</PlatformToolset>
 39 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 40 |     <CharacterSet>Unicode</CharacterSet>
 41 |   </PropertyGroup>
 42 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
 43 |     <ConfigurationType>Application</ConfigurationType>
 44 |     <UseDebugLibraries>true</UseDebugLibraries>
 45 |     <PlatformToolset>v141</PlatformToolset>
 46 |     <CharacterSet>Unicode</CharacterSet>
 47 |   </PropertyGroup>
 48 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
 49 |     <ConfigurationType>Application</ConfigurationType>
 50 |     <UseDebugLibraries>false</UseDebugLibraries>
 51 |     <PlatformToolset>v141</PlatformToolset>
 52 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 53 |     <CharacterSet>Unicode</CharacterSet>
 54 |   </PropertyGroup>
 55 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 56 |   <ImportGroup Label="ExtensionSettings">
 57 |   </ImportGroup>
 58 |   <ImportGroup Label="Shared">
 59 |   </ImportGroup>
 60 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 61 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 62 |   </ImportGroup>
 63 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 64 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 65 |   </ImportGroup>
 66 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 67 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 68 |   </ImportGroup>
 69 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 70 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 71 |   </ImportGroup>
 72 |   <PropertyGroup Label="UserMacros" />
 73 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 74 |     <LinkIncremental>false</LinkIncremental>
 75 |   </PropertyGroup>
 76 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 77 |     <LinkIncremental>true</LinkIncremental>
 78 |   </PropertyGroup>
 79 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 80 |     <LinkIncremental>true</LinkIncremental>
 81 |   </PropertyGroup>
 82 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 83 |     <LinkIncremental>false</LinkIncremental>
 84 |   </PropertyGroup>
 85 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 86 |     <ClCompile>
 87 |       <PrecompiledHeader>Use</PrecompiledHeader>
 88 |       <WarningLevel>Level3</WarningLevel>
 89 |       <Optimization>MaxSpeed</Optimization>
 90 |       <FunctionLevelLinking>true</FunctionLevelLinking>
 91 |       <IntrinsicFunctions>true</IntrinsicFunctions>
 92 |       <SDLCheck>true</SDLCheck>
 93 |       <PreprocessorDefinitions>NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 94 |       <ConformanceMode>true</ConformanceMode>
 95 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
 96 |     </ClCompile>
 97 |     <Link>
 98 |       <SubSystem>Console</SubSystem>
 99 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
100 |       <OptimizeReferences>true</OptimizeReferences>
101 |       <GenerateDebugInformation>true</GenerateDebugInformation>
102 |     </Link>
103 |   </ItemDefinitionGroup>
104 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
105 |     <ClCompile>
106 |       <PrecompiledHeader>Use</PrecompiledHeader>
107 |       <WarningLevel>Level3</WarningLevel>
108 |       <Optimization>Disabled</Optimization>
109 |       <SDLCheck>true</SDLCheck>
110 |       <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
111 |       <ConformanceMode>true</ConformanceMode>
112 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
113 |     </ClCompile>
114 |     <Link>
115 |       <SubSystem>Console</SubSystem>
116 |       <GenerateDebugInformation>true</GenerateDebugInformation>
117 |     </Link>
118 |   </ItemDefinitionGroup>
119 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
120 |     <ClCompile>
121 |       <PrecompiledHeader>Use</PrecompiledHeader>
122 |       <WarningLevel>Level3</WarningLevel>
123 |       <Optimization>Disabled</Optimization>
124 |       <SDLCheck>true</SDLCheck>
125 |       <PreprocessorDefinitions>_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
126 |       <ConformanceMode>true</ConformanceMode>
127 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
128 |     </ClCompile>
129 |     <Link>
130 |       <SubSystem>Console</SubSystem>
131 |       <GenerateDebugInformation>true</GenerateDebugInformation>
132 |     </Link>
133 |   </ItemDefinitionGroup>
134 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
135 |     <ClCompile>
136 |       <PrecompiledHeader>Use</PrecompiledHeader>
137 |       <WarningLevel>Level3</WarningLevel>
138 |       <Optimization>MaxSpeed</Optimization>
139 |       <FunctionLevelLinking>true</FunctionLevelLinking>
140 |       <IntrinsicFunctions>true</IntrinsicFunctions>
141 |       <SDLCheck>true</SDLCheck>
142 |       <PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
143 |       <ConformanceMode>true</ConformanceMode>
144 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
145 |     </ClCompile>
146 |     <Link>
147 |       <SubSystem>Console</SubSystem>
148 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
149 |       <OptimizeReferences>true</OptimizeReferences>
150 |       <GenerateDebugInformation>true</GenerateDebugInformation>
151 |     </Link>
152 |   </ItemDefinitionGroup>
153 |   <ItemGroup>
154 |     <ClInclude Include="pch.h" />
155 |   </ItemGroup>
156 |   <ItemGroup>
157 |     <ClCompile Include="FloatTest.cpp" />
158 |     <ClCompile Include="pch.cpp">
159 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Create</PrecompiledHeader>
160 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Create</PrecompiledHeader>
161 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Create</PrecompiledHeader>
162 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Create</PrecompiledHeader>
163 |     </ClCompile>
164 |   </ItemGroup>
165 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
166 |   <ImportGroup Label="ExtensionTargets">
167 |   </ImportGroup>
168 | </Project>


--------------------------------------------------------------------------------
/AESRand/IntegerRangeTest/IntegerRangeTest.vcxproj:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0" encoding="utf-8"?>
  2 | <Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  3 |   <ItemGroup Label="ProjectConfigurations">
  4 |     <ProjectConfiguration Include="Debug|Win32">
  5 |       <Configuration>Debug</Configuration>
  6 |       <Platform>Win32</Platform>
  7 |     </ProjectConfiguration>
  8 |     <ProjectConfiguration Include="Release|Win32">
  9 |       <Configuration>Release</Configuration>
 10 |       <Platform>Win32</Platform>
 11 |     </ProjectConfiguration>
 12 |     <ProjectConfiguration Include="Debug|x64">
 13 |       <Configuration>Debug</Configuration>
 14 |       <Platform>x64</Platform>
 15 |     </ProjectConfiguration>
 16 |     <ProjectConfiguration Include="Release|x64">
 17 |       <Configuration>Release</Configuration>
 18 |       <Platform>x64</Platform>
 19 |     </ProjectConfiguration>
 20 |   </ItemGroup>
 21 |   <PropertyGroup Label="Globals">
 22 |     <VCProjectVersion>15.0</VCProjectVersion>
 23 |     <ProjectGuid>{14562E75-9BAB-4663-BFE3-51C96298EC81}</ProjectGuid>
 24 |     <Keyword>Win32Proj</Keyword>
 25 |     <RootNamespace>IntegerRangeTest</RootNamespace>
 26 |     <WindowsTargetPlatformVersion>10.0.17134.0</WindowsTargetPlatformVersion>
 27 |   </PropertyGroup>
 28 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 29 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
 30 |     <ConfigurationType>Application</ConfigurationType>
 31 |     <UseDebugLibraries>true</UseDebugLibraries>
 32 |     <PlatformToolset>v141</PlatformToolset>
 33 |     <CharacterSet>Unicode</CharacterSet>
 34 |   </PropertyGroup>
 35 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
 36 |     <ConfigurationType>Application</ConfigurationType>
 37 |     <UseDebugLibraries>false</UseDebugLibraries>
 38 |     <PlatformToolset>v141</PlatformToolset>
 39 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 40 |     <CharacterSet>Unicode</CharacterSet>
 41 |   </PropertyGroup>
 42 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
 43 |     <ConfigurationType>Application</ConfigurationType>
 44 |     <UseDebugLibraries>true</UseDebugLibraries>
 45 |     <PlatformToolset>v141</PlatformToolset>
 46 |     <CharacterSet>Unicode</CharacterSet>
 47 |   </PropertyGroup>
 48 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
 49 |     <ConfigurationType>Application</ConfigurationType>
 50 |     <UseDebugLibraries>false</UseDebugLibraries>
 51 |     <PlatformToolset>v141</PlatformToolset>
 52 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 53 |     <CharacterSet>Unicode</CharacterSet>
 54 |   </PropertyGroup>
 55 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 56 |   <ImportGroup Label="ExtensionSettings">
 57 |   </ImportGroup>
 58 |   <ImportGroup Label="Shared">
 59 |   </ImportGroup>
 60 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 61 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 62 |   </ImportGroup>
 63 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 64 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 65 |   </ImportGroup>
 66 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 67 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 68 |   </ImportGroup>
 69 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 70 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 71 |   </ImportGroup>
 72 |   <PropertyGroup Label="UserMacros" />
 73 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 74 |     <LinkIncremental>false</LinkIncremental>
 75 |   </PropertyGroup>
 76 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 77 |     <LinkIncremental>true</LinkIncremental>
 78 |   </PropertyGroup>
 79 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 80 |     <LinkIncremental>true</LinkIncremental>
 81 |   </PropertyGroup>
 82 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 83 |     <LinkIncremental>false</LinkIncremental>
 84 |   </PropertyGroup>
 85 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 86 |     <ClCompile>
 87 |       <PrecompiledHeader>Use</PrecompiledHeader>
 88 |       <WarningLevel>Level3</WarningLevel>
 89 |       <Optimization>MaxSpeed</Optimization>
 90 |       <FunctionLevelLinking>true</FunctionLevelLinking>
 91 |       <IntrinsicFunctions>true</IntrinsicFunctions>
 92 |       <SDLCheck>true</SDLCheck>
 93 |       <PreprocessorDefinitions>NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 94 |       <ConformanceMode>true</ConformanceMode>
 95 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
 96 |       <LanguageStandard>stdcpp17</LanguageStandard>
 97 |     </ClCompile>
 98 |     <Link>
 99 |       <SubSystem>Console</SubSystem>
100 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
101 |       <OptimizeReferences>true</OptimizeReferences>
102 |       <GenerateDebugInformation>true</GenerateDebugInformation>
103 |     </Link>
104 |   </ItemDefinitionGroup>
105 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
106 |     <ClCompile>
107 |       <PrecompiledHeader>Use</PrecompiledHeader>
108 |       <WarningLevel>Level3</WarningLevel>
109 |       <Optimization>Disabled</Optimization>
110 |       <SDLCheck>true</SDLCheck>
111 |       <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
112 |       <ConformanceMode>true</ConformanceMode>
113 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
114 |     </ClCompile>
115 |     <Link>
116 |       <SubSystem>Console</SubSystem>
117 |       <GenerateDebugInformation>true</GenerateDebugInformation>
118 |     </Link>
119 |   </ItemDefinitionGroup>
120 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
121 |     <ClCompile>
122 |       <PrecompiledHeader>Use</PrecompiledHeader>
123 |       <WarningLevel>Level3</WarningLevel>
124 |       <Optimization>Disabled</Optimization>
125 |       <SDLCheck>true</SDLCheck>
126 |       <PreprocessorDefinitions>_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
127 |       <ConformanceMode>true</ConformanceMode>
128 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
129 |       <LanguageStandard>stdcpp17</LanguageStandard>
130 |     </ClCompile>
131 |     <Link>
132 |       <SubSystem>Console</SubSystem>
133 |       <GenerateDebugInformation>true</GenerateDebugInformation>
134 |     </Link>
135 |   </ItemDefinitionGroup>
136 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
137 |     <ClCompile>
138 |       <PrecompiledHeader>Use</PrecompiledHeader>
139 |       <WarningLevel>Level3</WarningLevel>
140 |       <Optimization>MaxSpeed</Optimization>
141 |       <FunctionLevelLinking>true</FunctionLevelLinking>
142 |       <IntrinsicFunctions>true</IntrinsicFunctions>
143 |       <SDLCheck>true</SDLCheck>
144 |       <PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
145 |       <ConformanceMode>true</ConformanceMode>
146 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
147 |     </ClCompile>
148 |     <Link>
149 |       <SubSystem>Console</SubSystem>
150 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
151 |       <OptimizeReferences>true</OptimizeReferences>
152 |       <GenerateDebugInformation>true</GenerateDebugInformation>
153 |     </Link>
154 |   </ItemDefinitionGroup>
155 |   <ItemGroup>
156 |     <ClInclude Include="pch.h" />
157 |   </ItemGroup>
158 |   <ItemGroup>
159 |     <ClCompile Include="IntegerRangeTest.cpp" />
160 |     <ClCompile Include="pch.cpp">
161 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Create</PrecompiledHeader>
162 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Create</PrecompiledHeader>
163 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Create</PrecompiledHeader>
164 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Create</PrecompiledHeader>
165 |     </ClCompile>
166 |   </ItemGroup>
167 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
168 |   <ImportGroup Label="ExtensionTargets">
169 |   </ImportGroup>
170 | </Project>


--------------------------------------------------------------------------------
/AESRand/AESRand/AESRand.vcxproj:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0" encoding="utf-8"?>
  2 | <Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  3 |   <ItemGroup Label="ProjectConfigurations">
  4 |     <ProjectConfiguration Include="Debug|Win32">
  5 |       <Configuration>Debug</Configuration>
  6 |       <Platform>Win32</Platform>
  7 |     </ProjectConfiguration>
  8 |     <ProjectConfiguration Include="Release|Win32">
  9 |       <Configuration>Release</Configuration>
 10 |       <Platform>Win32</Platform>
 11 |     </ProjectConfiguration>
 12 |     <ProjectConfiguration Include="Debug|x64">
 13 |       <Configuration>Debug</Configuration>
 14 |       <Platform>x64</Platform>
 15 |     </ProjectConfiguration>
 16 |     <ProjectConfiguration Include="Release|x64">
 17 |       <Configuration>Release</Configuration>
 18 |       <Platform>x64</Platform>
 19 |     </ProjectConfiguration>
 20 |   </ItemGroup>
 21 |   <PropertyGroup Label="Globals">
 22 |     <VCProjectVersion>15.0</VCProjectVersion>
 23 |     <ProjectGuid>{F91B1300-34D7-459B-B40C-3479AF111436}</ProjectGuid>
 24 |     <Keyword>Win32Proj</Keyword>
 25 |     <RootNamespace>AESRand</RootNamespace>
 26 |     <WindowsTargetPlatformVersion>10.0.17134.0</WindowsTargetPlatformVersion>
 27 |   </PropertyGroup>
 28 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 29 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
 30 |     <ConfigurationType>Application</ConfigurationType>
 31 |     <UseDebugLibraries>true</UseDebugLibraries>
 32 |     <PlatformToolset>v141</PlatformToolset>
 33 |     <CharacterSet>Unicode</CharacterSet>
 34 |   </PropertyGroup>
 35 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
 36 |     <ConfigurationType>Application</ConfigurationType>
 37 |     <UseDebugLibraries>false</UseDebugLibraries>
 38 |     <PlatformToolset>v141</PlatformToolset>
 39 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 40 |     <CharacterSet>Unicode</CharacterSet>
 41 |   </PropertyGroup>
 42 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
 43 |     <ConfigurationType>Application</ConfigurationType>
 44 |     <UseDebugLibraries>true</UseDebugLibraries>
 45 |     <PlatformToolset>v141</PlatformToolset>
 46 |     <CharacterSet>Unicode</CharacterSet>
 47 |   </PropertyGroup>
 48 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
 49 |     <ConfigurationType>Application</ConfigurationType>
 50 |     <UseDebugLibraries>false</UseDebugLibraries>
 51 |     <PlatformToolset>v141</PlatformToolset>
 52 |     <WholeProgramOptimization>true</WholeProgramOptimization>
 53 |     <CharacterSet>Unicode</CharacterSet>
 54 |   </PropertyGroup>
 55 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 56 |   <ImportGroup Label="ExtensionSettings">
 57 |   </ImportGroup>
 58 |   <ImportGroup Label="Shared">
 59 |   </ImportGroup>
 60 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 61 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 62 |   </ImportGroup>
 63 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 64 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 65 |   </ImportGroup>
 66 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 67 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 68 |   </ImportGroup>
 69 |   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 70 |     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
 71 |   </ImportGroup>
 72 |   <PropertyGroup Label="UserMacros" />
 73 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 74 |     <LinkIncremental>true</LinkIncremental>
 75 |   </PropertyGroup>
 76 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
 77 |     <LinkIncremental>true</LinkIncremental>
 78 |   </PropertyGroup>
 79 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
 80 |     <LinkIncremental>false</LinkIncremental>
 81 |   </PropertyGroup>
 82 |   <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
 83 |     <LinkIncremental>false</LinkIncremental>
 84 |   </PropertyGroup>
 85 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
 86 |     <ClCompile>
 87 |       <PrecompiledHeader>Use</PrecompiledHeader>
 88 |       <WarningLevel>Level3</WarningLevel>
 89 |       <Optimization>Disabled</Optimization>
 90 |       <SDLCheck>true</SDLCheck>
 91 |       <PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
 92 |       <ConformanceMode>true</ConformanceMode>
 93 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
 94 |     </ClCompile>
 95 |     <Link>
 96 |       <SubSystem>Console</SubSystem>
 97 |       <GenerateDebugInformation>true</GenerateDebugInformation>
 98 |     </Link>
 99 |   </ItemDefinitionGroup>
100 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
101 |     <ClCompile>
102 |       <PrecompiledHeader>Use</PrecompiledHeader>
103 |       <WarningLevel>Level3</WarningLevel>
104 |       <Optimization>Disabled</Optimization>
105 |       <SDLCheck>true</SDLCheck>
106 |       <PreprocessorDefinitions>_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
107 |       <ConformanceMode>true</ConformanceMode>
108 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
109 |       <EnableEnhancedInstructionSet>AdvancedVectorExtensions2</EnableEnhancedInstructionSet>
110 |       <AssemblerOutput>All</AssemblerOutput>
111 |     </ClCompile>
112 |     <Link>
113 |       <SubSystem>Console</SubSystem>
114 |       <GenerateDebugInformation>true</GenerateDebugInformation>
115 |     </Link>
116 |   </ItemDefinitionGroup>
117 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
118 |     <ClCompile>
119 |       <PrecompiledHeader>Use</PrecompiledHeader>
120 |       <WarningLevel>Level3</WarningLevel>
121 |       <Optimization>MaxSpeed</Optimization>
122 |       <FunctionLevelLinking>true</FunctionLevelLinking>
123 |       <IntrinsicFunctions>true</IntrinsicFunctions>
124 |       <SDLCheck>true</SDLCheck>
125 |       <PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
126 |       <ConformanceMode>true</ConformanceMode>
127 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
128 |     </ClCompile>
129 |     <Link>
130 |       <SubSystem>Console</SubSystem>
131 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
132 |       <OptimizeReferences>true</OptimizeReferences>
133 |       <GenerateDebugInformation>true</GenerateDebugInformation>
134 |     </Link>
135 |   </ItemDefinitionGroup>
136 |   <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
137 |     <ClCompile>
138 |       <PrecompiledHeader>Use</PrecompiledHeader>
139 |       <WarningLevel>Level3</WarningLevel>
140 |       <Optimization>MaxSpeed</Optimization>
141 |       <FunctionLevelLinking>true</FunctionLevelLinking>
142 |       <IntrinsicFunctions>true</IntrinsicFunctions>
143 |       <SDLCheck>true</SDLCheck>
144 |       <PreprocessorDefinitions>NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
145 |       <ConformanceMode>true</ConformanceMode>
146 |       <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
147 |       <EnableEnhancedInstructionSet>AdvancedVectorExtensions2</EnableEnhancedInstructionSet>
148 |       <AssemblerOutput>All</AssemblerOutput>
149 |     </ClCompile>
150 |     <Link>
151 |       <SubSystem>Console</SubSystem>
152 |       <EnableCOMDATFolding>true</EnableCOMDATFolding>
153 |       <OptimizeReferences>true</OptimizeReferences>
154 |       <GenerateDebugInformation>true</GenerateDebugInformation>
155 |     </Link>
156 |   </ItemDefinitionGroup>
157 |   <ItemGroup>
158 |     <ClInclude Include="pch.h" />
159 |   </ItemGroup>
160 |   <ItemGroup>
161 |     <ClCompile Include="AESRand.cpp" />
162 |     <ClCompile Include="others.cpp" />
163 |     <ClCompile Include="pch.cpp">
164 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Create</PrecompiledHeader>
165 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Create</PrecompiledHeader>
166 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Create</PrecompiledHeader>
167 |       <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Create</PrecompiledHeader>
168 |     </ClCompile>
169 |   </ItemGroup>
170 |   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
171 |   <ImportGroup Label="ExtensionTargets">
172 |   </ImportGroup>
173 | </Project>


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # AESRand
  2 | A Prototype implementation of Pseudo-RNG based on hardware-accelerated AES instructions and 128-bit SIMD
  3 | 
  4 | TL;DR
  5 | --------
  6 | * State: 128 bits (One XMM register)
  7 | * 256-bits / 32-bytes generated per iteration
  8 | * Incredible speed: roughly 3.7 CPU cycles per iteration
  9 | * Cycle Length: 2^64
 10 | * Independent Streams: 2^64
 11 | * Core RNG passes 8TB+ tests. Parallel-stream generator fails at 8TB of PractRand (passes 4TB)
 12 | * Tested: ~29.2 GBps (Gigabytes per second) single-thread / single-core. 
 13 | * Dual-stream version achieves 37.1 GBps
 14 | * A throughput of ~8.5 Bytes per cycle. Or roughly 3.73 cycles per 256-bit iteration.
 15 | * Faster than xoshiro256plus, pcg32, and std::mt19937
 16 | 
 17 | The shortest sample code is in the [Simplified Linux Version](AESRand_Linux/AESRand.cpp).
 18 | Commentary is provided in this README, as well as through many comments in the Windows version.
 19 | The Windows version contains a self-included benchmark to compare against the speed of 
 20 | xoshiro256, pcg32, and std::mt19937.
 21 | 
 22 | Design Principles
 23 | -------
 24 | 
 25 | 1. AESRound-based. Every x86 CPUs since Intel Westmere (2010) and AMD Bulldozer (2011) can execute
 26 | not only a singular "aesenc" instruction (AES Encode Round)... but they can also
 27 | execute them incredibly quickly: at least one per cycle. AMD Ryzen / EPYC CPUs 
 28 | can even execute them TWICE per cycle if they are independent. With a latency
 29 | of roughly 4-cycles on modern Intel Skylake CPUs, the 128-bit AES-encode 
 30 | instruction is faster than a 64-bit multiply.
 31 | 
 32 | 	Note: AESRound is implemented on all major CPUs of the modern (Nov 2018) era.
 33 | 	Power9 has the vcipher instruction, which seems to be identical to the x86 aesenc 
 34 | 	instruction. ARM unfortunately plays a bit differently, but a sequence of AESE, 
 35 | 	AESMC, and XOR would replecate the x86 "aesenc" instruction.
 36 | 
 37 | 	AESD, AESIMC, and XOR would together be equivalent to an x86 "aesdec" instruction.
 38 | 	It is time to take advantage of the universal AES-hardware instructions
 39 | 	embedded in all of our CPUs, even our Cell Phones can do this in 2018.
 40 | 
 41 | 2. SIMD-acceleration -- Modern computers are 128-bit, 256-bit, or even 512-bit machines.
 42 | Because AES is only defined for 128-bits, I stick with 128-bit. Power9 and ARM machines
 43 | also support 128-bit SIMD easily. Future CPUs will probably be more SIMD-heavy. If anyone
 44 | can think of how to extend this concept out to 256-bit (YMM) registers and beyond, they
 45 | probably can beat the results I have here!
 46 | 
 47 | 3. PCG-random.org "Simple counter" + "Mixer" design -- PCG-random.org has a two-step
 48 | RNG process. The "counter" (which was a multiply-based LCGRNG in the pcg32_random_r code), and
 49 | a "mixer" (which was a simple shift add xor hash-function). AESRand_increment serves as
 50 | the "counter", while AESRand_rand serves as the "mixer".
 51 | 
 52 | 4. Minimum latency on the "Counter" -- The latency of the counter-portion of this RNG
 53 | (AESRand_increment) is the absolute limit to the speed of any RNG. If it takes 5-cycles to
 54 | update the state, your RNG will take 5-cycles (or more) per iteration. I've minimized
 55 | the latency of AESRand_increment to 1-cycle, the absolute minimum latency.
 56 | 
 57 | 5. Instruction level parallelism (ILP) -- All instructions of the "mixer" portion of the RNG
 58 | (AESRand_rand) have a throughput of 1-per-cycle or more. AMD Zen can execute two AES
 59 | instructions per clock (and thus has a throughput of 2-per-cycle!!). Notice the 
 60 | signature of AESRand_rand(const \__m128i state). The state MUST be a constant to take
 61 | advantage of ILP. Aside from the counter-latency, each iteration i can execute in parallel
 62 | with future iterations i+1, i+2, i+3, etc. etc. Modern CPUs are incredibly good at capturing 
 63 | this parallelism and internally pipelining the AES-instructions of the mixer. ILP allows you
 64 | to beat the latency-characteristics of your instructions. For example, every iteration
 65 | has a latency of 4 cycles per AESENC, or 8-cycles of latency total. However, I've tested
 66 | 3.7 cycles per iteration. The magic of ILP makes this possible. 
 67 | 
 68 | 6. Full invertibility -- http://www.burtleburtle.net/bob/hash/doobs.html The JOAAT hash has a concept
 69 | of a "bit funnel", which is a BAD thing for hashes. If you provably have full-invertibility, it means you
 70 | never lose information. Its kind of a hard concept to describe, but it is fundamental to the design
 71 | of RNGs, Cryptography, and so forth. The entirety of GF(2) fields are all based around
 72 | the concept of invertible operations. The XOR, Add, and AES-encode instructions all have inverts
 73 | (XOR, Subtract, and AES-decode respectively), and therefore have the greatest chance of passing
 74 | statistical tests... as long as the bits are "shuffled" enough.
 75 | 
 76 | Benchmark Results
 77 | --------
 78 | 
 79 | [Click here](BenchmarkResults.md) for the latest benchmark results.
 80 | 
 81 | This is a very simple timer-based benchmark, where I simply run the various RNGs to be tested
 82 | (AESRand, mt19937, pcg32, and xoshiro256plus) in a tight loop of 5-billion iterations. To ensure that
 83 | the optimizer does NOT remove the RNG code, I have a "total" value that adds up every output
 84 | of the RNG, and eventually prints it out to the screen.
 85 | 
 86 | Before and after the 5-billion long loop, I run Window's 'QueryPerformanceCounter" to log the time.
 87 | 
 88 | I checked the generated assembly (After building in VS2017, check the "AESRand.cod" file).
 89 | The "mt19937" code was NOT inlined. Which may be a disadvantage, and why its so much slower than
 90 | the other RNGs.
 91 | 
 92 | PCG32 and xoshiro256plus were both inlined well. I wasn't sure how well they'd adapt to ILP, so I
 93 | created a 4x manually unrolled version for the both of them. The unrolled versions don't seem to be
 94 | faster or slower. I admit that I haven't used those RNGs before, so I'm not entirely sure if I've
 95 | set up their ideal conditions.
 96 | 
 97 | 
 98 | Weaknesses and Future Work
 99 | ----------------
100 | AESRand is surprisngly BAD at 1-bit changes. If I changed the increment to a single-bit change
101 | like [0x1, 0, 0, 0, ...], it would take 4, maybe 5 aesenc instructions before the code could get
102 | above 8GB of tests in PractRand.
103 | 
104 | I experimented with various other reversible functions documented on Lemire's blog
105 | https://lemire.me/blog/2016/08/09/how-many-reversible-integer-operations-do-you-know/. XOR, Adds,
106 | bitshifts, multiplies-with-odd numbers, and more are all interesting, but the AES-instructions
107 | seemed to mix bits better than any of the primitive instructions.
108 | 
109 | The one instruction that holds a lot of promise is PCLMULQDQ (Carry-less Multiply). This is a
110 | 64-bit x 64-bit polynomial multiply on 128-bit XMM registers. Roughly 3 or 4 PCLMULQDQ, along
111 | with some bitshifts and XORs, could implement the 128-bit carryless multiply used in GCM 
112 | (galois counter mode). And this seems to be a very good way to "disperse bits" and create
113 | an avalanche-effect.
114 | 
115 | Furthermore, 64-bit carryless multiply is implemented on x86 (PCLMULQDQ), ARMv8 (PMULL and PMULL2
116 | on ARM64, VMULL on ARM32), and Power9 (vpmsumh: Vector Polynomial Multiply-Sum). These instructions serve
117 | as the basis for GCM-mode, Eliptical Curve Cryptography, and other important developments in the modern
118 | cipher world. I expect all future CPUs to have carryless-multiply implemented due to their importance
119 | to the cryptography community.
120 | 
121 | However, my Threadripper 1950x appears to run the PCLMULQDQ instruction as microcode, and thus it only has
122 | a throughput of one-PCLMULQDQ every TWO instructions (4x less throughput than AESenc). In effect, running
123 | aesenc 4x in a row has more throughput, on my machine at least. Intel machines are documented to run 
124 | PCLMULQDQ per cycle, and thus PCLMULQDQ may be a faster base to use on Intel machines. Further investigation 
125 | into the relative speeds of these cryptography instructions, across the different modern CPUs could be important.
126 | 
127 | aesenc has 4 steps: SubBytes, ShiftRows, MixColumns, and XOR Round Key. SubBytes is absolutely excellent for
128 | RNG work. ShiftRows is useful, but only with multiple AES-instructions in a row. MixColumns is unfortunately 
129 | only a 32-bit operation, albeit parallel across 4-different 32-bit values. Still, a single aesenc or aesdec 
130 | disperses bits across 32-bits of the state. After two rounds of AES, any particular input bit only 
131 | affects half of the bits: 64-bits per 128-bit XMM register (or a total of ~128-bits of the 256-bit output)
132 | 
133 | So 2-rounds of AES is NOT sufficient to have a proper avalanche (defined as a 50% chance to flip any bit of 
134 | the output). I get around the severe 1-bit weakness by ensuring that all 128-bits of state changes on every
135 | iteration.
136 | 
137 | The "Parallel Stream" generator only changes the top 64-bits of the input. This is the "weak direction" of the
138 | random number generator, which fails after 8TB of testing in PractRand. Nonetheless, the ability to support
139 | parallel streams is important in today's world of highly-parallelized simulations. Passing 4TB of PractRand
140 | means that 34-Billion parallel streams were created, and PractRand was unable to detect
141 | any statistical correlation between their start points. So at least 2^35 high-quality parallel streams are 
142 | available to use.
143 | 
144 | Thanks and Notes
145 | ------------
146 | The core algorithm is based on pcg32, documented here: http://www.pcg-random.org/. The idea to 
147 | split "counter" with "mixer" is an incredibly effective design on modern machines with large amounts of
148 | instruction-level parallelism.
149 | 
150 | The theory of hashing by Bob Jenkins is what most made me "get" cipher design. Bob Jenkin's
151 | page is absolutely excellent, and his "theory of funnels" put me on the right track. 
152 | http://www.burtleburtle.net/bob/hash/doobs.html
153 | 
154 | Daniel Lemire's blog is filled to the brim with SIMD tips and tricks. His article here also documents
155 | MANY reversible functions. While none of these reversible operations ended up in this implementation,
156 | the page served as a valuable reference in my experiments. 
157 | https://lemire.me/blog/2016/08/09/how-many-reversible-integer-operations-do-you-know/
158 | 
159 | Donald Knuth's "The Art of Computer Programming", volume 2, serves as a great introduction to the
160 | overall theory of RNGs.
161 | 
162 | PractRand: http://pracrand.sourceforge.net/ for making an incredibly awesome RNG-testing utility that 
163 | actually works on Windows (and works easily!).
164 | 
165 | Agner Fog's instruction tables: I was constantly referencing Agner Fog's latency and throughput tables 
166 | throughout the coding of this RNG: https://www.agner.org/optimize/
167 | 
168 | PractRand Results
169 | ------------
170 | 
171 | [Click here](PractRand.md) for PractRand results.
172 | 
173 | 
174 | BigCrush Results
175 | ------------
176 | 
177 | AESRand_Linux contains two BigCrush tests, which require TestU01 in order to be run. The "primary" AESRand generator passes BigCrush through multiple means: reversed bits, forward bits and so forth. TestU01 is limited to 32-bit tests, so it is a bit odd to try to adapt a 256-bit generator like AESRand to TestU01's interface.
178 | 


--------------------------------------------------------------------------------