├── .gitignore ├── README.md ├── nanbox.h ├── nanbox_shortstring.h ├── shortstring_demo.c └── test.c /.gitignore: -------------------------------------------------------------------------------- 1 | a.out 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | nanbox 2 | ====== 3 | 4 | A type that can store various types of data in 64-bits using NaN-boxing. 5 | 6 | The header file `nanbox.h` defines a type `nanbox_t` which can be used to store either a double, a 32-bit integer, a pointer, a boolean, null or one of a few additional values named 'undefined', 'empty' and 'deleted' plus five additional 'auxillary' types of data of up to 48 bits. The encoding scheme differs between 32-bit and 64-bit platforms but the size of `nanbox_t` is always 64 bits. 7 | 8 | How does it work? 9 | ----------------- 10 | 11 | NaN-boxing is a way to store various information in unused NaN-space in the IEEE754 representation. 12 | 13 | Any value with the top 13 bits set represents a *quiet NaN*. The remaining bits are called the 'payload'. NaNs produced by hardware and C-library functions typically produce a payload of zero. We assume that all quiet NaNs with a non-zero payload can be used to encode whatever we want. 14 | 15 | On 64-bit platforms, unused bits in pointers are also used to encode various information. The representation is inspired by that used by Webkit's JavaScriptCore. It *should work* on most 32-bit and 64-bit little endian and big endian machines. (See Testing below.) 16 | 17 | Functions 18 | --------- 19 | 20 | A number of very short functions functions, all declared `static inline`, are defined to encode values as `nanbox_t`: 21 | 22 | ```c 23 | nanbox_t nanbox_from_double(double d); 24 | nanbox_t nanbox_from_int(int32_t i); 25 | nanbox_t nanbox_from_pointer(void* pointer); 26 | nanbox_t nanbox_from_boolean(bool b); 27 | nanbox_t nanbox_null(void); 28 | nanbox_t nanbox_undefined(void); 29 | nanbox_t nanbox_empty(void); 30 | nanbox_t nanbox_deleted(void); 31 | nanbox_t nanbox_true(void); /* the same as nanbox_from_boolean(true) */ 32 | nanbox_t nanbox_false(void); /* the same as nanbox_from_boolean(false) */ 33 | ``` 34 | 35 | ... to check the type: 36 | 37 | ```c 38 | bool nanbox_is_double(nanbox_t value); 39 | bool nanbox_is_int(nanbox_t value); 40 | bool nanbox_is_pointer(nanbox_t value); 41 | bool nanbox_is_boolean(nanbox_t value); 42 | bool nanbox_is_null(nanbox_t value); 43 | bool nanbox_is_undefined(nanbox_t value); 44 | bool nanbox_is_empty(nanbox_t value); 45 | bool nanbox_is_deleted(nanbox_t value); 46 | bool nanbox_is_true(nanbox_t value); 47 | bool nanbox_is_false(nanbox_t value); 48 | bool nanbox_is_number(nanbox_t value); /* either int or double */ 49 | bool nanbox_is_undefined_or_null(nanbox_t value); /* either */ 50 | bool nanbox_is_aux(nanbox_t value); /* auxillary space */ 51 | ``` 52 | 53 | ... and to decode the value: 54 | 55 | ```c 56 | double nanbox_to_double(nanbox_t value); 57 | int32_t nanbox_to_int(nanbox_t value); 58 | void* nanbox_to_pointer(nanbox_t value); 59 | bool nanbox_to_boolean(nanbox_t value); 60 | double nanbox_to_number(nanbox_t value); /* value can be int or double */ 61 | ``` 62 | 63 | Before fetching the value using these functions, you should make sure the nanbox is holdig a value of the correct type, e.g. using the corresponding `nanbox_is_...` function. If the encoded value is not of the correct type, the results of the `nanbox_to_...` functions are undefined. If compiled with assertions, you will get a failed assertion when trying to fetch a value of the wrong type. 64 | 65 | The 'empty' value 66 | ----------------- 67 | 68 | The 'empty' value is designed to used to represent empty slots in e.g. a hashtable. It is guarranteed to consist of a single repeated byte. This is to make sure `memset` can be used to set all the elements in an array of nanboxes to 'empty'. The macro `NANBOX_EMPTY_BYTE` represents the byte that, when repeated 8 times (64 bits), makes up an 'empty' value. 69 | 70 | ```c 71 | void foo(void) { 72 | nanbox_t boxes[100]; 73 | // Initialize the boxes to empty values 74 | memset(boxes, NANBOX_EMPTY_BYTE, sizeof(nanbox_t) * 100); 75 | // ... 76 | } 77 | ``` 78 | 79 | User-defined prefix instead of 'nanbox' 80 | --------------------------------------- 81 | 82 | You can define `NANBOX_PREFIX` to the prefix you want, before including 83 | `nanbox.h`. Then, the functions and types will be e.g. 84 | `bool myprefix_is_double(myprefix_t value)`, etc. By undefining `NANBOX_H` and 85 | redefining `NANBOX_PREFIX` (and possibly some of the other macros such as 86 | `NANBOX_POINTER_TYPE`) you can include `nanbox.h` multiple times to create 87 | multiple instances of nanbox type. 88 | 89 | User-defined pointer type 90 | ------------------------- 91 | 92 | When encoding and decoding pointers to/from a nanbox, the pointer type `void*` is used by default. This can be changed by defining `NANBOX_POINTER_TYPE` to the pointer type of choice, before including `nanbox.h`. The type must be a pointer type, because unused bits in the pointers are used to encode various data. 93 | 94 | Auxillary data 95 | -------------- 96 | 97 | Apart from doubles, pointers, ints, booleans, null, etc. there are still some bits left to store even more types of data in a nanbox. We call this 'auxillary space'. To check if the type of data in a nanbox is 'auxillary data', the function `nanbox_is_aux` can be used, but accessing the data itself requires some insight into the internal representation of the nanbox. `nanbox_h` is a union type, which means it can be accessed in multiple ways. The easiest way is to access the nanbox raw data is as a 64-bit integer using `nanbox.as_int64`. You can only store 64-bit integer value in the range `NANBOX_MIN_AUX`..`NANBOX_MAX_AUX`, which is the 'auxillary space'. You can store 5 * 248 distinct values in this range, or equivallently, 5 types of 48-bit values. 98 | 99 | Another way to access the data is to use `tag` and `payload`. These each represent 32 bits of the nanbox data. If the a nanbox has its tag (`nanbox.as_bits.tag`) in the range `NANBOX_MIN_AUX_TAG`..`NANBOX_MAX_AUX_TAG` and a payload `nanbox.as_bits.payload` being any 32-bit integer value, then the nanbox data is in auxillary space. 100 | 101 | Short strings 102 | ------------- 103 | 104 | As an example of what auxillary data can be used for, the file `nanbox_shortstring.h` is included, which implements a scheme to store strings of up to 6 bytes in the auxillary space of a nanbox. The functions `nanbox_is_shortstring`, `nanbox_shortstring_create`, etc. are defined and a small demo program is included in `shortstring_demo.c`. 105 | 106 | Testing 107 | ------- 108 | 109 | Tested on 110 | * x86-64 (Intel Core 2 Duo), Mac OS X (Darwin 10.0.8) in 64-bit and 32-bit mode, using gcc version 4.2.1 (Apple Inc. build 5664). 111 | 112 | I would like to add more architectures to the above list, especially non-Intel ones such as ARM and big endian systems such as SPARC. If you test this on another architecture or with another compiler, please drop me a line! 113 | 114 | To test with gcc, use the command `gcc -std=c99 -Wall -pedantic -o test test.c` to compile the test. It should produce no warnings. The executable `test` should run without outputting any errors. 115 | 116 | On x86-64 platforms, it is also possible to test in 32-bit mode using the -m32 flag as in `gcc -m32 -std=c99 -Wall -pedantic -o test test.c`. 117 | -------------------------------------------------------------------------------- /nanbox.h: -------------------------------------------------------------------------------- 1 | /* 2 | * The MIT License (MIT) 3 | * 4 | * Copyright (c) 2013 Viktor Söderqvist 5 | * 6 | * Permission is hereby granted, free of charge, to any person obtaining a copy 7 | * of this software and associated documentation files (the "Software"), to deal 8 | * in the Software without restriction, including without limitation the rights 9 | * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | * copies of the Software, and to permit persons to whom the Software is 11 | * furnished to do so, subject to the following conditions: 12 | * 13 | * The above copyright notice and this permission notice shall be included in 14 | * all copies or substantial portions of the Software. 15 | * 16 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 22 | * THE SOFTWARE. 23 | */ 24 | 25 | /* 26 | * nanbox.h 27 | * -------- 28 | * 29 | * This file provides a is a way to store various types of data in a 64-bit 30 | * slot, including a type tag, using NaN-boxing. NaN-boxing is a way to store 31 | * various information in unused NaN-space in the IEEE754 representation. For 32 | * 64-bit platforms, unused bits in pointers are also used to encode various 33 | * information. The representation in inspired by that used by Webkit's 34 | * JavaScriptCore. 35 | * 36 | * Datatypes that can be stored: 37 | * 38 | * * int (int32_t) 39 | * * double 40 | * * pointer 41 | * * boolean (true and false) 42 | * * null 43 | * * undefined 44 | * * empty 45 | * * deleted 46 | * * aux 'auxillary data' (5 types of 48-bit values) 47 | * 48 | * Any value with the top 13 bits set represents a quiet NaN. The remaining 49 | * bits are called the 'payload'. NaNs produced by hardware and C-library 50 | * functions typically produce a payload of zero. We assume that all quiet 51 | * NaNs with a non-zero payload can be used to encode whatever we want. 52 | */ 53 | 54 | #ifndef NANBOX_H 55 | #define NANBOX_H 56 | 57 | /* 58 | * Define this before including this file to get functions and type prefixed 59 | * with something other than "nanbox". 60 | */ 61 | #ifndef NANBOX_PREFIX 62 | #define NANBOX_PREFIX nanbox 63 | #endif 64 | 65 | /* User-defined pointer type. Defaults to void*. Must be a pointer type. */ 66 | #ifndef NANBOX_POINTER_TYPE 67 | #define NANBOX_POINTER_TYPE void* 68 | #endif 69 | 70 | /* 71 | * User-defined auxillary types. Default to void*. These types must be pointer 72 | * types or 32-bit types. (Pointers on 64-bit platforms always begin with 16 73 | * bits of zero.) 74 | */ 75 | #ifndef NANBOX_AUX1_TYPE 76 | #define NANBOX_AUX1_TYPE void* 77 | #endif 78 | #ifndef NANBOX_AUX2_TYPE 79 | #define NANBOX_AUX2_TYPE void* 80 | #endif 81 | #ifndef NANBOX_AUX3_TYPE 82 | #define NANBOX_AUX3_TYPE void* 83 | #endif 84 | #ifndef NANBOX_AUX4_TYPE 85 | #define NANBOX_AUX4_TYPE void* 86 | #endif 87 | #ifndef NANBOX_AUX5_TYPE 88 | #define NANBOX_AUX5_TYPE void* 89 | #endif 90 | 91 | 92 | #include // size_t 93 | #include // int64_t, int32_t 94 | #include // bool, true, false 95 | #include // memset 96 | #include 97 | 98 | /* 99 | * Macros to expand the prefix. 100 | */ 101 | #undef NANBOX_XXNAME 102 | #define NANBOX_XXNAME(prefix, name) prefix ## name 103 | #undef NANBOX_XNAME 104 | #define NANBOX_XNAME(prefix, name) NANBOX_XXNAME(prefix, name) 105 | #undef NANBOX_NAME 106 | #define NANBOX_NAME(name) NANBOX_XNAME(NANBOX_PREFIX, name) 107 | 108 | /* 109 | * Detect OS and endianess. 110 | * 111 | * Most of this is inspired by WTF/wtf/Platform.h in Webkit's source code. 112 | */ 113 | 114 | /* Unix? */ 115 | #if defined(_AIX) \ 116 | || defined(__APPLE__) /* Darwin */ \ 117 | || defined(__FreeBSD__) || defined(__DragonFly__) \ 118 | || defined(__FreeBSD_kernel__) \ 119 | || defined(__GNU__) /* GNU/Hurd */ \ 120 | || defined(__linux__) \ 121 | || defined(__NetBSD__) \ 122 | || defined(__OpenBSD__) \ 123 | || defined(__QNXNTO__) \ 124 | || defined(sun) || defined(__sun) /* Solaris */ \ 125 | || defined(unix) || defined(__unix) || defined(__unix__) 126 | #define NANBOX_UNIX 1 127 | #endif 128 | 129 | /* Windows? */ 130 | #if defined(WIN32) || defined(_WIN32) 131 | #define NANBOX_WINDOWS 1 132 | #endif 133 | 134 | /* 64-bit mode? (Mostly equivallent to how WebKit does it) */ 135 | #if ((defined(__x86_64__) || defined(_M_X64)) \ 136 | && (defined(NANBOX_UNIX) || defined(NANBOX_WINDOWS))) \ 137 | || (defined(__ia64__) && defined(__LP64__)) /* Itanium in LP64 mode */ \ 138 | || defined(__alpha__) /* DEC Alpha */ \ 139 | || (defined(__sparc__) && defined(__arch64__) || defined (__sparcv9)) /* BE */ \ 140 | || defined(__s390x__) /* S390 64-bit (BE) */ \ 141 | || (defined(__ppc64__) || defined(__PPC64__)) \ 142 | || defined(__aarch64__) /* ARM 64-bit */ 143 | #define NANBOX_64 1 144 | #else 145 | #define NANBOX_32 1 146 | #endif 147 | 148 | /* Big endian? (Mostly equivallent to how WebKit does it) */ 149 | #if defined(__MIPSEB__) /* MIPS 32-bit */ \ 150 | || defined(__ppc__) || defined(__PPC__) /* CPU(PPC) - PowerPC 32-bit */ \ 151 | || defined(__powerpc__) || defined(__powerpc) || defined(__POWERPC__) \ 152 | || defined(_M_PPC) || defined(__PPC) \ 153 | || defined(__ppc64__) || defined(__PPC64__) /* PowerPC 64-bit */ \ 154 | || defined(__sparc) /* Sparc 32bit */ \ 155 | || defined(__sparc__) /* Sparc 64-bit */ \ 156 | || defined(__s390x__) /* S390 64-bit */ \ 157 | || defined(__s390__) /* S390 32-bit */ \ 158 | || defined(__ARMEB__) /* ARM big endian */ \ 159 | || ((defined(__CC_ARM) || defined(__ARMCC__)) /* ARM RealView compiler */ \ 160 | && defined(__BIG_ENDIAN)) 161 | #define NANBOX_BIG_ENDIAN 1 162 | #endif 163 | 164 | /* 165 | * In 32-bit mode, the double is unmasked. In 64-bit mode, the pointer is 166 | * unmasked. 167 | */ 168 | union NANBOX_NAME(_u) { 169 | uint64_t as_int64; 170 | #if defined(NANBOX_64) 171 | NANBOX_POINTER_TYPE pointer; 172 | #endif 173 | double as_double; 174 | #ifdef NANBOX_BIG_ENDIAN 175 | struct { 176 | uint32_t tag; 177 | uint32_t payload; 178 | } as_bits; 179 | #else 180 | struct { 181 | uint32_t payload; 182 | uint32_t tag; 183 | } as_bits; 184 | #endif 185 | }; 186 | 187 | #undef NANBOX_T 188 | #define NANBOX_T NANBOX_NAME(_t) 189 | typedef union NANBOX_NAME(_u) NANBOX_T; 190 | 191 | #if defined(NANBOX_64) 192 | 193 | /* 194 | * 64-bit platforms 195 | * 196 | * This range of NaN space is represented by 64-bit numbers begining with 197 | * 13 bits of ones. That is, the first 16 bits are 0xFFF8 or higher. In 198 | * practice, no higher value is used for NaNs. We rely on the fact that no 199 | * valid double-precision numbers will be "higher" than this (compared as an 200 | * uint64). 201 | * 202 | * By adding 7 * 2^48 as a 64-bit integer addition, we shift the first 16 bits 203 | * in the doubles from the range 0000..FFF8 to the range 0007..FFFF. Doubles 204 | * are decoded by reversing this operation, i.e. substracting the same number. 205 | * 206 | * The top 16-bits denote the type of the encoded nanbox_t: 207 | * 208 | * Pointer { 0000:PPPP:PPPP:PPPP 209 | * / 0001:xxxx:xxxx:xxxx 210 | * Aux. { ... 211 | * \ 0005:xxxx:xxxx:xxxx 212 | * Integer { 0006:0000:IIII:IIII 213 | * / 0007:****:****:**** 214 | * Double { ... 215 | * \ FFFF:****:****:**** 216 | * 217 | * 32-bit signed integers are marked with the 16-bit tag 0x0006. 218 | * 219 | * The tags 0x0001..0x0005 can be used to store five additional types of 220 | * 48-bit auxillary data, each storing up to 48 bits of payload. 221 | * 222 | * The tag 0x0000 denotes a pointer, or another form of tagged immediate. 223 | * Boolean, 'null', 'undefined' and 'deleted' are represented by specific, 224 | * invalid pointer values: 225 | * 226 | * False: 0x06 227 | * True: 0x07 228 | * Undefined: 0x0a 229 | * Null: 0x02 230 | * Empty: 0x00 231 | * Deleted: 0x05 232 | * 233 | * All of these except Empty have bit 0 or bit 1 set. 234 | */ 235 | 236 | #define NANBOX_VALUE_EMPTY 0x0llu 237 | #define NANBOX_VALUE_DELETED 0x5llu 238 | 239 | // Booleans have bits 1 and 2 set. True also has bit 0 set. 240 | #define NANBOX_VALUE_FALSE 0x06llu 241 | #define NANBOX_VALUE_TRUE 0x07llu 242 | 243 | // Null and undefined both have bit 1 set. Undefined also has bit 3 set. 244 | #define NANBOX_VALUE_UNDEFINED 0x0Allu 245 | #define NANBOX_VALUE_NULL 0x02llu 246 | 247 | // This value is 7 * 2^48, used to encode doubles such that the encoded value 248 | // will begin with a 16-bit pattern within the range 0x0007..0xFFFF. 249 | #define NANBOX_DOUBLE_ENCODE_OFFSET 0x0007000000000000llu 250 | // If the 16 first bits are 0x0002, this indicates an integer number. Any 251 | // larger value is a double, so we can use >= to check for either integer or 252 | // double. 253 | #define NANBOX_MIN_NUMBER 0x0006000000000000llu 254 | #define NANBOX_HIGH16_TAG 0xffff000000000000llu 255 | 256 | // There are 5 * 2^48 auxillary values can be stored in the 64-bit integer 257 | // range NANBOX_MIN_AUX..NANBOX_MAX_AUX. 258 | #define NANBOX_MIN_AUX_TAG 0x00010000 259 | #define NANBOX_MAX_AUX_TAG 0x0005ffff 260 | #define NANBOX_MIN_AUX 0x0001000000000000llu 261 | #define NANBOX_MAX_AUX 0x0005ffffffffffffllu 262 | 263 | // NANBOX_MASK_POINTER defines the allowed non-zero bits in a pointer. 264 | #define NANBOX_MASK_POINTER 0x0000fffffffffffcllu 265 | 266 | // The 'empty' value is guarranteed to consist of a repeated single byte, 267 | // so that it should be easy to memset an array of nanboxes to 'empty' using 268 | // NANBOX_EMPTY_BYTE as the value for every byte. 269 | #define NANBOX_EMPTY_BYTE 0x0 270 | 271 | // Define bool nanbox_is_xxx(NANBOX_T val) and NANBOX_T nanbox_xxx(void) 272 | // with empty, deleted, true, false, undefined and null substituted for xxx. 273 | #define NANBOX_IMMEDIATE_VALUE_FUNCTIONS(NAME, VALUE) \ 274 | static inline NANBOX_T NANBOX_NAME(_##NAME)(void) { \ 275 | NANBOX_T val; \ 276 | val.as_int64 = VALUE; \ 277 | return val; \ 278 | } \ 279 | static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) { \ 280 | return val.as_int64 == VALUE; \ 281 | } 282 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(empty, NANBOX_VALUE_EMPTY) 283 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(deleted, NANBOX_VALUE_DELETED) 284 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(false, NANBOX_VALUE_FALSE) 285 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(true, NANBOX_VALUE_TRUE) 286 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(undefined, NANBOX_VALUE_UNDEFINED) 287 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(null, NANBOX_VALUE_NULL) 288 | 289 | static inline bool NANBOX_NAME(_is_undefined_or_null)(NANBOX_T val) { 290 | // Undefined and null are the same if we remove the 'undefined' bit. 291 | return (val.as_int64 & ~8) == NANBOX_VALUE_NULL; 292 | } 293 | 294 | static inline bool NANBOX_NAME(_is_boolean)(NANBOX_T val) { 295 | // True and false are the same if we remove the 'true' bit. 296 | return (val.as_int64 & ~1) == NANBOX_VALUE_FALSE; 297 | } 298 | static inline bool NANBOX_NAME(_to_boolean)(NANBOX_T val) { 299 | assert(NANBOX_NAME(_is_boolean)(val)); 300 | return val.as_int64 & 1; 301 | } 302 | static inline NANBOX_T NANBOX_NAME(_from_boolean)(bool b) { 303 | NANBOX_T val; 304 | val.as_int64 = b ? NANBOX_VALUE_TRUE : NANBOX_VALUE_FALSE; 305 | return val; 306 | } 307 | 308 | /* true if val is a double or an int */ 309 | static inline bool NANBOX_NAME(_is_number)(NANBOX_T val) { 310 | return val.as_int64 >= NANBOX_MIN_NUMBER; 311 | } 312 | 313 | static inline bool NANBOX_NAME(_is_int)(NANBOX_T val) { 314 | return (val.as_int64 & NANBOX_HIGH16_TAG) == NANBOX_MIN_NUMBER; 315 | } 316 | static inline NANBOX_T NANBOX_NAME(_from_int)(int32_t i) { 317 | NANBOX_T val; 318 | val.as_int64 = NANBOX_MIN_NUMBER | (uint32_t)i; 319 | return val; 320 | } 321 | static inline int32_t NANBOX_NAME(_to_int)(NANBOX_T val) { 322 | assert(NANBOX_NAME(_is_int)(val)); 323 | return (int32_t)val.as_int64; 324 | } 325 | 326 | static inline bool NANBOX_NAME(_is_double)(NANBOX_T val) { 327 | return NANBOX_NAME(_is_number)(val) && !NANBOX_NAME(_is_int)(val); 328 | } 329 | static inline NANBOX_T NANBOX_NAME(_from_double)(double d) { 330 | NANBOX_T val; 331 | val.as_double = d; 332 | val.as_int64 += NANBOX_DOUBLE_ENCODE_OFFSET; 333 | assert(NANBOX_NAME(_is_double)(val)); 334 | return val; 335 | } 336 | static inline double NANBOX_NAME(_to_double)(NANBOX_T val) { 337 | assert(NANBOX_NAME(_is_double)(val)); 338 | val.as_int64 -= NANBOX_DOUBLE_ENCODE_OFFSET; 339 | return val.as_double; 340 | } 341 | 342 | static inline bool NANBOX_NAME(_is_pointer)(NANBOX_T val) { 343 | return !(val.as_int64 & ~NANBOX_MASK_POINTER) && val.as_int64; 344 | } 345 | static inline NANBOX_POINTER_TYPE NANBOX_NAME(_to_pointer)(NANBOX_T val) { 346 | assert(NANBOX_NAME(_is_pointer)(val)); 347 | return val.pointer; 348 | } 349 | static inline NANBOX_T NANBOX_NAME(_from_pointer)(NANBOX_POINTER_TYPE pointer) { 350 | NANBOX_T val; 351 | val.pointer = pointer; 352 | assert(NANBOX_NAME(_is_pointer)(val)); 353 | return val; 354 | } 355 | 356 | static inline bool NANBOX_NAME(_is_aux)(NANBOX_T val) { 357 | return val.as_int64 >= NANBOX_MIN_AUX && 358 | val.as_int64 <= NANBOX_MAX_AUX; 359 | } 360 | 361 | /* end if NANBOX_64 */ 362 | #elif defined(NANBOX_32) 363 | 364 | /* 365 | * On 32-bit platforms we use the following NaN-boxing scheme: 366 | * 367 | * For values that do not contain a double value, the high 32 bits contain the 368 | * tag values listed below, which all correspond to NaN-space. When the tag is 369 | * 'pointer', 'integer' and 'boolean', their values (the 'payload') are store 370 | * in the lower 32 bits. In the case of all other tags the payload is 0. 371 | */ 372 | #define NANBOX_MAX_DOUBLE_TAG 0xfff80000 373 | #define NANBOX_INT_TAG 0xfff80001 374 | #define NANBOX_MIN_AUX_TAG 0xfff90000 375 | #define NANBOX_MAX_AUX_TAG 0xfffdffff 376 | #define NANBOX_POINTER_TAG 0xfffffffa 377 | #define NANBOX_BOOLEAN_TAG 0xfffffffb 378 | #define NANBOX_UNDEFINED_TAG 0xfffffffc 379 | #define NANBOX_NULL_TAG 0xfffffffd 380 | #define NANBOX_DELETED_VALUE_TAG 0xfffffffe 381 | #define NANBOX_EMPTY_VALUE_TAG 0xffffffff 382 | 383 | // The 'empty' value is guarranteed to consist of a repeated single byte, 384 | // so that it should be easy to memset an array of nanboxes to 'empty' using 385 | // NANBOX_EMPTY_BYTE as the value for every byte. 386 | #define NANBOX_EMPTY_BYTE 0xff 387 | 388 | /* The minimum uint64_t value for the auxillary range */ 389 | #define NANBOX_MIN_AUX 0xfff9000000000000llu 390 | #define NANBOX_MAX_AUX 0xfffdffffffffffffllu 391 | 392 | // Define nanbox_xxx and nanbox_is_xxx for deleted, undefined and null. 393 | #define NANBOX_IMMEDIATE_VALUE_FUNCTIONS(NAME, TAG) \ 394 | static inline NANBOX_T NANBOX_NAME(_##NAME)(void) { \ 395 | NANBOX_T val; \ 396 | val.as_bits.tag = TAG; \ 397 | val.as_bits.payload = 0; \ 398 | return val; \ 399 | } \ 400 | static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) { \ 401 | return val.as_bits.tag == TAG; \ 402 | } 403 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(deleted, NANBOX_DELETED_VALUE_TAG) 404 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(undefined, NANBOX_UNDEFINED_TAG) 405 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(null, NANBOX_NULL_TAG) 406 | 407 | // The undefined and null tags differ only in one bit 408 | static inline bool NANBOX_NAME(_is_undefined_or_null)(NANBOX_T val) { 409 | return (val.as_bits.tag & ~1) == NANBOX_UNDEFINED_TAG; 410 | } 411 | 412 | static inline NANBOX_T NANBOX_NAME(_empty)(void) { 413 | NANBOX_T val; 414 | val.as_int64 = 0xffffffffffffffffllu; 415 | return val; 416 | } 417 | static inline bool NANBOX_NAME(_is_empty)(NANBOX_T val) { 418 | return val.as_bits.tag == 0xffffffff; 419 | } 420 | 421 | /* Returns true if the value is auxillary space */ 422 | static inline bool NANBOX_NAME(_is_aux)(NANBOX_T val) { 423 | return val.as_bits.tag >= NANBOX_MIN_AUX_TAG && 424 | val.as_bits.tag < NANBOX_POINTER_TAG; 425 | } 426 | 427 | // Define nanbox_is_yyy, nanbox_to_yyy and nanbox_from_yyy for 428 | // boolean, int, pointer and aux1-aux5 429 | #define NANBOX_TAGGED_VALUE_FUNCTIONS(NAME, TYPE, TAG) \ 430 | static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) { \ 431 | return val.as_bits.tag == TAG; \ 432 | } \ 433 | static inline TYPE NANBOX_NAME(_to_##NAME)(NANBOX_T val) { \ 434 | assert(val.as_bits.tag == TAG); \ 435 | return (TYPE)val.as_bits.payload; \ 436 | } \ 437 | static inline NANBOX_T NANBOX_NAME(_from_##NAME)(TYPE a) { \ 438 | NANBOX_T val; \ 439 | val.as_bits.tag = TAG; \ 440 | val.as_bits.payload = (int32_t)a; \ 441 | return val; \ 442 | } 443 | 444 | NANBOX_TAGGED_VALUE_FUNCTIONS(boolean, bool, NANBOX_BOOLEAN_TAG) 445 | NANBOX_TAGGED_VALUE_FUNCTIONS(int, int32_t, NANBOX_INT_TAG) 446 | NANBOX_TAGGED_VALUE_FUNCTIONS(pointer, NANBOX_POINTER_TYPE, NANBOX_POINTER_TAG) 447 | 448 | static inline NANBOX_T NANBOX_NAME(_true)(void) { 449 | return NANBOX_NAME(_from_boolean)(true); 450 | } 451 | static inline NANBOX_T NANBOX_NAME(_false)(void) { 452 | return NANBOX_NAME(_from_boolean)(false); 453 | } 454 | static inline bool NANBOX_NAME(_is_true)(NANBOX_T val) { 455 | return val.as_bits.tag == NANBOX_BOOLEAN_TAG && val.as_bits.payload; 456 | } 457 | static inline bool NANBOX_NAME(_is_false)(NANBOX_T val) { 458 | return val.as_bits.tag == NANBOX_BOOLEAN_TAG && !val.as_bits.payload; 459 | } 460 | 461 | static inline bool NANBOX_NAME(_is_double)(NANBOX_T val) { 462 | return val.as_bits.tag < NANBOX_INT_TAG; 463 | } 464 | // is number = is double or is int 465 | static inline bool NANBOX_NAME(_is_number)(NANBOX_T val) { 466 | return val.as_bits.tag <= NANBOX_INT_TAG; 467 | } 468 | 469 | static inline NANBOX_T NANBOX_NAME(_from_double)(double d) { 470 | NANBOX_T val; 471 | val.as_double = d; 472 | assert(NANBOX_NAME(_is_double)(val) && 473 | val.as_bits.tag <= NANBOX_MAX_DOUBLE_TAG); 474 | return val; 475 | } 476 | static inline double NANBOX_NAME(_to_double)(NANBOX_T val) { 477 | assert(NANBOX_NAME(_is_double)(val)); 478 | return val.as_double; 479 | } 480 | 481 | #endif /* elif NANBOX_32 */ 482 | 483 | /* 484 | * Representation-independent functions 485 | */ 486 | 487 | static inline double NANBOX_NAME(_to_number)(NANBOX_T val) { 488 | assert(NANBOX_NAME(_is_number)(val)); 489 | return NANBOX_NAME(_is_int)(val) ? NANBOX_NAME(_to_int)(val) 490 | : NANBOX_NAME(_to_double)(val); 491 | } 492 | 493 | #endif /* NANBOX_H */ 494 | -------------------------------------------------------------------------------- /nanbox_shortstring.h: -------------------------------------------------------------------------------- 1 | #ifndef NANBOX_SHORTSTRING_H 2 | #define NANBOX_SHORTSTRING_H 3 | /* 4 | * Short strings 5 | * ------------- 6 | * Strings of up to 6 bytes can be stored in a NANBOX_T in so called 'auxillary 7 | * space'. The space used is NANBOX_MIN_AUX..(NANBOX_MIN_AUX + 3 * 2^48 - 1). 8 | */ 9 | 10 | #include "nanbox.h" 11 | 12 | static inline bool NANBOX_NAME(_is_shortstring)(NANBOX_T val) { 13 | return val.as_bits.tag >= NANBOX_MIN_AUX_TAG && 14 | val.as_bits.tag <= NANBOX_MIN_AUX_TAG + 0x0002ffff; 15 | } 16 | static inline char* NANBOX_NAME(_shortstring_chars)(NANBOX_T* val) { 17 | assert(NANBOX_NAME(_is_shortstring)(*val)); 18 | #ifdef NANBOX_BIG_ENDIAN 19 | if (val->as_bits.tag & 0xffff0000 == NANBOX_MIN_AUX_TAG) 20 | return (char*)val + 4; /* skip tag and length */ 21 | else 22 | return (char*)val + 2; /* skip tag */ 23 | #else 24 | return (char*)val; 25 | #endif 26 | } 27 | 28 | static inline unsigned NANBOX_NAME(_shortstring_length)(NANBOX_T val) { 29 | assert(NANBOX_NAME(_is_shortstring)(val)); 30 | if (val.as_bits.tag <= NANBOX_MIN_AUX_TAG + 4) 31 | return val.as_bits.tag - NANBOX_MIN_AUX_TAG; 32 | else 33 | return ((val.as_bits.tag - NANBOX_MIN_AUX_TAG) >> 16) + 4; 34 | } 35 | 36 | // creates a short string of length byts with undefined contents 37 | static inline NANBOX_T NANBOX_NAME(_shortstring_create_undef)(unsigned length) { 38 | NANBOX_T val; 39 | assert(length <= 6); 40 | if (length <= 4) 41 | val.as_bits.tag = NANBOX_MIN_AUX_TAG + length; 42 | else 43 | val.as_bits.tag = NANBOX_MIN_AUX_TAG + ((length - 4) << 16); 44 | val.as_bits.payload = 0; 45 | return val; 46 | } 47 | 48 | // copies length bytes of chars. (nul bytes are copied like any other byte) 49 | static inline NANBOX_T 50 | NANBOX_NAME(_shortstring_create)(const char *chars, unsigned length) { 51 | NANBOX_T val = NANBOX_NAME(_shortstring_create_undef)(length); 52 | memcpy(NANBOX_NAME(_shortstring_chars)(&val), chars, length); 53 | assert(NANBOX_NAME(_is_shortstring)(val)); 54 | return val; 55 | } 56 | #endif 57 | -------------------------------------------------------------------------------- /shortstring_demo.c: -------------------------------------------------------------------------------- 1 | #include "nanbox_shortstring.h" 2 | #include 3 | #include 4 | 5 | int main() { 6 | printf("Enter short strings of up to 6 chars to dump, q to quit.\n"); 7 | while (1) { 8 | char buf[7]; 9 | size_t len; 10 | printf("Short string --> "); 11 | if (scanf("%6s", buf) == EOF) 12 | break; 13 | if (!strcmp(buf, "q")) 14 | break; 15 | len = strlen(buf); 16 | nanbox_t val = nanbox_shortstring_create(buf, len); 17 | printf("%p \"%.6s\" (length %d)\n", 18 | (void*)val.as_int64, 19 | nanbox_shortstring_chars(&val), 20 | nanbox_shortstring_length(val)); 21 | } 22 | printf("\n"); 23 | return 0; 24 | } 25 | -------------------------------------------------------------------------------- /test.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "nanbox.h" 6 | 7 | // This macro stores a value VALUE of type TYPE in a nanbox, checks the type, 8 | // converts back and checks that we got the value back. It also checks that the 9 | // nanbox is not of any other type. 10 | // 11 | // The only NaN expression that is possible to test with this expression is 12 | // 0.0/0.0. (There is some special logic to NaN, because NaN != NaN. 13 | #define TO_NANBOX_AND_BACK(TYPE, VALUE) do { \ 14 | nanbox_t x = nanbox_from_##TYPE(VALUE); \ 15 | assert(nanbox_is_##TYPE(x)); \ 16 | /* decode and test == to the original, except for NaN */ \ 17 | if (!strcmp(#TYPE, "double") && !strcmp(#VALUE, "0.0/0.0")) \ 18 | assert(VALUE != nanbox_to_##TYPE(x)); \ 19 | else \ 20 | assert(VALUE == nanbox_to_##TYPE(x)); \ 21 | assert(nanbox_is_double(x) == !strcmp(#TYPE, "double")); \ 22 | assert(nanbox_is_int(x) == !strcmp(#TYPE, "int")); \ 23 | assert(nanbox_is_pointer(x) == !strcmp(#TYPE, "pointer")); \ 24 | assert(nanbox_is_boolean(x) == !strcmp(#TYPE, "boolean")); \ 25 | assert(nanbox_is_number(x) == (!strcmp(#TYPE, "double") || \ 26 | !strcmp(#TYPE, "int"))); \ 27 | assert(!nanbox_is_null(x)); \ 28 | assert(!nanbox_is_undefined(x)); \ 29 | assert(!nanbox_is_empty(x)); \ 30 | assert(!nanbox_is_deleted(x)); \ 31 | assert(!nanbox_is_aux(x)); \ 32 | assert(nanbox_is_true(x) == !strcmp(#VALUE, "true")); \ 33 | assert(nanbox_is_false(x) == !strcmp(#VALUE, "false")); \ 34 | } while(0) 35 | 36 | // Use this to create and check a nanbox of null, undefined, empty, deleted, 37 | // true or false. It tests that it is of the correct type and no other type. 38 | #define TO_NANBOX_AND_CHECK(VALUE) do { \ 39 | nanbox_t x = nanbox_##VALUE(); \ 40 | assert(!nanbox_is_double(x)); \ 41 | assert(!nanbox_is_int(x)); \ 42 | assert(!nanbox_is_pointer(x)); \ 43 | assert(!nanbox_is_number(x)); \ 44 | assert(!nanbox_is_aux(x)); \ 45 | assert(nanbox_is_boolean(x) == (!strcmp(#VALUE, "true") || \ 46 | !strcmp(#VALUE, "false"))); \ 47 | assert(nanbox_is_undefined_or_null(x) == \ 48 | (!strcmp(#VALUE, "undefined") || !strcmp(#VALUE, "null"))); \ 49 | assert(nanbox_is_null(x) == !strcmp(#VALUE, "null")); \ 50 | assert(nanbox_is_undefined(x) == !strcmp(#VALUE, "undefined")); \ 51 | assert(nanbox_is_empty(x) == !strcmp(#VALUE, "empty")); \ 52 | assert(nanbox_is_deleted(x) == !strcmp(#VALUE, "deleted")); \ 53 | } while(0) 54 | 55 | // Definded below. Called from main. 56 | void test_nan(void); 57 | 58 | int main() { 59 | // Size should be 16 bits 60 | assert(sizeof(nanbox_t) == 8); 61 | 62 | // Test storing various doubles, including NaN and infinity. 63 | TO_NANBOX_AND_BACK(double, -0.0); 64 | TO_NANBOX_AND_BACK(double, 3.14); 65 | TO_NANBOX_AND_BACK(double, 1.0/0.0); 66 | TO_NANBOX_AND_BACK(double, -1.0/0.0); 67 | TO_NANBOX_AND_BACK(double, 0.0/0.0); 68 | 69 | // Test storing int, pointer and boolean 70 | TO_NANBOX_AND_BACK(int, 42); 71 | TO_NANBOX_AND_BACK(pointer, &x); 72 | TO_NANBOX_AND_BACK(boolean, true); 73 | TO_NANBOX_AND_BACK(boolean, false); 74 | 75 | // The remaining types/values 76 | //TO_NANBOX_AND_CHECK(null); 77 | //TO_NANBOX_AND_CHECK(undefined); 78 | TO_NANBOX_AND_CHECK(empty); 79 | TO_NANBOX_AND_CHECK(deleted); 80 | TO_NANBOX_AND_CHECK(true); 81 | TO_NANBOX_AND_CHECK(false); 82 | 83 | test_nan(); 84 | 85 | return 0; 86 | } 87 | 88 | // A macro to check that a double is a cannonical NaN, i.e. one that we accept. 89 | // Also, nanboxes it and checks that it identified as a double. 90 | #define ASSERT_CANNONICAL_NAN(VALUE) do { \ 91 | double d = VALUE; \ 92 | uint64_t n = *(uint64_t*)&d; \ 93 | assert((n | 0x8000000000000000llu) == 0xfff8000000000000llu); \ 94 | assert(nanbox_is_double(nanbox_from_double(VALUE))); \ 95 | } while(0) 96 | 97 | void test_nan(void) { 98 | double nan = 0.0/0.0, inf = 1.0/0.0, ninf = -1.0/0.0; 99 | assert(nan != nan); 100 | ASSERT_CANNONICAL_NAN(0.0/0.0); 101 | ASSERT_CANNONICAL_NAN(nan); 102 | ASSERT_CANNONICAL_NAN(nan + 42); 103 | ASSERT_CANNONICAL_NAN(-inf * nan); 104 | 105 | ASSERT_CANNONICAL_NAN(inf/inf); 106 | ASSERT_CANNONICAL_NAN(ninf/inf); 107 | ASSERT_CANNONICAL_NAN(0 * inf); 108 | ASSERT_CANNONICAL_NAN(0 * ninf); 109 | ASSERT_CANNONICAL_NAN(inf * 0); 110 | ASSERT_CANNONICAL_NAN(inf + ninf); 111 | ASSERT_CANNONICAL_NAN(ninf + inf); 112 | 113 | ASSERT_CANNONICAL_NAN(pow(-1.0, 3.14)); 114 | ASSERT_CANNONICAL_NAN(sqrt(-1.0)); 115 | ASSERT_CANNONICAL_NAN(log(-1.0)); 116 | ASSERT_CANNONICAL_NAN(asin(2.0)); 117 | ASSERT_CANNONICAL_NAN(acos(2.0)); 118 | } 119 | --------------------------------------------------------------------------------