├── .gitignore
├── README.md
├── nanbox.h
├── nanbox_shortstring.h
├── shortstring_demo.c
└── test.c


/.gitignore:
--------------------------------------------------------------------------------
1 | a.out
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | nanbox
  2 | ======
  3 | 
  4 | A type that can store various types of data in 64-bits using NaN-boxing.
  5 | 
  6 | The header file `nanbox.h` defines a type `nanbox_t` which can be used to store either a double, a 32-bit integer, a pointer, a boolean, null or one of a few additional values named 'undefined', 'empty' and 'deleted' plus five additional 'auxillary' types of data of up to 48 bits. The encoding scheme differs between 32-bit and 64-bit platforms but the size of `nanbox_t` is always 64 bits.
  7 | 
  8 | How does it work?
  9 | -----------------
 10 | 
 11 | NaN-boxing is a way to store various information in unused NaN-space in the IEEE754 representation.
 12 | 
 13 | Any value with the top 13 bits set represents a *quiet NaN*. The remaining bits are called the 'payload'. NaNs produced by hardware and C-library functions typically produce a payload of zero. We assume that all quiet NaNs with a non-zero payload can be used to encode whatever we want.
 14 | 
 15 | On 64-bit platforms, unused bits in pointers are also used to encode various information. The representation is inspired by that used by Webkit's JavaScriptCore. It *should work* on most 32-bit and 64-bit little endian and big endian machines. (See Testing below.)
 16 | 
 17 | Functions
 18 | ---------
 19 | 
 20 | A number of very short functions functions, all declared `static inline`, are defined to encode values as `nanbox_t`:
 21 | 
 22 | ```c
 23 | nanbox_t nanbox_from_double(double d);
 24 | nanbox_t nanbox_from_int(int32_t i);
 25 | nanbox_t nanbox_from_pointer(void* pointer);
 26 | nanbox_t nanbox_from_boolean(bool b);
 27 | nanbox_t nanbox_null(void);
 28 | nanbox_t nanbox_undefined(void);
 29 | nanbox_t nanbox_empty(void);
 30 | nanbox_t nanbox_deleted(void);
 31 | nanbox_t nanbox_true(void);   /* the same as nanbox_from_boolean(true) */
 32 | nanbox_t nanbox_false(void);  /* the same as nanbox_from_boolean(false) */
 33 | ```
 34 | 
 35 | ... to check the type:
 36 | 
 37 | ```c
 38 | bool nanbox_is_double(nanbox_t value);
 39 | bool nanbox_is_int(nanbox_t value);
 40 | bool nanbox_is_pointer(nanbox_t value);
 41 | bool nanbox_is_boolean(nanbox_t value);
 42 | bool nanbox_is_null(nanbox_t value);
 43 | bool nanbox_is_undefined(nanbox_t value);
 44 | bool nanbox_is_empty(nanbox_t value);
 45 | bool nanbox_is_deleted(nanbox_t value);
 46 | bool nanbox_is_true(nanbox_t value);
 47 | bool nanbox_is_false(nanbox_t value);
 48 | bool nanbox_is_number(nanbox_t value);  /* either int or double */
 49 | bool nanbox_is_undefined_or_null(nanbox_t value); /* either */
 50 | bool nanbox_is_aux(nanbox_t value);     /* auxillary space */
 51 | ```
 52 | 
 53 | ... and to decode the value:
 54 | 
 55 | ```c
 56 | double nanbox_to_double(nanbox_t value);
 57 | int32_t nanbox_to_int(nanbox_t value);
 58 | void* nanbox_to_pointer(nanbox_t value);
 59 | bool nanbox_to_boolean(nanbox_t value);
 60 | double nanbox_to_number(nanbox_t value); /* value can be int or double */
 61 | ```
 62 | 
 63 | Before fetching the value using these functions, you should make sure the nanbox is holdig a value of the correct type, e.g. using the corresponding `nanbox_is_...` function. If the encoded value is not of the correct type, the results of the `nanbox_to_...` functions are undefined. If compiled with assertions, you will get a failed assertion when trying to fetch a value of the wrong type.
 64 | 
 65 | The 'empty' value
 66 | -----------------
 67 | 
 68 | The 'empty' value is designed to used to represent empty slots in e.g. a hashtable. It is guarranteed to consist of a single repeated byte. This is to make sure `memset` can be used to set all the elements in an array of nanboxes to 'empty'. The macro `NANBOX_EMPTY_BYTE` represents the byte that, when repeated 8 times (64 bits), makes up an 'empty' value.
 69 | 
 70 | ```c
 71 | void foo(void) {
 72 | 	nanbox_t boxes[100];
 73 | 	// Initialize the boxes to empty values
 74 | 	memset(boxes, NANBOX_EMPTY_BYTE, sizeof(nanbox_t) * 100);
 75 | 	// ...
 76 | }
 77 | ```
 78 | 
 79 | User-defined prefix instead of 'nanbox'
 80 | ---------------------------------------
 81 | 
 82 | You can define `NANBOX_PREFIX` to the prefix you want, before including
 83 | `nanbox.h`. Then, the functions and types will be e.g.
 84 | `bool myprefix_is_double(myprefix_t value)`, etc. By undefining `NANBOX_H` and
 85 | redefining `NANBOX_PREFIX` (and possibly some of the other macros such as
 86 | `NANBOX_POINTER_TYPE`) you can include `nanbox.h` multiple times to create
 87 | multiple instances of nanbox type.
 88 | 
 89 | User-defined pointer type
 90 | -------------------------
 91 | 
 92 | When encoding and decoding pointers to/from a nanbox, the pointer type `void*` is used by default. This can be changed by defining `NANBOX_POINTER_TYPE` to the pointer type of choice, before including `nanbox.h`. The type must be a pointer type, because unused bits in the pointers are used to encode various data.
 93 | 
 94 | Auxillary data
 95 | --------------
 96 | 
 97 | Apart from doubles, pointers, ints, booleans, null, etc. there are still some bits left to store even more types of data in a nanbox. We call this 'auxillary space'. To check if the type of data in a nanbox is 'auxillary data', the function `nanbox_is_aux` can be used, but accessing the data itself requires some insight into the internal representation of the nanbox. `nanbox_h` is a union type, which means it can be accessed in multiple ways. The easiest way is to access the nanbox raw data is as a 64-bit integer using `nanbox.as_int64`. You can only store 64-bit integer value in the range `NANBOX_MIN_AUX`..`NANBOX_MAX_AUX`, which is the 'auxillary space'. You can store 5 * 2<sup>48</sup> distinct values in this range, or equivallently, 5 types of 48-bit values.
 98 | 
 99 | Another way to access the data is to use `tag` and `payload`. These each represent 32 bits of the nanbox data. If the a nanbox has its tag (`nanbox.as_bits.tag`) in the range `NANBOX_MIN_AUX_TAG`..`NANBOX_MAX_AUX_TAG` and a payload `nanbox.as_bits.payload` being any 32-bit integer value, then the nanbox data is in auxillary space.
100 | 
101 | Short strings
102 | -------------
103 | 
104 | As an example of what auxillary data can be used for, the file `nanbox_shortstring.h` is included, which implements a scheme to store strings of up to 6 bytes in the auxillary space of a nanbox. The functions `nanbox_is_shortstring`, `nanbox_shortstring_create`, etc. are defined and a small demo program is included in `shortstring_demo.c`.
105 | 
106 | Testing
107 | -------
108 | 
109 | Tested on
110 |   * x86-64 (Intel Core 2 Duo), Mac OS X (Darwin 10.0.8) in 64-bit and 32-bit mode, using gcc version 4.2.1 (Apple Inc. build 5664).
111 | 
112 | I would like to add more architectures to the above list, especially non-Intel ones such as ARM and big endian systems such as SPARC. If you test this on another architecture or with another compiler, please drop me a line!
113 | 
114 | To test with gcc, use the command `gcc -std=c99 -Wall -pedantic -o test test.c` to compile the test. It should produce no warnings. The executable `test` should run without outputting any errors.
115 | 
116 | On x86-64 platforms, it is also possible to test in 32-bit mode using the -m32 flag as in `gcc -m32 -std=c99 -Wall -pedantic -o test test.c`.
117 | 


--------------------------------------------------------------------------------
/nanbox.h:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * The MIT License (MIT)
  3 |  * 
  4 |  * Copyright (c) 2013 Viktor Söderqvist
  5 |  * 
  6 |  * Permission is hereby granted, free of charge, to any person obtaining a copy
  7 |  * of this software and associated documentation files (the "Software"), to deal
  8 |  * in the Software without restriction, including without limitation the rights
  9 |  * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 10 |  * copies of the Software, and to permit persons to whom the Software is
 11 |  * furnished to do so, subject to the following conditions:
 12 |  * 
 13 |  * The above copyright notice and this permission notice shall be included in
 14 |  * all copies or substantial portions of the Software.
 15 |  * 
 16 |  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 17 |  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 18 |  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 19 |  * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 20 |  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 21 |  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 22 |  * THE SOFTWARE.
 23 |  */
 24 | 
 25 | /*
 26 |  * nanbox.h
 27 |  * --------
 28 |  *
 29 |  * This file provides a is a way to store various types of data in a 64-bit
 30 |  * slot, including a type tag, using NaN-boxing.  NaN-boxing is a way to store
 31 |  * various information in unused NaN-space in the IEEE754 representation.  For
 32 |  * 64-bit platforms, unused bits in pointers are also used to encode various
 33 |  * information.  The representation in inspired by that used by Webkit's
 34 |  * JavaScriptCore.
 35 |  *
 36 |  * Datatypes that can be stored:
 37 |  *
 38 |  *   * int (int32_t)
 39 |  *   * double
 40 |  *   * pointer
 41 |  *   * boolean (true and false)
 42 |  *   * null
 43 |  *   * undefined
 44 |  *   * empty
 45 |  *   * deleted
 46 |  *   * aux 'auxillary data' (5 types of 48-bit values)
 47 |  *
 48 |  * Any value with the top 13 bits set represents a quiet NaN.  The remaining
 49 |  * bits are called the 'payload'. NaNs produced by hardware and C-library
 50 |  * functions typically produce a payload of zero.  We assume that all quiet
 51 |  * NaNs with a non-zero payload can be used to encode whatever we want.
 52 |  */
 53 | 
 54 | #ifndef NANBOX_H
 55 | #define NANBOX_H
 56 | 
 57 | /*
 58 |  * Define this before including this file to get functions and type prefixed
 59 |  * with something other than "nanbox".
 60 |  */
 61 | #ifndef NANBOX_PREFIX
 62 | #define NANBOX_PREFIX nanbox
 63 | #endif
 64 | 
 65 | /* User-defined pointer type. Defaults to void*. Must be a pointer type. */
 66 | #ifndef NANBOX_POINTER_TYPE
 67 | #define NANBOX_POINTER_TYPE void*
 68 | #endif
 69 | 
 70 | /*
 71 |  * User-defined auxillary types. Default to void*. These types must be pointer
 72 |  * types or 32-bit types. (Pointers on 64-bit platforms always begin with 16
 73 |  * bits of zero.)
 74 |  */
 75 | #ifndef NANBOX_AUX1_TYPE
 76 | #define NANBOX_AUX1_TYPE void*
 77 | #endif
 78 | #ifndef NANBOX_AUX2_TYPE
 79 | #define NANBOX_AUX2_TYPE void*
 80 | #endif
 81 | #ifndef NANBOX_AUX3_TYPE
 82 | #define NANBOX_AUX3_TYPE void*
 83 | #endif
 84 | #ifndef NANBOX_AUX4_TYPE
 85 | #define NANBOX_AUX4_TYPE void*
 86 | #endif
 87 | #ifndef NANBOX_AUX5_TYPE
 88 | #define NANBOX_AUX5_TYPE void*
 89 | #endif
 90 | 
 91 | 
 92 | #include <stddef.h>  // size_t
 93 | #include <stdint.h>  // int64_t, int32_t
 94 | #include <stdbool.h> // bool, true, false
 95 | #include <string.h>  // memset
 96 | #include <assert.h>
 97 | 
 98 | /*
 99 |  * Macros to expand the prefix.
100 |  */
101 | #undef NANBOX_XXNAME
102 | #define NANBOX_XXNAME(prefix, name) prefix ## name
103 | #undef NANBOX_XNAME
104 | #define NANBOX_XNAME(prefix, name) NANBOX_XXNAME(prefix, name)
105 | #undef NANBOX_NAME
106 | #define NANBOX_NAME(name) NANBOX_XNAME(NANBOX_PREFIX, name)
107 | 
108 | /*
109 |  * Detect OS and endianess.
110 |  *
111 |  * Most of this is inspired by WTF/wtf/Platform.h in Webkit's source code.
112 |  */
113 | 
114 | /* Unix? */
115 | #if defined(_AIX) \
116 |     || defined(__APPLE__) /* Darwin */ \
117 |     || defined(__FreeBSD__) || defined(__DragonFly__) \
118 |     || defined(__FreeBSD_kernel__) \
119 |     || defined(__GNU__) /* GNU/Hurd */ \
120 |     || defined(__linux__) \
121 |     || defined(__NetBSD__) \
122 |     || defined(__OpenBSD__) \
123 |     || defined(__QNXNTO__) \
124 |     || defined(sun) || defined(__sun) /* Solaris */ \
125 |     || defined(unix) || defined(__unix) || defined(__unix__)
126 | #define NANBOX_UNIX 1
127 | #endif
128 | 
129 | /* Windows? */
130 | #if defined(WIN32) || defined(_WIN32)
131 | #define NANBOX_WINDOWS 1
132 | #endif
133 | 
134 | /* 64-bit mode? (Mostly equivallent to how WebKit does it) */
135 | #if ((defined(__x86_64__) || defined(_M_X64)) \
136 |      && (defined(NANBOX_UNIX) || defined(NANBOX_WINDOWS))) \
137 |     || (defined(__ia64__) && defined(__LP64__)) /* Itanium in LP64 mode */ \
138 |     || defined(__alpha__) /* DEC Alpha */ \
139 |     || (defined(__sparc__) && defined(__arch64__) || defined (__sparcv9)) /* BE */ \
140 |     || defined(__s390x__) /* S390 64-bit (BE) */ \
141 |     || (defined(__ppc64__) || defined(__PPC64__)) \
142 |     || defined(__aarch64__) /* ARM 64-bit */
143 | #define NANBOX_64 1
144 | #else
145 | #define NANBOX_32 1
146 | #endif
147 | 
148 | /* Big endian? (Mostly equivallent to how WebKit does it) */
149 | #if defined(__MIPSEB__) /* MIPS 32-bit */ \
150 |     || defined(__ppc__) || defined(__PPC__) /* CPU(PPC) - PowerPC 32-bit */ \
151 |     || defined(__powerpc__) || defined(__powerpc) || defined(__POWERPC__) \
152 |     || defined(_M_PPC) || defined(__PPC) \
153 |     || defined(__ppc64__) || defined(__PPC64__) /* PowerPC 64-bit */ \
154 |     || defined(__sparc)   /* Sparc 32bit */  \
155 |     || defined(__sparc__) /* Sparc 64-bit */ \
156 |     || defined(__s390x__) /* S390 64-bit */ \
157 |     || defined(__s390__)  /* S390 32-bit */ \
158 |     || defined(__ARMEB__) /* ARM big endian */ \
159 |     || ((defined(__CC_ARM) || defined(__ARMCC__)) /* ARM RealView compiler */ \
160 |         && defined(__BIG_ENDIAN))
161 | #define NANBOX_BIG_ENDIAN 1
162 | #endif
163 | 
164 | /*
165 |  * In 32-bit mode, the double is unmasked. In 64-bit mode, the pointer is
166 |  * unmasked.
167 |  */
168 | union NANBOX_NAME(_u) {
169 | 	uint64_t as_int64;
170 | 	#if defined(NANBOX_64)
171 | 	NANBOX_POINTER_TYPE pointer;
172 | 	#endif
173 | 	double as_double;
174 | 	#ifdef NANBOX_BIG_ENDIAN
175 | 	struct {
176 | 		uint32_t tag;
177 | 		uint32_t payload;
178 | 	} as_bits;
179 | 	#else
180 | 	struct {
181 | 		uint32_t payload;
182 | 		uint32_t tag;
183 | 	} as_bits;
184 | 	#endif
185 | };
186 | 
187 | #undef NANBOX_T
188 | #define NANBOX_T NANBOX_NAME(_t)
189 | typedef union NANBOX_NAME(_u) NANBOX_T;
190 | 
191 | #if defined(NANBOX_64)
192 | 
193 | /*
194 |  * 64-bit platforms
195 |  *
196 |  * This range of NaN space is represented by 64-bit numbers begining with
197 |  * 13 bits of ones. That is, the first 16 bits are 0xFFF8 or higher.  In
198 |  * practice, no higher value is used for NaNs.  We rely on the fact that no
199 |  * valid double-precision numbers will be "higher" than this (compared as an
200 |  * uint64).
201 |  *
202 |  * By adding 7 * 2^48 as a 64-bit integer addition, we shift the first 16 bits
203 |  * in the doubles from the range 0000..FFF8 to the range 0007..FFFF.  Doubles
204 |  * are decoded by reversing this operation, i.e. substracting the same number.
205 |  *
206 |  * The top 16-bits denote the type of the encoded nanbox_t:
207 |  *
208 |  *     Pointer {  0000:PPPP:PPPP:PPPP
209 |  *             /  0001:xxxx:xxxx:xxxx
210 |  *     Aux.   {           ...
211 |  *             \  0005:xxxx:xxxx:xxxx
212 |  *     Integer {  0006:0000:IIII:IIII
213 |  *              / 0007:****:****:****
214 |  *     Double  {          ...
215 |  *              \ FFFF:****:****:****
216 |  *
217 |  * 32-bit signed integers are marked with the 16-bit tag 0x0006.
218 |  *
219 |  * The tags 0x0001..0x0005 can be used to store five additional types of
220 |  * 48-bit auxillary data, each storing up to 48 bits of payload.
221 |  *
222 |  * The tag 0x0000 denotes a pointer, or another form of tagged immediate.
223 |  * Boolean, 'null', 'undefined' and 'deleted' are represented by specific,
224 |  * invalid pointer values:
225 |  *
226 |  *     False:     0x06
227 |  *     True:      0x07
228 |  *     Undefined: 0x0a
229 |  *     Null:      0x02
230 |  *     Empty:     0x00
231 |  *     Deleted:   0x05
232 |  *
233 |  * All of these except Empty have bit 0 or bit 1 set.
234 |  */
235 | 
236 | #define NANBOX_VALUE_EMPTY       0x0llu
237 | #define NANBOX_VALUE_DELETED     0x5llu
238 | 
239 | // Booleans have bits 1 and 2 set. True also has bit 0 set.
240 | #define NANBOX_VALUE_FALSE       0x06llu
241 | #define NANBOX_VALUE_TRUE        0x07llu
242 | 
243 | // Null and undefined both have bit 1 set. Undefined also has bit 3 set.
244 | #define NANBOX_VALUE_UNDEFINED   0x0Allu
245 | #define NANBOX_VALUE_NULL        0x02llu
246 | 
247 | // This value is 7 * 2^48, used to encode doubles such that the encoded value
248 | // will begin with a 16-bit pattern within the range 0x0007..0xFFFF.
249 | #define NANBOX_DOUBLE_ENCODE_OFFSET 0x0007000000000000llu
250 | // If the 16 first bits are 0x0002, this indicates an integer number.  Any
251 | // larger value is a double, so we can use >= to check for either integer or
252 | // double.
253 | #define NANBOX_MIN_NUMBER           0x0006000000000000llu
254 | #define NANBOX_HIGH16_TAG           0xffff000000000000llu
255 | 
256 | // There are 5 * 2^48 auxillary values can be stored in the 64-bit integer
257 | // range NANBOX_MIN_AUX..NANBOX_MAX_AUX.
258 | #define NANBOX_MIN_AUX_TAG          0x00010000
259 | #define NANBOX_MAX_AUX_TAG          0x0005ffff
260 | #define NANBOX_MIN_AUX              0x0001000000000000llu
261 | #define NANBOX_MAX_AUX              0x0005ffffffffffffllu
262 | 
263 | // NANBOX_MASK_POINTER defines the allowed non-zero bits in a pointer.
264 | #define NANBOX_MASK_POINTER         0x0000fffffffffffcllu
265 | 
266 | // The 'empty' value is guarranteed to consist of a repeated single byte,
267 | // so that it should be easy to memset an array of nanboxes to 'empty' using
268 | // NANBOX_EMPTY_BYTE as the value for every byte.
269 | #define NANBOX_EMPTY_BYTE           0x0
270 | 
271 | // Define bool nanbox_is_xxx(NANBOX_T val) and NANBOX_T nanbox_xxx(void)
272 | // with empty, deleted, true, false, undefined and null substituted for xxx.
273 | #define NANBOX_IMMEDIATE_VALUE_FUNCTIONS(NAME, VALUE)                \
274 | 	static inline NANBOX_T NANBOX_NAME(_##NAME)(void) {        \
275 | 		NANBOX_T val;                                        \
276 | 		val.as_int64 = VALUE;                                \
277 | 		return val;                                          \
278 | 	}                                                            \
279 | 	static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) { \
280 | 		return val.as_int64 == VALUE;                        \
281 | 	}
282 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(empty, NANBOX_VALUE_EMPTY)
283 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(deleted, NANBOX_VALUE_DELETED)
284 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(false, NANBOX_VALUE_FALSE)
285 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(true, NANBOX_VALUE_TRUE)
286 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(undefined, NANBOX_VALUE_UNDEFINED)
287 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(null, NANBOX_VALUE_NULL)
288 | 
289 | static inline bool NANBOX_NAME(_is_undefined_or_null)(NANBOX_T val) {
290 | 	// Undefined and null are the same if we remove the 'undefined' bit.
291 | 	return (val.as_int64 & ~8) == NANBOX_VALUE_NULL;
292 | }
293 | 
294 | static inline bool NANBOX_NAME(_is_boolean)(NANBOX_T val) {
295 | 	// True and false are the same if we remove the 'true' bit.
296 | 	return (val.as_int64 & ~1) == NANBOX_VALUE_FALSE;
297 | }
298 | static inline bool NANBOX_NAME(_to_boolean)(NANBOX_T val) {
299 | 	assert(NANBOX_NAME(_is_boolean)(val));
300 | 	return val.as_int64 & 1;
301 | }
302 | static inline NANBOX_T NANBOX_NAME(_from_boolean)(bool b) {
303 | 	NANBOX_T val;
304 | 	val.as_int64 = b ? NANBOX_VALUE_TRUE : NANBOX_VALUE_FALSE;
305 | 	return val;
306 | }
307 | 
308 | /* true if val is a double or an int */
309 | static inline bool NANBOX_NAME(_is_number)(NANBOX_T val) {
310 | 	return val.as_int64 >= NANBOX_MIN_NUMBER;
311 | }
312 | 
313 | static inline bool NANBOX_NAME(_is_int)(NANBOX_T val) {
314 | 	return (val.as_int64 & NANBOX_HIGH16_TAG) == NANBOX_MIN_NUMBER;
315 | }
316 | static inline NANBOX_T NANBOX_NAME(_from_int)(int32_t i) {
317 | 	NANBOX_T val;
318 | 	val.as_int64 = NANBOX_MIN_NUMBER | (uint32_t)i;
319 | 	return val;
320 | }
321 | static inline int32_t NANBOX_NAME(_to_int)(NANBOX_T val) {
322 | 	assert(NANBOX_NAME(_is_int)(val));
323 | 	return (int32_t)val.as_int64;
324 | }
325 | 
326 | static inline bool NANBOX_NAME(_is_double)(NANBOX_T val) {
327 | 	return NANBOX_NAME(_is_number)(val) && !NANBOX_NAME(_is_int)(val);
328 | }
329 | static inline NANBOX_T NANBOX_NAME(_from_double)(double d) {
330 | 	NANBOX_T val;
331 | 	val.as_double = d;
332 | 	val.as_int64 += NANBOX_DOUBLE_ENCODE_OFFSET;
333 | 	assert(NANBOX_NAME(_is_double)(val));
334 | 	return val;
335 | }
336 | static inline double NANBOX_NAME(_to_double)(NANBOX_T val) {
337 | 	assert(NANBOX_NAME(_is_double)(val));
338 | 	val.as_int64 -= NANBOX_DOUBLE_ENCODE_OFFSET;
339 | 	return val.as_double;
340 | }
341 | 
342 | static inline bool NANBOX_NAME(_is_pointer)(NANBOX_T val) {
343 |     return !(val.as_int64 & ~NANBOX_MASK_POINTER) && val.as_int64;
344 | }
345 | static inline NANBOX_POINTER_TYPE NANBOX_NAME(_to_pointer)(NANBOX_T val) {
346 | 	assert(NANBOX_NAME(_is_pointer)(val));
347 | 	return val.pointer;
348 | }
349 | static inline NANBOX_T NANBOX_NAME(_from_pointer)(NANBOX_POINTER_TYPE pointer) {
350 | 	NANBOX_T val;
351 | 	val.pointer = pointer;
352 | 	assert(NANBOX_NAME(_is_pointer)(val));
353 | 	return val;
354 | }
355 | 
356 | static inline bool NANBOX_NAME(_is_aux)(NANBOX_T val) {
357 | 	return val.as_int64 >= NANBOX_MIN_AUX &&
358 | 	       val.as_int64 <= NANBOX_MAX_AUX;
359 | }
360 | 
361 | /* end if NANBOX_64 */
362 | #elif defined(NANBOX_32)
363 | 
364 | /*
365 |  * On 32-bit platforms we use the following NaN-boxing scheme:
366 |  *
367 |  * For values that do not contain a double value, the high 32 bits contain the
368 |  * tag values listed below, which all correspond to NaN-space. When the tag is
369 |  * 'pointer', 'integer' and 'boolean', their values (the 'payload') are store
370 |  * in the lower 32 bits. In the case of all other tags the payload is 0.
371 |  */
372 | #define NANBOX_MAX_DOUBLE_TAG     0xfff80000
373 | #define NANBOX_INT_TAG            0xfff80001
374 | #define NANBOX_MIN_AUX_TAG        0xfff90000
375 | #define NANBOX_MAX_AUX_TAG        0xfffdffff
376 | #define NANBOX_POINTER_TAG        0xfffffffa
377 | #define NANBOX_BOOLEAN_TAG        0xfffffffb
378 | #define NANBOX_UNDEFINED_TAG      0xfffffffc
379 | #define NANBOX_NULL_TAG           0xfffffffd
380 | #define NANBOX_DELETED_VALUE_TAG  0xfffffffe
381 | #define NANBOX_EMPTY_VALUE_TAG    0xffffffff
382 | 
383 | // The 'empty' value is guarranteed to consist of a repeated single byte,
384 | // so that it should be easy to memset an array of nanboxes to 'empty' using
385 | // NANBOX_EMPTY_BYTE as the value for every byte.
386 | #define NANBOX_EMPTY_BYTE 0xff
387 | 
388 | /* The minimum uint64_t value for the auxillary range */
389 | #define NANBOX_MIN_AUX            0xfff9000000000000llu
390 | #define NANBOX_MAX_AUX            0xfffdffffffffffffllu
391 | 
392 | // Define nanbox_xxx and nanbox_is_xxx for deleted, undefined and null.
393 | #define NANBOX_IMMEDIATE_VALUE_FUNCTIONS(NAME, TAG)                   \
394 | 	static inline NANBOX_T NANBOX_NAME(_##NAME)(void) {       \
395 | 		NANBOX_T val;                                         \
396 | 		val.as_bits.tag = TAG;                                \
397 | 		val.as_bits.payload = 0;                              \
398 | 		return val;                                           \
399 | 	}                                                             \
400 | 	static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) {  \
401 | 		return val.as_bits.tag == TAG;                        \
402 | 	}
403 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(deleted, NANBOX_DELETED_VALUE_TAG)
404 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(undefined, NANBOX_UNDEFINED_TAG)
405 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(null, NANBOX_NULL_TAG)
406 | 
407 | // The undefined and null tags differ only in one bit
408 | static inline bool NANBOX_NAME(_is_undefined_or_null)(NANBOX_T val) {
409 | 	return (val.as_bits.tag & ~1) == NANBOX_UNDEFINED_TAG;
410 | }
411 | 
412 | static inline NANBOX_T NANBOX_NAME(_empty)(void) {
413 | 	NANBOX_T val;
414 | 	val.as_int64 = 0xffffffffffffffffllu;
415 | 	return val;
416 | }
417 | static inline bool NANBOX_NAME(_is_empty)(NANBOX_T val) {
418 | 	return val.as_bits.tag == 0xffffffff;
419 | }
420 | 
421 | /* Returns true if the value is auxillary space */
422 | static inline bool NANBOX_NAME(_is_aux)(NANBOX_T val) {
423 | 	return val.as_bits.tag >= NANBOX_MIN_AUX_TAG &&
424 | 	       val.as_bits.tag < NANBOX_POINTER_TAG;
425 | }
426 | 
427 | // Define nanbox_is_yyy, nanbox_to_yyy and nanbox_from_yyy for
428 | // boolean, int, pointer and aux1-aux5
429 | #define NANBOX_TAGGED_VALUE_FUNCTIONS(NAME, TYPE, TAG) \
430 | 	static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) { \
431 | 		return val.as_bits.tag == TAG; \
432 | 	} \
433 | 	static inline TYPE NANBOX_NAME(_to_##NAME)(NANBOX_T val) { \
434 | 		assert(val.as_bits.tag == TAG); \
435 | 		return (TYPE)val.as_bits.payload; \
436 | 	} \
437 | 	static inline NANBOX_T NANBOX_NAME(_from_##NAME)(TYPE a) { \
438 | 		NANBOX_T val; \
439 | 		val.as_bits.tag = TAG; \
440 | 		val.as_bits.payload = (int32_t)a; \
441 | 		return val; \
442 | 	}
443 | 
444 | NANBOX_TAGGED_VALUE_FUNCTIONS(boolean, bool, NANBOX_BOOLEAN_TAG)
445 | NANBOX_TAGGED_VALUE_FUNCTIONS(int, int32_t, NANBOX_INT_TAG)
446 | NANBOX_TAGGED_VALUE_FUNCTIONS(pointer, NANBOX_POINTER_TYPE, NANBOX_POINTER_TAG)
447 | 
448 | static inline NANBOX_T NANBOX_NAME(_true)(void) {
449 | 	return NANBOX_NAME(_from_boolean)(true);
450 | }
451 | static inline NANBOX_T NANBOX_NAME(_false)(void) {
452 | 	return NANBOX_NAME(_from_boolean)(false);
453 | }
454 | static inline bool NANBOX_NAME(_is_true)(NANBOX_T val) {
455 | 	return val.as_bits.tag == NANBOX_BOOLEAN_TAG && val.as_bits.payload;
456 | }
457 | static inline bool NANBOX_NAME(_is_false)(NANBOX_T val) {
458 | 	return val.as_bits.tag == NANBOX_BOOLEAN_TAG && !val.as_bits.payload;
459 | }
460 | 
461 | static inline bool NANBOX_NAME(_is_double)(NANBOX_T val) {
462 | 	return val.as_bits.tag < NANBOX_INT_TAG;
463 | }
464 | // is number = is double or is int
465 | static inline bool NANBOX_NAME(_is_number)(NANBOX_T val) {
466 | 	return val.as_bits.tag <= NANBOX_INT_TAG;
467 | }
468 | 
469 | static inline NANBOX_T NANBOX_NAME(_from_double)(double d) {
470 | 	NANBOX_T val;
471 | 	val.as_double = d;
472 | 	assert(NANBOX_NAME(_is_double)(val) &&
473 | 	       val.as_bits.tag <= NANBOX_MAX_DOUBLE_TAG);
474 | 	return val;
475 | }
476 | static inline double NANBOX_NAME(_to_double)(NANBOX_T val) {
477 | 	assert(NANBOX_NAME(_is_double)(val));
478 | 	return val.as_double;
479 | }
480 | 
481 | #endif /* elif NANBOX_32 */
482 | 
483 | /*
484 |  * Representation-independent functions
485 |  */
486 | 
487 | static inline double NANBOX_NAME(_to_number)(NANBOX_T val) {
488 | 	assert(NANBOX_NAME(_is_number)(val));
489 | 	return NANBOX_NAME(_is_int)(val) ? NANBOX_NAME(_to_int)(val)
490 | 	                                 : NANBOX_NAME(_to_double)(val);
491 | }
492 | 
493 | #endif /* NANBOX_H */
494 | 


--------------------------------------------------------------------------------
/nanbox_shortstring.h:
--------------------------------------------------------------------------------
 1 | #ifndef NANBOX_SHORTSTRING_H
 2 | #define NANBOX_SHORTSTRING_H
 3 | /*
 4 |  * Short strings
 5 |  * -------------
 6 |  * Strings of up to 6 bytes can be stored in a NANBOX_T in so called 'auxillary
 7 |  * space'. The space used is NANBOX_MIN_AUX..(NANBOX_MIN_AUX + 3 * 2^48 - 1).
 8 |  */
 9 | 
10 | #include "nanbox.h"
11 | 
12 | static inline bool NANBOX_NAME(_is_shortstring)(NANBOX_T val) {
13 | 	return val.as_bits.tag >= NANBOX_MIN_AUX_TAG &&
14 | 	       val.as_bits.tag <= NANBOX_MIN_AUX_TAG + 0x0002ffff;
15 | }
16 | static inline char* NANBOX_NAME(_shortstring_chars)(NANBOX_T* val) {
17 | 	assert(NANBOX_NAME(_is_shortstring)(*val));
18 | 	#ifdef NANBOX_BIG_ENDIAN
19 | 	if (val->as_bits.tag & 0xffff0000 == NANBOX_MIN_AUX_TAG)
20 | 		return (char*)val + 4; /* skip tag and length */
21 | 	else
22 | 		return (char*)val + 2; /* skip tag */
23 | 	#else
24 | 	return (char*)val;
25 | 	#endif
26 | }
27 | 
28 | static inline unsigned NANBOX_NAME(_shortstring_length)(NANBOX_T val) {
29 | 	assert(NANBOX_NAME(_is_shortstring)(val));
30 | 	if (val.as_bits.tag <= NANBOX_MIN_AUX_TAG + 4)
31 | 		return val.as_bits.tag - NANBOX_MIN_AUX_TAG;
32 | 	else
33 | 		return ((val.as_bits.tag - NANBOX_MIN_AUX_TAG) >> 16) + 4;
34 | }
35 | 
36 | // creates a short string of length byts with undefined contents
37 | static inline NANBOX_T NANBOX_NAME(_shortstring_create_undef)(unsigned length) {
38 | 	NANBOX_T val;
39 | 	assert(length <= 6);
40 | 	if (length <= 4)
41 | 		val.as_bits.tag = NANBOX_MIN_AUX_TAG + length;
42 | 	else
43 | 		val.as_bits.tag = NANBOX_MIN_AUX_TAG + ((length - 4) << 16);
44 | 	val.as_bits.payload = 0;
45 | 	return val;
46 | }
47 | 
48 | // copies length bytes of chars. (nul bytes are copied like any other byte)
49 | static inline NANBOX_T
50 | NANBOX_NAME(_shortstring_create)(const char *chars, unsigned length) {
51 | 	NANBOX_T val = NANBOX_NAME(_shortstring_create_undef)(length);
52 | 	memcpy(NANBOX_NAME(_shortstring_chars)(&val), chars, length);
53 | 	assert(NANBOX_NAME(_is_shortstring)(val));
54 | 	return val;
55 | }
56 | #endif
57 | 


--------------------------------------------------------------------------------
/shortstring_demo.c:
--------------------------------------------------------------------------------
 1 | #include "nanbox_shortstring.h"
 2 | #include <stdlib.h>
 3 | #include <stdio.h>
 4 | 
 5 | int main() {
 6 | 	printf("Enter short strings of up to 6 chars to dump, q to quit.\n");
 7 | 	while (1) {
 8 | 		char buf[7];
 9 | 		size_t len;
10 | 		printf("Short string --> ");
11 | 		if (scanf("%6s", buf) == EOF)
12 | 			break;
13 | 		if (!strcmp(buf, "q"))
14 | 			break;
15 | 		len = strlen(buf);
16 | 		nanbox_t val = nanbox_shortstring_create(buf, len);
17 | 		printf("%p \"%.6s\" (length %d)\n",
18 | 		       (void*)val.as_int64,
19 | 		       nanbox_shortstring_chars(&val),
20 | 		       nanbox_shortstring_length(val));
21 | 	}
22 | 	printf("\n");
23 | 	return 0;
24 | }
25 | 


--------------------------------------------------------------------------------
/test.c:
--------------------------------------------------------------------------------
  1 | #include <assert.h>
  2 | #include <math.h>
  3 | #include <stdio.h>
  4 | 
  5 | #include "nanbox.h"
  6 | 
  7 | // This macro stores a value VALUE of type TYPE in a nanbox, checks the type,
  8 | // converts back and checks that we got the value back. It also checks that the
  9 | // nanbox is not of any other type.
 10 | //
 11 | // The only NaN expression that is possible to test with this expression is
 12 | // 0.0/0.0. (There is some special logic to NaN, because NaN != NaN.
 13 | #define TO_NANBOX_AND_BACK(TYPE, VALUE) do {                                  \
 14 | 	nanbox_t x = nanbox_from_##TYPE(VALUE);                               \
 15 | 	assert(nanbox_is_##TYPE(x));                                          \
 16 | 	/* decode and test == to the original, except for NaN */              \
 17 | 	if (!strcmp(#TYPE, "double") && !strcmp(#VALUE, "0.0/0.0"))           \
 18 | 		assert(VALUE != nanbox_to_##TYPE(x));                         \
 19 | 	else                                                                  \
 20 | 		assert(VALUE == nanbox_to_##TYPE(x));                         \
 21 | 	assert(nanbox_is_double(x)  == !strcmp(#TYPE, "double"));             \
 22 | 	assert(nanbox_is_int(x)     == !strcmp(#TYPE, "int"));                \
 23 | 	assert(nanbox_is_pointer(x) == !strcmp(#TYPE, "pointer"));            \
 24 | 	assert(nanbox_is_boolean(x) == !strcmp(#TYPE, "boolean"));            \
 25 | 	assert(nanbox_is_number(x)  == (!strcmp(#TYPE, "double") ||           \
 26 | 	                                !strcmp(#TYPE, "int")));              \
 27 | 	assert(!nanbox_is_null(x));                                           \
 28 | 	assert(!nanbox_is_undefined(x));                                      \
 29 | 	assert(!nanbox_is_empty(x));                                          \
 30 | 	assert(!nanbox_is_deleted(x));                                        \
 31 | 	assert(!nanbox_is_aux(x));                                            \
 32 | 	assert(nanbox_is_true(x)    == !strcmp(#VALUE, "true"));              \
 33 | 	assert(nanbox_is_false(x)   == !strcmp(#VALUE, "false"));             \
 34 | } while(0)
 35 | 
 36 | // Use this to create and check a nanbox of null, undefined, empty, deleted,
 37 | // true or false. It tests that it is of the correct type and no other type.
 38 | #define TO_NANBOX_AND_CHECK(VALUE) do {                                       \
 39 | 	nanbox_t x = nanbox_##VALUE();                                        \
 40 | 	assert(!nanbox_is_double(x));                                         \
 41 | 	assert(!nanbox_is_int(x));                                            \
 42 | 	assert(!nanbox_is_pointer(x));                                        \
 43 | 	assert(!nanbox_is_number(x));                                         \
 44 | 	assert(!nanbox_is_aux(x));                                            \
 45 | 	assert(nanbox_is_boolean(x)   == (!strcmp(#VALUE, "true") ||          \
 46 | 	                                  !strcmp(#VALUE, "false")));         \
 47 | 	assert(nanbox_is_undefined_or_null(x) ==                              \
 48 | 	       (!strcmp(#VALUE, "undefined") || !strcmp(#VALUE, "null")));    \
 49 | 	assert(nanbox_is_null(x)      == !strcmp(#VALUE, "null"));            \
 50 | 	assert(nanbox_is_undefined(x) == !strcmp(#VALUE, "undefined"));       \
 51 | 	assert(nanbox_is_empty(x)     == !strcmp(#VALUE, "empty"));           \
 52 | 	assert(nanbox_is_deleted(x)   == !strcmp(#VALUE, "deleted"));         \
 53 | } while(0)
 54 | 
 55 | // Definded below. Called from main.
 56 | void test_nan(void);
 57 | 
 58 | int main() {
 59 | 	// Size should be 16 bits
 60 | 	assert(sizeof(nanbox_t) == 8);
 61 | 
 62 | 	// Test storing various doubles, including NaN and infinity.
 63 | 	TO_NANBOX_AND_BACK(double, -0.0);
 64 | 	TO_NANBOX_AND_BACK(double, 3.14);
 65 | 	TO_NANBOX_AND_BACK(double, 1.0/0.0);
 66 | 	TO_NANBOX_AND_BACK(double, -1.0/0.0);
 67 | 	TO_NANBOX_AND_BACK(double, 0.0/0.0);
 68 | 
 69 | 	// Test storing int, pointer and boolean
 70 | 	TO_NANBOX_AND_BACK(int, 42);
 71 | 	TO_NANBOX_AND_BACK(pointer, &x);
 72 | 	TO_NANBOX_AND_BACK(boolean, true);
 73 | 	TO_NANBOX_AND_BACK(boolean, false);
 74 | 
 75 | 	// The remaining types/values
 76 | 	//TO_NANBOX_AND_CHECK(null);
 77 | 	//TO_NANBOX_AND_CHECK(undefined);
 78 | 	TO_NANBOX_AND_CHECK(empty);
 79 | 	TO_NANBOX_AND_CHECK(deleted);
 80 | 	TO_NANBOX_AND_CHECK(true);
 81 | 	TO_NANBOX_AND_CHECK(false);
 82 | 
 83 | 	test_nan();
 84 | 
 85 | 	return 0;
 86 | }
 87 | 
 88 | // A macro to check that a double is a cannonical NaN, i.e. one that we accept.
 89 | // Also, nanboxes it and checks that it identified as a double.
 90 | #define ASSERT_CANNONICAL_NAN(VALUE) do {                                         \
 91 | 	double d = VALUE;                                                         \
 92 | 	uint64_t n = *(uint64_t*)&d;                                              \
 93 | 	assert((n | 0x8000000000000000llu) == 0xfff8000000000000llu);             \
 94 | 	assert(nanbox_is_double(nanbox_from_double(VALUE)));                      \
 95 | } while(0)
 96 | 
 97 | void test_nan(void) {
 98 | 	double nan = 0.0/0.0, inf = 1.0/0.0, ninf = -1.0/0.0;
 99 | 	assert(nan != nan);
100 | 	ASSERT_CANNONICAL_NAN(0.0/0.0);
101 | 	ASSERT_CANNONICAL_NAN(nan);
102 | 	ASSERT_CANNONICAL_NAN(nan + 42);
103 | 	ASSERT_CANNONICAL_NAN(-inf * nan);
104 | 
105 | 	ASSERT_CANNONICAL_NAN(inf/inf);
106 | 	ASSERT_CANNONICAL_NAN(ninf/inf);
107 | 	ASSERT_CANNONICAL_NAN(0 * inf);
108 | 	ASSERT_CANNONICAL_NAN(0 * ninf);
109 | 	ASSERT_CANNONICAL_NAN(inf * 0);
110 | 	ASSERT_CANNONICAL_NAN(inf + ninf);
111 | 	ASSERT_CANNONICAL_NAN(ninf + inf);
112 | 
113 | 	ASSERT_CANNONICAL_NAN(pow(-1.0, 3.14));
114 | 	ASSERT_CANNONICAL_NAN(sqrt(-1.0));
115 | 	ASSERT_CANNONICAL_NAN(log(-1.0));
116 | 	ASSERT_CANNONICAL_NAN(asin(2.0));
117 | 	ASSERT_CANNONICAL_NAN(acos(2.0));
118 | }
119 | 


--------------------------------------------------------------------------------