├── .gitignore
├── README.md
├── nanbox.h
├── nanbox_shortstring.h
├── shortstring_demo.c
└── test.c
/.gitignore:
--------------------------------------------------------------------------------
1 | a.out
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | nanbox
2 | ======
3 |
4 | A type that can store various types of data in 64-bits using NaN-boxing.
5 |
6 | The header file `nanbox.h` defines a type `nanbox_t` which can be used to store either a double, a 32-bit integer, a pointer, a boolean, null or one of a few additional values named 'undefined', 'empty' and 'deleted' plus five additional 'auxillary' types of data of up to 48 bits. The encoding scheme differs between 32-bit and 64-bit platforms but the size of `nanbox_t` is always 64 bits.
7 |
8 | How does it work?
9 | -----------------
10 |
11 | NaN-boxing is a way to store various information in unused NaN-space in the IEEE754 representation.
12 |
13 | Any value with the top 13 bits set represents a *quiet NaN*. The remaining bits are called the 'payload'. NaNs produced by hardware and C-library functions typically produce a payload of zero. We assume that all quiet NaNs with a non-zero payload can be used to encode whatever we want.
14 |
15 | On 64-bit platforms, unused bits in pointers are also used to encode various information. The representation is inspired by that used by Webkit's JavaScriptCore. It *should work* on most 32-bit and 64-bit little endian and big endian machines. (See Testing below.)
16 |
17 | Functions
18 | ---------
19 |
20 | A number of very short functions functions, all declared `static inline`, are defined to encode values as `nanbox_t`:
21 |
22 | ```c
23 | nanbox_t nanbox_from_double(double d);
24 | nanbox_t nanbox_from_int(int32_t i);
25 | nanbox_t nanbox_from_pointer(void* pointer);
26 | nanbox_t nanbox_from_boolean(bool b);
27 | nanbox_t nanbox_null(void);
28 | nanbox_t nanbox_undefined(void);
29 | nanbox_t nanbox_empty(void);
30 | nanbox_t nanbox_deleted(void);
31 | nanbox_t nanbox_true(void); /* the same as nanbox_from_boolean(true) */
32 | nanbox_t nanbox_false(void); /* the same as nanbox_from_boolean(false) */
33 | ```
34 |
35 | ... to check the type:
36 |
37 | ```c
38 | bool nanbox_is_double(nanbox_t value);
39 | bool nanbox_is_int(nanbox_t value);
40 | bool nanbox_is_pointer(nanbox_t value);
41 | bool nanbox_is_boolean(nanbox_t value);
42 | bool nanbox_is_null(nanbox_t value);
43 | bool nanbox_is_undefined(nanbox_t value);
44 | bool nanbox_is_empty(nanbox_t value);
45 | bool nanbox_is_deleted(nanbox_t value);
46 | bool nanbox_is_true(nanbox_t value);
47 | bool nanbox_is_false(nanbox_t value);
48 | bool nanbox_is_number(nanbox_t value); /* either int or double */
49 | bool nanbox_is_undefined_or_null(nanbox_t value); /* either */
50 | bool nanbox_is_aux(nanbox_t value); /* auxillary space */
51 | ```
52 |
53 | ... and to decode the value:
54 |
55 | ```c
56 | double nanbox_to_double(nanbox_t value);
57 | int32_t nanbox_to_int(nanbox_t value);
58 | void* nanbox_to_pointer(nanbox_t value);
59 | bool nanbox_to_boolean(nanbox_t value);
60 | double nanbox_to_number(nanbox_t value); /* value can be int or double */
61 | ```
62 |
63 | Before fetching the value using these functions, you should make sure the nanbox is holdig a value of the correct type, e.g. using the corresponding `nanbox_is_...` function. If the encoded value is not of the correct type, the results of the `nanbox_to_...` functions are undefined. If compiled with assertions, you will get a failed assertion when trying to fetch a value of the wrong type.
64 |
65 | The 'empty' value
66 | -----------------
67 |
68 | The 'empty' value is designed to used to represent empty slots in e.g. a hashtable. It is guarranteed to consist of a single repeated byte. This is to make sure `memset` can be used to set all the elements in an array of nanboxes to 'empty'. The macro `NANBOX_EMPTY_BYTE` represents the byte that, when repeated 8 times (64 bits), makes up an 'empty' value.
69 |
70 | ```c
71 | void foo(void) {
72 | nanbox_t boxes[100];
73 | // Initialize the boxes to empty values
74 | memset(boxes, NANBOX_EMPTY_BYTE, sizeof(nanbox_t) * 100);
75 | // ...
76 | }
77 | ```
78 |
79 | User-defined prefix instead of 'nanbox'
80 | ---------------------------------------
81 |
82 | You can define `NANBOX_PREFIX` to the prefix you want, before including
83 | `nanbox.h`. Then, the functions and types will be e.g.
84 | `bool myprefix_is_double(myprefix_t value)`, etc. By undefining `NANBOX_H` and
85 | redefining `NANBOX_PREFIX` (and possibly some of the other macros such as
86 | `NANBOX_POINTER_TYPE`) you can include `nanbox.h` multiple times to create
87 | multiple instances of nanbox type.
88 |
89 | User-defined pointer type
90 | -------------------------
91 |
92 | When encoding and decoding pointers to/from a nanbox, the pointer type `void*` is used by default. This can be changed by defining `NANBOX_POINTER_TYPE` to the pointer type of choice, before including `nanbox.h`. The type must be a pointer type, because unused bits in the pointers are used to encode various data.
93 |
94 | Auxillary data
95 | --------------
96 |
97 | Apart from doubles, pointers, ints, booleans, null, etc. there are still some bits left to store even more types of data in a nanbox. We call this 'auxillary space'. To check if the type of data in a nanbox is 'auxillary data', the function `nanbox_is_aux` can be used, but accessing the data itself requires some insight into the internal representation of the nanbox. `nanbox_h` is a union type, which means it can be accessed in multiple ways. The easiest way is to access the nanbox raw data is as a 64-bit integer using `nanbox.as_int64`. You can only store 64-bit integer value in the range `NANBOX_MIN_AUX`..`NANBOX_MAX_AUX`, which is the 'auxillary space'. You can store 5 * 248 distinct values in this range, or equivallently, 5 types of 48-bit values.
98 |
99 | Another way to access the data is to use `tag` and `payload`. These each represent 32 bits of the nanbox data. If the a nanbox has its tag (`nanbox.as_bits.tag`) in the range `NANBOX_MIN_AUX_TAG`..`NANBOX_MAX_AUX_TAG` and a payload `nanbox.as_bits.payload` being any 32-bit integer value, then the nanbox data is in auxillary space.
100 |
101 | Short strings
102 | -------------
103 |
104 | As an example of what auxillary data can be used for, the file `nanbox_shortstring.h` is included, which implements a scheme to store strings of up to 6 bytes in the auxillary space of a nanbox. The functions `nanbox_is_shortstring`, `nanbox_shortstring_create`, etc. are defined and a small demo program is included in `shortstring_demo.c`.
105 |
106 | Testing
107 | -------
108 |
109 | Tested on
110 | * x86-64 (Intel Core 2 Duo), Mac OS X (Darwin 10.0.8) in 64-bit and 32-bit mode, using gcc version 4.2.1 (Apple Inc. build 5664).
111 |
112 | I would like to add more architectures to the above list, especially non-Intel ones such as ARM and big endian systems such as SPARC. If you test this on another architecture or with another compiler, please drop me a line!
113 |
114 | To test with gcc, use the command `gcc -std=c99 -Wall -pedantic -o test test.c` to compile the test. It should produce no warnings. The executable `test` should run without outputting any errors.
115 |
116 | On x86-64 platforms, it is also possible to test in 32-bit mode using the -m32 flag as in `gcc -m32 -std=c99 -Wall -pedantic -o test test.c`.
117 |
--------------------------------------------------------------------------------
/nanbox.h:
--------------------------------------------------------------------------------
1 | /*
2 | * The MIT License (MIT)
3 | *
4 | * Copyright (c) 2013 Viktor Söderqvist
5 | *
6 | * Permission is hereby granted, free of charge, to any person obtaining a copy
7 | * of this software and associated documentation files (the "Software"), to deal
8 | * in the Software without restriction, including without limitation the rights
9 | * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10 | * copies of the Software, and to permit persons to whom the Software is
11 | * furnished to do so, subject to the following conditions:
12 | *
13 | * The above copyright notice and this permission notice shall be included in
14 | * all copies or substantial portions of the Software.
15 | *
16 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
22 | * THE SOFTWARE.
23 | */
24 |
25 | /*
26 | * nanbox.h
27 | * --------
28 | *
29 | * This file provides a is a way to store various types of data in a 64-bit
30 | * slot, including a type tag, using NaN-boxing. NaN-boxing is a way to store
31 | * various information in unused NaN-space in the IEEE754 representation. For
32 | * 64-bit platforms, unused bits in pointers are also used to encode various
33 | * information. The representation in inspired by that used by Webkit's
34 | * JavaScriptCore.
35 | *
36 | * Datatypes that can be stored:
37 | *
38 | * * int (int32_t)
39 | * * double
40 | * * pointer
41 | * * boolean (true and false)
42 | * * null
43 | * * undefined
44 | * * empty
45 | * * deleted
46 | * * aux 'auxillary data' (5 types of 48-bit values)
47 | *
48 | * Any value with the top 13 bits set represents a quiet NaN. The remaining
49 | * bits are called the 'payload'. NaNs produced by hardware and C-library
50 | * functions typically produce a payload of zero. We assume that all quiet
51 | * NaNs with a non-zero payload can be used to encode whatever we want.
52 | */
53 |
54 | #ifndef NANBOX_H
55 | #define NANBOX_H
56 |
57 | /*
58 | * Define this before including this file to get functions and type prefixed
59 | * with something other than "nanbox".
60 | */
61 | #ifndef NANBOX_PREFIX
62 | #define NANBOX_PREFIX nanbox
63 | #endif
64 |
65 | /* User-defined pointer type. Defaults to void*. Must be a pointer type. */
66 | #ifndef NANBOX_POINTER_TYPE
67 | #define NANBOX_POINTER_TYPE void*
68 | #endif
69 |
70 | /*
71 | * User-defined auxillary types. Default to void*. These types must be pointer
72 | * types or 32-bit types. (Pointers on 64-bit platforms always begin with 16
73 | * bits of zero.)
74 | */
75 | #ifndef NANBOX_AUX1_TYPE
76 | #define NANBOX_AUX1_TYPE void*
77 | #endif
78 | #ifndef NANBOX_AUX2_TYPE
79 | #define NANBOX_AUX2_TYPE void*
80 | #endif
81 | #ifndef NANBOX_AUX3_TYPE
82 | #define NANBOX_AUX3_TYPE void*
83 | #endif
84 | #ifndef NANBOX_AUX4_TYPE
85 | #define NANBOX_AUX4_TYPE void*
86 | #endif
87 | #ifndef NANBOX_AUX5_TYPE
88 | #define NANBOX_AUX5_TYPE void*
89 | #endif
90 |
91 |
92 | #include // size_t
93 | #include // int64_t, int32_t
94 | #include // bool, true, false
95 | #include // memset
96 | #include
97 |
98 | /*
99 | * Macros to expand the prefix.
100 | */
101 | #undef NANBOX_XXNAME
102 | #define NANBOX_XXNAME(prefix, name) prefix ## name
103 | #undef NANBOX_XNAME
104 | #define NANBOX_XNAME(prefix, name) NANBOX_XXNAME(prefix, name)
105 | #undef NANBOX_NAME
106 | #define NANBOX_NAME(name) NANBOX_XNAME(NANBOX_PREFIX, name)
107 |
108 | /*
109 | * Detect OS and endianess.
110 | *
111 | * Most of this is inspired by WTF/wtf/Platform.h in Webkit's source code.
112 | */
113 |
114 | /* Unix? */
115 | #if defined(_AIX) \
116 | || defined(__APPLE__) /* Darwin */ \
117 | || defined(__FreeBSD__) || defined(__DragonFly__) \
118 | || defined(__FreeBSD_kernel__) \
119 | || defined(__GNU__) /* GNU/Hurd */ \
120 | || defined(__linux__) \
121 | || defined(__NetBSD__) \
122 | || defined(__OpenBSD__) \
123 | || defined(__QNXNTO__) \
124 | || defined(sun) || defined(__sun) /* Solaris */ \
125 | || defined(unix) || defined(__unix) || defined(__unix__)
126 | #define NANBOX_UNIX 1
127 | #endif
128 |
129 | /* Windows? */
130 | #if defined(WIN32) || defined(_WIN32)
131 | #define NANBOX_WINDOWS 1
132 | #endif
133 |
134 | /* 64-bit mode? (Mostly equivallent to how WebKit does it) */
135 | #if ((defined(__x86_64__) || defined(_M_X64)) \
136 | && (defined(NANBOX_UNIX) || defined(NANBOX_WINDOWS))) \
137 | || (defined(__ia64__) && defined(__LP64__)) /* Itanium in LP64 mode */ \
138 | || defined(__alpha__) /* DEC Alpha */ \
139 | || (defined(__sparc__) && defined(__arch64__) || defined (__sparcv9)) /* BE */ \
140 | || defined(__s390x__) /* S390 64-bit (BE) */ \
141 | || (defined(__ppc64__) || defined(__PPC64__)) \
142 | || defined(__aarch64__) /* ARM 64-bit */
143 | #define NANBOX_64 1
144 | #else
145 | #define NANBOX_32 1
146 | #endif
147 |
148 | /* Big endian? (Mostly equivallent to how WebKit does it) */
149 | #if defined(__MIPSEB__) /* MIPS 32-bit */ \
150 | || defined(__ppc__) || defined(__PPC__) /* CPU(PPC) - PowerPC 32-bit */ \
151 | || defined(__powerpc__) || defined(__powerpc) || defined(__POWERPC__) \
152 | || defined(_M_PPC) || defined(__PPC) \
153 | || defined(__ppc64__) || defined(__PPC64__) /* PowerPC 64-bit */ \
154 | || defined(__sparc) /* Sparc 32bit */ \
155 | || defined(__sparc__) /* Sparc 64-bit */ \
156 | || defined(__s390x__) /* S390 64-bit */ \
157 | || defined(__s390__) /* S390 32-bit */ \
158 | || defined(__ARMEB__) /* ARM big endian */ \
159 | || ((defined(__CC_ARM) || defined(__ARMCC__)) /* ARM RealView compiler */ \
160 | && defined(__BIG_ENDIAN))
161 | #define NANBOX_BIG_ENDIAN 1
162 | #endif
163 |
164 | /*
165 | * In 32-bit mode, the double is unmasked. In 64-bit mode, the pointer is
166 | * unmasked.
167 | */
168 | union NANBOX_NAME(_u) {
169 | uint64_t as_int64;
170 | #if defined(NANBOX_64)
171 | NANBOX_POINTER_TYPE pointer;
172 | #endif
173 | double as_double;
174 | #ifdef NANBOX_BIG_ENDIAN
175 | struct {
176 | uint32_t tag;
177 | uint32_t payload;
178 | } as_bits;
179 | #else
180 | struct {
181 | uint32_t payload;
182 | uint32_t tag;
183 | } as_bits;
184 | #endif
185 | };
186 |
187 | #undef NANBOX_T
188 | #define NANBOX_T NANBOX_NAME(_t)
189 | typedef union NANBOX_NAME(_u) NANBOX_T;
190 |
191 | #if defined(NANBOX_64)
192 |
193 | /*
194 | * 64-bit platforms
195 | *
196 | * This range of NaN space is represented by 64-bit numbers begining with
197 | * 13 bits of ones. That is, the first 16 bits are 0xFFF8 or higher. In
198 | * practice, no higher value is used for NaNs. We rely on the fact that no
199 | * valid double-precision numbers will be "higher" than this (compared as an
200 | * uint64).
201 | *
202 | * By adding 7 * 2^48 as a 64-bit integer addition, we shift the first 16 bits
203 | * in the doubles from the range 0000..FFF8 to the range 0007..FFFF. Doubles
204 | * are decoded by reversing this operation, i.e. substracting the same number.
205 | *
206 | * The top 16-bits denote the type of the encoded nanbox_t:
207 | *
208 | * Pointer { 0000:PPPP:PPPP:PPPP
209 | * / 0001:xxxx:xxxx:xxxx
210 | * Aux. { ...
211 | * \ 0005:xxxx:xxxx:xxxx
212 | * Integer { 0006:0000:IIII:IIII
213 | * / 0007:****:****:****
214 | * Double { ...
215 | * \ FFFF:****:****:****
216 | *
217 | * 32-bit signed integers are marked with the 16-bit tag 0x0006.
218 | *
219 | * The tags 0x0001..0x0005 can be used to store five additional types of
220 | * 48-bit auxillary data, each storing up to 48 bits of payload.
221 | *
222 | * The tag 0x0000 denotes a pointer, or another form of tagged immediate.
223 | * Boolean, 'null', 'undefined' and 'deleted' are represented by specific,
224 | * invalid pointer values:
225 | *
226 | * False: 0x06
227 | * True: 0x07
228 | * Undefined: 0x0a
229 | * Null: 0x02
230 | * Empty: 0x00
231 | * Deleted: 0x05
232 | *
233 | * All of these except Empty have bit 0 or bit 1 set.
234 | */
235 |
236 | #define NANBOX_VALUE_EMPTY 0x0llu
237 | #define NANBOX_VALUE_DELETED 0x5llu
238 |
239 | // Booleans have bits 1 and 2 set. True also has bit 0 set.
240 | #define NANBOX_VALUE_FALSE 0x06llu
241 | #define NANBOX_VALUE_TRUE 0x07llu
242 |
243 | // Null and undefined both have bit 1 set. Undefined also has bit 3 set.
244 | #define NANBOX_VALUE_UNDEFINED 0x0Allu
245 | #define NANBOX_VALUE_NULL 0x02llu
246 |
247 | // This value is 7 * 2^48, used to encode doubles such that the encoded value
248 | // will begin with a 16-bit pattern within the range 0x0007..0xFFFF.
249 | #define NANBOX_DOUBLE_ENCODE_OFFSET 0x0007000000000000llu
250 | // If the 16 first bits are 0x0002, this indicates an integer number. Any
251 | // larger value is a double, so we can use >= to check for either integer or
252 | // double.
253 | #define NANBOX_MIN_NUMBER 0x0006000000000000llu
254 | #define NANBOX_HIGH16_TAG 0xffff000000000000llu
255 |
256 | // There are 5 * 2^48 auxillary values can be stored in the 64-bit integer
257 | // range NANBOX_MIN_AUX..NANBOX_MAX_AUX.
258 | #define NANBOX_MIN_AUX_TAG 0x00010000
259 | #define NANBOX_MAX_AUX_TAG 0x0005ffff
260 | #define NANBOX_MIN_AUX 0x0001000000000000llu
261 | #define NANBOX_MAX_AUX 0x0005ffffffffffffllu
262 |
263 | // NANBOX_MASK_POINTER defines the allowed non-zero bits in a pointer.
264 | #define NANBOX_MASK_POINTER 0x0000fffffffffffcllu
265 |
266 | // The 'empty' value is guarranteed to consist of a repeated single byte,
267 | // so that it should be easy to memset an array of nanboxes to 'empty' using
268 | // NANBOX_EMPTY_BYTE as the value for every byte.
269 | #define NANBOX_EMPTY_BYTE 0x0
270 |
271 | // Define bool nanbox_is_xxx(NANBOX_T val) and NANBOX_T nanbox_xxx(void)
272 | // with empty, deleted, true, false, undefined and null substituted for xxx.
273 | #define NANBOX_IMMEDIATE_VALUE_FUNCTIONS(NAME, VALUE) \
274 | static inline NANBOX_T NANBOX_NAME(_##NAME)(void) { \
275 | NANBOX_T val; \
276 | val.as_int64 = VALUE; \
277 | return val; \
278 | } \
279 | static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) { \
280 | return val.as_int64 == VALUE; \
281 | }
282 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(empty, NANBOX_VALUE_EMPTY)
283 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(deleted, NANBOX_VALUE_DELETED)
284 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(false, NANBOX_VALUE_FALSE)
285 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(true, NANBOX_VALUE_TRUE)
286 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(undefined, NANBOX_VALUE_UNDEFINED)
287 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(null, NANBOX_VALUE_NULL)
288 |
289 | static inline bool NANBOX_NAME(_is_undefined_or_null)(NANBOX_T val) {
290 | // Undefined and null are the same if we remove the 'undefined' bit.
291 | return (val.as_int64 & ~8) == NANBOX_VALUE_NULL;
292 | }
293 |
294 | static inline bool NANBOX_NAME(_is_boolean)(NANBOX_T val) {
295 | // True and false are the same if we remove the 'true' bit.
296 | return (val.as_int64 & ~1) == NANBOX_VALUE_FALSE;
297 | }
298 | static inline bool NANBOX_NAME(_to_boolean)(NANBOX_T val) {
299 | assert(NANBOX_NAME(_is_boolean)(val));
300 | return val.as_int64 & 1;
301 | }
302 | static inline NANBOX_T NANBOX_NAME(_from_boolean)(bool b) {
303 | NANBOX_T val;
304 | val.as_int64 = b ? NANBOX_VALUE_TRUE : NANBOX_VALUE_FALSE;
305 | return val;
306 | }
307 |
308 | /* true if val is a double or an int */
309 | static inline bool NANBOX_NAME(_is_number)(NANBOX_T val) {
310 | return val.as_int64 >= NANBOX_MIN_NUMBER;
311 | }
312 |
313 | static inline bool NANBOX_NAME(_is_int)(NANBOX_T val) {
314 | return (val.as_int64 & NANBOX_HIGH16_TAG) == NANBOX_MIN_NUMBER;
315 | }
316 | static inline NANBOX_T NANBOX_NAME(_from_int)(int32_t i) {
317 | NANBOX_T val;
318 | val.as_int64 = NANBOX_MIN_NUMBER | (uint32_t)i;
319 | return val;
320 | }
321 | static inline int32_t NANBOX_NAME(_to_int)(NANBOX_T val) {
322 | assert(NANBOX_NAME(_is_int)(val));
323 | return (int32_t)val.as_int64;
324 | }
325 |
326 | static inline bool NANBOX_NAME(_is_double)(NANBOX_T val) {
327 | return NANBOX_NAME(_is_number)(val) && !NANBOX_NAME(_is_int)(val);
328 | }
329 | static inline NANBOX_T NANBOX_NAME(_from_double)(double d) {
330 | NANBOX_T val;
331 | val.as_double = d;
332 | val.as_int64 += NANBOX_DOUBLE_ENCODE_OFFSET;
333 | assert(NANBOX_NAME(_is_double)(val));
334 | return val;
335 | }
336 | static inline double NANBOX_NAME(_to_double)(NANBOX_T val) {
337 | assert(NANBOX_NAME(_is_double)(val));
338 | val.as_int64 -= NANBOX_DOUBLE_ENCODE_OFFSET;
339 | return val.as_double;
340 | }
341 |
342 | static inline bool NANBOX_NAME(_is_pointer)(NANBOX_T val) {
343 | return !(val.as_int64 & ~NANBOX_MASK_POINTER) && val.as_int64;
344 | }
345 | static inline NANBOX_POINTER_TYPE NANBOX_NAME(_to_pointer)(NANBOX_T val) {
346 | assert(NANBOX_NAME(_is_pointer)(val));
347 | return val.pointer;
348 | }
349 | static inline NANBOX_T NANBOX_NAME(_from_pointer)(NANBOX_POINTER_TYPE pointer) {
350 | NANBOX_T val;
351 | val.pointer = pointer;
352 | assert(NANBOX_NAME(_is_pointer)(val));
353 | return val;
354 | }
355 |
356 | static inline bool NANBOX_NAME(_is_aux)(NANBOX_T val) {
357 | return val.as_int64 >= NANBOX_MIN_AUX &&
358 | val.as_int64 <= NANBOX_MAX_AUX;
359 | }
360 |
361 | /* end if NANBOX_64 */
362 | #elif defined(NANBOX_32)
363 |
364 | /*
365 | * On 32-bit platforms we use the following NaN-boxing scheme:
366 | *
367 | * For values that do not contain a double value, the high 32 bits contain the
368 | * tag values listed below, which all correspond to NaN-space. When the tag is
369 | * 'pointer', 'integer' and 'boolean', their values (the 'payload') are store
370 | * in the lower 32 bits. In the case of all other tags the payload is 0.
371 | */
372 | #define NANBOX_MAX_DOUBLE_TAG 0xfff80000
373 | #define NANBOX_INT_TAG 0xfff80001
374 | #define NANBOX_MIN_AUX_TAG 0xfff90000
375 | #define NANBOX_MAX_AUX_TAG 0xfffdffff
376 | #define NANBOX_POINTER_TAG 0xfffffffa
377 | #define NANBOX_BOOLEAN_TAG 0xfffffffb
378 | #define NANBOX_UNDEFINED_TAG 0xfffffffc
379 | #define NANBOX_NULL_TAG 0xfffffffd
380 | #define NANBOX_DELETED_VALUE_TAG 0xfffffffe
381 | #define NANBOX_EMPTY_VALUE_TAG 0xffffffff
382 |
383 | // The 'empty' value is guarranteed to consist of a repeated single byte,
384 | // so that it should be easy to memset an array of nanboxes to 'empty' using
385 | // NANBOX_EMPTY_BYTE as the value for every byte.
386 | #define NANBOX_EMPTY_BYTE 0xff
387 |
388 | /* The minimum uint64_t value for the auxillary range */
389 | #define NANBOX_MIN_AUX 0xfff9000000000000llu
390 | #define NANBOX_MAX_AUX 0xfffdffffffffffffllu
391 |
392 | // Define nanbox_xxx and nanbox_is_xxx for deleted, undefined and null.
393 | #define NANBOX_IMMEDIATE_VALUE_FUNCTIONS(NAME, TAG) \
394 | static inline NANBOX_T NANBOX_NAME(_##NAME)(void) { \
395 | NANBOX_T val; \
396 | val.as_bits.tag = TAG; \
397 | val.as_bits.payload = 0; \
398 | return val; \
399 | } \
400 | static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) { \
401 | return val.as_bits.tag == TAG; \
402 | }
403 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(deleted, NANBOX_DELETED_VALUE_TAG)
404 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(undefined, NANBOX_UNDEFINED_TAG)
405 | NANBOX_IMMEDIATE_VALUE_FUNCTIONS(null, NANBOX_NULL_TAG)
406 |
407 | // The undefined and null tags differ only in one bit
408 | static inline bool NANBOX_NAME(_is_undefined_or_null)(NANBOX_T val) {
409 | return (val.as_bits.tag & ~1) == NANBOX_UNDEFINED_TAG;
410 | }
411 |
412 | static inline NANBOX_T NANBOX_NAME(_empty)(void) {
413 | NANBOX_T val;
414 | val.as_int64 = 0xffffffffffffffffllu;
415 | return val;
416 | }
417 | static inline bool NANBOX_NAME(_is_empty)(NANBOX_T val) {
418 | return val.as_bits.tag == 0xffffffff;
419 | }
420 |
421 | /* Returns true if the value is auxillary space */
422 | static inline bool NANBOX_NAME(_is_aux)(NANBOX_T val) {
423 | return val.as_bits.tag >= NANBOX_MIN_AUX_TAG &&
424 | val.as_bits.tag < NANBOX_POINTER_TAG;
425 | }
426 |
427 | // Define nanbox_is_yyy, nanbox_to_yyy and nanbox_from_yyy for
428 | // boolean, int, pointer and aux1-aux5
429 | #define NANBOX_TAGGED_VALUE_FUNCTIONS(NAME, TYPE, TAG) \
430 | static inline bool NANBOX_NAME(_is_##NAME)(NANBOX_T val) { \
431 | return val.as_bits.tag == TAG; \
432 | } \
433 | static inline TYPE NANBOX_NAME(_to_##NAME)(NANBOX_T val) { \
434 | assert(val.as_bits.tag == TAG); \
435 | return (TYPE)val.as_bits.payload; \
436 | } \
437 | static inline NANBOX_T NANBOX_NAME(_from_##NAME)(TYPE a) { \
438 | NANBOX_T val; \
439 | val.as_bits.tag = TAG; \
440 | val.as_bits.payload = (int32_t)a; \
441 | return val; \
442 | }
443 |
444 | NANBOX_TAGGED_VALUE_FUNCTIONS(boolean, bool, NANBOX_BOOLEAN_TAG)
445 | NANBOX_TAGGED_VALUE_FUNCTIONS(int, int32_t, NANBOX_INT_TAG)
446 | NANBOX_TAGGED_VALUE_FUNCTIONS(pointer, NANBOX_POINTER_TYPE, NANBOX_POINTER_TAG)
447 |
448 | static inline NANBOX_T NANBOX_NAME(_true)(void) {
449 | return NANBOX_NAME(_from_boolean)(true);
450 | }
451 | static inline NANBOX_T NANBOX_NAME(_false)(void) {
452 | return NANBOX_NAME(_from_boolean)(false);
453 | }
454 | static inline bool NANBOX_NAME(_is_true)(NANBOX_T val) {
455 | return val.as_bits.tag == NANBOX_BOOLEAN_TAG && val.as_bits.payload;
456 | }
457 | static inline bool NANBOX_NAME(_is_false)(NANBOX_T val) {
458 | return val.as_bits.tag == NANBOX_BOOLEAN_TAG && !val.as_bits.payload;
459 | }
460 |
461 | static inline bool NANBOX_NAME(_is_double)(NANBOX_T val) {
462 | return val.as_bits.tag < NANBOX_INT_TAG;
463 | }
464 | // is number = is double or is int
465 | static inline bool NANBOX_NAME(_is_number)(NANBOX_T val) {
466 | return val.as_bits.tag <= NANBOX_INT_TAG;
467 | }
468 |
469 | static inline NANBOX_T NANBOX_NAME(_from_double)(double d) {
470 | NANBOX_T val;
471 | val.as_double = d;
472 | assert(NANBOX_NAME(_is_double)(val) &&
473 | val.as_bits.tag <= NANBOX_MAX_DOUBLE_TAG);
474 | return val;
475 | }
476 | static inline double NANBOX_NAME(_to_double)(NANBOX_T val) {
477 | assert(NANBOX_NAME(_is_double)(val));
478 | return val.as_double;
479 | }
480 |
481 | #endif /* elif NANBOX_32 */
482 |
483 | /*
484 | * Representation-independent functions
485 | */
486 |
487 | static inline double NANBOX_NAME(_to_number)(NANBOX_T val) {
488 | assert(NANBOX_NAME(_is_number)(val));
489 | return NANBOX_NAME(_is_int)(val) ? NANBOX_NAME(_to_int)(val)
490 | : NANBOX_NAME(_to_double)(val);
491 | }
492 |
493 | #endif /* NANBOX_H */
494 |
--------------------------------------------------------------------------------
/nanbox_shortstring.h:
--------------------------------------------------------------------------------
1 | #ifndef NANBOX_SHORTSTRING_H
2 | #define NANBOX_SHORTSTRING_H
3 | /*
4 | * Short strings
5 | * -------------
6 | * Strings of up to 6 bytes can be stored in a NANBOX_T in so called 'auxillary
7 | * space'. The space used is NANBOX_MIN_AUX..(NANBOX_MIN_AUX + 3 * 2^48 - 1).
8 | */
9 |
10 | #include "nanbox.h"
11 |
12 | static inline bool NANBOX_NAME(_is_shortstring)(NANBOX_T val) {
13 | return val.as_bits.tag >= NANBOX_MIN_AUX_TAG &&
14 | val.as_bits.tag <= NANBOX_MIN_AUX_TAG + 0x0002ffff;
15 | }
16 | static inline char* NANBOX_NAME(_shortstring_chars)(NANBOX_T* val) {
17 | assert(NANBOX_NAME(_is_shortstring)(*val));
18 | #ifdef NANBOX_BIG_ENDIAN
19 | if (val->as_bits.tag & 0xffff0000 == NANBOX_MIN_AUX_TAG)
20 | return (char*)val + 4; /* skip tag and length */
21 | else
22 | return (char*)val + 2; /* skip tag */
23 | #else
24 | return (char*)val;
25 | #endif
26 | }
27 |
28 | static inline unsigned NANBOX_NAME(_shortstring_length)(NANBOX_T val) {
29 | assert(NANBOX_NAME(_is_shortstring)(val));
30 | if (val.as_bits.tag <= NANBOX_MIN_AUX_TAG + 4)
31 | return val.as_bits.tag - NANBOX_MIN_AUX_TAG;
32 | else
33 | return ((val.as_bits.tag - NANBOX_MIN_AUX_TAG) >> 16) + 4;
34 | }
35 |
36 | // creates a short string of length byts with undefined contents
37 | static inline NANBOX_T NANBOX_NAME(_shortstring_create_undef)(unsigned length) {
38 | NANBOX_T val;
39 | assert(length <= 6);
40 | if (length <= 4)
41 | val.as_bits.tag = NANBOX_MIN_AUX_TAG + length;
42 | else
43 | val.as_bits.tag = NANBOX_MIN_AUX_TAG + ((length - 4) << 16);
44 | val.as_bits.payload = 0;
45 | return val;
46 | }
47 |
48 | // copies length bytes of chars. (nul bytes are copied like any other byte)
49 | static inline NANBOX_T
50 | NANBOX_NAME(_shortstring_create)(const char *chars, unsigned length) {
51 | NANBOX_T val = NANBOX_NAME(_shortstring_create_undef)(length);
52 | memcpy(NANBOX_NAME(_shortstring_chars)(&val), chars, length);
53 | assert(NANBOX_NAME(_is_shortstring)(val));
54 | return val;
55 | }
56 | #endif
57 |
--------------------------------------------------------------------------------
/shortstring_demo.c:
--------------------------------------------------------------------------------
1 | #include "nanbox_shortstring.h"
2 | #include
3 | #include
4 |
5 | int main() {
6 | printf("Enter short strings of up to 6 chars to dump, q to quit.\n");
7 | while (1) {
8 | char buf[7];
9 | size_t len;
10 | printf("Short string --> ");
11 | if (scanf("%6s", buf) == EOF)
12 | break;
13 | if (!strcmp(buf, "q"))
14 | break;
15 | len = strlen(buf);
16 | nanbox_t val = nanbox_shortstring_create(buf, len);
17 | printf("%p \"%.6s\" (length %d)\n",
18 | (void*)val.as_int64,
19 | nanbox_shortstring_chars(&val),
20 | nanbox_shortstring_length(val));
21 | }
22 | printf("\n");
23 | return 0;
24 | }
25 |
--------------------------------------------------------------------------------
/test.c:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 |
5 | #include "nanbox.h"
6 |
7 | // This macro stores a value VALUE of type TYPE in a nanbox, checks the type,
8 | // converts back and checks that we got the value back. It also checks that the
9 | // nanbox is not of any other type.
10 | //
11 | // The only NaN expression that is possible to test with this expression is
12 | // 0.0/0.0. (There is some special logic to NaN, because NaN != NaN.
13 | #define TO_NANBOX_AND_BACK(TYPE, VALUE) do { \
14 | nanbox_t x = nanbox_from_##TYPE(VALUE); \
15 | assert(nanbox_is_##TYPE(x)); \
16 | /* decode and test == to the original, except for NaN */ \
17 | if (!strcmp(#TYPE, "double") && !strcmp(#VALUE, "0.0/0.0")) \
18 | assert(VALUE != nanbox_to_##TYPE(x)); \
19 | else \
20 | assert(VALUE == nanbox_to_##TYPE(x)); \
21 | assert(nanbox_is_double(x) == !strcmp(#TYPE, "double")); \
22 | assert(nanbox_is_int(x) == !strcmp(#TYPE, "int")); \
23 | assert(nanbox_is_pointer(x) == !strcmp(#TYPE, "pointer")); \
24 | assert(nanbox_is_boolean(x) == !strcmp(#TYPE, "boolean")); \
25 | assert(nanbox_is_number(x) == (!strcmp(#TYPE, "double") || \
26 | !strcmp(#TYPE, "int"))); \
27 | assert(!nanbox_is_null(x)); \
28 | assert(!nanbox_is_undefined(x)); \
29 | assert(!nanbox_is_empty(x)); \
30 | assert(!nanbox_is_deleted(x)); \
31 | assert(!nanbox_is_aux(x)); \
32 | assert(nanbox_is_true(x) == !strcmp(#VALUE, "true")); \
33 | assert(nanbox_is_false(x) == !strcmp(#VALUE, "false")); \
34 | } while(0)
35 |
36 | // Use this to create and check a nanbox of null, undefined, empty, deleted,
37 | // true or false. It tests that it is of the correct type and no other type.
38 | #define TO_NANBOX_AND_CHECK(VALUE) do { \
39 | nanbox_t x = nanbox_##VALUE(); \
40 | assert(!nanbox_is_double(x)); \
41 | assert(!nanbox_is_int(x)); \
42 | assert(!nanbox_is_pointer(x)); \
43 | assert(!nanbox_is_number(x)); \
44 | assert(!nanbox_is_aux(x)); \
45 | assert(nanbox_is_boolean(x) == (!strcmp(#VALUE, "true") || \
46 | !strcmp(#VALUE, "false"))); \
47 | assert(nanbox_is_undefined_or_null(x) == \
48 | (!strcmp(#VALUE, "undefined") || !strcmp(#VALUE, "null"))); \
49 | assert(nanbox_is_null(x) == !strcmp(#VALUE, "null")); \
50 | assert(nanbox_is_undefined(x) == !strcmp(#VALUE, "undefined")); \
51 | assert(nanbox_is_empty(x) == !strcmp(#VALUE, "empty")); \
52 | assert(nanbox_is_deleted(x) == !strcmp(#VALUE, "deleted")); \
53 | } while(0)
54 |
55 | // Definded below. Called from main.
56 | void test_nan(void);
57 |
58 | int main() {
59 | // Size should be 16 bits
60 | assert(sizeof(nanbox_t) == 8);
61 |
62 | // Test storing various doubles, including NaN and infinity.
63 | TO_NANBOX_AND_BACK(double, -0.0);
64 | TO_NANBOX_AND_BACK(double, 3.14);
65 | TO_NANBOX_AND_BACK(double, 1.0/0.0);
66 | TO_NANBOX_AND_BACK(double, -1.0/0.0);
67 | TO_NANBOX_AND_BACK(double, 0.0/0.0);
68 |
69 | // Test storing int, pointer and boolean
70 | TO_NANBOX_AND_BACK(int, 42);
71 | TO_NANBOX_AND_BACK(pointer, &x);
72 | TO_NANBOX_AND_BACK(boolean, true);
73 | TO_NANBOX_AND_BACK(boolean, false);
74 |
75 | // The remaining types/values
76 | //TO_NANBOX_AND_CHECK(null);
77 | //TO_NANBOX_AND_CHECK(undefined);
78 | TO_NANBOX_AND_CHECK(empty);
79 | TO_NANBOX_AND_CHECK(deleted);
80 | TO_NANBOX_AND_CHECK(true);
81 | TO_NANBOX_AND_CHECK(false);
82 |
83 | test_nan();
84 |
85 | return 0;
86 | }
87 |
88 | // A macro to check that a double is a cannonical NaN, i.e. one that we accept.
89 | // Also, nanboxes it and checks that it identified as a double.
90 | #define ASSERT_CANNONICAL_NAN(VALUE) do { \
91 | double d = VALUE; \
92 | uint64_t n = *(uint64_t*)&d; \
93 | assert((n | 0x8000000000000000llu) == 0xfff8000000000000llu); \
94 | assert(nanbox_is_double(nanbox_from_double(VALUE))); \
95 | } while(0)
96 |
97 | void test_nan(void) {
98 | double nan = 0.0/0.0, inf = 1.0/0.0, ninf = -1.0/0.0;
99 | assert(nan != nan);
100 | ASSERT_CANNONICAL_NAN(0.0/0.0);
101 | ASSERT_CANNONICAL_NAN(nan);
102 | ASSERT_CANNONICAL_NAN(nan + 42);
103 | ASSERT_CANNONICAL_NAN(-inf * nan);
104 |
105 | ASSERT_CANNONICAL_NAN(inf/inf);
106 | ASSERT_CANNONICAL_NAN(ninf/inf);
107 | ASSERT_CANNONICAL_NAN(0 * inf);
108 | ASSERT_CANNONICAL_NAN(0 * ninf);
109 | ASSERT_CANNONICAL_NAN(inf * 0);
110 | ASSERT_CANNONICAL_NAN(inf + ninf);
111 | ASSERT_CANNONICAL_NAN(ninf + inf);
112 |
113 | ASSERT_CANNONICAL_NAN(pow(-1.0, 3.14));
114 | ASSERT_CANNONICAL_NAN(sqrt(-1.0));
115 | ASSERT_CANNONICAL_NAN(log(-1.0));
116 | ASSERT_CANNONICAL_NAN(asin(2.0));
117 | ASSERT_CANNONICAL_NAN(acos(2.0));
118 | }
119 |
--------------------------------------------------------------------------------