Are you a programmer who want to learn more about bits and bytes? Or do you just need to freshen up for your project? This cheatsheet is for you.
40 |
41 |
42 |
43 |
95 |
96 |
97 |
98 |
99 |
Background
100 |
101 |
Looking at your computer screen, what do you see? Text, colours, images, videos and much more. Everything you see, everything you store on your computer, every button you press is a stream of ones and zeros. Why? How does it work?
102 |
103 |
Computers parts are able to store data via tiny electronic components called flip-flops. They can be set to either of two states: “one” or “zero.” These components can store a single value we call a bit.
104 |
105 |
To store other values than ones and zeroes, a series of bits can be grouped. If one bit can have two different states, a group of two bits can have four. A group of four bits can have up to sixteen unique states. The amount of different states grows exponentially for every bit that is added to a group. s = n2, where s is the number of states, and n the number of bits within the group.
106 |
107 |
The de-facto standard is to call a group of bits a byte, and one byte contains 8 bits. The more explicit term for an 8-bit group is octet.
108 |
109 |
Binary is the purest representation of data on computers, but it isn’t easy to understand what the data represents. How do we get from binary to other values?
110 |
111 |
Representation
112 |
113 |
Many typed programming languages have datatypes like integers, floats, strings, etc. These datatypes tell the computer how to interpret binary data. Instead of modifying bits and bytes to add two integers, we can use arithmetic operators, like a plus (+), and the computer will know what to do at binary level. Dynamically typed languages do the same, but also removes the need to tell which variable has which type.
114 |
115 |
A common datatype is an unsigned integer (uint). When an integer is signed, the first bit will be used to store whether the integer is a positive or negative integer. When it’s unsigned, the data type can only store positive integers. So a uint is an integer that must be equal or greater than 0.
116 |
117 |
Let’s have look at the binary representation of such an integer, in 8 bits. A uint is a positive integer starting from 0. Incrementing is very similar to our decimal system:
118 |
119 |
120 |
Increment the least significant number
121 |
If least significant number exceeds maximum value, reset and increment the number one place left
122 |
123 |
124 |
// No bits set => zero
125 | 00000000 = 0
126 | // Increment least significant bit
127 | 00000001 = 1
128 | // Least significant bit exceeds max, so we reset it and increment bit left of it
129 | 00000010 = 2
130 | 00000011 = 3
131 | 00101010 = 42
132 | 11111111 = 255
133 |
134 |
Understanding uint is quite valuable, as many programming languages use the datatype as default representation for binary data.
At this point you might wonder why you’re reading this page. Why should you understand bits and bytes? How can you benefit from this knowledge when programming in higher level languages?
145 |
146 |
Imagine we’re building a web application that needs data from a webserver. The conventional communication format is JavaScript Object Notation (JSON). Let’s say we’re sending an array of 8 random integers ranging from 0 to 255:
Because the data is sent as a string with a specific format, we’ll need 8 bits per character, assuming it’s UTF-8 encoded. Ignoring whitespace, above JSON string is 31 characters, making it 248 bits.
160 |
161 |
We can send the exact same data in binary, but way more efficient. Instead of returning a formatted string, because that’s what JSON is, we can send the data as one binary blob. For numbers ranging from 0 to 255, we still need 8 bits to represent each. Because each number has a fixed bit length, we can just concatinate them. The complete blob will be 64 bits long, saving just under 75% of data!
The strength of JSON isn’t size, but flexibility. Because it’s a formatted string, it’s length and contents is irrelevant. The strength of binary is it’s size. It all depends on your use-case, but a 75% reduction of bandwidth is worth considering.
173 |
174 |
175 |
176 |
Operators
177 |
178 |
Bitwise operators are operators that allow you to modify data on bit-level. Welcome to Boolean algebra.
179 |
180 |
In the following examples, we will define binary data using “binary literals.” This means they are prefixed with 0b, which is common in many programming languages. For more information, see notes.
181 |
182 |
NOT
183 |
Flipping, or negating bits can be done using the NOT (~) operator. The operator toggles all the bits.
184 |
~0b01 = 10
185 | ~0b11 = 00
186 |
187 |
AND
188 |
The AND (&) operator returns 1 for each bit only if the corresponding bits of
189 | both operands are 1’s.
190 |
0b01 & 0b11 = 01
191 | 0b01 & 0b00 = 00
192 |
193 |
OR
194 |
The OR (|) operator returns 1’s if a bit of either of the operands is
195 | 1.
Similar to the OR operator, the XOR (^), exclusive OR, operator only returns 1’s if
202 | either of the corresponding bits of the operands is 1, but 0 if both are.
It could be considered a shorthand of the following:
208 |
a = 0b11
209 | b = 0b10
210 | (a | b) & ~(a & b) = 01
211 |
212 |
Shift
213 |
Bit shifting is the act of shifting a set of bits to the left or the right.
214 |
215 |
To shift bits to the left, use <<. Additional bits, 0s, will be added on the right-hand side.
216 |
0b1001 << 2 = 100100
217 |
218 |
To shift bits the other way, use >>. This will discard the right-hand bits.
219 |
Note that this operation retains the first bit for signed integers. This means that negative integers stay negative.
220 |
221 |
0b1001 >> 2 = 1001
222 |
223 |
When shifting bits to the right, notice the amount of bits decreases? A zero-fill right shift (>>>)
224 | also adds bits on the left-hand side, so the amount of bits is unchanged.
225 |
Unlike a regular right shift, the zero-fill right shift also moves the sign bit in a signed integer, which is often undesired.
226 |
227 |
0b1001 >>> 2 = 001001
228 |
229 |
230 |
231 |
Practical
232 |
233 |
Read a bit at a specific position
234 |
First, a set of bits must be shifted to the right until the bit of interest is all the way on the right. To discard all other bits,
235 | we can use the AND operator with a so called bitmask.
236 |
bitmask = 0b1
237 |
238 | // the highlighted bits are moved all the way to the right, then all other bits are cancelled out with the bitmask
239 | (0b1101 >> 2) & bitmask = 1
240 | (0b1101 >> 1) & bitmask = 0
241 |
242 |
The bitmask determines how much of the info is returned, so to get two bits, a two-bit bitmask is required.
To set a specific bit to 1, you can use the OR operator. First, the bit you wish to set is shifted to the position you wish to set it to, the OR operator does the rest.
249 |
byte = 0b0000
250 | byte | (0b1 << 2) = 0100
251 |
252 |
To set a specific bit to 0, you must use the AND operator.
253 |
byte = 0b1111
254 | byte & (0b0 << 2) = 1011
255 |
256 |
If the new bit has a dynamic value, the following allows you to change a bit to any
257 | value at a given position.
258 |
259 |
x = 1 // new value of bit...
260 | n = 2 // at this location
261 | byte = 0b0010
262 |
263 | byte ^ (-x ^ byte) & (1 << n) = 0110
264 |
265 |
Toggle a bit at a specific position
266 |
The XOR operator returns 1 if operands are unequal. By having one operand set to
267 | 1, it toggles.
268 |
269 |
n = 2 // at this location
270 | byte = 0b0100
271 | byte ^ (0b1 << n) = 0000
272 |
273 |
Store flags
274 |
Flags, a fancy name for “options,” can easily be stored in a byte. This example is inspired by the TCP protocol.
Colours are often stored as hexadecimals. Sometimes, you will want to get the value of each channel. Note that hexadecimals are just another representation of uints.
291 |
292 |
// mask = 11111111
293 | mask = 0xFF
294 |
295 | // rgb = 11100110 01000010 00011001
296 | rgb = 0xE64219
297 |
298 | // to get the red component,
299 | // shift 16 bits to the right
300 | // and get the first 8 bits
301 | // 1110011001000010 00011001
302 | (rgb >> 16) & mask = 0xE6 // = 11100110
303 |
304 | // to get green,
305 | // shift 8 bits to the right,
306 | // and only get the first 8 bits
307 | // 111001100100001000011001
308 | (rgb >> 8) & mask = 0x42 // = 01000010
309 |
310 | // blue is the first 8 bits
311 | // 11100110 0100001000011001
312 | rgb & mask = 0x19
313 |
314 |
RGB to hex
315 |
You can do the opposite as well; convert RGB to hexadecimals.
316 |
317 |
r = 0xE6 // 11100110
318 | g = 0x42 // 11100110
319 | b = 0x19 // 00011001
320 |
321 | // 11100110 01000010 00011001
322 | (r << 16) | (g << 8) | b = 0xE64219
323 |
324 |
325 |
326 |
Notes
327 |
328 |
The above code is pseudo-code, and may not work in all languages. Other than the basics—like data-types and equations—there
329 | are few things to consider in particular to make this work in your language.
330 |
331 |
Declare binary literals
332 |
Although many languages seem to support binary literals by prefixing them with 0b, some do not.
For more information, please refer to the documentation of your preferred language.
348 |
349 |
Bitwise operators
350 |
Some programming languages use different bitwise operators than used in this document. Please advice the documentation
351 | of your language in question.
352 |
353 |
354 |
355 |
Contribute
356 |
357 |
To ensure quality, this cheatsheet is open to contributions. If you run in to errors, have suggestions or feel you can help a hand in any way, be sure to leave an issue or pull request in the GitHub repository. Thanks!
Looking at your computer screen, what do you see? Text, colours, images, videos and much more. Everything you see, everything you store on your computer, every button you press is a stream of ones and zeros. Why? How does it work?
5 |
6 |
Computers parts are able to store data via tiny electronic components called flip-flops. They can be set to either of two states: “one” or “zero.” These components can store a single value we call a bit.
7 |
8 |
To store other values than ones and zeroes, a series of bits can be grouped. If one bit can have two different states, a group of two bits can have four. A group of four bits can have up to sixteen unique states. The amount of different states grows exponentially for every bit that is added to a group. s = n2, where s is the number of states, and n the number of bits within the group.
9 |
10 |
The de-facto standard is to call a group of bits a byte, and one byte contains 8 bits. The more explicit term for an 8-bit group is octet.
11 |
12 |
Binary is the purest representation of data on computers, but it isn’t easy to understand what the data represents. How do we get from binary to other values?
13 |
14 |
Representation
15 |
16 |
Many typed programming languages have datatypes like integers, floats, strings, etc. These datatypes tell the computer how to interpret binary data. Instead of modifying bits and bytes to add two integers, we can use arithmetic operators, like a plus (+), and the computer will know what to do at binary level. Dynamically typed languages do the same, but also removes the need to tell which variable has which type.
17 |
18 |
A common datatype is an unsigned integer (uint). When an integer is signed, the first bit will be used to store whether the integer is a positive or negative integer. When it’s unsigned, the data type can only store positive integers. So a uint is an integer that must be equal or greater than 0.
19 |
20 |
Let’s have look at the binary representation of such an integer, in 8 bits. A uint is a positive integer starting from 0. Incrementing is very similar to our decimal system:
21 |
22 |
23 |
Increment the least significant number
24 |
If least significant number exceeds maximum value, reset and increment the number one place left
25 |
26 |
27 |
// No bits set => zero
28 | 00000000 = 0
29 | // Increment least significant bit
30 | 00000001 = 1
31 | // Least significant bit exceeds max, so we reset it and increment bit left of it
32 | 00000010 = 2
33 | 00000011 = 3
34 | 00101010 = 42
35 | 11111111 = 255
36 |
37 |
Understanding uint is quite valuable, as many programming languages use the datatype as default representation for binary data.
To ensure quality, this cheatsheet is open to contributions. If you run in to errors, have suggestions or feel you can help a hand in any way, be sure to leave an issue or pull request in the GitHub repository. Thanks!
The above code is pseudo-code, and may not work in all languages. Other than the basics—like data-types and equations—there
5 | are few things to consider in particular to make this work in your language.
6 |
7 |
Declare binary literals
8 |
Although many languages seem to support binary literals by prefixing them with 0b, some do not.
For more information, please refer to the documentation of your preferred language.
24 |
25 |
Bitwise operators
26 |
Some programming languages use different bitwise operators than used in this document. Please advice the documentation
27 | of your language in question.
Bitwise operators are operators that allow you to modify data on bit-level. Welcome to Boolean algebra.
5 |
6 |
In the following examples, we will define binary data using “binary literals.” This means they are prefixed with 0b, which is common in many programming languages. For more information, see notes.
7 |
8 |
NOT
9 |
Flipping, or negating bits can be done using the NOT (~) operator. The operator toggles all the bits.
10 |
~0b01 = 10
11 | ~0b11 = 00
12 |
13 |
AND
14 |
The AND (&) operator returns 1 for each bit only if the corresponding bits of
15 | both operands are 1’s.
16 |
0b01 & 0b11 = 01
17 | 0b01 & 0b00 = 00
18 |
19 |
OR
20 |
The OR (|) operator returns 1’s if a bit of either of the operands is
21 | 1.
Similar to the OR operator, the XOR (^), exclusive OR, operator only returns 1’s if
28 | either of the corresponding bits of the operands is 1, but 0 if both are.
It could be considered a shorthand of the following:
34 |
a = 0b11
35 | b = 0b10
36 | (a | b) & ~(a & b) = 01
37 |
38 |
Shift
39 |
Bit shifting is the act of shifting a set of bits to the left or the right.
40 |
41 |
To shift bits to the left, use <<. Additional bits, 0s, will be added on the right-hand side.
42 |
0b1001 << 2 = 100100
43 |
44 |
To shift bits the other way, use >>. This will discard the right-hand bits.
45 |
Note that this operation retains the first bit for signed integers. This means that negative integers stay negative.
46 |
47 |
0b1001 >> 2 = 1001
48 |
49 |
When shifting bits to the right, notice the amount of bits decreases? A zero-fill right shift (>>>)
50 | also adds bits on the left-hand side, so the amount of bits is unchanged.
51 |
Unlike a regular right shift, the zero-fill right shift also moves the sign bit in a signed integer, which is often undesired.
First, a set of bits must be shifted to the right until the bit of interest is all the way on the right. To discard all other bits,
6 | we can use the AND operator with a so called bitmask.
7 |
bitmask = 0b1
8 |
9 | // the highlighted bits are moved all the way to the right, then all other bits are cancelled out with the bitmask
10 | (0b1101 >> 2) & bitmask = 1
11 | (0b1101 >> 1) & bitmask = 0
12 |
13 |
The bitmask determines how much of the info is returned, so to get two bits, a two-bit bitmask is required.
14 |
15 |
bitmask = 0b11
16 | (0b1101) >> 2) & bitmask = 11
17 |
18 |
Set a bit
19 |
To set a specific bit to 1, you can use the OR operator. First, the bit you wish to set is shifted to the position you wish to set it to, the OR operator does the rest.
20 |
byte = 0b0000
21 | byte | (0b1 << 2) = 0100
22 |
23 |
To set a specific bit to 0, you must use the AND operator.
24 |
byte = 0b1111
25 | byte & (0b0 << 2) = 1011
26 |
27 |
If the new bit has a dynamic value, the following allows you to change a bit to any
28 | value at a given position.
29 |
30 |
x = 1 // new value of bit...
31 | n = 2 // at this location
32 | byte = 0b0010
33 |
34 | byte ^ (-x ^ byte) & (1 << n) = 0110
35 |
36 |
Toggle a bit at a specific position
37 |
The XOR operator returns 1 if operands are unequal. By having one operand set to
38 | 1, it toggles.
39 |
40 |
n = 2 // at this location
41 | byte = 0b0100
42 | byte ^ (0b1 << n) = 0000
43 |
44 |
Store flags
45 |
Flags, a fancy name for “options,” can easily be stored in a byte. This example is inspired by the TCP protocol.
Colours are often stored as hexadecimals. Sometimes, you will want to get the value of each channel. Note that hexadecimals are just another representation of uints.
62 |
63 |
// mask = 11111111
64 | mask = 0xFF
65 |
66 | // rgb = 11100110 01000010 00011001
67 | rgb = 0xE64219
68 |
69 | // to get the red component,
70 | // shift 16 bits to the right
71 | // and get the first 8 bits
72 | // 1110011001000010 00011001
73 | (rgb >> 16) & mask = 0xE6 // = 11100110
74 |
75 | // to get green,
76 | // shift 8 bits to the right,
77 | // and only get the first 8 bits
78 | // 111001100100001000011001
79 | (rgb >> 8) & mask = 0x42 // = 01000010
80 |
81 | // blue is the first 8 bits
82 | // 11100110 0100001000011001
83 | rgb & mask = 0x19
84 |
85 |
RGB to hex
86 |
You can do the opposite as well; convert RGB to hexadecimals.
87 |
88 |
r = 0xE6 // 11100110
89 | g = 0x42 // 11100110
90 | b = 0x19 // 00011001
91 |
92 | // 11100110 01000010 00011001
93 | (r << 16) | (g << 8) | b = 0xE64219
At this point you might wonder why you’re reading this page. Why should you understand bits and bytes? How can you benefit from this knowledge when programming in higher level languages?
5 |
6 |
Imagine we’re building a web application that needs data from a webserver. The conventional communication format is JavaScript Object Notation (JSON). Let’s say we’re sending an array of 8 random integers ranging from 0 to 255:
Because the data is sent as a string with a specific format, we’ll need 8 bits per character, assuming it’s UTF-8 encoded. Ignoring whitespace, above JSON string is 31 characters, making it 248 bits.
20 |
21 |
We can send the exact same data in binary, but way more efficient. Instead of returning a formatted string, because that’s what JSON is, we can send the data as one binary blob. For numbers ranging from 0 to 255, we still need 8 bits to represent each. Because each number has a fixed bit length, we can just concatinate them. The complete blob will be 64 bits long, saving just under 75% of data!
The strength of JSON isn’t size, but flexibility. Because it’s a formatted string, it’s length and contents is irrelevant. The strength of binary is it’s size. It all depends on your use-case, but a 75% reduction of bandwidth is worth considering.