Sometimes a number is just a number. But binary computers have long offered us many ways to interpret an n-bit word:
It is usual for programming language type systems to distinguish sharply between addreses, floating point numbers, and integers.
Signed integers could be handled in many ways (e.g., by using base -2). In practice three ways are used, one of them common.
It also has a seductive effect. For sign-and-magnitude and ones complement arithmetic, the natural thing for the machine to do when a result was too big to fit was to raise an exception. For 2s complement, it was extremely tempting to simply define the results to be taken modulo 2^{n}. That makes sense for adding, subtracting, and multiplying (at the price of making < <- > and >= questionable), but it never did make sense for division, because
does not imply that
For the rest of this note, we shall assume that the bits of an integer are numbered from 0 as the least significant bit to n==31 (for a C/Java int) or n>=63 (for a C long long or Java long) and that the value of a signed integer a is that v between -2^{n-1} and 2^{n-1}-1 inclusive such that
Recall your truth tables
x | y | NOT y | x AND y | x OR y | x XOR y |
False | False | True | False | False | False |
False | True | False | False | True | True |
True | False | True | False | True | True |
True | True | False | True | True | False |
There are several ways to represent True and False in a computer. Two machines I've used took any even number as False and any odd number as True; their conditional branch instructions looked at the least significant bit only. Some languages I've used took any negative number as True and any non-negative number as False; their conditional branch instructions looked at the sign bit only. C and its successors take 0 as False and any other number as True. Note that operations in C that deliver a newly calculated logical result return 0 or 1, never -1.
We shall take 0 for False and 1 for True. Here are the truth tables again.
x | y | NOT y | x AND y | x OR y | x XOR y |
0 | 0 | 1 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 1 | 1 |
1 | 0 | 1 | 0 | 1 | 1 |
1 | 1 | 0 | 1 | 1 | 0 |
The bitwise operations in C work on all the bits in a word independently:
a = ~y | a[i] = NOT y[i] |
a = x & y | a[i] = x[i] AND y[i] |
a = x | y | a[i] = x[i] OR y[i] |
a = x ^ y | a[i] = x[i] XOR y[i] |
Note the difference between ~ and !: !0 is 1, but ~0 is all 1s (= -1 in 2s complement).
But what if you want to do the equivalent of testing a[i], setting a[1] = 1, or clearing a[i] = 0? We need things called shift instructions for that. Conventionally, a machine will have four of them:
On twos-complement machines, the signed left shift, if any, will do the same as the unsigned one, except perhaps for setting flags differently. For example, the ARM CPU in your phone offers LSL (logical=unsigned shift left), LSR (logical shift right), and ASR (arithmetic=signed shift right).
In C, there are two shift operators: << and >>. Historic practice was that
unsigned << amount | is logical shift left |
signed << amount | is arithmetic shift left |
unsigned >> amount | is logical shift right |
signed >> amount | is arithmetic shift right |
although the standard weaseled out and allows a compiler to use the unsigned right shift for signed numbers if it really feels like it. To get around this, Java offers
signed << amount | some left shift, you can't tell |
signed >> amount | arithmetic shift right |
signed >>> amount | logical shift right |
So here is one way we can use these operators.
To test a[i] | ((a >> i) & 1) != 0 |
To clear a[i] = 0 | a &=~ 1 << i |
To set a[i] = 1 | a |= 1 << i |
Suppose we want to count in base B. For example, suppose we wanted to write a function to test whether a given string is parenthesis balanced. Here's an attempt:
boolean is_balanced(String s) { int depth = 0; final int n = s.length(); for (int i = 0; i < n; i++) { switch (s.charAt(i)) { case '(': depth++; break; case ')': if (depth == 0) return false; depth--; break; default : break; } } return depth == 0; }
It would be a very good idea to test this. The first time I typed it up I accidentally omitted the depth-- statement. I only noticed because the tests came out wrong. In order to test it, we'd like to generate strings of L characters drawn from the alphabet {'x','(', ')'}.
final int L = 4 /* for example */; final char[] digit = new char[L+1]; for (int i = 0; i <= L; i++) digit[i] = 'x' /*0*/; while (digit[L] == 'x') { final String s = new String(digit, 0, L); System.out.println(s + " => " + is_balanced(s)); int i = 0; while (digit[i] == ')' /* B-1 */) { digit[i] = 'x'; /* 0 */ i++; } digit[i] = digit[i] == 'x' ? '(' : ')'; /* increment digit[i] */ }
Suppose we want to count using base 2. Then we could use an array of Boolean values:
final boolean[] digit = new boolean[L+1]; for (int i = 0; i <= L; i++) digit[i] = false; while (!digit[L]) { ... use digit[] ... int i = 0; while (digit[i]) { digit[i] = false; i++; } digit[i] = true; }
But if L < 32, we can pack the bits into an int, or if L < 64, we can pack them into a (long) long.
int a = 0; while (((a >> L) & 1) == 0) { ... use a ... int i = 0; while (((a >> i) & 1) != 0) { /* test bit i */ a &=~ 1 << i; /* clear bit i */ i++; } a |= 1 << i; /* set bit i */ }
Or we can go a step further:
for (int a = 0; a < (1 << L); i++) { ... use a ... }
This is the fastest, simplest, way I know to do an exhaustive search over choices out of L things. The only way to improve on it is not to do an exhaustive search.
A string of n bits (an n-bit integer) can represent a subset of {0...n-1}. The value i is in the set if and only if the ith bit is 1.
x UNION y | x | y |
x INTERSECTION y | x & y |
x RELATIVE-COMPLEMENT y | x &~ y |
x SYMMETRIC-DIFFERENCE y | x ^ y |
i IS-ELEMENT-OF x | (((x >> i) & 1) != 0 |
x UNION {i} | x | (1 << i) |
x RELATIVE-COMPLEMENT {i} | x &~ (1 << i) |