# Bitwise operations

Sometimes a number is just a number. But binary computers have long offered us many ways to interpret an n-bit word:

• as an address
• as an n-bit floating point number
• as an n-bit unsigned integer [0 to 2n-1]
• as logical values
• as an n-bit signed integer
• as a vector of n bits
• as a subset of {0,1,...,n-1}.

It is usual for programming language type systems to distinguish sharply between addreses, floating point numbers, and integers.

• PL/I distinguished between BINARY FIXED (15), a type with arithmetic operations, and BIT (16), a type with bit vector operations.
• Algol 68 distinguished between int, a type with arithmetic operations and bits a type with bit vector operations.
• Pascal distinguishes between 0..65535, a type with arithmetic operations and set of 0..15, a type with set operations.
• Ada distinguishes between Integer, a type with arithmetic operations, and array (0:31) of Boolean, a type with bit vector operations.
• Common Lisp has and several Scheme implementations have unbounded integers and bit vectors as distinct types with distinct operations. Common Lisp also offers (differently named) bitwise operations on integers.
• Many programming languages do not make any distinction between integers and word-size bit strings. In particular, C's grandparent BCPL, C's parent B, C, C++, Objective C, Objective C++, Swift, C#, Java, and Javascript offer bitwise operations on numbers.

## Integers

Signed integers could be handled in many ways (e.g., by using base -2). In practice three ways are used, one of them common.

• sign and magnitude: (s,m) where m is in the range [0 to 2m-1], the value being (-1)s×m. This is the most intuitive representation, being close to how we do things. It is pleasantly symmetric, so that you can take the absolute value of any number. Some people profess to see a problem with the fact that (0,0) and (1,0) both represent 0, but in six years of using such a machine five days a week it was never a problem for me. This is of course the way signs are handled in IEEE floating-point. Machines using this representation are in use to this day.
• ones complement: to represent a negative number, flip all its bits. This too is pleasantly symmetric, and from a programmer's point of view it is hard to tell from the sign and magnitude representation. (It is also sort of cute that the NOT instruction and the NEGATE instruction are the same instruction.) Machines using this representation are in use to this day, despite (0...0) and (1...1) both representing 0. The one issue this causes in these two representations is that there is a distinction between “are these the same bit strings” and “do these represent the same number”, but we have that issue in floating-point anyway.
• The dominant representation, used in IBM z/Series, POWER and PowerPC, SPARC, ARM, and of course the Slimy Tentacled Monster that Won, the x86 series, is called 2s complement, and represents numbers modulo 2n. This representation has the extremely unpleasant property that there are more negative numbers than positive ones, so that it is impossible to write an absolute value function for integers that gives correct answers for all representable integers.

It also has a seductive effect. For sign-and-magnitude and ones complement arithmetic, the natural thing for the machine to do when a result was too big to fit was to raise an exception. For 2s complement, it was extremely tempting to simply define the results to be taken modulo 2n. That makes sense for adding, subtracting, and multiplying (at the price of making < <- > and >= questionable), but it never did make sense for division, because

x ≡ y (mod M)

does not imply that

x/z ≡ y/z (mod M)

For the rest of this note, we shall assume that the bits of an integer are numbered from 0 as the least significant bit to n==31 (for a C/Java int) or n>=63 (for a C long long or Java long) and that the value of a signed integer a is that v between -2n-1 and 2n-1-1 inclusive such that

v ≡ Σi=0nai2i (mod 2n)

## Bitwise operations

Recall your truth tables
 x y NOT y x AND y x OR y x XOR y False False True False False False False True False False True True True False True False True True True True False True True False

There are several ways to represent True and False in a computer. Two machines I've used took any even number as False and any odd number as True; their conditional branch instructions looked at the least significant bit only. Some languages I've used took any negative number as True and any non-negative number as False; their conditional branch instructions looked at the sign bit only. C and its successors take 0 as False and any other number as True. Note that operations in C that deliver a newly calculated logical result return 0 or 1, never -1.

We shall take 0 for False and 1 for True. Here are the truth tables again.
 x y NOT y x AND y x OR y x XOR y 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 1 1 0 1 1 0

The bitwise operations in C work on all the bits in a word independently:
 a = ~y a[i] = NOT y[i] a = x & y a[i] = x[i] AND y[i] a = x | y a[i] = x[i] OR y[i] a = x ^ y a[i] = x[i] XOR y[i]

Note the difference between ~ and !: !0 is 1, but ~0 is all 1s (= -1 in 2s complement).

But what if you want to do the equivalent of testing a[i], setting a[1] = 1, or clearing a[i] = 0? We need things called shift instructions for that. Conventionally, a machine will have four of them:

• unsigned left shift (move all bits towards high end, shift 0s in at the bottom, don't treat sign specially).
• signed left shift (move non-sign bits towards high end, shift 0s in at the bottom).
• unsigned right shift (move all bits towards low end, shift 0s in at the top, don't treat sign specially).
• signed right shift (move non-sign bits towards low end, shift 0s (sign and magnitude) or copies of the sign bit (the others) in at the top).

On twos-complement machines, the signed left shift, if any, will do the same as the unsigned one, except perhaps for setting flags differently. For example, the ARM CPU in your phone offers LSL (logical=unsigned shift left), LSR (logical shift right), and ASR (arithmetic=signed shift right).

In C, there are two shift operators: << and >>. Historic practice was that
 unsigned << amount is logical shift left signed << amount is arithmetic shift left unsigned >> amount is logical shift right signed >> amount is arithmetic shift right

although the standard weaseled out and allows a compiler to use the unsigned right shift for signed numbers if it really feels like it. To get around this, Java offers
 signed << amount some left shift, you can't tell signed >> amount arithmetic shift right signed >>> amount logical shift right

So here is one way we can use these operators.
 To test a[i] ((a >> i) & 1) != 0 To clear a[i] = 0 a &=~ 1 << i To set a[i] = 1 a |= 1 << i

## Counting in base B

Suppose we want to count in base B. For example, suppose we wanted to write a function to test whether a given string is parenthesis balanced. Here's an attempt:

```boolean is_balanced(String s) {
int depth = 0;
final int n = s.length();
for (int i = 0; i < n; i++) {
switch (s.charAt(i)) {
case '(': depth++; break;
case ')': if (depth == 0) return false;
depth--;  break;
default : break;
}
}
return depth == 0;
}
```

It would be a very good idea to test this. The first time I typed it up I accidentally omitted the depth-- statement. I only noticed because the tests came out wrong. In order to test it, we'd like to generate strings of L characters drawn from the alphabet {'x','(', ')'}.

```final int L = 4 /* for example */;
final char[] digit = new char[L+1];
for (int i = 0; i <= L; i++) digit[i] = 'x' /*0*/;
while (digit[L] == 'x') {
final String s = new String(digit, 0, L);
System.out.println(s + " => " + is_balanced(s));
int i = 0;
while (digit[i] == ')' /* B-1 */) {
digit[i] = 'x';    /*  0  */
i++;
}
digit[i] = digit[i] == 'x' ? '(' : ')'; /* increment digit[i] */
}
```

Suppose we want to count using base 2. Then we could use an array of Boolean values:

```final boolean[] digit = new boolean[L+1];
for (int i = 0; i <= L; i++) digit[i] = false;
while (!digit[L]) {
... use digit[] ...
int i = 0;
while (digit[i]) {
digit[i] = false;
i++;
}
digit[i] = true;
}
```

But if L < 32, we can pack the bits into an int, or if L < 64, we can pack them into a (long) long.

```int a = 0;
while (((a >> L) & 1) == 0) {
... use a ...
int i = 0;
while (((a >> i) & 1) != 0) { /* test bit i */
a &=~ 1 << i; /* clear bit i */
i++;
}
a |= 1 << i; /* set bit i */
}
```

Or we can go a step further:

```for (int a = 0; a < (1 << L); i++) {
... use a ...
}
```

This is the fastest, simplest, way I know to do an exhaustive search over choices out of L things. The only way to improve on it is not to do an exhaustive search.

## Sets

A string of n bits (an n-bit integer) can represent a subset of {0...n-1}. The value i is in the set if and only if the ith bit is 1.
 x UNION y x | y x INTERSECTION y x & y x RELATIVE-COMPLEMENT y x &~ y x SYMMETRIC-DIFFERENCE y x ^ y i IS-ELEMENT-OF x (((x >> i) & 1) != 0 x UNION {i} x | (1 << i) x RELATIVE-COMPLEMENT {i} x &~ (1 << i)