About Bit manipulation

Bit manipulation

In programming, particularly Wimp programming, there are many occasions when there is need to set, change or extract individual bits within a 'word' or within a single byte, and it's worth reviewing the general principles of effecting these actions.

Indirection operators

First, let's recap the main two indirection operators ? and !.

? concerns a single byte (8 bits, conventionally numbered from 0-7, with Bit 0 the right-most bit).

The expression:

?A% means "the contents of " (the value held in) the single byte whose memory address is A% - and it can be used on both sides of an equation. Thus:

?A%=?(B%+16) "make the contents of the byte at address A% equal the contents of the byte at address (B%+16)"

?A%=?B% + 16 "make the contents of the byte at address A% equal to 16 plus the contents of the byte at address B%".

The form A%?n% is also allowed. This means the same as ?(A%+n%) where n% is a positive integer or 0.

!A% concerns a 4-byte 'word' (32 bits, conventionally numbered from 0-31, with Bit 0 the right-most bit).

All the above meanings for ? translate across to ! except that they now apply to 32 bits instead of 8.

Numbers or flags?

It is also worth distinguishing early on between using a byte or word to hold a number as opposed to using the bits of a byte/word as 'flags' whose state represents something quite unrelated to the actual number represented. Thus, using something like this:

c%!0 = a%!0 + b%!0

usually means that we are interested in the number values held in the 4-byte words at memory addresses a%, b% and c%.

However, if we see the sequence:

iconflags% = &7000139
block%!20 = iconflags%

we are pretty sure that the bytes in the memory block starting at block%+20 are being used differently and their individual bit states are more important than the number that happens to be represented by them.

We can, of course, use bit manipulation for both cases, but it helps if you know which type you are playing with before you start!

By and large, using bit manipulation on numbers is only helpful for multiplication/division by 2. (Each left-shift of the bits in the number multiplies it by 2, each right shift does a DIV 2 operation.)

The rest of this article will therefore concentrate on the other usage, i.e. where we are mainly interested in whether the individual bits are set or unset - or where we want to write/read a value from a group of bits within a byte or word.

The tools we have available include the bit-wise operators AND, OR, EOR, etc, plus the bit-shift operators <<, >> and >>>.

AND, OR and EOR

AND, OR and EOR each operate on two values/operands (e.g. value1 AND value2) to produce a single output value. The important thing is that they operate separately on corresponding pairs of bits within each operand.

At the bit level:

AND gives an output of 1 only if both its operands are 1, otherwise the output is 0;

OR gives an output of 1 if either (or both) operands are 1, otherwise the output is 0;

EOR gives an output of 1 if either (but not both) operands are 1, otherwise the output is 0.

Let's start with a single byte, i.e. comprising 8 bits, conventionally numbered from 0-7 with Bit 0 being the least significant (the right-most) bit. Hence a single byte can hold values:

%00000000  (0)

%11111111  (255)

If we want to evaluate manually:

%00000011 AND %00000101

the method to use is to write the two operands one above the other, like this:

00000011
00000101  AND

and then working from one end to the other (it doesn't matter which way) we evaluate the AND result of each vertical pair of bits in turn. In the above case, the result is:

00000001

because 0 AND 0 = 0, 0 AND 1 = 0, 1 AND 0 = 0 and 1 AND 1 = 1.

Using a similar process for OR and EOR, with the same operands, the results will be, respectively:

00000111
00000110

I have deliberately refrained from using the decimal equivalents here because, for instance:

3 AND 5 = 1

doesn't show what is really going on and can be a bit bewildering.

As a practical example, have a look at the ASCII numbers for the upper case alphabet letters A-Z. They are 65-90. Their lower-case counterparts are 97-122. Each upper case letter is therefore paired with its lower case version separated by 32.

Aha (I hear you exclaim), 32 has just got to be significant in binary terms. And you're right, of course! 32 is %00100000. So, if we take any upper case letter and OR its ASCII value with 32 we will get its lower case equivalent e.g.:

%01000001 OR %00100000 = %01100001   (or 65 OR 32 = 97 in decimal)

Or, more generally, ORing with 32 (%00100000, only its Bit 5 is 'set') only affects Bit 5 of the other operand, because Bit 5 is ORed with 1, which results in it being 'set' - whereas all the other bits of %01000001 (65) are ORed with 0 and are therefore unchanged.

So, if we want to end up with particular bits in a byte being set, we simply arrange to OR the original value with the value having just those bits set, e.g. to set, say, bits 0, 1 and 6 of a single byte we need to OR the byte value with %01000011 (67).

Conversely, if we want to produce an 'unset' bit (or bits), we would need to AND the original value with a value having all the bits set except the bit(s) we are interested in. Thus:

%01100001 AND %11011111 = %010000001  (97 AND 223 = 65)

This is the reverse of the ASCII A/a example above.

EOR is typically used in the same way as OR except the result is that the bit(s) in question have their value changed from 0 to 1 (and vice versa) e.g.

%0110001 EOR %00110000 = %01010001

This is very useful for toggling an effect between two possible states e.g. icon enabled/disabled.

It is worth noting that you can also 'unset' a bit by first using OR (to positively set it) followed by EOR (to toggle it to unset).

Bit-shift operators

Because our brain power is limited (or mine is anyway!) the above operators get somewhat unwieldy to use for more than about 1 byte/8 bits and that's where the shift operators come into their own (and some people prefer to use only them, anyway). The operators are <<, >> and >>>, and it is vital to remember that they all act on a 4-byte (32-bit) 'word'.

The two operators << and >>> work exactly the same but in opposite directions. The format of the use is:

value%<<numberofshifts%

For example:

%1011<<6

which results in:

%1011000000 (or, more correctly, %00000000000000000000001011000000)

i.e. bits of the original value have been shifted six times to the left and trailing zeros have been added to make up the space. At the left-hand end, bits are simply 'dropped off the end' and are lost - so %1011<<29 would give a result

%01100000000000000000000000000000.

Similarly:

%110110010 >>3 = %110110 (with 26 leading zeros implied!)

The third shift-operator is >> and is generally more useful for manipulating numbers because it shifts the bits in basically the same way as >>>, but preserves the original value of the left-most bit (Bit 31) which, for integer numbers, is 1 for negative numbers and 0 for positive numbers. Note that, if you start with a negative number (Bit 31 =1) then using, say, >>3 will result in Bits 29-31 all being 1 - as you would expect when you start to think about it.

However, as we shall see, for many non-number uses, it doesn't matter whether we use >> or >>> because we are usually more interested in the original contents of a bit (or bits) rather than what the left-most bits are filled with after right shifting.

In practice, << is often used to write bit values into a byte or word, whereas >> or >>> is used to read values.

Suppose we want to find the value of a particular bit in a single byte - say, Bit 3. One way is to use:

value2%=(value1%>>bitofinterest%) AND 1

For example:

flag%=(block%?4)>>3 AND 1

will read the value of just Bit 3 of whatever is in the byte at address block%+4. Let's see why. Assume the value stored in block%+4 is %01011101.

Then %01011101>>3 will be %00001011 and:

%00001011 AND %00000001 = %00000001

i.e. the result is 1, which is the value of the original Bit 3.

If we want to ensure that a particular single bit is set in an existing word - without altering any of the other bits:

!word% = !word% OR (1<<17)

will set Bit 17 without affecting the other bits (because OR 0 leaves the original value unaltered).

If we want to insert a value into any bits which we know are all currently unset (0) then we could use ordinary addition - because this has the same effect as OR in these specific circumstances. For example:

!word%=&FFFFFF00 creates a value with all bits set except 0-7

!word%= !word% + (%1010<<3)

would make Bits 0-7 become %01010000 and all the rest would be unaltered, i.e. would still be set (1).

It therefore follows (I can hear my Maths teacher saying!) that if we want to insert a value into some bits (say, Bits 12-15) of an existing word - whatever their current state - and without disturbing whatever is in the other bits, we first need to ensure that bits 12-15 are cleared (unset) and only then insert the required value. Thus:

!word% = !word AND &FFFF0FFF

to unset bits 12-15 leaving the rest untouched, then:

!word% = !word + (%1101<<12) to insert the value into the cleared bits.

However, there is a disadvantage in using + rather than OR. You will note that brackets have been put around the shift operation here. This is vital, because the shift operators have a lower precedence/priority than + but higher than AND/OR/EOR. This means that you can very easily get the wrong result if you use, instead:

!word%=!word% + %1101<<12

Try it.

For this reason, I feel it is always safer to use OR rather than +.

General rules

I think that gives the gist of bit manipulation, and I will finish off by recapping some general rules:

any single bit value is preserved if it is ANDed with 1 (or ORed with 0)
any single bit value is unset (becomes 0) if it is ANDed with 0
any single bit value is set (becomes 1) if it is ORed with 1
any single bit value is changed (from 0 to 1, or vice versa) if EORed with 1

NOT

Finally, there are other bit-wise operators, but NOT is probably the only one of significance in our current context. Strictly, NOT acts on a single 4-byte word and simply reverses every one of its 32 bits, i.e. each 1 is made 0 and vice versa. However, the most frequent use of NOT in programming is simply to change TRUE into FALSE or vice versa - and if you recall that TRUE is represented by -1 (&FFFFFFFF) you will see that the TRUE/FALSE relationship is just one example of the NOT bit-reversal process.

Source:	Archive Magazine 14.5 - "Learners' Column"
Publication:	Archive Magazine
Contributor:	Ray Favre