Implementing CRC32 in Python

Aug 15, 2024

In this post I’ll explain how cyclic redundancy checks (CRC) works, and implement a basic version of CRC32 in python. In my opinion, the best way to go about learning something is by doing it, so I recommend you to implement this yourself. If you want to do some CRC calculations, here is a good calculator.

Why?

Cyclic redundancy checks are numbers you append to moving data (either on a network, or even transferring from storage to memory) that allows you to confirm that the data has no errors. Moving data around without some sort of error detection would be pretty useless.

How?

The maths involved is interesting but because I’m not an expert on finite fields I’m not going to delve too deeply into things. You can find decent books on the theory behind Galois fields and proofs on how well CRC works on the internet. The main principle of cyclic redundancy checks is to represent that data as a polynomial in GF(2). This means that the data will be a polynomial, where the coefficients are either 1 or 0. Since data is represented in binary in computers, this is perfect.

Here is an example:

\(11010 \Rightarrow 1x^{4} + 1x^{3} + 0x^{2} + 1x^{1} + 0x^{0} \Rightarrow 1x^{4} + 1x^{3} +1x^{1}\)

We then decide on a generator polynomial. We’ll use this to calculate the number that we append at the end of the message. An example polynomial might be:

\(111 \Rightarrow 1x^2 + 1x^1 + 1x^{0} \Rightarrow 1x^{2} + 1x^{1} + 1\)

We will use this polynomial to divide into our data polynomial. This is the exact same long division algorithm you would have learned in school for dividing polynomial, and finding their remainder. We’re using GF(2) arithmetic. Subtraction and addition are equivalent and are the same as an XOR operation.

The remainder 01 is our redundant information that we will append to our outgoing message 11010. The sender will transmit 1101001 and the receiver, who also has the generator polynomial will divide 111 into 11010 and will check to see if the remainder is the same as what was received. I left out the quotient because it’s irrelevant to CRC. All we need is the remainder.

Basic Implementation

Our generator polynomial for CRC32 will be 32bits long as the name implies. The polynomial we’ll be using is 0x04C11DB7. The biggest thing to note here is that in an actual implementation, we don’t need the leading 1, since everytime we XOR 1 with 1 it will be 0.

# Basic implementation if crc32
def crc32(message:bytearray, poly:int):
    bitmask = 0xFFFFFFFF
    crc = 0

    for byte in message:
        for _ in range(8):
            b = byte & (1<<7) != 0
            divide = bitmask if (crc & (1<<31) != 0) else 0
            crc = (crc << 1) | b
            crc ^= (poly & divide)
            byte <<= 1
    return (crc & bitmask) # to keep it at 32bits long

Some little details

There are a few things that can be improved upon above, before we go implementing anything.

First, it’s going to be a pain having to wait for the entire message to come in before we get to calculate the remainder. We want this to be really fast, and start calculating the remainder while the message is coming in.

To do this, in our little example above, we will append 2 zeroes to our outgoing message, before we calculate the remainder. Like so

So, our outgoing message will be 1101011. This allows will allow the receiver to calculate the remainder as the message is coming in. If the remainder is 0, we know (or most likely) have the correct data.

What happens if leading 0’s get added/deleted during transmission? It will still be divisible by the generator, but obviously the wrong data. The solution is to prepend a certain amount of 1’s (in our case 32, since we’re going to implement a 32 bit CRC) to the data.

What happens if ending 0’s get added/deleted during transmission? Well this wouldn’t be a problem if our message (data + remainder) ended with a 1, but if it ends with a 0 this error may go undetected. In this case, we will end the function with an XOR of some value to fix this problem.

The resulting code becomes:

# Better implementation of crc32, leading/trailing zeros
def crc32_improved(message:bytearray, poly:int, init:int=0, final_xor:int=0):

    bitmask = 0xFFFFFFFF
    crc = init

    for byte in message:
        for _ in range(8):
            b = bitmask if byte & (1<<7) != 0 else 0
            divide = bitmask if (crc & (1<<31)) != 0 else 0
            crc = (crc << 1) ^ (poly & (b ^ divide))
            byte <<= 1
    return (crc & bitmask) ^ final_xor

There are some other optimizations you could make. You could make a look up table that does some caching like so:

from functools import lru_cache
@lru_cache
def create_lut(poly): return [crc32_improved(bytearray([x]), poly) for x in range(256)]
# Implementation of crc32 with look up table
def crc32_lut(message:bytearray, poly:int):
    """
    Generates a crc with a look up table for improved speed
    """
    bitmask = 0xFFFFFFFF
    crc = 0

    lut = create_lut(poly)
    for m in message:
        index = (int(m) ^ (crc >> 24)) & 0xFF
        crc = (crc << 8) ^ lut[index]

    return crc & bitmask

Silicon Chip Cookies

Discussion about this post