Data Representation · 4 question types
Past paper frequency (2018 to 2024)
This topic accounts for approximately 4% of your exam marks.
File size calculations and lossless vs lossy compression are regular 3 to 4 mark questions.
RLE is a simple lossless method that works well on data containing long runs of the same value repeating in a row.
The decoder reverses this by expanding each pair back into count copies of value.
Example — Apply RLE to the string "AAAABBBCCDAA".
Walking through the string:
The encoded form is (4, A)(3, B)(2, C)(1, D)(2, A), often written as 4A3B2C1D2A. The original is 12 characters; the encoded form is 10 characters, so RLE has saved a small amount of space.
Example — Apply RLE to the binary sequence 11111100001111111100.
Walking through:
Encoded form: (6, 1)(4, 0)(8, 1)(2, 0), often written as 6,1 4,0 8,1 2,0. The original is 20 bits; the count values can be stored more compactly than the raw run, which is where the saving comes from.
Example — A row of pixels in a simple bitmap reads R R R R R B B B G G. Apply RLE.
The pixels split into three runs:
Encoded: (5, R)(3, B)(2, G), often written as 5R3B2G. Ten pixels have been encoded as three pairs.
RLE is only effective if the data has long runs:
In the worst case, RLE can actually increase the file size: if every value is different from the next, each value becomes a 1-value run, so the output is roughly twice the original size.
The decoded data is bit-identical to the original. RLE counts as lossless because expanding each (count, value) pair back to count copies of value perfectly restores the input.