DATA COMPRESSION #2
          -------------------
          
          By Frankenstein
          Thanx to Spookryder
          
          In the previous issue The Gatekeeper (TG) already told you about
          lots of different methods. Read it first if you haven't done it
          already!
          
          CLEAR UP TIME!
          
          You poor guys read about LZW, packers, arcers and MAGNUS
          crunchers. For those who aren't familiar with this stuff, it
          could be very confusing.
          What you have to know is that it all has to do with data
          compression!
          
          Data compression can be divided into two groups:
          
          - Substitution methods
          - Statistic methods
          
          TG mentioned this already in issue #3. He was talking about
          'dictionary based' methods, which we call substitution methods
          during this article.
          
          
          Substitution methods:
          
          - Run length encoding
          - Liniar pattern matching
          - Lempel-Ziv-Welch (LZW)
            (used by ARC and PKZIP)
          
          Since WosFilm wrote two articles about LZW already, we won't
          talk about this method here.
          
          Statistic methods:
          
          - STATIC HUFFMAN ENCODING
            or: minimal redundancy method
          - DYNAMIC HUFFMAN ENCODING
            also called adaptive Huffman
            (used by LHARC)
          - ARITHMETIC ENCODING
          
          Okidoki! (I like this phrase from M.A.S.H.) Let's begin....
          
          
          RUN LENGTH ENCODING (RLE)
          -------------------------
          This is a fairly simple method which is very useful for
          compressing pictures and other sequentialy repeating data.
          With run-length encoding we replace a sequence of the same bytes
          by a code. This code includes:
          
          - UNIQUE byte (code mark)  [BYTE]
          - replace byte IMAGE       [BYTE]
          - AMOUNT of replaced bytes [BYTE]
          
          So there has to be one byte which isn't used in the entire file.
          This byte will be used as a code mark for the unpacker.
          Because a code requires three bytes (UNIQUE, IMAGE and AMOUNT)
          it's only useful to replace sequences which are longer than
          three bytes.
          But what if there's no UNIQUE byte in the file? Well, a UNIQUE
          byte doesn't have to be a never occuring byte! In fact you only
          have to know which byte occures the least. Whenever a UNIQUE
          byte occures in the file, it should be replaced by a code. The
          replace byte IMAGE is the same as the UNIQUE byte in such case.
          The disadvantage of this system is that you loose two bytes
          every time a single UNIQUE byte appears in the file. But since
          the UNIQUE byte is the least occuring byte of the file, it won't
          be noticed in most cases.
          
          The method:
          
          - Create a table of frequences (for each byte a counter of how
          much that byte occures in the file).
          - The UNIQUE byte should now be set to the byte image of the
          lowest frequency in the table.
          - Now replace byte sequences (longer than three) by a code
          (UNIQUE, IMAGE, AMOUNT).
          - When a UNIQUE byte is found, it has to be replaced by a code
          (also if it's only one single byte).
          
          Mostly sequences of UNIQUE bytes never happen in a file.
          Therefore you could also choose for a system of replacing the
          UNIQUE bytes by a code of two bytes, namely:
          
          - UNIQUE byte
          - UNIQUE byte (byte image)
          
          When the UNPACKER finds a sequence of two UNIQUE bytes it
          shouldn't expect an AMOUNT anymore.
          
          Koala pictures are packed with RLE, but it's slightly different
          from the method described here. Mostly koala packes a few bytes
          better than the system with a unique byte.


          >> How come? <<
          Ok, let's take a look....
          
          Example data stream:
          
          4,4,4,4,4,4,4,4,5,6
          7,8,9,3,3,3,3,3,3,3
          
          First eight bytes can be replaced. Then there are five bytes
          which can NOT be replaced, followed by a replaceable sequence of
          seven bytes.
          The packed file will be like this:
          
          [128+8,4], [0+5,5,6,7,8,9], [128+7,3]
          
          Indication byte:
          128 (bit 7 set)   = fill byte follows
            0 (bit 7 clear) = not packable
          
          The amount is included in the indication byte thus it can
          replace sequences of 127 bytes (128+127=255).
          But also notice that it now can replace sequences of three bytes
          because it only uses two bytes for the code!
          
          
          LINEAIR PATTERN MATCHING
          ------------------------
          This method also replaces strings by a code. It searches for
          strings which appear for the second time in the file. A buffer
          is used to keep track of the string being searched.
          The code is build as follows:
          
          - UNIQUE byte
          - POSITION
          - match LENGHT
          
          And now you expect me to tell you how this method works right?
          Wrong! I'm getting a little tired of this article. Next time
          you'll read more about this method. Then I'll also describe the
          static and dynamic Huffman methods. Also look out for the source
          code of the static Huffman depacker.
          Real freaks can try to rip this routine from the 'Watch Diz'
          demo! (I called it the GIGA unpacker).
          
          If you know more about the methods mentioned in this article, or
          maybe know about other methods, please write to Mega Magazine.
          
          <THE END>