LZW DATA COMPRESSION                                             
          --------------------                                             
                                                                           
          Written by: WosFilm (Sweden)                                     
                                                                           
                                                                           
          -"OH NO!!" I hear all of you cry out. -"It's that maniac WosFilm 
          from Sweden writing a lot of crap again!" But no, be calm! This  
          is serious stuff!                                                
                                                                           
          I don't know which techniques all of you use when compressing    
          data, but here is a rather good one called LZW (Lempel- Ziv-     
          Welch) which is best used to compress text, screens etc. (highly 
          redundant stuff...). I haven't made any programs, because right  
          now I haven't got the time (and I'm not a very good coder        
          anyway...) but hopefully I might get a chance to see what you    
          folks can make out of it.                                        
                                                                           
          The algorithm is actually very simple: LZW compression replaces  
          strings of characters with single codes. It does not do any      
          analysis of the incoming characters. Instead, it just adds every 
          new string of characters it sees to a table of strings.          
          Compression occurs when a single code is output replacing the    
          string of characters. Many other techniques use this method too, 
          but they have to save the stringtable along with the compressed  
          code, when the LZW stringtable automatically builds up when      
          compressing as well as when decompressing.                       
          One of the problems when using this algorithm on an 8-bit        
          computer is that the code generated when compressing, must have  
          more bits in it than a single character, since codes 0-255 are   
          by default assigned to the standard character set. The remaining 
          codes are assigned to strings as the algorithm proceeds. E.g.    
          when running with 12-bit codes, 0-255 refer to individual bytes  
          and 256-4095 refer to strings from the table.                    
                                                                           
          Here are the compress and decompress algorithms in a form        
          everybody ought to understand:                                   
                                                                           
                                                                           
          LZW COMPRESS:                                                    
                                                                           
          STRING=get input byte                                            
          WHILE there are still input bytes DO                             
            CHAR=get input byte                                            
            IF STRING+CHAR is in the table THEN                            
              STRING=STRING+CHAR                                           
            ELSE                                                           
              output the code for STRING                                   
              add STRING+CHAR to the table                                 
              STRING=CHAR                                                  


            END of IF                                                      
          END of WHILE                                                     
          output the code for STRING                                       
                                                                           
                                                                           
          LZW DECOMPRESS:                                                  
                                                                           
          Read OLDCODE                                                     
          output  OLDCODE                                                  
          WHILE there are still input bytes DO                             
            Read NEWCODE                                                   
            IF NEWCODE is not in the table THEN                            
              STRING=get string for OLDCODE                                
              STRING=STRING+CHAR                                           
            ELSE                                                           
              STRING=get string for NEWCODE                                
            END of IF                                                      
            output STRING                                                  
            CHAR=first character in STRING                                 
            add OLDCODE+CHAR to the table                                  
            OLDCODE=NEWCODE                                                
          END of WHILE                                                     
                                                                           
                                                                           
          A sample string (/WED/WE/WEE/WEB/WET) is used to demonstrate the 
          algorithms:                                                      
                                                                           
                                                                           
          Compression process:                                             
                                                                           
          Input string: /WED/WE/WEE/WEB/WET                                
                                                                           
          CHAR       Code       String                                     
          input      output     table                                      
          ---------  ---------  ---------                                  
           / W         /        256=/W                                     
           E           W        257=WE                                     
           D           E        258=ED                                     
           /           D        259=D/                                     
           W E        256       260=/WE                                    
           /           E        261=E/                                     
           W E E      260       262=/WEE                                   
           / W        261       263=E/W                                    
           E B        257       264=WEB                                    
           /           B        265=B/                                     
           W E T      260       266=/WET                                   
           <EOF>       T                                                   
                                                                           
          If you step through the start of the algorithm for this string,  
          you can see that in the first pass through the loop the system   


          performs a check to see if the string /W is in the table. Since  
          it doesn't find the string, it outputs the code for / and adds   
          the string /W to the table. After the system reads in the third  
          character, E, the second string code, WE, is added to the table, 
          and the code for W is output. This process continues until, in   
          the second word, the characters /W are read in , matching the    
          string number 256. In this case, the system outputs code 256,    
          and adds a 3-character string, /WE, to the table. And so it      
          continues. As you can see the table fills up rapidly, and that's 
          another problem with this technique (unless you have a 130XE     
          maybe...).                                                       
          Step through the uncompressing process below to see how the      
          string table is built up to be an exact duplicate of the table   
          in the compressing process.                                      
                                                                           
          Decompression process:                                           
                                                                           
          Input codes: / W E D 256 E 260 261 257 B 260 T                   
                                                                           
          NEW    OLD  STR.  CH.  STRING                                    
          CODE   CODE OUTP.      TABLE                                     
          ------ ---- ----- ---- -------                                   
           /      /    /                                                   
           W      /    W     W   256=/W                                    
           E      W    E     E   257=WE                                    
           D      E    D     D   258=ED                                    
          256     D    /W    /   259=D/                                    
           E     256   E     E   260=/WE                                   
          260     E    /WE   /   261=E/                                    
          261    260   E/    E   262=/WEE                                  
          257    261   WE    W   263=E/W                                   
           B     257   B     B   264=WEB                                   
          260     B    /WE   /   265=B/                                    
           T     260   T     T   266=/WET                                  
          <EOF>                                                            
                                                                           
                                                                           
          Results of LZW compression:                                      
          Shown by the examples the LZW technique works best when          
          confronted with data streams that have any type of repeated      
          strings. Because of this, it does extremely well when            
          compressing text (compression levels of at least 50%) or highly  
          redundant database files (as best about 10% of the original      
          size!). Other data files, like programs, obviously varies a lot. 
          The technique does not well when compressing very large files,   
          since there is a fixed number of strings in the table (4096      
          strings for 12-bit codes etc.), and when all strings are filled  
          the compressing ratio begins to degrade if the later section of  
          the file has other characteristics than the section for the      
          string table. One way of solving this can be to keep track of    


          the compression ratio, and after a certain amount of degradation 
          the table is flushed and gets rebuilt from scratch. Another      
          method is to monitor how frequently the strings are used, and    
          erase those strings that are rarely used.                        
          So the main problem with LZW compression is the management of    
          the string table; searching through the table..., it fills up    
          quickly..., it has not a fixed size, although there is a fixed   
          number of strings, since the size depends on the sum of all      
          strings, and the individual strings differ in size... But I'm    
          sure some of you can get around these problems, and I hope you   
          do because this technique seems to be very efficient although    
          the algorithms are extremely simple. If any of you manages to    
          make something out of this, I'd be glad to know...               
                                                                           
                                                                           
                  * WosFilm *                                              
                                                                           
          (examples taken from Dr.Dobb's Journal, October 1989)