Diskcomm file format version 3.2, revision 1.0 Diskcomm archives represent the contents of diskettes used for the Classic Atari computers. Various pieces of information related to the format and the contents of the diskette are stored in the archive file. To reduce the storage space requirements, compression algorithms are applied to the data. For some large archive files, it may be necessary to split the archive into multiple files, in order to be able to store the archive on diskettes. On an Atari disk, data is organized in sectors. These sectors are numbered starting from 1. There are various disk sizes. The most common ones are the standard diskettes. Common diskette formats are the single density diskette, which holds 720 sectors of 128 bytes, the enhanced density diskette, which holds 1040 sectors of 128 bytes, and the double density diskette, which holds 720 sectors of 256 bytes. There are various other formats, but the single density and the enhanced density are used most, since these are supported by the 1050 disk drive. Other formats require a XF551, 815, or some third party disk drive, like the Percom, Indus GT, Trak, Black Box with floppy board, MIO, and the HDI, to name just a few. Diskcomm will always use 1040 sectors for enhanced density diskettes, so for this type of format, the number of sectors in the archive is defined by the format. For single density and double density disks, the maximum number of sectors can be modified between 1 and 9999. By definition, the first three sectors of any Atari disk contain only 128 bytes, since this is considered the boot area. So the first three sectors of double density disks will contain only 128 bytes of data. Diskcomm still stores these sectors as sectors of 256 bytes within the archive, and the remaining 128 bytes will simply contain zeroes. The sectors of a disk are stored in the archive sequentially. Sectors of data within a Diskcomm archive are compressed. While creating the archive, Diskcomm examines the contents of each sector that is processed. Based on these contents, one of several compression algorithms is used to reduce the amount of storage required for representing the contents of this sector. Sectors that contain nothing but zeroes are considered empty sectors. Empty sectors are not stored in the archive. A flag will be set in the information stored for the preceeding sector, to indicate this. The next sector that contains data will be preceeded by its sector number. It is assumed that the diskette will be formatted before writing the archive back to a diskette, and thus that initially all sectors on the output disk will contain zeroes. Therefore, there is no need to store empty sectors in the archive. To be able to skip these sectors when writing the archive back to disk, the sector number included in the archive is used to skip these sectors. For sectors that contain data, the contents of the sector are compared to the contents of the last preceeding sector containing data. Empty sectors have no influence on this comparison, since they are skipped. There are five different algorithms that can be applied. Each of them is applied to the sector in turn, and if the result is successful, the resulting compressed data is appended to the archive buffer, with the type of compression prepended. Like noted before, if the preceeding sector was empty, the sector number is prepended to all of this, in the 6502 low/high byte order. Older versions of Diskcomm used a sixth algorithm. This is now obsoleted by one of the remaining five algoriths, so this old algorithm is no longer applied when an archive is being created. However, some very old archive may still contain a sector that was compressed by this algorithm. Compressing of sectors continues until memory runs out, or until there are no more sectors left to process. Due to memory limitations, there is a maximum of just over 24K of data that can be stored in the archive buffer. When appending the compressed data to the buffer causes the buffer to contain 24K of compressed data, the buffer is full, and it is flushed to disk. A system that has more than 64K of memory can hold multiple buffers before the data is flushed to disk. Each buffer load is considered to be a pass in the compression of the disk. A pass is an undefined number of compressed sectors, that is considered complete when hex 5F02 ( dec 24322 ) bytes of data or more has been accumulated. A pass can never contain more than hex 6002 bytes. Each pass starts with the header, which consists of two bytes. The first byte is either hex FA or hex F9. When the archive is split up into multiple files, this byte will contain hex F9, otherwise it will contain hex FA. The second byte of the header combines three pieces of information. The format of the original disk is indicated in bit 5 and bit 6 of the second byte. Bit value 00 is used for single density disks, bit value 01 is used for enhanced density disks, and bit value 10 is used for double density disks. Bit value 11 is undefined. Bits 0 to 4 are used to indicate the pass number. Each pass is numbered sequentially, starting at 1. since there are 5 bits available for this, the highest possible pass number is 31. Therefore, the largest archive will be no larger than 31 times 24K, unless the pass count is allowed to roll over to zero. The high order bit of the second byte (bit 7) is set when this pass is the last pass. Since compression is started before asking what the user wants to do, the question of dividing the archive into smaller files is only presented to the user if there is more than one pass. If all data can be stored in one pass, this question is not presented, and an archive with header byte hex FA is created. The first sector within a pass will always be preceeded by its sector number. Format descripton = = = + + = = + = FA | F9 = 00 | 80 = 00 | 20 | 40 = 45 = 00 | 80 = 41 | 42 | 43 | 44 | 46 | 47 Format description in plain English. Diskcomm archive A Diskcomm archive consists of one or more passes. When an archive is split into multiple files, each pass is stored in a separate file. Pass A pass consists of an archive type code, followed by pass information, followed by the starting sector number, followed by one or more sector data packets, followed by the end of pass code. Archive type The archive type indicates whether this is a multi file archive or not. Sector data A sector data packet consists of one byte that indicates the compression type for the sector. After the compression type, the compressed data for the sector follows. The contents of this depends on the type of compression, and it can contain any number of bytes, from zero up to the length of the sector for the type of disk, either 128 or 256 bytes. The high order bit of the compression type is used to indicate whether or not a sector number will follow the compressed data. If this bit is zero, a sector number will follow the data. If this bit is one, there will not be a sector number following the compressed data. Sector number An unsigned sector number, which is two bytes. The first byte is the low order portion of the number, the second byte is the high order portion of the number. Normally ranging from 1 to 9999. End of pass The value hex 45. Compression type One of the following hex values: 41, 42, 43, 44, 46 or 47. The meaning of these values is described below. Type 41, modify begin. The compression is relative to the previous sector. The sector data contains only the beginning portion. The last portion is not changed. The first byte of the sector data specifies at what offset to start modifying the sector. The remaining bytes of the sector data ar used to modify the beginning portion of the sector. This modification takes place starting at the byte at the start offset, working towards the beginning of the sector, up to and including the byte at offset zero, the first byte of the sector. This implies that the data bytes are stored in a reverse order in the sector data. Type 42, 128 byte DOS sector. This is an obsolete compression type, that was used by early versions of Diskcomm. Earlier versions of Diskcomm supported only single density diskettes, so this type of sector is always 128 bytes long. Programs that decode archives should be aware of this. Using it for creating new archives is not recommended. The sector data contains five bytes. The first byte of the sector data is used to initialize the first 124 bytes of the sector. The remaining four bytes are stored in the last four bytes of the sector. Type 43, compressed sector. The sector data contains substrings. These substrings alternate between uncompressed and compressed, starting with an uncompressed substring. Each of these substrings starts with a byte that specifies the ending offset of the resulting data in the sector. When this ending offset position is reached, the end of the substring is reached, and the byte at this ending offset is the starting position for the next substring. The starting position for the first substring is at offset zero. An uncompressed substring will contain as many bytes as are needed to fill the sector from the start position up to, but not including the end offset. For uncompressed substrings, if the starting position offset is equal to the ending offset, there is no further data, so in effect, this is a null string. This is used when there are two portions of data within the sector that can be compressed, without other data in between these portions. The uncompressed substring must be present, therefore a null string must be used in this case. Compressed substrings are always two bytes in length. The compressed substring starts with a byte that indicates the ending offset. The second byte contains the fill character. The portion of the sector starting at the start offset, up to, but not including the ending offset, is set to the value of this fill character. After the compressed substring, another uncompressed substring follows. For double density disks, the ending offset for the last substring is 256. Since there is only one byte to represent the ending offset, this is stored as zero. However, zero is an offset that can be used for the first uncompressed string, to indicate that the first uncompressed string is a null string. The end of this type of compressed sector is reached when all bytes in the sector have been processed. This can occur at the end of an uncompressed substring. In this case, there will not be a compressed substring following the uncompressed string. Likewise, if it occurs at the end of a compressed substring, there will not be an uncompressed string following it. Type 44, modify end. The compression is relative to the previous sector. The sector data contains only the ending portion. The beginning portion is not changed. The first byte of the sector data specifies at what offset to start modifying the sector. The remaining bytes of the sector data ar used to modify the ending portion of the sector. This modification takes place starting at the byte at the start offset, up to, and including the last byte of the sector. Type 45, end of pass. This compression type indicates the end of a pass, so it is not a real compression type. There is no sector data for this type. For a multi file archive, this indicates the end of the file. The archive is continued in the next file, unless this pass was the last pass. For single file archives, this indicates that the next pass follows within this file, unless this was the last pass. The next pass starts with a header again, followed by a sector number. Type 46, same as before. This compression type indicates that the data for this sector is identical to the data of the previous non-zero sector. There is no sector data for this type. Type 47, uncompressed sector. The sector data contains the number of bytes required to fill an entire sector, either 128 or 256 bytes. No compression of any kind is performed on this sector type. Previous sector. The buffer that holds the contents of the previous non-zero sector is initialized at the start of a pass if the archive is a multi file archive. For single file archives, this buffer is cleared at the start of the first pass only. Known bugs and anomalies. It looks like Diskcomm has some slight problems. Double density sectors are 256 bytes long. If the buffer contains hex 5EFF bytes, and the sector cannot be compressed, and a sector number must be included, we must add 259 bytes to the buffer. To mark the end of pass, we have o add either one hex 45 byte, or hex 45 00 45. This might add up to three extra bytes. The buffer starts at hex 2F00, and if we would add these 259 bytes to the hex 5EFF bytes, we would write up into the Diskcomm code which starts at hex 9000. This area happens to hold the maximum sector number as input by the user. This makes the pass longer than hex 6002 bytes. On reading, this is also a problem. Diskcomm will not store the first two bytes. The header is processed first. Then it tries to read hex 6000 bytes. Within these hex 6000 bytes, the end of pass compression type must be included. This will be missing though, so Diskcomm will not be able to process the file. This problem only occurs with double density disks in the specified exceptional conditions.