From: Ernest R. Schreurs Subject: DCM / Diskcomm file format specifications revision 1.0 Date: Thursday, February 05, 1998 5:50 PM Here are the specifications for the Diskcomm file format. Sorry that this post is rather long, but over time, lots of people have asked for it, so I hope it will be appreciated anyway. Please send me some feedback if you think this text is unclear. Keep those XL's/XE's humming. Diskcomm file format version 3.2, revision 1.0, February 1998. Diskcomm archives represent the contents of diskettes used for the Classic Atari computers. Various pieces of information related to the format and the contents of the diskette are stored in the archive file. To reduce the storage space requirements, compression algorithms are applied to the data. For some large archive files, it may be necessary to split the archive into multiple files, in order to be able to store the archive on diskettes. On an Atari disk, data is organized in sectors. These sectors are numbered starting from 1. There are various disk sizes. The most common ones are the standard diskettes. Common diskette formats are the single density diskette, which holds 720 sectors of 128 bytes, the enhanced density diskette, which holds 1040 sectors of 128 bytes, and the double density diskette, which holds 720 sectors of 256 bytes. There are various other formats, but the single density and the enhanced density are used most, since these are supported by the 1050 disk drive. Other formats require a XF551, 815, or some third party disk drive, like the Percom, Indus GT, Trak, Black Box with floppy board, MIO, and the HDI, to name just a few. Diskcomm will always use 1040 sectors for enhanced density diskettes, so for this type of format, the number of sectors processed is defined by the format. For single density and double density disks, the maximum number of sectors can be modified between 1 and 9999. By definition, the first three sectors of any Atari disk contain 128 bytes, since this is considered the boot area. So the first three sectors of double density disks will also contain only 128 bytes of data. Diskcomm still stores these sectors as sectors of 256 bytes within the archive, and the remaining 128 bytes will simply contain zeroes. The sectors of a disk are stored in the archive sequentially. Sectors of data within a Diskcomm archive are compressed. While creating the archive, Diskcomm examines the contents of each sector that is processed. Based on these contents, one of several compression algorithms is used to reduce the amount of storage required for representing the contents of this sector. Sectors that contain nothing but zeroes are considered empty sectors. Empty sectors are not stored in the archive. A flag will be set in the information stored for the preceding non-empty sector, to indicate this. This preceding sector will be followed by the sector number of the next sector that contains data. It is assumed that the diskette will be formatted before writing the archive back to a diskette, and thus that initially all sectors on the output disk will contain zeroes. Therefore, there is no need to store empty sectors in the archive. To be able to skip these sectors when writing the archive back to disk, the sector number included in the archive is used to skip these empty sectors. For sectors that contain data, the contents of the sector are compared to the contents of the last preceding sector containing data. Empty sectors have no influence on this comparison, since they are skipped. Like noted before, there is a flag that indicates that a sector number follows the sector data. If this flag indicates that a sector number follows the sector data, the number of the current sector is appended to the archive buffer, in the 6502 low/high byte format. Then an attempt is made to compress the current sector. There are four different compression algorithms that can be applied. Each of them is applied to the sector in turn, and if the result is successful, the resulting compressed data is appended to the archive buffer, with the type of compression pre-pended. Older versions of Diskcomm used a fifth algorithm. This is now obsoleted by one of the remaining four algorithms, so this old algorithm is no longer applied when an archive is being created. However, some very old archive may still contain a sector that was compressed by this algorithm. If the sector data cannot be compressed by one of the four algorithms, the data is stored uncompressed. Compression of sectors continues until memory runs out, or until there are no more sectors left to process. Due to memory limitations, there is a maximum of just over 24K of data that can be stored in the archive buffer. When appending the compressed data to the buffer causes the buffer to contain about 24K of compressed data, the buffer is full, and it is flushed to disk. A system that has more than 64K of memory can hold multiple buffers before the data is actually flushed to disk. Each buffer load is considered to be a pass in the compression of the disk. A pass is an undefined number of compressed sectors, that is considered complete when hex 5F02 ( dec 24322 ) bytes of data or more has been accumulated. The end of pass information is then appended to the pass. A pass must be no larger than hex 6002 bytes. Each pass starts with the header, which consists of two bytes. The first byte is either hex FA or hex F9. When the archive is split up into multiple files, this byte will contain hex F9, otherwise it will contain hex FA. The second byte of the header combines three pieces of information. The format of the original disk is indicated in bit 5 and bit 6 of the second byte. Bit value 00 is used for single density disks, bit value 01 is used for enhanced density disks, and bit value 10 is used for double density disks. Bit value 11 is undefined. Bits 0 to 4 are used to indicate the pass number. Each pass is numbered sequentially, starting at 1. since there are 5 bits available for this, the highest possible pass number is 31. Therefore, the largest archive will be no larger than 31 times 24K, unless the pass count is allowed to roll over to zero. The high order bit of the second byte (bit 7) is set when this pass is the last pass. Since compression is started before asking what the user wants to do, the question of dividing the archive into smaller files is only presented to the user if there is more than one pass. If all data can be stored in one pass, this question is not presented, and an archive with header type hex FA is created. The first sector within a pass will always be preceded by its sector number. Format description, values are in hex: = {pass} = {sector data} = + + = [compressed data] [sector number] = + = F9 | FA = 00 | 80 = 00 | 20 | 40 = 45 = 00 | 80 = 41 | 42 | 43 | 44 | 46 | 47 = 0001 - 270F = 01 - 1F = Sector contents, see below. Format description in plain English. Diskcomm archive: A Diskcomm archive consists of one or more passes. When an archive is split into multiple files, each pass is stored in a separate file. Pass: A pass consists of an archive type code, followed by pass information, followed by the starting sector number, followed by one or more sector data packets, followed by the end of pass code. Archive type: The archive type indicates whether this is a multi file archive (F9) or not (FA). Sector data: A sector data packet consists of one byte that indicates the content type for the sector data packet. After the content type, the compressed data for the sector follows. The contents of this depends on the type of compression, and it can contain any number of bytes, from zero up to the length of the sector for the type of disk, either 128 or 256 bytes. The high order bit of the content type is used to indicate whether or not a sector number will follow the compressed data. If this bit is zero, a sector number will follow the data. If this bit is one, there will not be a sector number following the compressed data. Sequential flag: This flag indicates whether or not the sector packet contains a sector number. If this flag has the value 00, a sector number will follow the sector data. If it has the value 80, there will not be a sector number following the sector data, and the next sector is the next sequential sector. Content type: The high order bit of this byte is the sequential flag. The remaining low order bits are the compression type. Sector number: An unsigned sector number, which is two bytes. The first byte is the low order portion of the number, the second byte is the high order portion of the number. Normally ranging from 1 to 9999 decimal. Pass number: A sequence number assigned to each pass. Normally ranging from 1 to 31 decimal. This might roll over to zero after 31. End of pass: The value hex 45. Compression type: One of the following hex values: 41, 42, 43, 44, 46 or 47. The meaning of these values is described below. Type 41, modify begin. The compression is relative to the previous sector. The sector data contains only the beginning portion. The last portion is not changed. The first byte of the sector data specifies at what offset to start modifying the sector. The remaining bytes of the sector data are used to modify the beginning portion of the sector. This modification takes place starting at the byte at the start offset, working towards the beginning of the sector, up to and including the byte at offset zero, the first byte of the sector. This implies that the data bytes are stored in a reverse order in the sector data. Type 42, 128 byte DOS sector. This is an obsolete compression type, that was used by early versions of Diskcomm. Earlier versions of Diskcomm supported only single density diskettes, so this type of sector always represents 128 bytes. Programs that decode archives should be aware of this. Using it for creating new archives is not recommended. The sector data contains five bytes. The first byte of the sector data is used to initialize the first 124 bytes of the sector. The remaining four bytes are stored in the last four bytes of the sector. Type 43, compressed sector. The sector data contains substrings. These substrings alternate between uncompressed and compressed, starting with an uncompressed substring. Each of these substrings starts with a byte that specifies the ending offset of the resulting data in the sector. When this ending offset position is reached, the end of the substring is reached, and the byte at this ending offset is the starting position for the next substring. The starting position for the first substring is at offset zero. An uncompressed substring will contain as many bytes as are needed to fill the sector from the start position up to, but not including the end offset. For uncompressed substrings, if the starting position offset is equal to the ending offset, there is no further data, so in effect, this is a null string. This is used when there are two portions of data within the sector that can be compressed, without other data in between these portions. The uncompressed substring must be present, therefore a null string must be used in this case. Compressed substrings are always two bytes in length. The compressed substring starts with a byte that indicates the ending offset. The second byte contains the fill character. The portion of the sector starting at the start offset, up to, but not including the ending offset, is set to the value of this fill character. After the compressed substring, another uncompressed substring follows. For double density disks, the ending offset for the last substring is 256. Since there is only one byte to represent the ending offset, this is stored as zero. However, zero is an offset that can be used for the first uncompressed string, to indicate that the first uncompressed string is a null string. The end of this type of compressed sector is reached when all bytes in the sector have been processed. This can occur at the end of an uncompressed substring. In this case, there will not be a compressed substring following the uncompressed string. Likewise, if it occurs at the end of a compressed substring, there will not be an uncompressed string following it. Type 44, modify end. The compression is relative to the previous sector. The sector data contains only the ending portion. The beginning portion is not changed. The first byte of the sector data specifies at what offset to start modifying the sector. The remaining bytes of the sector data are used to modify the ending portion of the sector. This modification takes place starting at the byte at the start offset, up to, and including the last byte of the sector. Type 45, end of pass. This compression type indicates the end of a pass, so it is not a real compression type. There is no sector data for this type. For a multi file archive, this indicates the end of the file. The archive is continued in the next file, unless this pass was the last pass. For single file archives, this indicates that the next pass follows within this file, unless this was the last pass. The next pass starts with a header again, followed by a sector number. Type 46, same as before. This compression type indicates that the data for this sector is identical to the data of the previous non-zero sector. There is no sector data for this type. Type 47, uncompressed sector. The sector data contains the number of bytes required to fill an entire sector, either 128 or 256 bytes. No compression of any kind is performed on this sector type. Previous sector. The buffer that holds the contents of the previous non-zero sector is initialized at the start of a pass if the archive is a multi file archive. For single file archives, this buffer is cleared at the start of the first pass only. Known bugs and anomalies. This specification induces some anomalies. When the last sector in a pass has the flag set that indicates that a sector number must follow it, this sector number has no meaning, since the next pass will always start out with a sector number. Diskcomm might not have the next sector available. Therefore, it cannot always determine whether or not it is an empty sector. Since a sector number must be included once we set this flag, a fake sector number is appended. The value hex 0045 is used for this. This is also true for the last pass. Note that this is stored in the low/high byte order. Diskcomm processes sectors in chunks of 18 sectors, or chunks of 9 sectors if the disk is double density. These chunks might include empty sectors. The last sector in these chunks will always be followed by a sector number, since Diskcomm does not read ahead to determine the contents of the next sector. This is not a requirement. On creating an archive, Diskcomm just happens to do this. So a sector number might be included even if the next sector is non-zero. It looks like Diskcomm has some slight problems. Double density sectors are 256 bytes long. If the buffer contains hex 5EFF bytes, and the sector cannot be compressed, and a sector number must be included, we must add 259 bytes to the buffer. To mark the end of pass, we have to add either one hex 45 byte, or hex 45 00 45. This might add up to three extra bytes. The pass would be hex 6003 bytes long. This makes the pass longer than hex 6002 bytes. On reading, this is also a problem. Diskcomm will not store the first two bytes, since the two header bytes are read and processed first. Then it tries to read hex 6000 bytes. Within these hex 6000 bytes, the end of pass code must be included. This will be missing though, so Diskcomm will not be able to process the file. This problem only occurs with double density disks in the specified exceptional conditions. When a pass contains exactly hex 6002 bytes, Diskcomm will terminate processing after this pass. Therefore, passes should be less than hex 6002 in length. This can only occur with archives of double density disks. For unknown reasons, the passes above pass number 31 have their pass number reduced by one. Only the five low order bits are stored. For multi-file archives, a selected character of the filename is incremented for each pass. This will eventually cause an invalid character to be used in the filename, depending on the restrictions imposed by the DOS used. Send comments to: Ernest R. Schreurs Kempenlandstraat 8 5211 VN Den Bosch The Netherlands ernest@wxs.nl