Gordon Cameron
presents a tutorial on bubble sorting with a unique program that
allows you to sort lists within word processing files
There are
many occasions when it would be useful to be able to sort data into
either numerical or alphabetical order. Some examples are Telephone
Directories, Book/Record lists, software lists etc. Many such files
are held on disk and these are of particular interest as they can be
easily manipulated. This article will deal with sorting a file into
either alphabetical or reverse alphabetical order.
There are two main ways of performing sorts of this kind on disk
files (which are usually text files). One is to do operations on the
file itself, reading it in a couple of lines at a time say,
performing an operation, writing the results to another file, and
repeating the procedure over and over but this involves a lot of
disk operations. One alternative would be to use a RAMDISK, where
the text files could be stored. This increases the speed
considerably and, obviously, eliminates disk operation but, despite
this, the method is cumbersome and clumsy involving multiple opening
and closing of files and only allowing sequential indexing of data.
SORT IN MEMORY FOR SPEED
The method I will use involves copying the source file into memory,
in this case into an array, and sorting the data in the array.
Finally, the resultant sorted array can be saved to an object file.
Before going on I had better explain some terms before I lose
anyone. The source file is the original unsorted file which you wish
to sort and the object file will be the file AFTER it has been
sorted. A text file is a file which consists of characters usually
produced by a word processor or text editor etc. and which can be
loaded using either the INPUT or GET commands, depending on the
format.
The source file need not be a word-processor file. If you are, for
example, writing a software list program, you may save your file
directly to disk using either PUT or PRINT. These files can just as
easily be sorted using the methods I describe below.
SORTING WORD PROCESSED TEXT
Firstly, we must read the file into the array. The program required
is shown in Listing 1. Line 10 dimensions the array FILE$ to hold
10000 characters. This may not be enough or it may be too much for
your needs so alter it as you like. 10000 is, however, a good round
number to be getting on with. The source text file may have been
saved with either PRINT or PUT. The program deals with this, and
reads the data into the string FILE$. This may take a while. The
program is fairly unique in that it can sort just part of a text
file so that lists forming part of a document can be sorted. This
facility is often missing from many commercial word processors.
It is important that you mark the beginning and end of the data that
you wish to be sorted. Mark the beginning with a `$' symbol followed
by RETURN on a separate line. Mark the end of the data with the 'up
arrow' symbol, again on a line by itself. The data in between these
two markers will be the data that is manipulated and sorted.
Anything before is stored in GARBST$, and anything after in GARBEND$.
This facility is provided as many word processors (e.g. Superscript,
Mini Office II) add their own control characters to the beginning
and/or end of your actual text. Using this method we retain these
special codes but they are obviously not to be sorted. They are
rewritten to the object file at the end of the session, so that the
object file will load back in to the word processor.
This
feature is very valuable as it means you can use a word processor to
enter data, the program here to sort the data, and the word
processor thereafter to inspect and adjust again if necessary. The
above feature is also useful if you only want to sort PART of a
file. Simply position the markers round the part you want sorted,
and the program will handle the rest. Again, you can alter the sizes
of GARBST$ and GARBEND$ at will.
The text to be sorted is stored sequentially in the array, with each
screen line of the file occupying 40 spaces in the array. Note this
limitation which means that data longer than 40 characters cannot be
sorted by this program. Once we have the data from the file in
memory we can access any line at will but how do we sort this data ?
DIFFERENT TYPES OF SORT
There are several sorting algorithms but the most common are the
Bubble sort, Shell sort and the Quick sort. For this article I will
look at the easiest of the three, the Bubble sort.
The best way to explain the sort is by way of a simple example. Only
6 'lines' of data will be used but the method can obviously be
extended to much larger files. Listing 2 will sort your data. This
program should be used in conjunction with the loader given in
Listing 1.
The data in our example is a list of software titles as follows:
Line 1 — Kennedy Approach
Line 2
— Pawn
Line 3 — Star Raiders
Line 4 — Phantom
Line 5 — Summer Games
Line 6
— Decathlon
The bubble
sort works through the list checking if the first line of data is
greater than (in this case, 'higher up' in the alphabet) the next
line. If it is, it swaps them otherwise it goes on to the next item.
If a reverse alphabetical sort is required then we simply check if
the first item is LESS than the second and again swap them if so.
This swapping continues down the 'list' of lines and when the end is
reached the last line will be correct (i.e. the last line will
contain the 'greatest' or highest in the alphabet text sequence).
The procedure is then repeated with each line being checked
excluding the last line as it is correct. This is repeated
continually until all the values are correctly placed. Note that the
first line is contained in FILE$(1, 40), the second in FILE$(41,
80), the third in FILE$(81, 120) and so on so each line in the list
must consist of a maximum of 40 characters only.
The Atari can work out with its inbuilt functions whether, for
example, 'Phantom' > 'Summer Games' or not, so you don't have to
bother with this. Your Atari does this part for you. Using the above
algorithm the data transforms in the following way:
Pass 1.1
Is Kennedy Approach (line 1) > Pawn (line 2)? No, so don't swap
Pass
1.2
Is Pawn (line 2) > Star Raiders (line 3)? No, so continue
Pass 1.3
Is Star Raiders (line 3) > Phantom (line 4)?
YES! So swap. Phantom which was line 4 now becomes line 3 and Star
Raiders becomes line 4
This continues with Star Raiders being compared with Summer Games —
no swap occurs. Summer Games is greater than Decathlon so a swap
occurs and the list after the first pass is as follows:
BEFORE
|
|
AFTER |
Line 1
|
Kennedy Approach |
Line 1 |
Line 2 |
Pawn
|
Line 2 |
Line 4 |
Phantom
|
Line 3 |
Line 3 |
Star Raiders |
Line 4 |
Line 6
|
Decathlon |
Line 5 |
Line 5 |
Summer Games |
Line 6 |
Note that Summer Games has taken its proper position at the bottom
of the list, as it is the 'highest in the alphabet' so in the next
pass, we don't need to check this line with the previous one as we
KNOW that it is correct. Whereas before we had to make 5 comparisons
(1 with 2, 2 with 3, 3 with 4, 4 with 5 and 5 with 6), next time we
need only make 4 comparisons as we don't need to compare with the
last line.
In the next pass, only one swap occurs, that of Decathlon with Star
Raiders. In pass 3 there is again only one swap — Decathlon with
Phantom and in pass 4 only one swap again — Decathlon with Pawn. In
pass 5, Decathlon is swapped with Kennedy Approach and this yields
the final sorted sequence:
AT START
|
|
END |
Line 6 |
Decathlon |
Line 1 |
Line 1 |
Kennedy Approach |
Line 2 |
Line 2 |
Pawn
|
Line 3 |
Line 4 |
Phantom
|
Line 4 |
Line 3 |
Star Raiders |
Line 5 |
Line 5 |
Summer Games |
Line 6 |
And that's
the bubble sort for you!
SIMPLE BUT EFFECTIVE
The Bubble sort is very simple, but it can be very effective,
however there are drawbacks. Notice that, although Summer Games
reached its location almost at once, Decathlon took 5 swaps and all
5 Passes and therefore reached its destination very slowly. This
demonstrates a possible disadvantage of the Bubble Sort — although
only one item was out of place, it took many swaps to get it to its
destination. Imagine the list being hundreds of lines long with,
say, 'AAA' right at the bottom and all other lines in sorted order.
What a waste! If there were 500 lines, it would take 499 passes and
swaps just to get this line to its correct position!
Although an extreme example, this demonstrates the shortcomings of
the bubble sort algorithm but there are alternatives. In a later
article I will show you the Shell sort. Bye for now!
Listing
1
|
|
|
Listing 2
|
|
|