The use of hash tables is a standard method of storing records so that requested records can be retrieved very quickly. Hash tables rely upon the use of a hashing function to work properly. Such a function is computed based on one or more of the values contained within the record. The purpose of this program is to evaluate the relative benefits of using two different hashing functions for the records in a particular file.
To help you with this program, two procedures have been written for your use.
1) LOAD is a Pascal procedure that reads all records from a file (PROJECT:5DATA.DAT) into an array. Each record consists of a 24-character
name (encrypted for security), a longword key value, a 5-character date, and a word (2-byte) statistic value. To call LOAD,
you must provide two arguments: the address of the array to hold the records and the address of a longword that holds the
number of records in the array. In declaring storage for the array of records, you should allow for up to 1000 records.
2) EVAL is a FORTRAN subroutine that performs a frequency count on an array of hash values. To call EVAL, you must
provide three arguments: the address of an array of hash values (each one byte long), the address of a longword holding
the number of hash values, and the address of an array of frequency counts (each entry one byte long). The array of hash
values may have up to 1000 entries; the array of frequency counts has 67 entries.
Your main assembly language program should be sequenced as follows.
1. Call LOAD to read in the entire collection of records and determine how many records there are (let's call this COUNT).
2. Use an assembly language procedure named MID (that you must write) to compute an array of hash values. While computing these
hash values, display the the first ten key values and their associated hash values; the display should have appropriate headings.
3. Call EVAL to perform a frequency count on the hash values produced by MID.
4. Display a list of bad hash values - these are values that have a frequency count that is less than the lower bound, (COUNT / 134),
or greater than the upper bound ((COUNT * 3) / 134). The display should have appropriate headings and should show the hash value and its frequency.
5. Use an assembly language procedure named SUM (that you must write) to compute a second array of hash values. While computing
the hash values, display the first ten names and their associated hash values; the display should have appropriate headings.
6. Call EVAL to perform a frequency count on the hash values produced by SUM.
7. Display a list of the bad hash values in this second array; the criteria for "bad" is the same; and the display form is
the same as for the previous display.
The two assembly language procedures that you must write are each designed to perform a hashing function. All hashing functions are designed to be applied to a field of one record and produce one hash value. MID and SUM simply use different portions of the record and compute in different ways. You will need to set up loops to call MID and SUM so that they may be applied to each record in the array of records.
MID
In terms of comparing the MID and SUM hashing functions, whichever one has the fewest bad has values is the better hashing function. You can find EVAL in the files EVAL.FOR and EVAL.OBJ in the PROJECT: directory. You can find LOAD in the files LOAD.PAS and LOAD.OBJ in the PROJECT: directory, as well.
Hand in a .LIS listing of the main program, a .LIS listing of MID and SUM (I recommend that you
put both procedures in one file), and a CARBONCOPY printout that captures the execution of the program.