Project #5

The use of hash tables is a standard method of storing records so that requested records can be retrieved very quickly. Hash tables rely upon the use of a hashing function to work properly. Such a function is computed based on one or more of the values contained within the record. The purpose of this program is to evaluate the relative benefits of using two different hashing functions for the records in a particular file.

To help you with this program, two procedures have been written for your use.

1) LOAD is a Pascal procedure that reads all records from a file (PROJECT:5DATA.DAT) into an array. Each record consists of a 24-character name (encrypted for security), a longword key value, a 5-character date, and a word (2-byte) statistic value. To call LOAD, you must provide two arguments: the address of the array to hold the records and the address of a longword that holds the number of records in the array. In declaring storage for the array of records, you should allow for up to 1000 records.

2) EVAL is a FORTRAN subroutine that performs a frequency count on an array of hash values. To call EVAL, you must provide three arguments: the address of an array of hash values (each one byte long), the address of a longword holding the number of hash values, and the address of an array of frequency counts (each entry one byte long). The array of hash values may have up to 1000 entries; the array of frequency counts has 67 entries.

Your main assembly language program should be sequenced as follows.

1. Call LOAD to read in the entire collection of records and determine how many records there are (let's call this COUNT).

2. Use an assembly language procedure named MID (that you must write) to compute an array of hash values. While computing these hash values, display the the first ten key values and their associated hash values; the display should have appropriate headings.

3. Call EVAL to perform a frequency count on the hash values produced by MID.

4. Display a list of bad hash values - these are values that have a frequency count that is less than the lower bound, (COUNT / 134), or greater than the upper bound ((COUNT * 3) / 134). The display should have appropriate headings and should show the hash value and its frequency.

5. Use an assembly language procedure named SUM (that you must write) to compute a second array of hash values. While computing the hash values, display the first ten names and their associated hash values; the display should have appropriate headings.

6. Call EVAL to perform a frequency count on the hash values produced by SUM.

7. Display a list of the bad hash values in this second array; the criteria for "bad" is the same; and the display form is the same as for the previous display.

The two assembly language procedures that you must write are each designed to perform a hashing function. All hashing functions are designed to be applied to a field of one record and produce one hash value. MID and SUM simply use different portions of the record and compute in different ways. You will need to set up loops to call MID and SUM so that they may be applied to each record in the array of records.

performs what is known as a mid-square hash. It takes two arguments: the address of a longword key (in a record) and the address of a 1-byte hash value (in a hash value array). MID should compute the hash value in the following manner. First, calculate the square of the key, ignoring any overflow. Then, taking only the two middle bytes of the square (which is a longword), compute the absolute value (this will be a word value). Finally, find the remainder when the absolute value is divided by 67. This remainder will fit into one byte; it is the hash value.

performs what is known as lexical summation. It takes two arguments: the address of a 24-byte name (in a record) and the address of a 1-byte hash value (in a hash value array). SUM should compute the hash value in the following manner. Treat the name as an array of 24 one-byte numbers. Sum the 24 name bytes, ignoring any overflow. Compute the absolute value of this sum (this will be a byte value). Finally, find the remainder when the absolute value is divided by 67. This remainder is the hash value.

In terms of comparing the MID and SUM hashing functions, whichever one has the fewest bad has values is the better hashing function. You can find EVAL in the files EVAL.FOR and EVAL.OBJ in the PROJECT: directory. You can find LOAD in the files LOAD.PAS and LOAD.OBJ in the PROJECT: directory, as well.

Hand in a .LIS listing of the main program, a .LIS listing of MID and SUM (I recommend that you put both procedures in one file), and a CARBONCOPY printout that captures the execution of the program.

Program #5 (Due 17 November 1994)