Extra Credit Project
(Due 11 December 2006
- NO Late submissions will be accepted)
We have twice dealt with the search for multiple credit card transactions in the 15-minute collection of transactions in card-events.txt. We need to consider one (or two) other alternative(s) to finding the multiple transactions quickly. The first part of this project is to use a HashMap to try to find transactions with the same credit card number. If the card number is used as the key for storing transactions, then when an attempt to store a second transaction with the same key is made, a non-null value is returned by the put method. In this way, multiple transactions can be recognized.
A HashMap does not sort the data. Instead, it stores the data in such a way that subsequent retrieval is very fast (O(1) on average). Thus, the HashMap holds the promise of being even faster than either the radix sort or the treesort. Make a program that finds the multiple transactions based on using a HashMap and time the action in the same manner* as for project #6. Note: when instantiating the HashMap, make sure you specify enough table entries and a load threshold so that the rehash() method is NOT needed.
Hand in a small table showing the times to find and report the multiple transactions for all three approaches: radix sort, tree sort, and hashmap. The times should show the processing of the data in the file card-events.txt and the data in the file 30minutes.txt which contains a 30-minute collection of transactions. Note whether or not you have included file reading times in the table values. I would expect the table to be typed by you based on the timed executions of the three programs. Also, hand in a well documented program which performs the HashMap approach - copy this program to a folder named extra in your folder on the P: drive for COSC 310. You can obtain the two data files from I:\jlwolfe\310
*If you can separate the file reading time from the time needed to enter transactions into the HashMap and note the repeats, you will have a more accurate comparison for the three approaches.
Below is a table showing the execution
times that I got with various algorithms. All times are in seconds
and do not include file read times. I was reading the files
from my thumb drive; this took about 4 seconds for card-events.txt and
about 10 seconds for 30minutes.txt. The times you get will vary because
of your implementation of the algorithms, the machine that you are using,
and the exact points where you get the time. Other factors may come
into play as well, such as the size and load threshold of the hash map,
the number of buckets for chaining, and more.
|
|
Project 5 |
Project 6 |
|
(long key) |
(String key) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Incidentally, if a simple O(N2)
sort had been used to order the transactions and then search them for the
repeated credit card numbers, processing card-events would take about 97
seconds and processing 30minutes would take about 606 seconds. These
are estimates based on sample runs.
Extra Extra Credit
(Also, Due 11 Dec 2006,
with NO exceptions)
The HashMap class uses an open addressing approach to storing the items. In the book, the authors provide several classes used to implement a chaining approach for using hash table. Make a program that uses the book's classes for chaining to form a hash table that finds the multiple transactions and time the action in the same manner as for the other programs. In this program, you MUST replace the use of the hashCode method with a method of your own devising to determine the bucket to access for storing the transactions. You may set the number of buckets and the load threshold to whatever you think is reasonable; however, you still should avoid rehashing. The hashCode() method is used in both the get and put methods of HashtableChain.
The classes provided by the book are HashtableChain and KWHashMap. You can get them from the I: drive ( I:\jlwolfe\310 ). You will find that the rehash() method is missing; you should create a stub for it and make sure it is not invoked.
Add entries to the table mentioned above
for this chaining approach to the hash table. My sample table shows
int the last column the results that I got with a very simple replacement
for hashCode(). Hand in a well documented program that performs this
task.
Note: If you choose to modify
the programs to accept keyboard input of the file name, make sure you do
not include the prompting and user input time in the execution times for
the actions.