Indiana University of Pennsylvania
Computer Science Department
CO 110 Spring 1984
McKelvey and Wolfe
Programming Project #6
For project 6, you are to write a program to process survey
data from two files, to produce several frequency reports on this
data, to detect errors in the data, to write an error record
file, and to list the contents of the error file.
A two-part survey has been conducted on the same group of
people. The first part of the survey involved movie attendance;
the results are contained in file 110-FILMS.COMPSCI. The records
in this file contain the following data.
Columns Data
------- -------------------------
1-4 Survey form number
6-7 Age (a 2-digit number)
9 Sex (M or F)
11 Number of movies seen per month (0 to 9), 9 is
used if more than 9 per month
13-27 Favorite type of movie (Adventure, Mystery,
Comedy, Romance, Science Fiction, Historical,
Musical, or Documentary)
The second part of the survey involved use of the media; the
results are contained in file 110-MEDIA.COMPSCI. The records in
this file contain the following data.
Columns Data
------- --------------------------------
1-4 Survey form number
6-7 Number of hours per day that the TV is on
9 Number of hours per day that the radio is on
12 Number of magazines subscribed to
14 Reads a daily newspaper (Y or N)
16 Reads a Sunday newspaper (Y or N)
Most people responded to the movie part of the survey; fewer
responded to the media part of the survey; some responded to only
the media part. Both files have the data already sorted in
ascending order on the survey form number.
The data in the 110-FILMS.COMPSCI file has not been reliably
entered. Some of the records have erroneous characters in the
age, sex, and number-of-movies-seen fields. In addition, the
favorite movie type field may have various spelling errors.
However, the survey form number is guaranteed to be correct.
Your program is to read the 110-FILMS.COMPSCI file and print
the following frequency reports.
1. A report that shows the number of people in each of the
following age groups: 12 and under, 13 to 18, 19 to 25, 26 to 35,
36 to 50, and 51 and over. Also, print a line that shows the
average age of the people surveyed. (See sample below)
2. A report that shows the number of people who see 0, 1, 2, ...
9 or more movies per month. Also, print a line that shows the
average number of movies seen per month by the people surveyed.
3. A report that shows the number of males whose favorite movie
type is Adventure, Mystery, etc.; a report that shows the number
of females whose favorite movie type is Adventure, Mystery, etc;
and a combined report for both sexes that shows numbers for
favorite movie type.
Following is a sample of the report form to be used. It shows
how the first report might be shown.
Age Distribution
0-12 13-18 19-25 26-35 36-50 51-99 Total
xxx xxx xxx xxx xxx xxx xxxx
The average age is xx.xx
If a record has any error, the data on that record must NOT
be counted in ANY of the reports. Also, when an error is found,
the program must write an entry in the error file, SURVEY-ERRORS.
Before writing into your error file, your program must check the
110-MEDIA.COMPSCI file to see if the same survey form whose data
is in error was also returned in the media survey (see suggestion
#2). When an error occurs, all information associated with that
particular survey form must be written to the error file. If
there is no entry in 110-MEDIA.COMPSCI, only the survey form
number, age, sex, number of movies seen, and favorite type are
written in the error file. If there is an entry in the media
file, the number of TV hours, number of radio hours, number of
magazines, and the newspaper indicators must also be written to
the SURVEY-ERRORS file.
After reading all records in the 110-FILMS.COMPSCI file,
your program should print the reports and close both survey files
and the error file. Then, the program should read the error file
and print out each error record in the following form.
Records with Errors
Survey# Age Sex Favorite TV hrs Radio Mags Daily Sunday
xxxx xx x xxx. . . x xx x x x x
SUGGESTIONS
1. Use arrays to keep track of the frequency counts for each of
the reports.
2. Plan on reading each file only once. Both files are sorted
in survey form number order. By careful reading, you can match
the corresponding parts of a survey form, rather than search for
the media part after an error has been found in the movie part.
3. Make a subprogram to check for invalid (non-numeric)
characters. You can use this subprogram to check the age and
number-of-movies-seen fields, if you use an internal file for the
reading. Also, you could use a subprogram to match the records
in the two data files and to write the error file.
4. Read the 110-MEDIA.COMPSCI file using character variables.
This prevents you from getting any errors because of invalid
characters in this file. You should also write and read the
error file, SURVEY-ERRORS, using character variables to be sure
the values written and read reflect the contents of the survey
files.
5. Do not use unit numbers for the files that are between 0 and
9 or between 100 and 109. Some of these numbers have specific
system meaning.
6. When your program opens the data files, 110-FILMS.COMPSCI and
110-MEDIA.COMPSCI, specify USAGE='INPUT,SHARED' so that many
students can use these files at the same time.
7. You have some freedom in terms of the form of the reports and
the error listing; however, ALL values that are printed must be
identified (annotated), such as in the sample report form. Try
to make the report as readable as possible.
REQUIREMENTS
1. You must hand in a program listing and output from a batch
run of your program. The report output must begin at the top of
a new page; the listing of error records must be on a separate
page.
2. Documentation within the program must be of the same form as
in previous programs.
3. Be sure your work is your own. This program will be
carefully scrutinized for plagiarism and collaboration; heavy
penalties will be assessed on offenders.