Indiana University of Pennsylvania
Computer Science Department
CO 110  Spring 1984
McKelvey and Wolfe

     Programming Project #6

    For project 6, you are to write a program to process survey data from two files, to produce several frequency reports on this data, to detect errors in the data, to write an error record file, and to list the contents of the error file.

    A two-part survey has been conducted on the same group of people.  The first part of the survey involved movie attendance; the results are contained in file 110-FILMS.COMPSCI.  The records in this file contain the following data.

    Columns   Data
    -------   -------------------------
     1-4      Survey form number
     6-7      Age (a 2-digit number)
       9      Sex (M or F)
      11      Number of movies seen per month (0 to 9), 9 is
                used if more than 9 per month
    13-27     Favorite type of movie (Adventure, Mystery,
                Comedy, Romance, Science Fiction, Historical,
                Musical, or Documentary)

The second part of the survey involved use of the media; the results are contained in file 110-MEDIA.COMPSCI.  The records in this file contain the following data.

    Columns   Data
    -------   --------------------------------
      1-4     Survey form number
      6-7     Number of hours per day that the TV is on
        9     Number of hours per day that the radio is on
       12     Number of magazines subscribed to
       14     Reads a daily newspaper (Y or N)
       16     Reads a Sunday newspaper (Y or N)

Most people responded to the movie part of the survey; fewer responded to the media part of the survey; some responded to only the media part.  Both files have the data already sorted in ascending order on the survey form number.

    The data in the 110-FILMS.COMPSCI file has not been reliably entered.  Some of the records have erroneous characters in the age, sex, and number-of-movies-seen fields.  In addition, the favorite movie type field may have various spelling errors.   However, the survey form number is guaranteed to be correct.



    Your program is to read the 110-FILMS.COMPSCI file and print the following frequency reports.

1.  A report that shows the number of people in each of the following age groups: 12 and under, 13 to 18, 19 to 25, 26 to 35, 36 to 50, and 51 and over.  Also, print a line that shows the average age of the people surveyed.  (See sample below)

2.  A report that shows the number of people who see 0, 1, 2, ... 9 or more movies per month.  Also, print a line that shows the average number of movies seen per month by the people surveyed.

3.  A report that shows the number of males whose favorite movie type is Adventure, Mystery, etc.; a report that shows the number of females whose favorite movie type is Adventure, Mystery, etc;
and a combined report for both sexes that shows numbers for favorite movie type.

Following is a sample of the report form to be used.  It shows how the first report might be shown.

                     Age Distribution

   0-12    13-18    19-25    26-35    36-50    51-99   Total
    xxx      xxx      xxx      xxx      xxx      xxx    xxxx

   The average age is xx.xx

    If a record has any error, the data on that record must NOT be counted in ANY of the reports.  Also, when an error is found, the program must write an entry in the error file, SURVEY-ERRORS.   Before writing into your error file, your program must check the 110-MEDIA.COMPSCI file to see if the same survey form whose data is in error was also returned in the media survey (see suggestion #2).  When an error occurs, all information associated with that particular survey form must be written to the error file.  If there is no entry in 110-MEDIA.COMPSCI, only the survey form number, age, sex, number of movies seen, and favorite type are written in the error file.  If there is an entry in the media file, the number of TV hours, number of radio hours, number of magazines, and the newspaper indicators must also be written to the SURVEY-ERRORS file.

    After reading all records in the 110-FILMS.COMPSCI file, your program should print the reports and close both survey files and the error file.  Then, the program should read the error file and print out each error record in the following form.

                  Records with Errors

Survey#  Age  Sex  Favorite    TV hrs  Radio  Mags  Daily  Sunday
 xxxx    xx    x  xxx. . . x    xx      x      x     x       x
 

    SUGGESTIONS

1.  Use arrays to keep track of the frequency counts for each of the reports.

2.  Plan on reading each file only once.  Both files are sorted in survey form number order.  By careful reading, you can match the corresponding parts of a survey form, rather than search for the media part after an error has been found in the movie part.

3.  Make a subprogram to check for invalid (non-numeric) characters.  You can use this subprogram to check the age and number-of-movies-seen fields, if you use an internal file for the reading.  Also, you could use a subprogram to match the records in the two data files and to write the error file.

4.  Read the 110-MEDIA.COMPSCI file using character variables.   This prevents you from getting any errors because of invalid characters in this file.  You should also write and read the error file, SURVEY-ERRORS, using character variables to be sure the values written and read reflect the contents of the survey files.

5.  Do not use unit numbers for the files that are between 0 and 9 or between 100 and 109.  Some of these numbers have specific system meaning.

6.  When your program opens the data files, 110-FILMS.COMPSCI and 110-MEDIA.COMPSCI, specify USAGE='INPUT,SHARED' so that many students can use these files at the same time.

7.  You have some freedom in terms of the form of the reports and the error listing; however, ALL values that are printed must be identified (annotated), such as in the sample report form.  Try to make the report as readable as possible.


    REQUIREMENTS

1.  You must hand in a program listing and output from a batch run of your program.  The report output must begin at the top of a new page;  the listing of error records must be on a separate page.

2.  Documentation within the program must be of the same form as in previous programs.

3.  Be sure your work is your own.  This program will be carefully scrutinized for plagiarism and collaboration; heavy penalties will be assessed on offenders.