Skip navigation.
Home

Project 3 - C Heap Library


  • Due Tuesday July 21, 11:59 pm
    Submit the proj3.tar.gz file electronically on the engr submission site.
  • You must use the Unix system calls open, close, lseek, write, and read.
    You must NOT use their analogous procedure calls from the standard I/O library (fopen, fclose, fseek, fwrite, fread)
  • Comment your code well enough that you will understand it 6 months from now.

This assignment asks you to write a heap library using Unix system I/O. The purpose is to get you familiar with reading and writing to files with system I/O, file position, atomic operations, file locking, libraries, and the ideas behind standard I/O file pointers. You will write the library in C. You need to know how to work with strings, malloc and free, pointers, and structures in C; all things you should have learned in C Programming!

A test script will be provided on the class web page. Instructions on how to run the test will be included at the top of that file.

1 Overview

Project 3 defines a record as an int key, char[] description, and an int order. You will be storing records in a heap data structure according to the value of key. The root node record of the heap will contain the record with the lowest key. The int order indicates the order in which this record was added to the heap. If this was the first record ever added, order = 1. If this record was added after 85 other records, then order = 86. The int order does not change even if records with a lower order are removed from the heap. The char[] description is a string that contains the data of the record.

Heaps should be a familiar data structure to you so I won't go into the definition of a heap. One way to implement a heap is with an array. The root node record will have index 0. A record at index n will have a parent at index (n-1)/2. If a record at index n has a left child, it will be at index 2n + 1. If it also has a right child, it will be at index 2n + 2.

Your heap will be contained in a file on disk. You will never read in the whole file and manipulate it. Instead, when you need to add a record to the heap, you will write your new record at the end of the array and bubble it up to maintain the heap property. You will have to keep track somehow of the file offset of the last leaf node so that you will know where to first write a new record. When you remove a record from the heap, you will always remove the record at the root node position (index 0). Then you will move the record at the last leaf node position into the root node location and bubble it down to maintain the heap property. You will only have a couple records in memory at any given time. The structure of the heap is maintained on disk. You will know the index of a record by its file offset.

2 Specifications

Your heap will handle fixed sized records. Each record contains the int key, char[] description, and int order. The size of the char[] description is specified by the user when they create the heap.

The description is a C string, terminated by a 0 byte. The terminating 0 byte is counted in the size of the description, so if a user specifies a description size of 20 they can use only 19 characters for the description; the terminating 0 takes up the 20th byte. Descriptions can be shorter than the description size specified, but they always take up the full description size bytes on disk. Therefore, all the records in a given heap have the same number of bytes.

You will have to write the header file, heap.h, and the code for 7 functions, described below. Here is the outline of the header file; you will need to complete this.

2.1 Header file (incomplete)

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>

#define HEAP_PATH_MAX 128

typedef struct {
       /* You determine what goes in here */
} HEAP_FILE;

/* Function Declarations */
int heap_create( char *file_name, int description_size );
HEAP_FILE *heap_open( char *file_name );
int heap_close( HEAP_FILE *h_fp );
int heap_add( HEAP_FILE *h_fp, int key, char *desc );
int heap_remove_min( HEAP_FILE *h_fp, int *key, char *desc, int *order );
int heap_empty( HEAP_FILE *h_fp );
int heap_desc_size( HEAP_FILE *h_fp );

Note that this file specifies a maximum size for a path name. This is meant to help make your code a little simpler. If you need a string to hold a pathname you can use an array of size HEAP_PATH_MAX and be sure it is big enough; you do not have to use malloc for this.

2.2 int heap_create( char *file_name, int description_size )
This function creates the heap file file_name where the size of the description is as specified. If the file_name already exists, the create function should not change it, and it should return -1 to indicate an error. The length of the file name (plus the .LCK extension described later) must be less than HEAP_PATH_MAX; if a longer file name is given it is an error and -1 is returned. If there is any other error, heap_create should also return -1. If the create is successful, it should return 0.

2.3 HEAP_FILE *heap_open( char *file_name )
This function opens the heap file file_name. It allocates a HEAP_FILE structure and returns a pointer to it. The structure stores information about the open file. Just what information you need to save in this structure is up to you to figure out. The pointer to the HEAP_FILE structure will be used in subsequent adds, removes, and a final close. Multiple files can be open at once and even the same file can be opened more than once at the same time; this pointer distinguishes between the open files.

If there is any error heap_open will return a NULL pointer.

2.4 int heap_close( HEAP_FILE *h_fp )
This function closes the heap file specified by h_fp. It returns 0 if successful and -1 for any error.

2.5 int heap_add( HEAP_FILE *h_fp, int key, char *desc )
This function writes out a record with the given key and desc to the open heap file specified by h_fp. The heap structure must be maintained on disk.

Writes are atomic. That is, if two processes are trying to write to the heap at the same time, one process will complete the write before the other process begins. See the section below on atomic writes below for more information on this.

If the write is successful 0 is returned. If there is any error, -1 is returned.

2.6 int heap_remove_min( HEAP_FILE *h_fp, int *key, char *desc, int *order )
This function reads in the record with the minimum key from the open heap file specified by h_fp. The description for that record is put in the memory pointed to by desc. The size of the description that is read is determined by the description size used to create the heap.

The record is also removed from the heap on disk and the heap structure is maintained.

If the read is successful 0 is returned. If the heap is empty, a 1 is returned. If there is any other error, -1 is returned.

2.7 int heap_empty( HEAP_FILE *h_fp )
This function determines if the open heap file specified by h_fp is empty or not. If it is empty, a 1 is returned. If it is not empty, a 0 is returned. If there is any error, -1 is returned.

2.8 int heap_desc_size( HEAP_FILE *h_fp )
Returns the description size used when the heap was created.

3 Atomic Writes

It is very important that adds to your heap are atomic. Here is an example of what could go wrong when two processes are adding a record to the same file. Say process 1 and process 2 both want to add records at the same time. So they both need to initially write their records at the end of the array. Let's say that Process 1 finds out that the end of the array is at file offset 1000 and then is swapped out of the CPU. Process 2 is swapped into the CPU and finds out that the end of the array is at file offset 1000 and writes its record there. Process 1 eventually gets back into the CPU and picks up where it left off, which was going to file offset 1000 and writing a record. A record has been overwritten.

You need to be sure this, and similar problems, do not happen. You will do this by making adds atomic; that is, one add must finish completely before the next starts, even when the adds are done by two different processes.

This will be implemented by creating a lock file using the system call open with the O_CREAT and O_EXCL flags. If a process wants to add to the heap, it must first create the lock file. If it succeeds, then it has the lock and it can add. If it fails, then some other process must have created the lock file first, so this process must wait and try again later to see if it can create the lock file. See your text for information on this; we will also talk about this in class. You will not write anything inside the lock file. The existence of the lock file is what is important. When the process is finished in the critical section, it removes the lock file. Use the remove( file_name ) command.

There will be a lock file for each heap. The lock file should use the same pathname as the heap file itself, with the extension .LCK added at the end. So the heap file /users/student/test/mydata will use a lock file named /users/student/test/mydata.LCK.

This locking is only used internally in your add routine; the users of your library cannot do any explicit locking themselves.

In creating the lock file pathname, you can assume that no pathname will be longer than HEAP_PATH_MAX. If the constructed pathname of the lock file would be longer, you can return an error.

Do you need to use file locking for remove_min or any other function?

4 Building the Library

You should compile the code for your library on Linux using the gcc command.

You must put each function in a separate file: heapcreate.c, heapopen.c, heapclose.c, heapdescsize.c, heapadd.c, heapremovemin.c, and heapempty.c. Your header file should be named heap.h. (If you use these names exactly for your C files the makefile we develop in class will work for your project)

A library is a collection of the compiled C functions that other programs can use. The linker, which is called automatically when you run cc or gcc, pulls the needed code out of the library and puts it with the main code to make a complete program. You can compile but not link the library because, by itself, if is not a complete program. To compile a C file without linking, use the commands:
gcc -c heapcreate.c
and so on.

Then use ar to build the library with the command:
ar -r libheap.a heapcreate.o heapopen.o ....
This puts all the compiled functions together in the library libheap.a ready to be linked with a main program. See man ar for info on the ar command.

To test your library you will have to write a main program, say heaptest.c. Then it can be compiled and linked with the heap library with the command:
gcc -o heaptest heaptest.c libheap.a

By putting the code for each function in a separate file, the compiler and linker will only add those functions from that library that are actually used in your program. So if heaptest.c only adds to the heap and never calls heap_remove_min, the code for heap_remove_min would not be included in the executable file heaptest.

You will turn in one file, a compressed tar ball named proj3.tar.gz of all your source code generated by the command:
tar -czf proj3.tar.gz heap.h heapempty.c heapremovemin.c heapadd.c heapopen.c heapcreate.c heapclose.c heapdescsize.c
Notice that heaptest.c is NOT in the tarball. We will use our own code with your library.

5 Example

Here is an example of code that uses your library.

5.1 Library Usage
If this is called heaptest.c, I would compile and link it with the command

gcc -o heaptest heaptest.c libheap.a

or I would put this command in a makefile. Either way an executable file named heaptest is created.

#include <stdio.h>
#include <stdlib.h>
#include "heap.h"
int main(int argc, char *argv[]) {
     
   /* this program needs a parameter which is used to name the heap file */ 
   if(argc == 2) {
      heap_create(argv[1], 10); // descriptions have to be < 10 characters
      HEAP_FILE *heap_fp;
      heap_fp = heap_open(argv[1]);
      char buf[10];
      memset(buf, ' ', 10); // fill buf with 10 ' ' characters
      strcpy(buf, "Larry 50"); // copy this string into buf
      heap_add(heap_fp, 50, buf);
   
      memset(buf, ' ', 10);
      strcpy(buf, "Ken 30");
      heap_add(heap_fp, 30, buf);
   
      memset(buf, ' ', 10);
      strcpy(buf, "Sue 8");
      heap_add(heap_fp, 8, buf);
      memset(buf, ' ', 10);
      strcpy(buf, "Mary 3");
      heap_add(heap_fp, 3, buf);
      memset(buf, ' ', 10); 
      strcpy(buf, "Mom 40"); 
      heap_add(heap_fp, 40, buf);
      memset(buf, ' ', 10); 
      strcpy(buf, "Jim 5"); 
      heap_add(heap_fp, 5, buf);
      int key, order;
      heap_remove_min(heap_fp, &key, buf, &order);
      printf("minimum was %d, %s, %d\n", key, buf, order);
      memset(buf, ' ', 10);
      strcpy(buf, "Alice 200");
      heap_add(heap_fp, 200, buf);
      heap_remove_min(heap_fp, &key, buf, &order);
      printf("minimum was %d, %s, %d\n", key, buf, order);
   
      heap_remove_min(heap_fp, &key, buf, &order);
      printf("minimum was %d, %s, %d\n", key, buf, order);
 
      heap_close(heap_fp);
   }
   return(EXIT_SUCCESS);
}
  

5.2 Output
When the code above is run, the output should look like this:

flip$ heaptest afile
minimum was 3, Mary 3, 4
minimum was 5, Jim 5, 6
minimum was 8, Sue 8, 3

flip$

od -Ad -c afile

<-

this command allows you to see what is in afile in character format.

 

   0000000 Things that you put into your HEAP_FILE structure will be written here. 
   0000016 The size of my HEAP_FILE structure is 144. So, the root node starts at 144.
   *       My records have size 20.
   0000128  \n  \0  \0  \0 003  \0  \0  \0 004  \0  \0  \0  \a  \0  \0  \0
   0000144   K   e   n       3   0  \0   0  \0  \0  \0  \0 036  \0  \0  \0
   0000160 002  \0  \0  \0   M   o   m       4   0  \0   0  \0  \0  \0  \0
   0000176   (  \0  \0  \0 005  \0  \0  \0   A   l   i   c   e       2   0
   0000192   0  \0  \0  \0 310  \0  \0  \0  \a  \0  \0  \0   L   a   r   r
   0000208   y       5   0  \0  \0  \0  \0   2  \0  \0  \0 001  \0  \0  \0
   0000224
 

flip$ od -Ad -d afile <- this command allows you to see what is in afile in decimal format.

   0000000 Things that you put into your HEAP_FILE structure will be written here.
   0000016 The size of my HEAP_FILE structure is 144. So, the root node starts at 144.
   *       My records have size 20.
   0000128    10     0     3     0     4     0     7     0
   0000144 25931  8302 12339 12288     0     0    30     0
   0000160     2     0 28493  8301 12340 12288     0     0
   0000176    40     0     5     0 27713 25449  8293 12338
   0000192    48     0   200     0     7     0 24908 29298
   0000208  8313 12341     0     0    50     0     1     0
   0000224
 

6 makefile

Please note that the testscript requires that your code is inside a tarball. The directions for making this tarball are at the top of the testscript. As you troubleshoot your code, you must recreate your tarball everytime. My testscript excutes the code in the tarball. It would be very wise to use a makefile that updates your tarball everytime you change any source code. Here is a makefile you can use/modify.
cp /nfs/stak/a2/classes/eecs/summer2009/cs311/public_html/files/projects/project3/makefile .

7 Test script and Grading Criteria

Test script:

cp /nfs/stak/a2/classes/eecs/summer2009/cs311/public_html/files/projects/project3/TestFiles/testScript .

Grading Criteria:

cp /nfs/stak/a2/classes/eecs/summer2009/cs311/public_html/files/projects/project3/gradingCriteria.xls .