Parting Thoughts on C David Dagon ___________________________________________________________________________ =x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0. What? --------------------------- These lecture notes wrap up a discussion of C in cs1312. To emphasize the basics, the first section considers a simple Java program, and discusses how it can be converted to C. The subsequent sections discusses how to code a stack in C, and how to adapt this stack for use in a simple graph search program. 1. Review ---------------------------- By now, we've seen enough of the syntax of C that we should be able to code simple programs. To test this theory, let's convert a simple program from Java into C. We start with a very simple program to read in two numbers, and print out the greatest common divisor, using Euclid's formula. To review, Euclid's formula is expressed as: public int gcd(int n, int m) { int k; if(m==0) return(n); else { k = n%m; return(gcd(m,k)); } } A simple Java program that reads in two numbers and calculates their gcd appears as: import java.io.*; public class Euclid { public static int gcd(int n, int m) { int k; if(m==0) return(n); else { k = n%m; return(gcd(m,k)); } } public static void exitOnError(String strMessage){ System.err.println(strMessage); System.exit(1); } public static void main(String[] args){ BufferedReader br = new BufferedReader (new InputStreamReader(System.in)); String strFirst = null, strSecond = null; try { System.out.print("Please enter the first number: "); strFirst = br.readLine(); System.out.print("Please enter the second number: "); strSecond = br.readLine(); } catch (IOException e){ exitOnError("Cannot read input"); } int n = Integer.parseInt(strFirst); int m = Integer.parseInt(strSecond); System.out.println("The gcd of " + n + " and " + m + " is " + gcd(n,m)); } } // Euclid Notice that for convenience, we've made the gcd method static. Notice that we've also included very simple error handling: any problems with I/O are presumed fatal. How can this be converted to C? Let's look at individual parts of the Java program and see what must change. a. No Classes ------------------------- Perhaps the most significant difference is the lack of classes in C. Our Java program *must* encapsulate all the 'gcd' related behavior in a class. So it begins with: import java.io.*; public class Euclid { In C, we have no classes in which to place code. As a result, we merely declare methods, and these are called as appropriate. Notice that since the Java methods were static, we avoid having to simulate class instances in C. We might even think of C functions are being similar to "static" java methods, insofar as object instances are not needed for their invocation. b. Similar Methods ------------------------- The heart of the Java program is the static method, gcd(int, int): public static int gcd(int n, int m) { int k; if(m==0) return(n); else { k = n%m; return(gcd(m,k)); } } Just because we lack classes in C does not mean we should give up on abstraction. In C, we might write a method: int gcd(int n, int m){ int k; if(0==m) return(n); else { k = n%m; return(gcd(m,k)); } } We also have an error handling method to address. The Java program merely catches *any* exception, and exits. We can write a similar method in C: void error(const char* msg){ puts(msg); exit(1); } Note that we take in a "const char*" instead of a char*, since the messages we send in will be string literals. It's fairly common in C to create such a method. By centrally writing an exit method, one can close any open files, close any network connections, print any warnings, write to any logs, and otherwise gracefully exit. Coding the input routines in C shows another major difference between C and Java. While the Java program required the creation of a BufferedReader object, the C program has direct facilities to read input. The scanf function is used, and its return value checked: int main (void) { int n, m, ret; printf("Please enter the first number: "); ret = scanf("%d", &n); if (ret==0 || ret == EOF) error("Cannot read input"); printf("Please enter the second number: "); ret = scanf("%d", &m); if (ret==0 || ret == EOF) error("Cannot read input"); printf("The gcd of %d and %d is %d\n", n, m, gcd(n,m)); } According to the scanf(3) man page, we see that a '0' is returned if there was input available, but could not be processed, and EOF is returned in the event of no input availability. In either case, the program calls a single method "error(char*)" with an appropriate message. The completed program appears as: /* euclid.c */ #include int gcd (int n, int m){ int k; if(m==0) return(n); else { k = n%m; return(gcd(m,k)); } } void error(const char* msg){ puts(msg); exit(1); } int main (void) { int n, m, ret; printf("Please enter the first number: "); ret = scanf("%d", &n); if (ret==0 || ret == EOF) error("Cannot read input"); printf("Please enter the second number: "); ret = scanf("%d", &m); if (ret==0 || ret == EOF) error("Cannot read input"); printf("The gcd of %d and %d is %d\n", n, m, gcd(n,m)); } c. Summary ---------------- This exercise has exposed the following points: 1) Since C and Java are similar languages (from the Algol family), simple programs in Java can be mapped into C programs. 2) A key difference is the lack of classes in C. For static (class) methods in Java, this does not present a tremendous problem. (Will will soon see that without classes, C requires the careful selection of data structures.) 3) In many ways, C's file processing, printing, and input mechanisms are easier to use. (A consideration not exposed in this lesson is that Java's I/O facilities are far more powerful, allowing for input/output in a variety of formats.) ___________________________________________________________________________ =x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. A Simple Stack in C ------------------------------------- Previous lectures have _thoroughly_ discussed how to create stacks in C. Let's consider how we might create a similar structure in C. a. A First Attempt ------------------------------------- A very simple approach is to create an array, and manage access to the array through functions that implement stack behavior. We could begin by declaring a large array of some size: int MAX = 25; ... int stack[MAX]; or even: #define MAX (25) int stack[MAX]; Recall that mere declaration of the int array is sufficient to create the array. In C, unlike Java, arrays are not objects. We will also need a variable to keep track of where we are in the stack: int counter = 0; Pushing a piece of data onto the stack is simple enough: void push (int x) { if (counter>MAX) { printf("Stack full.\n"); return; } stack[counter] = x; counter++; } The method merely takes in a data element, check if there's room left in the stack, and if so, add it to the array. (Aside: how to design for error handling in C is difficult, because of the lack of exception types. It might be a better idea to return a success-or-failure bit from the push method, to indicate what happened. That way, code attempting to push onto a full stack can recover. But this is simple example, and to keep the code easy to read, we merely print a message.) A pop operation is likewise simple: int pop(void){ counter--; if (counter<0) { printf("Stack underflow.\n"); return 0; } return stack[counter]; } All together, a simple stack program might appear as: /* stack.c */ #include #define MAX 25 int counter; int stack[MAX]; void push (int x) { if (counter>MAX) { printf("Stack full.\n"); return; } stack[counter] = x; counter++; } int pop(void){ counter--; if (counter<0) { printf("Stack underflow.\n"); return 0; } return stack[counter]; } int main(void){ int i = 0; counter = 0; for (i=0; i < MAX; i++){ push(i*2); } for (i=0; i < MAX; i++){ printf("Stack[%d] = %d\n", i, pop() ); } } When we run the program, we find that '0' is one of the values we've pushed onto the stack. Yet this is same value returned from the pop function on error. Just like the push operation, the pop checks for the validity of the stack size, and then either indicates an error (0) or returns a value. As before, error handling is a bit of a problem here. Sometimes, 0 is a valid entry on the stack. At other times, it's an error. How can we address this problem? b. Possible Solutions --------------------------- So calling the pop method might return an error, or it might return valid data. With the current design, we just don't know what a '0' means in our program. (And we are unwilling to impose limitations on our stack; perhaps we *want* to store a few 0 elements.) How can we work around this problem? One approach might be to consider what happens in Java. A typical pop method might reads as: public Object pop(); And of course, the calling method would check the return value: Object oTemp = myStackRef.pop(); if (oTemp != null) { // do something } else { // error handling } There might even be an Exception thrown from the method, such as: public Object pop() throws NoSuchElementException; But by looking at the example without the fancy exception handling we begin to get an idea how to handle our stack method. If the simple Java pop() method fails, it return null. We could do something similar in C, and return a pointer to a structure or array. That way, if an error occurs, we can return NULL instead of a proper pointer into the array. Aha! So a pointer will help us detect the problem. We rewrite out pop method to read: int* pop(void){ counter--; if (counter<0) { printf("Stack underflow.\n"); return NULL; } return &stack[counter]; } Note that there are only _three_ changes we've made. They include: a) changing the return value from int to int* b) changing the error return value to NULL from 0 c) changing the success return value to the _address_ of the array element, so that we match the int* return type. This allows us to store '0' elements in the array. Errors are indicated by returning NULL instead of 0. Let's review WHY we made these changes. First, the return value is changed to an int pointer so that we can send back valid data, including what would otherwise be a 'flag' variable. We can return all zeros if needed, and still not cause confusion. Second, the two return values are changed to NULL to indicate error, and the _address_ of the array element when a successful pop occurs. Using the address-of operator was necessary, so that we matched out promised return type. The code calling the pop method needs just a little change as well: int main(void){ int i = 0; int *val; counter = 0; for (i=0; i < MAX; i++){ push(i); } /* intentional underflow */ for (i=0; i < MAX+1; i++){ val = pop(); if (val==NULL) { printf("error detected.\n"); } else { printf("Stack[%d] = %d\n", i, *val ); } } } Note that we've declared a new variable, and int* called val, so we can point to the address returned from a call to pop. We also use some error handling in the pop function calls. If NULL is returned, we know there's been a mishap. Else, we print the data. Note that our 'for' loop now iterates through the size of the array plus one--this allows us to intentionally cause an underflow and test our error. All together, our new stack program might read: /* stack2.c */ #include #define MAX 25 int counter; int stack[MAX]; void push (int x) { if (counter>MAX) { printf("Stack full.\n"); return; } stack[counter] = x; counter++; } int* pop(void){ counter--; if (counter<0) { printf("Stack underflow.\n"); return NULL; } return &stack[counter]; } int main(void){ int i = 0; int *val; counter = 0; for (i=0; i < MAX; i++){ push(i); } for (i=0; i < MAX+1; i++){ val = pop(); if (val==NULL) { printf("error detected.\n"); } else { printf("Stack[%d] = %d\n", i, *val ); } } } As you study C in more classes, you will find that most libraries use a NULL return value to indicate error. (One common exception: those functions that return a count of values processed--such as bytes that were transferred in--will sometimes return -1, or some defined error value.) c. Summary ---------------------- This section has covered the following points: a) A stack is a stack. You can code it using dynamic or static blocks of memory. b) The return value from functions is difficult to manage. It's a good idea to indicate success/failure from a function call. c) Returning pointers allows you to indicate error by returning NULL. That way, other data won't be confused by "flag" values. d) This puts a premium on checking return values from function calls. For cs1312, you probably will not have problems if you neglect to check a few return values. But that would start a bad habit--one that is sure to burn you later. Get used to checking return values. ___________________________________________________________________________ =x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x%=x% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3. A Graph Search in C ------------------------------------- You're probably saying to yourself: "A graph search in C? That's nuts!" Well, we've done quite a bit in C already, and a graph search is not that much of a reach. To keep things simple, we'll create a DFS search that makes use of our stack class seen above. The particular graph problem we'll model will involve a set of nodes that represent cities. Here's a sketch of the graph: 800 Atlanta ----------- Miami | |\ | / \ | 600 85 / \ 925 | / \ | / \ Dallas ---- Tampa ----- Tallahassee 175 \ 250 / \ / \ 130 / 195 \ / Chicago ------- London 240 a. Graph Representation ----------------------------- First, let's select a means of representing the graph in memory. For a simple problem, it might be best to design a struct that holds the edge data. The struct could have a char array to represent the "from" and "to" cities, as well as a field for the distance. For reasons to be discussed later, we'll also have a flag in the struct called "backtrack". (This will prove useful during searching). With associated variables our declaration might appear as: struct _edge { char from[25]; char to[25]; int distance; int backtrack; /* used in searching */ }; struct _edge graph_edge[MAX]; int graph_size; We could fuss with the variable names, but this gets the idea across. Next, let's consider how we can populate this graph. It might be useful to hold this information in a file on disk. This could get complicated. To represent this graph in a file might be tricky; we don't want to select an encoding scheme (commas, semi-colons, etc.) that proves too difficult to unpack. (Other classes will teach you about 'scanning' input streams.) So, we'll just use a very simple technique of placing the "from" city on one line, the "to" city on the next, and the distance value on the third. We'll also simplify the reading process by placing a single int at the top of the file, indicating how many edges we can expect. So our file on disk appears as: 9 <----------- Means there are 9 edges total Atlanta Miami 800 Atlanta <---- from Atlanta Dallas <---- to Dallas 600 <---- 600 Dallas Tampa 175 Tampa Miami 85 Tampa Tallahassee 250 Tampa London 130 Miami Tallahassee 925 Tallahassee London 195 London Chicago 240 The arrow comments to the side are not part of the file, but appear above to indicate what individual items mean. This structure is merely a representation of the graph. Reading in this data is simple enough. We can open a file descriptor with: FILE *fp; fp = fopen("file_name", "r"); To read in the entire file, we just need a temporary variable for our counter, a 'global' value to hold the number of edges read in, and a FILE* reference. void read_data(const char* file){ int temp; FILE *fp; temp = 0; fp = fopen(file, "r"); if (NULL == fp) { printf ("Unable to read file: %s\n", file); exit(1); } fscanf(fp, "%d", &graph_size); for (temp=0; temp < graph_size; temp++){ fscanf(fp, "%24s", &graph_edge[temp].from); fscanf(fp, "%24s", &graph_edge[temp].to); fscanf(fp, "%d", &graph_edge[temp].distance); } fclose(fp); } Because we want to keep this example simple we don't do any elaborate error handling. We instead just assume the text file on disk was accurate. At this point, we could even code a quick and dirty "dump_graph" method to verify if we've read things in correctly. void dump_graph() { int temp; for(temp=0; temp < graph_size; temp++){ printf("Edge [%d]: from %s to %s, distance %d\n", temp, graph_edge[temp].from, graph_edge[temp].to, graph_edge[temp].distance); } } This merely iterates through the graph, and prints each field of the array. b. Testing -------------- At this point, it might be useful to test our program out, merely to see if we can populate the array and dump its contents. Our program at this point might appear as: #include #define MAX 100 #define FILE_NAME "graph_data.txt" struct _edge { char from[25]; char to[25]; int distance; int backtrack; /* used in searching */ }; struct _edge graph_edge[MAX]; int graph_size; void read_data(const char* file){ int temp; FILE *fp; temp = 0; fp = fopen(file, "r"); if (NULL == fp) { printf ("Unable to read file: %s\n", file); exit(1); } fscanf(fp, "%d", &graph_size); for (temp=0; temp < graph_size; temp++){ fscanf(fp, "%24s", &graph_edge[temp].from); fscanf(fp, "%24s", &graph_edge[temp].to); fscanf(fp, "%d", &graph_edge[temp].distance); } fclose(fp); } void dump_graph() { int temp; for(temp=0; temp < graph_size; temp++){ printf("Edge [%d]: from %s to %s, distance %d\n", temp, graph_edge[temp].from, graph_edge[temp].to, graph_edge[temp].distance); } } int main (void){ read_data(FILE_NAME); dump_graph(); } Testing shows that it works well enough, given the assumptions we've made about the input data. c. Graph Functions ----------------------- Next, let's add a function that will identify the distance between two cities. Ideally, we'd like to specify an arbitrary start and end. So the function might take in two pointers, and return the distance, if found. int get_distance(char *from, char *to){ int temp = 0; while (temp < graph_size) { if (!strcmp(graph_edge[temp].to, to) && !strcmp(graph_edge[temp].from, from)) { return &graph_edge[temp].distance; } temp++; } return 0; } The "strcmp" function is available in the "string.h" library, so we'll have to "#include " as well as "#include ". The function is similar to the "equals()" method in Java, except that it works only with char arrays. If there are no differences between two strings, the strcmp method returns 0. So, if calls to strcmp both return 0 for the 'to' and 'from' cities, we know that our search of the array has come to an end. d. Path Functions and Data Representation ---------------------------------------------- Now that we have a graph structure in memory, and a few methods to inspect the graph, lets turn our attention to the path. How can we represent the path discovered during a DFS? One solution would be to store the path developed during the DFS in a stack object. We therefore declare a new struct, and some associated variables: struct _stack { char from[25]; char to[25]; int distance; }; int stack_counter; struct _stack stack[MAX]; /* To hold our search result */ This is similar to the graph_edge structure, except we do not have to provide flags to avoid backtracking. We also have a counter, to keep track of the current stack size. The push and pop functions that service this stack are a little different from the above example. The previous example merely pushed and popped single ints. In our case, we wish to push and pop three values at a time: to, from, and distance. We could pass in a struct pointer (in fact, this would be more efficient), but to keep it simple, we'll adopt a very basic approach: when data is pushed onto the stack, it is copied into the stack structure. (Obviously, it would be far better to have the stack struct hold pointers, and merely adjust what is pointer to; but this is a basic DFS, not an efficient one. We can later adapt this program to use pointers if necessary.) The push method would therefore take in three parameters: void push(char *from, char *to, int d){ if (stack_counter0) { stack_counter--; strcpy(from, stack[stack_counter].from); strcpy(to, stack[stack_counter].to); *d = stack[stack_counter].distance; } else { printf("Stack underflow\n"); } } This keeps the example simple, if somewhat inefficient. Again, it would have been more efficient to return a pointer to a struct in memory. e. DFS Search --------------------- We next turn to an implementation of the DFS. During critical parts of the search, we will be "on" a certain node, and will need to know what nodes are adjacent. Since we did not represent the graph in its own data structure, we'll have to improvise a method: int find_adjacency(char *from, char *temp_buffer){ int temp=0; while(temp < graph_size) { if (!strcmp(graph_edge[temp].from, from) && !graph_edge[temp].backtrack){ strcpy(temp_buffer, graph_edge[temp].to); graph_edge[temp].backtrack = 1; return graph_edge[temp].distance; } temp++; } return 0; } The find_adjacency function takes in a starting node (char *from), and also a temporary buffer were it can copy the results. (Again, returning a pointer would be faster, but perhaps a bit more complicated.) If a node has adjacencies that have not been visited in the search (determined by the 'backtrack' flag in the graph_edge struct), it copies the destination name into the temporary buffer, and returns the distance value. All that remains is the DFS itself. The search function will manipulate the stack, creating a path from start to finish. It does this by allocating a temporary character buffer, and filling this buffer with the name of the node adjacent to the current city. (The find_adjacency function above populates the temporary buffer.) If the no new adjacency is found, the DFS "back tracks" by popping off cities from the stack. (Recall that pop is operation that alters the parameters passed in.) void find_path(char *from, char *to){ int d; char temp_buffer[25]; d = get_distance(from, to); if (d!=0) { push(from, to, d); return; } d = find_adjacency(from, temp_buffer); if (d != 0){ push(from, to, d); find_path(temp_buffer, to); } else if (stack_counter > 0){ pop(from, to, &d); find_path(from, to); } } After this, we need only write a main method that strings these functions together, and prompts the user for a start/end city. int main (void){ char from[25], to[25]; read_data(FILE_NAME); dump_graph(); printf("What is the start node? "); scanf ("%24s", &from); printf("What is the end node? "); scanf ("%24s", &to); find_path(from, to); /* print out the resulting path */ show_path(to); } All together, our graph program might appear as: /* * graph.c -- reads in a graph from disk and performs a * simple dfs. */ #include #include #define MAX 100 #define FILE_NAME "graph_data.txt" /*=======================================================*/ /* D A T A S T R U C T U R E S */ /*=======================================================*/ /*------------------------ G r a p h --------------------*/ struct _edge { char from[25]; char to[25]; int distance; int backtrack; /* used in searching */ }; struct _edge graph_edge[MAX]; int graph_size; /*-------------------------- P a t h ---------------------*/ struct _stack { char from[25]; char to[25]; int distance; }; int stack_counter; struct _stack stack[MAX]; /* To hold our search result */ /*=======================================================*/ /* S T A C K M E T H O D S */ /*=======================================================*/ /* * Pushes a value onto the stack */ void push(char *from, char *to, int d){ if (stack_counter0) { stack_counter--; strcpy(from, stack[stack_counter].from); strcpy(to, stack[stack_counter].to); *d = stack[stack_counter].distance; } else { printf("Stack underflow\n"); } } /*=======================================================*/ /* G R A P H M E T H O D S */ /*=======================================================*/ /* * Determines the distance between two cities, if any. * */ int get_distance(char *from, char *to){ int temp = 0; while (temp < graph_size) { if (!strcmp(graph_edge[temp].to, to) && !strcmp(graph_edge[temp].from, from)) { return graph_edge[temp].distance; } temp++; } return 0; /* no edge exists */ } /* * Finds an adjacent city, but avoids cycles. This * is called during a search to find new paths. * */ int find_adjacency(char *from, char *temp_buffer){ int temp=0; while(temp < graph_size) { if (!strcmp(graph_edge[temp].from, from) && !graph_edge[temp].backtrack){ strcpy(temp_buffer, graph_edge[temp].to); graph_edge[temp].backtrack = 1; return graph_edge[temp].distance; } temp++; } return 0; } /*=======================================================*/ /* S E A R C H I N G M E T H O D S */ /*=======================================================*/ /* * The dfs */ void find_path(char *from, char *to){ int d; char temp_buffer[25]; d = get_distance(from, to); if (d!=0) { push(from, to, d); return; } d = find_adjacency(from, temp_buffer); if (d != 0){ push(from, to, d); find_path(temp_buffer, to); } else if (stack_counter > 0){ pop(from, to, &d); find_path(from, to); } } /* * Displays the path. When the search is completed, * the stack object will contain the path. This * method merely dumps this out. * */ void show_path(char *destination) { int total_dist, temp; total_dist = temp = 0; while (temp < stack_counter){ printf("%s to ", stack[temp].from); total_dist += stack[temp].distance; temp++; } printf("%s\n", destination); printf("Total distance is %d\n", total_dist); } /*=======================================================*/ /* U T I L I T Y M E T H O D S */ /*=======================================================*/ /* * Reads in data from a file. Assumptions are made about * the formatting of this file. * */ void read_data(const char* file){ int temp; FILE *fp; temp = 0; fp = fopen(file, "r"); if (NULL == fp) { printf ("Unable to read file: %s\n", file); exit(1); } /* find out how many edges are in the file */ fscanf(fp, "%d", &graph_size); /* read in each edge */ for (temp=0; temp < graph_size; temp++){ fscanf(fp, "%24s", &graph_edge[temp].from); fscanf(fp, "%24s", &graph_edge[temp].to); fscanf(fp, "%d", &graph_edge[temp].distance); } fclose(fp); } /* * A utility method to dump out the graph in memory. */ void dump_graph() { int temp; for(temp=0; temp < graph_size; temp++){ printf("Edge [%d]: from %s to %s, distance %d\n", temp, graph_edge[temp].from, graph_edge[temp].to, graph_edge[temp].distance); } } /*=======================================================*/ /* M A I N F U N C T I O N */ /*=======================================================*/ /* * Main */ int main (void){ char from[25], to[25]; read_data(FILE_NAME); printf("What is the start node? "); scanf ("%24s", &from); printf("What is the end node? "); scanf ("%24s", &to); dump_graph(); find_path(from, to); show_path(to); } f. Summary ------------------ This section considered the following: a) A graph search is not such a big deal, even in C. It's just a matter of breaking it down into small functions, and testing them as you go along. b) The selection of data structures in C will often determine the efficiency of your functions. In the example above, the struct were made to hold char arrays, and not pointers to chars. This meant that data had to be copied back and forth in our search. This was slower, but perhaps easier to follow for a first-time graph search. A better solution would be rewrite the structures so that pointer manipulation is used. c) What data gets represented in memory is also a critical factor in your search. Above, we hold only adjacencies. Since we don't have separate "node" structure, we have to code some helper methods for the search.