In this assignment, you'll get practice working with strings and arrays. Manipulating text strings and performing array operations are two of the most common operations that are routinely performed in computer programs. These are certainly some of the most useful skills that you should learn in this course.
In this assignment, you'll write a Java program that will read in a large body of text (e.g., a chapter of a book) and produce some statistics from it. More specifically, your program will read an integer from the command line that specifies the number of unique words at the start of the text file to "track." Your objective is to count how often each of these words occurs in the entire text file (including the initial appearance(s)). Additionally, your program will identify the longest sentence in the text file where the length of a sentence is measured by the number of words in it. More details about these tasks are described below. Be sure to read the entire assignment before writing this program. Note that this assignment is a basic command-line program without any GUI.
The first thing that this program must do is to find the frequencies of a given number of initial words in the input text. This may be broken into several steps:
If you are using jGRASP, select the "Run Arguments" and "Run in MSDOS Window" options under the Run menu for the file that you are working on. If these two options are not selected, your program will not function correctly in jGRASP. You will then be able to enter a command-line argument in the text field above the source code.
If you are using the command line, provide the command line argument directly after the name of the class (in this example, we assume that it's called TextAnalyzer.class):
% java TextAnalyzer 1000
Try printing out the value of the command-line argument N to make sure that you can correctly read the command line argument.
Your program should continue reading in words while they are available. This will allow your program to input multiple lines of data. Keep in mind that you should ignore punctuation when deciding if two words match.
The second thing that this program must do is to find the longest sentence in the input text. The longest sentence has the greatest number of words in it. For simplicity, you may assume that sentences are separated by either a period ('.'), an exclamation point ('!'), or a question mark ('?').
Here's an example run, with a value of N = 5. If you are using jGRASP, the value "5" is entered in the Run Arguments text field above your program's source code. Keyboard input is shown in bold. Holding down the Control key and hitting Z will cause your program to stop accepting input. On some UNIX systems and other systems, you must use Ctrl-D instead.
% java TextAnalyzer 5 The beginning. This is the middle sentence. This is the end. 1234 1234 1234. This is another line of input. Here's another. <Ctrl-Z> Word frequencies: the: 3 beginning: 1 this: 3 is: 3 middle: 1 Longest sentence: "This is another line of input."
After testing your program a little, you would note that it quickly becomes tedious to keep typing in the text for sample input over and over. To avoid this, you can have your program read input from a file via input redirection. (Note that later in the term you will learn how to read from files more directly, but you don't need to do that now.) To do input redirection, you use the less-than sign '<' which tells the program to read standard input from the file name that follows the sign. Unfortunately, jgrasp does not appear to handle this properly as it interprets the '<' as another item in the array of command-line arguments. Thus, to use file redirection, you will need to run your program from the command-line. But never fear, we've been doing that all term and it's always good practice to do so. We will provide a couple sample input files, hw7-alice1.txt and hw7-alice2.txt, for you to try out. Download those files and/or create some of your own to work with and test your program on.
When using the command line, you will run your program like this:
% java TextAnalyzer 5 < alice.txt
Note that you do not need to change your program at all to get this functionality. The same source program will handle the manual entry of text as first shown above, and then input redirection from a file as just discussed. In both cases your program thinks it's taking input from the standard input stream.
You should also create some of your own text files to try out as input. On a really big file, provide an N of something like 1000 and see how your program works. The Palindrome, Pig Latin, and syllable counter programs that we have discussed in class and are in your book may be helpful.
Potential extra credit for the ambitious students: Once your program has done its thing from the command line, ie, it has read input and calculated values, have it pop up a graphical display that presents statistics about the text. You might want to print out a bar chart with the word frequencies, show the longest sentence, etc. But make sure that your program takes input from the command-line as we describe above.
After you have finished your program, turn the files in via Webwork You will be submitting multiple files. Please make sure they are named as shown below: