Presidential and Vice Presidential Debate Stats and Script
This page contains a number of resources useful for performing basic textual
analyses of the presidential debates. A self-contained Python script parses each debate,
and outputs a number of files listing:
- summary statistics of the speakers' responses (e.g., average response length in words),
- the most frequently used words and phrases,
- a compressed timeline showing when phrases were repeated,
- a list of the longest words used by each candidate, and
- each speaker's responses independent of the other speakers' speech.
Statistics for all televised debates are available.
Sample findings
A word of caution with these statistics: These statistics look for exact
word matches. This means they ignore semantically similar content
that may have been expressed using different words.
Word stemming is
being used to help normalize the stats (ie, "lasts, lasted, and lasting" all
reduce to "last"), but stemming is not foolproof. In the cases where stemming
is used, all variations encountered are listed.
First Presidential Debate
Here are a few of the findings from the first debate, in no particular order:
- Kerry's average response length was 215 words. Bush's was 155
- Bush said the phrase "hard work" 11 times
- Bush said "Saddam Hussein" 9 more times than he said "Bin Laden" (14 vs. 5
occurrences)
- Bush said "free Iraq" 12 times. Kerry did not say that exact phrase at all
- Bush said "mixed messages" 5 times
- The longest words Bush uttered were "transformational" and
"counterterrorism". Kerry's were "extraordinarily" and "authoritatively".
Vice Presidential Debate
- Edwards' average response length was 181 words, Cheney's was 165
- The top two-word phrases used by Edwards:
- vice president (48 times, #1 two-word phrase)
- John Kerry (33 times)
- American people (26 times)
- Health care (23)
- For Cheney, the top two-word phrases (filtering out some common phrases,
such as "we've got"):
- I think (23 times, the #1 two-word phrase)
- United States (11 times)
- Saddam Hussein (9 times)
- John Kerry (9 times)
- "Bin Laden" was said 2 times by Cheney, 8 times by Edwards
Second Debate
- In this debate, Bush did not use the phrase "mixed messages" at all
- Bush upped his response length to 181 words, compared to his 155-word average
response length in the first debate
- The second most frequently used 2-word phrase by Kerry in the second
debate was "health care", which he said 19 times. (The number one
two-word phrase used by Kerry was "I'm going")
Third Debate
Stats for the final debate are posted, but I have not analyzed them.
Downloads
You can download the Python script to generate the stats on your own, or you can
download the files it generated directly. The script is completely
self-contained, and even includes the debate text.
debate_stats.py (The script)
debate_stats.zip (All the generated stats, in
one zip file; includes the script)
The files outputted are tab-delimited data files that can be read into other
software, like Matlab or Excel. Currently, the comment format is conducive to
Matlab, though you can erase the "% " lines to more easily import them into
something like Excel.
For repeated phrases, there are now HTML files that format the output into
tables more conducive to browsing. Text files remain for importing the data
into third-party software.
To give you an example of the files' contents, here are the most common
phrases, composed of two words, uttered by Kerry and Bush in the first debate:
Here are the individual files:
Have fun,
Mike (mterry@cc.gatech.edu
|