I. Public versus private
(This lecture was given by Mitch; here are my notes for the
same material.)
In the previous lecture, we suggested that there may be a potential
problem with our Python simulation of my Coke Machine empire.
In particular, that numcans variable is pretty much accessible from
everywhere. So any old method from any object can not only read what's
in there, but write over it too. This could lead to problems,
especially in large programming projects.
In Python, by default, all attributes of an object are "public"...all
attributes of a class instance are accessible without any restrictions. As
Beazley's "Python: Essential Reference" goes on to say:
It also implies that everything defined in a base class is inherited and
accessible within a derived class. This behavior is often undesirable in
object-oriented applications because it exposes the internal implementation
of an object and it can lead to namespace conflicts between objects defined
in a derived class and those defined in a base class.
To fix this problem, all names in a class that start with a double
underscore, such as __Foo, are mangled to form a new name of the form
_Classname_Foo. This effectively provides a way for a class to have
private attributes, since private names used in a derived class won't
collide with the same private names used in a base class. For example:
class A:
def __init__(self):
self.__X = 3 # Mangled to self._A__X
class B:
def __init__(self):
A.__init__(self)
self.__X = 37 # Mangled to self._B__X
Although this scheme provides the illusion of data hiding, there's no strict
mechanism in place to prevent access to the "private" attributes of a class.
In particular, if the name of the class and corresponding private attribute
are known, they can be accessed using the mangled name.
So in short, while Java for example enforces a distinction between public and
private attributes, in Python everything is public. With Python's help you
can make stuff look private, but ultimately it's public anyway. For our
example, that's not so important, but in your future you may encounter
situations where not having that real public/private distinction may lead you
to choose some other OO language over Python. That's ok...it's why we took
the time to talk about this.
Let's get back to vending machines. How do we simulate the purchase of a
Coke? We must create a BuyCoke method. It'll be easy. It'll look like this,
after we add it to the CokeMachine class:
class CokeMachine:
def __init__(self):
self.numcans = 20
print "Adding another Coke machine to your empire"
def BuyCoke(self):
if self.numcans > 0:
self.numcans = self.numcans - 1
print "Have a nice frosty Coke!"
print "%3d frosty cans of Coke remaining." % (self.numcans)
else:
print "Sorry, out of Coke."
Here are some helpful additions to SimCoke:
class SimCoke:
def __init__(self):
print " "
print "Here's the Coke Machine Simulator"
cs = CokeMachine()
ee = CokeMachine()
print "The cs machine has %3d frosty cans of Coke waiting for you." \
% (cs.numcans)
print "The ee machine has %3d frosty cans of Coke waiting for you." \
% (ee.numcans)
cs.BuyCoke()
cs.BuyCoke()
print "The cs machine has %3d frosty cans of Coke waiting for you." \
% (cs.numcans)
print "The ee machine has %3d frosty cans of Coke waiting for you." \
% (ee.numcans)
And here's the output when we run it:
Here's the Coke Machine Simulator
Adding another Coke machine to your empire
Adding another Coke machine to your empire
The cs machine has 20 frosty cans of Coke waiting for you.
The ee machine has 20 frosty cans of Coke waiting for you.
Have a nice frosty Coke!
19 frosty cans of Coke remaining.
Have a nice frosty Coke!
18 frosty cans of Coke remaining.
The cs machine has 18 frosty cans of Coke waiting for you.
The ee machine has 20 frosty cans of Coke waiting for you.
Our Scheme-based Coke machine simulator let us load Cokes into the machine
too. We can do the same thing here with the addition of another method to the
CokeMachine class. We'll call it LoadCoke:
def LoadCoke(self, loadcans):
self.numcans = self.numcans + loadcans
print "%3d cans of Coke added" % (loadcans)
print "%3d cans of Coke now available" % (self.numcans)
That should be pretty straightforward...nothing weird here, except we're
going to explicitly pass a parameter to LoadCoke. Watch for it in the
ever-growing SimCoke below. Here's the whole thing:
class CokeMachine:
def __init__(self):
self.numcans = 20
print "Adding another Coke machine to your empire"
def BuyCoke(self):
if self.numcans > 0:
self.numcans = self.numcans - 1
print "Have a nice frosty Coke!"
print "%3d frosty cans of Coke remaining." % (self.numcans)
else:
print "Sorry, out of Coke."
def LoadCoke(self, loadcans):
self.numcans = self.numcans + loadcans
print "%3d cans of Coke added" % (loadcans)
print "%3d cans of Coke now available" % (self.numcans)
class SimCoke:
def __init__(self):
print " "
print "Here's the Coke Machine Simulator"
cs = CokeMachine()
ee = CokeMachine()
print "The cs machine has %3d frosty cans of Coke waiting for you." \
% (cs.numcans)
print "The ee machine has %3d frosty cans of Coke waiting for you." \
% (ee.numcans)
cs.BuyCoke()
cs.BuyCoke()
print "The cs machine has %3d frosty cans of Coke waiting for you." \
% (cs.numcans)
print "The ee machine has %3d frosty cans of Coke waiting for you." \
% (ee.numcans)
cs.LoadCoke(10)
print "The cs machine has %3d frosty cans of Coke waiting for you." \
% (cs.numcans)
ee.LoadCoke(-20)
print "The ee machine has %3d frosty cans of Coke waiting for you." \
% (ee.numcans)
ee.BuyCoke()
SimCoke()
And here's some output:
Here's the Coke Machine Simulator
Adding another Coke machine to your empire
Adding another Coke machine to your empire
The cs machine has 20 frosty cans of Coke waiting for you.
The ee machine has 20 frosty cans of Coke waiting for you.
Have a nice frosty Coke!
19 frosty cans of Coke remaining.
Have a nice frosty Coke!
18 frosty cans of Coke remaining.
The cs machine has 18 frosty cans of Coke waiting for you.
The ee machine has 20 frosty cans of Coke waiting for you.
10 cans of Coke added
28 cans of Coke now available
The cs machine has 28 frosty cans of Coke waiting for you.
-20 cans of Coke added
0 cans of Coke now available
The ee machine has 0 frosty cans of Coke waiting for you.
Sorry, out of Coke.
II. More Coke Machine stuff
As I grow my collection of Coke machines, I'd like to be able to find out
easily just how many Coke machines I own. So what I need to do is to have the
Coke machine constructor update some sort of counter for the whole class of
Coke machines every time a new Coke machine is created. To do that, we'll
introduce a variable that belongs to the class, not to any particular
instance or object. This is roughly the equivalent of a static variable in
Java: we're saying that the variable is always around so long as the class
is around, regardless of whether instances (or objects) of the class have
been created.
So we add the totalmachines variable up at the top of the definition of the
CokeMachine class so that we don't confuse it with the methods associated
with the objects that will be created, and then we make sure to increment
this variable inside the __init__ method that will be executed every time a
new Coke machine is created (i.e., every time we generate a new instance of
the CokeMachine class):
class CokeMachine:
totalmachines = 0
def __init__(self):
CokeMachine.totalmachines = CokeMachine.totalmachines + 1
self.numcans = 20
print "Adding another Coke machine to your empire"
def BuyCoke(self):
if self.numcans > 0:
self.numcans = self.numcans - 1
print "Have a nice frosty Coke!"
print "%3d frosty cans of Coke remaining." % (self.numcans)
else:
print "Sorry, out of Coke."
def LoadCoke(self, loadcans):
self.numcans = self.numcans + loadcans
print "%3d cans of Coke added" % (loadcans)
print "%3d cans of Coke now available" % (self.numcans)
class SimCoke:
def __init__(self):
print " "
print "Here's the Coke Machine Simulator"
cs = CokeMachine()
ee = CokeMachine()
print "The cs machine has %3d frosty cans of Coke waiting for you." \
% (cs.numcans)
print "The ee machine has %3d frosty cans of Coke waiting for you." \
% (ee.numcans)
cs.BuyCoke()
cs.BuyCoke()
print "The cs machine has %3d frosty cans of Coke waiting for you." \
% (cs.numcans)
print "The ee machine has %3d frosty cans of Coke waiting for you." \
% (ee.numcans)
cs.LoadCoke(10)
print "The cs machine has %3d frosty cans of Coke waiting for you." \
% (cs.numcans)
ee.LoadCoke(-20)
print "The ee machine has %3d frosty cans of Coke waiting for you." \
% (ee.numcans)
ee.BuyCoke()
print "There are %2d Coke Machines in your empire" \
% (CokeMachine.totalmachines)
SimCoke()
Lo and behold, here's some output!
Here's the Coke Machine Simulator
Adding another Coke machine to your empire
Adding another Coke machine to your empire
The cs machine has 20 frosty cans of Coke waiting for you.
The ee machine has 20 frosty cans of Coke waiting for you.
Have a nice frosty Coke!
19 frosty cans of Coke remaining.
Have a nice frosty Coke!
18 frosty cans of Coke remaining.
The cs machine has 18 frosty cans of Coke waiting for you.
The ee machine has 20 frosty cans of Coke waiting for you.
10 cans of Coke added
28 cans of Coke now available
The cs machine has 28 frosty cans of Coke waiting for you.
-20 cans of Coke added
0 cans of Coke now available
The ee machine has 0 frosty cans of Coke waiting for you.
Sorry, out of Coke.
There are 2 Coke Machines in your empire
III. Inheritance
Let's just say, for the sake of argument, that I'm introducing a new line of
Coke machines that work like my older Coke machines, which I still want to
keep because they're making money, but they have some additional features
that the old Coke machines don't have. For example, maybe I want to keep
track of how many cans of Coke a particular machine has sold, so I can ping
it once in awhile to find out how much profit I've made on that machine.
I could create a new class of machine, called "CokeMachine2002", by copying
all the code from the CokeMachine class and just adding my additional code.
Or I could let just create the CokeMachine2002 class by telling Python that
the CokeMachine2002 class takes on all the attributes and methods of the
original CokeMachine class. When a class is derived from another class, that
new class *inherits* the data structures and methods of the original class.
You don't have to copy code; Python does the work for you. The original
class is called the parent class, the base class, or the superclass. The new,
extended class is called the child class or subclass or, in Python, the
derived class. In general, the subclass has everything that the superclass
has, plus additional stuff to make it more specialized than the superclass.
"Inheritance means being able to declare a type which
builds on the fields (data and methods) of a previously
declared type. As well as inheriting all the operations
and data, you get the chance to declare your own
versions and new versions of the methods to refine,
specialize, replace, or extend the ones in the parent
class."
(Peter van der Linden, "Just Java 2", p. 150)
Here's a simple example of how to extend a class. Let's start with just the
basics. First, I tell Python that CokeMachine2002 inherits from CokeMachine
by adding a parameter list in the class statement that lists all the classes
that CokeMachine2002 will inherit from:
class CokeMachine2002(CokeMachine):
pass
And that's all I need, at minimum, to extend the CokeMachine class. The
CokeMachine2002 class is now a clone of the CokeMachine class and can
be used in exactly the same way:
class SimCoke:
def __init__(self):
print " "
print "Here's the Coke Machine Simulator"
cs = CokeMachine()
ee = CokeMachine()
ie = CokeMachine2002()
print "The ie machine has %3d frosty cans of Coke waiting for you." \
% (ie.numcans)
ie.BuyCoke()
print "The ie machine has %3d frosty cans of Coke waiting for you." \
% (ie.numcans)
ie.LoadCoke(10)
print "The ie machine has %3d frosty cans of Coke waiting for you." \
% (ie.numcans)
print "There are %2d Coke Machines in your empire" \
% (CokeMachine.totalmachines)
And here's the output, just so you can see for yourself:
Here's the Coke Machine Simulator
Adding another Coke machine to your empire
Adding another Coke machine to your empire
Adding another Coke machine to your empire
The ie machine has 20 frosty cans of Coke waiting for you.
Have a nice frosty Coke!
19 frosty cans of Coke remaining.
The ie machine has 19 frosty cans of Coke waiting for you.
10 cans of Coke added
29 cans of Coke now available
The ie machine has 29 frosty cans of Coke waiting for you.
There are 3 Coke Machines in your empire
Now where was I? Oh yes, I wanted to give CokeMachine2002 the ability to keep
track of how many cans it has sold and report on the profit. CokeMachine2002
should have already inherited a variable that keeps track of how many cans
are in the machine...the one called "numcans". Still, I need to add a
variable to keep track of how many cans have been sold. Then I can multiply t
hat number by the profit per can to get my profit from the machine at that
time:
class CokeMachine2002(CokeMachine):
def __init__(self):
self.soldcans = 0
def GetProfit(self, sellprice, cost):
print "This machine has made %4.2d profit" \
% (self.soldcans * (sellprice - cost))
There's a problem here though. When I add a __init__ method here, it
overrides the one that was inherited, so numcans never gets created and
things blow up. I need to make sure that the __init__ methodfrom the base
class gets done too:
class CokeMachine2002(CokeMachine):
def __init__(self):
CokeMachine.__init__(self)
self.soldcans = 0
def GetProfit(self, sellprice, cost):
print "This machine has made %0.2f profit" \
% (self.soldcans * (sellprice - cost))
That's gonna work just fine when I run it, except that I end up with no
profit because I'm not updating the number of cans sold when I execute
BuyCoke. So I have to redefine BuyCoke and make sure that I execute the
BuyCoke method from the base class too:
class CokeMachine2002(CokeMachine):
def __init__(self):
CokeMachine.__init__(self)
self.soldcans = 0
def GetProfit(self, sellprice, cost):
print "This machine has made %0.2f profit" \
% (self.soldcans * (sellprice - cost))
def BuyCoke(self):
CokeMachine.BuyCoke(self)
self.soldcans = self.soldcans + 1
Here's the whole thing, followed by some output:
class CokeMachine:
totalmachines = 0
def __init__(self):
CokeMachine.totalmachines = CokeMachine.totalmachines + 1
self.numcans = 20
print "Adding another Coke machine to your empire"
def BuyCoke(self):
if self.numcans > 0:
self.numcans = self.numcans - 1
print "Have a nice frosty Coke!"
print "%3d frosty cans of Coke remaining." % (self.numcans)
else:
print "Sorry, out of Coke."
def LoadCoke(self, loadcans):
self.numcans = self.numcans + loadcans
print "%3d cans of Coke added" % (loadcans)
print "%3d cans of Coke now available" % (self.numcans)
class CokeMachine2002(CokeMachine):
def __init__(self):
CokeMachine.__init__(self)
self.soldcans = 0
def GetProfit(self, sellprice, cost):
print "This machine has made %0.2f profit" \
% (self.soldcans * (sellprice - cost))
def BuyCoke(self):
CokeMachine.BuyCoke(self)
self.soldcans = self.soldcans + 1
class SimCoke:
def __init__(self):
print " "
print "Here's the Coke Machine Simulator"
cs = CokeMachine()
ee = CokeMachine()
ie = CokeMachine2002()
print "The ie machine has %3d frosty cans of Coke waiting for you." \
% (ie.numcans)
ie.BuyCoke()
print "The ie machine has %3d frosty cans of Coke waiting for you." \
% (ie.numcans)
ie.LoadCoke(10)
print "The ie machine has %3d frosty cans of Coke waiting for you." \
% (ie.numcans)
print "There are %2d Coke Machines in your empire" \
% (CokeMachine.totalmachines)
ie.GetProfit(0.75, 0.25)
SimCoke()
Here's the Coke Machine Simulator
Adding another Coke machine to your empire
Adding another Coke machine to your empire
Adding another Coke machine to your empire
The ie machine has 20 frosty cans of Coke waiting for you.
Have a nice frosty Coke!
19 frosty cans of Coke remaining.
The ie machine has 19 frosty cans of Coke waiting for you.
10 cans of Coke added
29 cans of Coke now available
The ie machine has 29 frosty cans of Coke waiting for you.
There are 3 Coke Machines in your empire
This machine has made 0.50 profit
And now that I've finally made a profit, I'm going to simulate my
earning a fortune in quarters. I'm almost a zillionaire...I can feel it.
IV. Simple sorting
(The following stuff is close to what Mitch presented, except that
he started with bubble sort instead of insertion sort, and he
presented his stuff in Python instead of Scheme. The time
complexity of bubble sort is the same as insertion sort.
If I get a moment, I'll try to add the Python stuff, and
bubble sort, in here. -- Kurt, Dec. 8)
Let's shift gears a little. No, make that a lot. One of the
classic problems in computer science is to sort stuff...to
take an unsorted sequence of data items and put them into some sorted
order. Why sort stuff? Well, basically, it's to make searching for
data items a whole lot easier, and searching for data is something
that computers spend lots of time on. And as I'm sure you know from
personal experience, it's easier to find something in an orderly
environment than in a disorderly environment. (Check my office
sometime for an example of a disorderly environment.) Consider the
phone book. It's relatively easy to find somebody's phone number,
because the listings are sorted alphabetically on the (last) name of
the person you're trying to find. But imagine if the phone book
wasn't sorted...what if the names were randomly ordered? It would
take a whole lot longer to find that special someone, wouldn't it? So
that's why sorting is such a big deal. Additionally, there are things
that happen in computer science that aren't exactly sorting, but they
behave like sorting algorithms, so studying sorting has value that
extends beyond sorting problems. But that's grist for another course
later in your CS education, if you choose to go that far.
To make things a little more concrete, consider the problem of
sorting the following list of eight numbers:
7 3 1 4 8 6 2 5
Let's make it even more familiar:
(7 3 1 4 8 6 2 5)
How could we sort this list? Well, the most intuitively obvious way
to sort a list is to apply something that computer scientists like to
call "insertion sort". We start with an unsorted list, and another
empty list that will end up holding the sorted items. We take the
first item of the unsorted list and insert it in the "right" place in
the sorted list.
unsorted list sorted list
(7 3 1 4 8 6 2 5) ()
Since the sorted list is empty on the first pass, this is pretty
easy...we just put the item on the list.
(3 1 4 8 6 2 5) (7)
Now we take the next item off the unsorted list, and we successively
compare it to each item on the sorted list, looking for the right
place to insert it.
(1 4 8 6 2 5) (3 7)
We keep doing this until we run out of items on the unsorted list.
(4 8 6 2 5) (1 3 7)
(8 6 2 5) (1 3 4 7)
(6 2 5) (1 3 4 7 8)
(2 5) (1 3 4 6 7 8)
(5) (1 2 3 4 6 7 8)
() (1 2 3 4 5 6 7 8)
When that happens, the other list contains all the original elements,
but now they're sorted!
Here's a Scheme implementation of insertion sort:
; insertion sort
;
; given an unsorted list and an empty list that will eventually
; hold the sorted items, repeat the following until the unsorted list
; is empty:
; 1) remove the first element from the unsorted list
; 2) traverse the sorted list from left to right one item at
; a time, comparing them to the item removed from the
; unsorted list
; 3) when you find an item in the sorted list that is less than
; or equal to the item from the unsorted list, insert the item
; from the unsorted list just after the item you stopped at
; in the sorted list
;
; insertionsort expects a list of numbers passed via the parameter
; sortlist
;
; insertionsort returns a list of the numbers passed through sortlist
; after being sorted from lowest value to highest value, left to right
(define (insertionsort sortlist)
(insertionsort-helper sortlist ()))
(define (insertionsort-helper sortlist result)
(cond ((null? sortlist) result)
(else (insertionsort-helper (cdr sortlist)
(insert (car sortlist) result)))))
(define (insert item alreadysorted)
(cond ((null? alreadysorted) (cons item ()))
((<= item (car alreadysorted))
(cons item alreadysorted))
(else (cons (car alreadysorted)
(insert item (cdr alreadysorted))))))
Just out of curiosity, what do you suppose is the time complexity for
insertion sort? Well, assuming we want worst-case complexity (which
is often what people are asking for when they ask about complexity),
we need to ask what's the cost of inserting an item into a sorted
list. And while we're at it, we need to decide what the unit of cost
should be. Last week, we counted conses. We could do that again, or
we could count a single comparison (i.e., is this thing less than,
equal to, or greater than that thing?) as a unit of cost. I'll leave
it up to you to convince yourself that in this problem, counting
conses and counting comparisons will work out to the same thing, more
or less...at least, they'll be in the same order of complexity, which
is all we really care about for now.
So how many comparisons happen when inserting the first unsorted
element into the unsorted list? Since the unsorted list is empty,
there are zero comparisons. How about for the next unsorted element?
In the worst case, there is one comparison. How about for the next
unsorted element? In the worst case, there are now two comparisons,
because the sorted list has two elements. This continues until we run
out of unsorted items. That happens at the time when there are n-1
sorted items on the sorted list, so the number of comparisons on this
pass, in the worst case, would be n-1. So the worst-case total number
of comparisons would be 0 + 1 + 2 + ... + n-2 + n-1, which is
n*(n-1)/2. You've seen a pattern like this before, haven't you? So
the time complexity, in terms of comparisons performed in a
worst-case insertion sort, is O((n^2-n)/2), which reduces to O(n^2-n)
because we don't care about constants like 1/2. And O(n^2-n) reduces
to O(n^2) because n^2 grows much faster than n as n gets really big,
so we can ignore the n term. Our answer, therefore, is O(n^2).
Unfortunately, O(n^2) is kind of undesirable. Remember, this big-O
stuff doesn't give us an exact count of the number of comparisons or
conses or whatever for a given procedure on a given input. What it
gives us is a useful approximation of how fast the amount of work
grows in proportion to increases in the size of the input. So what we
can deduce from O(n^2) is that as the size of the problem grows, or
in this case as the size of the list to be sorted grows, the amount
of work performed grows in direct proportion to the square of the
size of the problem. And don't forget that average-case behavior or
best-case behavior may be very different from worst-case behavior. In
the case of insertion sort, best-case time complexity (which occurs
when the unsorted list is the reverse of the desired sorted
list...prove it to yourself at home) is O(n) (prove that to yourself
too!).
Of course, you don't have to do insertion sort using lists and
recursion. You could use vectors and iteration. Here's one
sample incarnation:
;; this procedure creates an output vector with same size
;; as input vector (to be sorted) and fills the output
;; vector with non-numeric junk
;; then the procedure iterates over the items in the
;; input vector and calls insert
(define (insertionsort sortvector)
(do ((result (make-vector (vector-length sortvector) '*))
(max (vector-length sortvector))
(index 0))
((>= index max) result)
(insert (vector-ref sortvector index) result max index)
(set! index (+ index 1))
(display result) ;; this is just to watch what happens
(newline))) ;; ditto
;; this procedure takes the item to be inserted in the
;; sorted list, along with the sorted list so far, and
;; finds the place where the item should be inserted.
;; then this procedure calls insert2
;; this procedure works by side-effect on result,
;; so what's returned doesn't matter
(define (insert new-item result max inputitemindex)
(do ((index 0))
((>= index max) 'nothing) ;; insert works by side-effect
(cond [(or (not (number? (vector-ref result index)))
(< new-item (vector-ref result index)))
(insert2 new-item index result max inputitemindex)
(set! index max)]
[else
(set! index (+ index 1))])))
;; this procedure iterates right-to-left (backward,
;; whatever) through the sorted list, moving every
;; item one slot to the right until it has moved
;; the number pointed at by index, which opens
;; up a space for the new item to be inserted.
;; this procedure works by side-effect on result,
;; so what's returned doesn't matter
;; Note: this procedure has been optimized some, so
;; as not to move things it doesn't need to...
;; ultimately the insertion sort algorithm is
;; still O(n^2) in the worst case.
(define (insert2 new-item index result max inputitemindex)
(do ((leftpointer index)
(rightpointer inputitemindex)) ;; if we use max instead of
;; inputitemindex from way up
;; above, we'll move asterisks
;; we don't have to, but it's
;; not gonna change our Big-O
((= rightpointer leftpointer) (vector-set! result leftpointer new-item))
(vector-set! result rightpointer (vector-ref result (- rightpointer 1)))
(set! rightpointer (- rightpointer 1))))
The verbosity of Scheme's do structure makes the code a little bit
ugly, but it should run faster than the previous version since we're
not building lists...we've eliminated the expense of the conses. But
if we're counting comparisons, this version still has time complexity
of O(n^2).
And of course we could do the whole thing in Python, taking advantage
of Python's much cooler iterative forms as well as built in
methods for inserting into the middle of a list and appending
to the end:
def InsertionSort(inputlist):
result = inputlist[0:1] # seed the result list with the
# first item from input list
print result
for item in inputlist[1:]: # begin with the second item from input list
InsertItem(item, result)
print result
return result # this function returns the sorted list
# without munging the input list
def InsertItem(item, result):
for i in range(0, len(result)): # look at everything in result and
if item < result[i]: # find place to insert item
result.insert(i, item) # insert the item in that place
return result # what's returned is unimportant as
# this procedure works with side effects
# on result
else:
pass
result.append(item) # if the item isn't inserted somewhere
# then stick it on the end of the list
Here are some test cases:
print "start test1"
test1 = [6, 2, 4, 1, 3, 5]
print test1
print InsertionSort(test1)
print test1
print " "
print "start test2"
test2 = [6, 5, 4, 3, 2, 1]
print test2
print InsertionSort(test2)
print test2
print " "
print "start test3"
test3 = [1, 2, 3, 4, 5, 6]
print test3
print InsertionSort(test3)
print test3
print " "
print "end"
Here's the output:
start test1
[6, 2, 4, 1, 3, 5]
[6]
[2, 6]
[2, 4, 6]
[1, 2, 4, 6]
[1, 2, 3, 4, 6]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[6, 2, 4, 1, 3, 5]
start test2
[6, 5, 4, 3, 2, 1]
[6]
[5, 6]
[4, 5, 6]
[3, 4, 5, 6]
[2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[6, 5, 4, 3, 2, 1]
start test3
[1, 2, 3, 4, 5, 6]
[1]
[1, 2]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
end
But now we're back to manipulating lists and pointers, and that's
expensive, and in terms of comparisons, it's still gonna have
time complexity of O(n^2). Could we speed this up? Sure, we
could translate the vector-based Scheme approach above into
a Python version where we treat lists as arrays, and avoid
list operators like insert and append. But still, it's O(n^2).
That's the nature of insertion sort. No matter how you optimize
things, as n, the number of things to be sorted, grows, the amount
of work to be done grows along the lines of n^2. That's what this
Big-O stuff is all about. It's not about comparing different
implementations of the same algorithm, it's about describing the
behavior of a class of algorithms as the input grows.
V. Bubblesort
Mitch talked about bubblesort in class. It's an interesting
algorithm because it's not immediately obvious that this approach
could get stuff sorted, and the code can be really compact.
However, this approach too has a time complexity of O(n^2).
So algoritm complexity isn't about how much code there is,
it's still only about how much work gets done in terms of
the cost measure that you use.
Mitch's explanation looked like this:
Traverse an (unsorted) collection of elements from front
to back or left to right
"Bubble" the largest value in the list to the end or right
using pairwise comparisons and swapping.
You keep repeating the two steps above until the collection
is sorted...at most, if there are n things to be sorted,
you'll have to traverse the collection n-1 times.
Here's an example from Mitch's slides. Start with this collection:
77 42 35 12 101 5
^
We start our first traversal at the left. 77 is greater than 42, so
we swap:
42 77 35 12 101 5
^
77 is greater than 35 so we swap again
42 35 77 12 101 5
^
77 is greater than 12 so we swap again
42 35 12 77 101 5
^
77 is less than 101 so we leave 77 where it is and
resume again with trying to bubble 101 up or to the right:
42 35 12 77 101 5
^
101 is less than 5 so we swap those values
42 35 12 77 5 101
^
Now we're at the end of the list. We've completed one traversal,
and we observe that the biggest value in the list has bubbled
to the top or right. One item is sorted, and n-1 items remain
to be sorted, so we have to traverse the list again, but we only
need to traverse n-1 elements. After the next pass the list looks
like this:
35 12 42 5 77 101
And then the process continues like this:
12 35 5 42 77 101
12 5 35 42 77 101
5 12 35 42 77 101
There were 6 items in the list, and 6-1 or 5 traversals. The
analysis of bubblesort works out just like with insertion sort.
In a list of n items, on the first pass you'll do n-1 comparisons
and at most n-1 swaps. On the next pass you'll do n-2 comparisons
and at most n-2 swaps. And so on. Add 'em all up and you get
(n - 1) + (n - 2) + (n - 3) + ... + 1 comparisons/worst-case-swaps.
Have you seen that before? Sure, it's the same series that we
saw with insertions sort above. Looks like O(n^2) again.
Here's a Scheme-based approach to bubblesort on a vector of
numbers...if you want to bubblesort a list of numbers, feel
free to provide your own code:
(define (bubblesort invector)
(do ((i 0)
(max (- (vector-length invector) 1)))
((>= i max) invector)
(traverse invector)
(print invector)
(newline)
(set! i (+ i 1))))
(define (traverse invector)
(do ((j 0)
(max (- (vector-length invector) 1)))
((>= j max) invector)
(cond [(> (vector-ref invector j) (vector-ref invector (+ j 1)))
(swap invector j (+ j 1))]
[else 'nothing])
(set! j (+ j 1))))
(define (swap invector i1 i2)
(let ((temp (vector-ref invector i1)))
(vector-set! invector i1 (vector-ref invector i2))
(vector-set! invector i2 temp)))
Here's a little bit of output:
> (bubblesort (vector 6 5 4 3 2 1))
#6(5 4 3 2 1 6)
#6(4 3 2 1 5 6)
#6(3 2 1 4 5 6)
#6(2 1 3 4 5 6)
#6(1 2 3 4 5 6)
#6(1 2 3 4 5 6) ;; this one is returned, not printed
> (bubblesort (vector 5 1 4 6 2 3))
#6(1 4 5 2 3 6)
#6(1 4 2 3 5 6)
#6(1 2 3 4 5 6)
#6(1 2 3 4 5 6)
#6(1 2 3 4 5 6)
#6(1 2 3 4 5 6) ;; this one is returned, not printed
>
Bubblesort is much more concisely expressed in Python. Here's
Mitch's version with one teeny weeny modification so that it
doesn't do any more work than necessary:
def bsort(array):
for i in range(0,len(array)-1):
for j in range(0,len(array)-1):
if array[j] > array [j+1]:
swap(array, j, j+1)
def swap(array, i1, i2):
temp = array[i1]
array[i1] = array[i2]
array[i2] = temp
VI. Using tree recursion as a better sort of sorting strategy
So now the question is "can we do better than O(n^2) with sorting?"
The answer is yes, and the trick is that you have to think about the
sorting problem differently. Here's another way to sort a list: To
sort a list, you cut the list into two smaller lists of say equal
length. Call them the lefthalf and the righthalf. Now you sort the
lefthalf and the righthalf independently. The results should be that
you get back two sorted lists. You then merge those two lists in such
a way that the result is one sorted list containing all the elements
of the two smaller sorted lists. In other words, to sort a list, you
cut the list into halves, sort the halves, and then put the results
back together into a sorted list. How do you sort the halves? Well,
you just cut the halves in half, sort those halves (now quarters),
and merge the results. And how do you merge the quarters? It should
be obvious by now.
Here's a more graphical way of looking at the same problem-solving
approach. Again, we start with the unordered list:
(7 3 1 4 8 6 2 5)
To sort this list, we split the list into two halves, and take a leap
of faith that if we sort the each of the two halves and then merge
the results together, we'll get a sorted list:
(7 3 1 4 8 6 2 5)
^
|
merge
^ ^
| |
(7 3 1 4) (8 6 2 5)
Then to sort those two lists, we split each of them in half, sort
them, and merge the results:
(7 3 1 4 8 6 2 5)
^
|
merge
^ ^
| |
(7 3 1 4) (8 6 2 5)
^ ^
| |
merge merge
^ ^ ^ ^
| | | |
(7 3) (1 4) (8 6) (2 5)
We keep splitting, sorting, and merging until we get down to lists
with only one element, because those lists are already sorted for us!
(7 3 1 4 8 6 2 5)
^
|
merge
^ ^
| |
(7 3 1 4) (8 6 2 5)
^ ^
| |
merge merge
^ ^ ^ ^
| | | |
(7 3) (1 4) (8 6) (2 5)
^ ^ ^ ^
| | | |
merge merge merge merge
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | |
(7) (3) (1) (4) (8) (6) (2) (5)
Note that, on the way down, we haven't really sorted anything yet!
All we did was keep splitting our lists until we got down to
one-element lists, and each of those is sorted by definition. The
real "sorting" will happen in the merges that are performed from this
point on. For example, when we merge the two lists (7) and (3), the
merge process will produce the sorted list (3 7). The other merges
will produce (1 4), (8 6), and (2 5):
(7 3 1 4 8 6 2 5)
^
|
merge
^ ^
| |
(7 3 1 4) (8 6 2 5)
^ ^
| |
merge merge
^ ^ ^ ^
| | | |
(3 7) (1 4) (6 8) (2 5)
^ ^ ^ ^
| | | |
merge merge merge merge <- start
^ ^ ^ ^ ^ ^ ^ ^ here
| | | | | | | |
(7) (3) (1) (4) (8) (6) (2) (5)
We now merge those two-element lists to get two sorted four-element lists:
(7 3 1 4 8 6 2 5)
^
|
merge
^ ^
| |
(1 3 4 7) (2 5 6 8)
^ ^
| |
merge merge <- now do these
^ ^ ^ ^
| | | |
(3 7) (1 4) (6 8) (2 5)
^ ^ ^ ^
| | | |
merge merge merge merge
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | |
(7) (3) (1) (4) (8) (6) (2) (5)
Finally, we merge the two four-element lists to get one
eight-element, entirely sorted list:
(1 2 3 4 5 6 7 8)
^
|
merge <- now we merge
^ ^ this one
| |
(1 3 4 7) (2 5 6 8)
^ ^
| |
merge merge <- now do these
^ ^ ^ ^
| | | |
(3 7) (1 4) (6 8) (2 5)
^ ^ ^ ^
| | | |
merge merge merge merge
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | |
(7) (3) (1) (4) (8) (6) (2) (5)
This is an example of a problem-solving strategy called "divide and
conquer": make the problem into smaller problems, solve the smaller
problems, and join the results. You've been doing this all along, you
just didn't know it had a name. And the particular sorting algorithm
that implements this divide and conquer strategy is called mergesort.
VII. Mergesort in Scheme
Here's an English description of the mergesort procedure we illustrated above:
To sort a list, you cut the list into two smaller lists of say equal
length. Call them the lefthalf and the righthalf. Now you sort the
lefthalf and the righthalf independently. The results should be that
you get back two sorted lists. You then merge those two lists in such
a way that the result is one sorted list containing all the elements
of the two smaller sorted lists.
When I look at this description and think about how to design a
mergesort program, I see at least four different procedures that are
named in the description: a sorting procedure, a procedure to return
the lefthalf of a list, a procedure to return the righthalf of a
list, and a procedure for merging two sorted lists.
And when I start to think about that sorting procedure, I want to
call it mergesort, since that's the name of the sorting algorithm I'm
trying to implement. That top-level procedure does exactly what the
English description says: to mergesort a list, merge the results of
calling mergesort on the lefthalf of the list and the righthalf of
the list:
(define (mergesort sortlist)
(cond ((null? sortlist) ()) ;; in case someone tries to
;; sort the empty list
((null? (cdr sortlist)) sortlist) ;; making sure we don't split
;; a one-element list and
;; recurse forever
(else (merge (mergesort (lefthalf sortlist))
(mergesort (righthalf sortlist))))))
Figuring out what the left half of a list is involves finding the
length of the list, dividing by 2, and then peeling off that many
elements of the list and returning those elements as a list:
(define (lefthalf sortlist)
(lefthalf-helper sortlist (floor (/ (length sortlist) 2))))
(define (lefthalf-helper sortlist endcount)
(cond ((= endcount 0) ())
(else (cons (car sortlist) (lefthalf-helper (cdr sortlist)
(- endcount 1))))))
Figuring out what the right half of a list is involves finding the
length of the list, dividing by 2, and then peeling off that many
elements of the list and throwing them away, returning what's left
over:
(define (righthalf sortlist)
(righthalf-helper sortlist (floor (/ (length sortlist) 2))))
(define (righthalf-helper sortlist startcount)
(cond ((= startcount 0) sortlist)
(else (righthalf-helper (cdr sortlist) (- startcount 1)))))
By using the same number for counting off list elements, we minimize
the potential for arithmetic errors in lefthalf and righthalf, which
could in turn give us erroneous left halves and right halves.
Finally, merging two sorted lists is not unlike putting together the
two halves of a zipper, except that you have to do some comparisons
to maintain sorted order as you merge:
(define (merge list1 list2)
(cond ((null? list1) list2)
((null? list2) list1)
((<= (car list1) (car list2))
(cons (car list1) (merge (cdr list1) list2)))
(else
(cons (car list2) (merge list1 (cdr list2))))))
VIII. Analysis of mergesort
Here's the mergesort program all together:
;; mergesort is the top-level function...it takes a list of numbers
;; to be sorted, splits the list into two equal-sized (plus or minus 1)
;; sublists, calls mergesort on the two sublists recursively, and then
;; calls merge to merge the resulting sorted sublists. mergesort stops
;; recursing on a list with only one element, which by definition is
;; a sorted list.
(define (mergesort sortlist)
(cond ((null? sortlist) ()) ;; in case someone tries to
;; sort the empty list
((null? (cdr sortlist)) sortlist) ;; making sure we don't split
;; a one-element list and
;; recurse forever
(else (merge (mergesort (lefthalf sortlist))
(mergesort (righthalf sortlist))))))
;; lefthalf returns a list of the first n elements of a list, where
;; n is floor(length of the list/2)
(define (lefthalf sortlist)
(lefthalf-helper sortlist (floor (/ (length sortlist) 2))))
(define (lefthalf-helper sortlist endcount)
(cond ((= endcount 0) ())
(else (cons (car sortlist) (lefthalf-helper (cdr sortlist)
(- endcount 1))))))
;; righthalf returns a list of all but the first n elements of a
;; list, where n is floor(length of the list/2)
(define (righthalf sortlist)
(righthalf-helper sortlist (floor (/ (length sortlist) 2))))
(define (righthalf-helper sortlist startcount)
(cond ((= startcount 0) sortlist)
(else (righthalf-helper (cdr sortlist) (- startcount 1)))))
;; merge takes two sorted lists as arguments and merges the two lists
;; into a single list while retaining the sorted order...duplicate
;; elements are retained
(define (merge list1 list2)
(cond ((null? list1) list2)
((null? list2) list1)
((<= (car list1) (car list2))
(cons (car list1) (merge (cdr list1) list2)))
(else
(cons (car list2) (merge list1 (cdr list2))))))
We started down this road because we were looking for a sorting
algorithm that gave us better time complexity than insertion sort's
O(n^2), so you're probably expecting that mergesort gives us that
better time complexity. Good guess. (And remember, "algorithm" is
just another way of saying "a set of instructions which when executed
solves some problem or does something useful". It's pretty much
interchangeable with the word "procedure" in this course, although we
sometimes use the word algorithm when we want make note of the
separation between the high-level approach to solving a problem
(algorithm) and the actual implementation in some specific
programming language (procedure). However, you should also know that
there's another aspect of the definition for "algorithm" that we're
ignoring here: by definition, an algorithm is guaranteed to halt.
It's an important theoretical distinction, but you don't need to
worry about it for the time being. Now back to our regularly
scheduled lecture notes.) Let's analyze the complexity of the
mergesort algorithm.
We start by noticing that when it split the original unsorted list
into two lists, the algorithm had to deal with n elements---at the
very least, to split the eight-element list into two four-element
lists, the algorithm had to peel off the first four elements, or n/2
elements. And just to make life simpler, let's not worry about
low-level details like how many conses were performed...let's analyze
the algorithm independently of how things are done in Scheme, so for
our unit cost we'll just count how many times a list element is
"handled" by the algorithm (you can convince yourself that the
analysis would still hold up if we counted conses, although the
constants will change some).
What about at the next level of splitting? Well, to split two
four-element lists into four two-element lists, the algorithm had to
peel off the first two elements of each of two lists, and that's four
elements again, or n/2 elements. In fact, everytime the algorithm
splits all the lists at a given level of splitting, it has to handle
n/2 elements. So the work done at each level of splitting is O(n/2),
and since the constant isn't all that important to us, we say the
time complexity at each level is O(n).
Once the splitting is all done, what's the time complexity of
merging? Well, when you merge eight lists into four lists, at worst
you handle every list element. And when you merge four lists into
two, again you handle every list element, in the worst case. So it's
easy to see that the time complexity at every level of merging is
O(n).
So at every level of doing either splitting or merging, the
complexity is O(n). More accurately, it's O(2n), but of course the
constant once again is thrown away. Now we have to ask how many
levels of splitting or merging there are. If the number of levels
remains constant regardless of the n, then we'll be able to throw
that away and declare that we have a sorting algorithm that has O(n)
worst-case time complexity, and that would be worthy of a doctoral
dissertation or two. On the other hand, if the number of levels is
somehow proportional to n, then we may have to include that in our
analysis. Oh, in case you were waiting to have your Ph.D. bestowed on
you now, rest assured that the number of levels does not remain
constant.
How the number of levels is affected by n may not be immediately
obvious to you...it's the kind of thing that mathematicians get paid
to notice. And what they'd notice is this: when n = 8, we see 3
levels of splitting and 3 levels of merging. What if n = 4? You can
see from the figures above that it takes 2 levels of splitting to get
from one four-element list to four one-element lists, and not
surprisingly there are 2 levels of merging to get back to a sorted
four-element list. The same figures show you that if n = 2, there
will be 1 level of splitting and 1 level of merging. In other words,
every time we divide n by 2, we reduce the number of levels by one.
What if we doubled n? It won't take you much work to convince
yourself that if n = 16, there will be 4 levels of splitting and 4
more of merging. In other words, it looks like
2^1 list elements results in 1 level of splitting and 1 level of merging
2^2 list elements results in 2 levels of splitting and 2 levels of merging
2^3 list elements results in 3 levels of splitting and 3 levels of merging
2^4 list elements results in 4 levels of splitting and 4 levels of merging
and so on
Or, in fewer words, if n = 2^k, then there are k levels of splitting
and k levels of merging. If you know anything about how logarithms
work, then you know that the base 2 logarithm of 2^k is k. And if you
don't know about base 2 logarithms, that's just a way of saying that
k is the exponent that we have to raise 2 to in order to produce 2^k.
So the base 2 logarithm of 16 is 4, and the base 2 logarithm of 8 is
3, and the base 2 logarithm of 4 is 2, and so on. We can shorten that
text to read like this:
log 16 = 4
2
log 8 = 3
2
log 4 = 2
2
In general then, when there are n elements in the unsorted list that
mergesort begins with, there will be
log n levels of splitting and
2
log n levels of merging.
2
When we're dealing with issues of computational complexity and Big-O
notation, we ignore bases in the same way that we ignore other
constants...we're really only interested in the variables when we're
analyzing how the complexity changes as the size of the problem
grows. So we can say that there are log n levels of splitting and log
n levels of merging for an n-element list, or O(2 log n) levels
altogether. Again, we eliminate the constant and say that there are
on the order of log n levels of splitting and merging. Since there
are on the order of n elements handled at each level, and there are
on the order of log n levels of handling, we can then say that the
worst-case time complexity for mergesort is O(n) * O(log n), which we
can combine as O(n log n) time complexity for mergesort. And O(n log
n) is pretty much the best worst-case time complexity you're going to
find for any sorting algorithm, so mergesort turns out to be pretty
darn good.
Somebody once asked why we don't just throw away the log n part
and call it O(n) complexity. We could throw away the log n part if
the complexity were analyzed to be O(n + log n), because log n
doesn't grow very much as n gets big, so adding log n to n doesn't
change the complexity much as n grows. We keep the dominant term n
and toss everything else. But when we're multiplying n times log n,
that's different. Even though log n grows slowly, it does grow, and
as n increases to very big numbers, O(n log n) diverges greatly from
O(n), so ignoring the log n in the product n log n would give us
estimates of complexity that would be way off base for large values
of n.
The analysis of algorithm performance is a big part of what good
computer scientists do, or should be doing, when creating software.
You'll see more of this sort of stuff as you continue with your
career in computer science, assuming that's your career choice. Until
then, you should know at least that there are different sorts of
classic time-complexity behavior that you can expect. We can
summarize them in the following table, adapted from the book
"Algorithmics" by David Harel:
O(K) or O(1) constant time good
O(log N) logarithmic time
polynomial O(N) linear time
time
O(N logN) N log N time
O(N^2) N-squared or quadratic time
O(N^3) cubic time
O(N^K) etc.
-------------
O(2^N) or O(K^N) exponential time
exponential O(N!) factorial time
time
O(N^N) forget it bad
In general, anything that falls in the realm of polynomial time is
considered a reasonable algorithm. Even if the numbers get really big
as N increases, those numbers are relatively small in the
mathematical arena. On the other hand, any algorithm that exhibits
exponential time complexity (or space complexity for that matter) is
considered to be unreasonable or intractable.
Still, you have to keep in mind that we've been working with
worst-case scenarios all along, and you really need to consider
average-case and best-case performance as well. While you may come up
with an algorithm that has unreasonable worst-case complexity, it may
have really good average-case complexity. And if the worst-case
scenario doesn't occur very often, or if you can recognize when it
does and do something else instead, then your algorithm may be pretty
good after all. Being able to do this kind of analysis well requires
lots of practice, of course, and a little bit of art thrown in with
your science. It's fun stuff once you get the hang of it.
Copyright (c) 2003 by Kurt Eiselt. All rights reserved, with
the exception of stuff that belongs to somebody else.
Last revised: December 11, 2003