Computer Science 2500, Fall '11
Course Diary
Copyright 2011 by H.T. Wareham
All rights reserved
Week 1,
Week 2,
Week 3,
Week 4,
Week 5,
(Class Exam #1 Notes),
Week 6,
Week 7,
Week 8,
Week 9,
(Class Exam #2 Notes),
Week 10,
(Final Exam Notes),
Week 11,
Week 12,
Week 13,
(end of diary)
Wednesday, September 7 (Lecture #1)
[LutzL, Sections 1-3; Class Notes]
- Went over course outline.
- Introduction to Python
- Python is a (primarily interpreted) scripting language, unlike
traditional compiled languages like C, C++, and Java.
- Why Python?
- Easy to learn and use: Python hides messy low-level
details of programming such as variable (type)
declaration, memory management, and construction and
manipulation of advanced abstract data types like
sequences and dictionaries, and lets users focus on
programming what they need to do.
Some claim this leads to shorter code that is more
likely to be correct than that produced in other
languages like Java and C (Loui (2008), p. 23).
- Powerful: Advanced data types like sequences, sets, and
dictionaries as well as operations on these data types
are part of the core syntax, enabling easy manipulation
of nested heterogeneous data. Object-oriented
programming with classes, methods, and inheritance is
also built in, but need only be used when necessary
cf. Java. Custom applications supported by
easily-accessible modules, e.g., numerical
computing, graphical user interfaces (GUI).
- Mixable: Built for easy interfacing with code written in
other programming languages. This is useful for both
creating wrappers around legacy code and developing
new applications which exploit the strengths of
different programming languages (Python as glue).
- Free and well-supported: Python is open-source and
has a very active development community. In addition to
supporting core language evolution, this community is
also very active in developing modules for particular
applications, e.g., computational biology,
numerical computing, natural language processing.
- The major downside with current versions of Python (as with
scripting languages in general) is runtime and memory
efficiency. However, given increasing computer speed and
memory availability, special-purpose Python modules,
e.g., numPy, integrated code components from high-speed
and memory-efficient languages like C, and various
optimization techniques, these are not insurmountable
problems.
- Invoking Python on Unix
- Using the interpreter (interactive mode).
- Type "python" and then enter code to run after
the ">>" Python command prompt.
- Allows quick testing of Python code fragments; can
also import larger pieces of code from system files
and run them inside the interpreter.
- Executing Python scripts (program mode)
- Given a piece of Python code in system file
X.py that takes command-line arguments a1,
a2, ..., type "python X.py a1 a2 ..." to run X.py
on those arguments.
- Note implicit invocation of interpreter via
initial "python" in the command above.
- In this course, we will focus on running Python by executing
scripts as described above. There are other manners of running Python on
Unix, e.g., directly executable / compiled Python
scripts, as well as under other operating systems such
as Windows; for details, see LutzL, Chapters 2 and 3 and
Appendix A.
- What Python can do for you: Some sample scripts
- Example: Printing the square of 10 (sq.py).
- Note two forms of documentation: line-comments
(lines starting with "#") and docstrings (set of
lines surrounded by triple quotes).
- This script is simple, but very limited. What if want to take the squares
of one or more command line arguments? What about squaring numbers
stored in a file? And how about some error-checking? We will look
at how to handle these issues (and much more) in the next lecture.
Friday, September 9 (Lecture #2)
[Class Notes]
- Introduction to Python (Cont'd)
- What Python can do for you: Some sample scripts (Cont'd)
- Example: Printing the square of a command-line argument
(sqarg1.py).
- Uses system list-variable sys.argv
- sys.argv[0]: Python script filename
- sys.argv[i], i > 0: command-line argument i.
- Note conversion using int() -- this means all
command-line arguments are strings by default.
- Note continuation of command over two lines using
"\"-character at end of 1st line.
- Example: Printing the square of a command-line argument (error-checking)
(sqarg2.py).
- Uses len(sys.argv) (number of elements in
list sys.argv).
- Note that statement-blocks associated with control structure
statements like conditionals and loops are denoted by indenting.
It takes a bit of getting used to, but this
actually turns out to be a lot safer than
the traditional begin/end and parenthesis-pair
statement-block delimiters.
- Example: Printing the squares of all command-line arguments
(no error-checking)
(sqargs1.py).
- Note interesting for-loop -- is operating over
a (sub-)list!
- What happens if there are no command-line arguments?
If a command line argument is not a number?
- If there are no arguments, the command-line
argument sub-list is empty and nothing is
done, which is OK. However, if a command-line
argument is not an integer, interesting things
happen (try entering "python sqargs1.py a").
- Example: Printing the squares of all command-line arguments
(with error-checking)
(sqargs2.py).
- Uses try-except construct; note how much simpler
this is than error-handling in other languages,
e.g., Java.
- Example: Printing the squares of all numbers in a single-column
file
(sqfile1.py)
[sqfile1.dat].
- Once again, the for-loop is iterating over a list --
however, in this case, it is the list of lines in
a file.
- Note how much simpler Python file I/O is than in
other languages, e.g., Java.
- Example: Printing the squares of all numbers in a file
(sqfile2.py)
[sqfile2_1.dat,
sqfile2_2.dat].
- Note how much simpler splitting lines into
space/tab-separated arguments is in Python than
Java.
- Example: Counting the number of occurrences of the word "line" in a file
(woc1.py)
[woc.dat].
- Note use of backslash to produce double quotes in output.
- Note that simple string-comparison operators are built into
Python cf. string comparison via
compareTo() method in Java.
- Example: Counting the number of case-sensitive and -insensitive occurrences of a command-line specified word in a file
(woc2.py)
- Note slightly different syntax for intermediate
conditional clauses.
- Example: Multiple transformations of 4-column datafile (i.e.,
delete 1st column, swap 2nd and 4th columns, scale
3rd column by factor)
(filetran.py)
[filetran.dat].
- Note list-assignment statement -- have you ever
seen anything like this before?
- Having seen some of the neat things Python can do (in
remarkably little (and eminently comprehensible) code),
let's get a grounding in the fundamentals of Python, starting
with documentation and I/O.
Monday, September 12 (Lecture #3)
[LutzL, Sections 9 and 15; Class Notes]
- Basic Python I: Documentation
- The remainder of any line after a hash-mark ("#") is
a comment; use this for in-line comments and comment-blocks
in the code.
- Convention for scripts is to have triple-quoted string
(docstring) at top of file with details on command-line
arguments and anything else worth saying about the
script, e.g., who wrote it, what it does. Such
docstrings can be accessed by Python documentation software;
this will be discussed more later in the course, when we get
to defining functions and modules.
- Within the Python interpreter, to view the docstring
associated with an imported program or function X,
type "print X.__doc__".
- Basic Python I: Text I/O
- Two types of I/O in Python: text and binary. Focus on text in this
course (though binary will be mentioned briefly when we discuss object
file storage in Python later in the course).
- Text I/O in Python is line-oriented.
- Can do text I/O wrt keyboard and screen in Python using raw_input()
and print.
- print Format: print arg1, arg2, ... {,}
- Note that args can be string literals or any type in Java
(which will automatically be converted to the appropriate strings).
- print automatically adds a line-return; this can be
suppressed using a trailing comma.
- Function raw_input() returns a line of input from the
keyboard with the final line-return stripped off; if a string-literal
is given as a function parameter, that string is printed as a
prompt.
- Example: Echo uppercase version of keyboard input to
screen (echo.py)
- As of Python 3.0, print has been replaced by function
print() and raw_input() has been replaced
by input(); for details, see pages 297-302 and 49-50 in
LutzL.
- Wrt files, we can read the lines in a textfile X very nicely now with
the for line in open(X): construct -- however, for file
writing and appending and the more complex types of file
reading, we need to look at general file I/O methods.
- Note that in pre-3.0 versions of Python, open() in the
for-open() construct can be replaced by file;
though the latter is arguably more readable, it will no longer be
legal in +3.0 versions of Python, and hence the former should
be used.
- File access commands:
- f = open(filename) / f = open(filename, "r"): Open
file for reading.
- f = open(filename, "w"): Open file for writing
- Careful! If file filename already exists, will erase
contents and re-open for writing.
- f = open(filename, "a"): Open existing file for writing
to end of that file, i.e., appending.
- f.close()
- Python automatically closes all files when a script
finishes, so you don't have to do this; however,
it is still a good habit to clean up after
yourself.
- f.flush()
- open() actually has a third buffer-size argument;
setting this to 0 means (in case of writing) given
string is immediately written out (can be handy in
long-running programs which, if they crash, can lose
buffer contents -- however, doing this means your
program loses the speed benefits of buffered I/O).
- Note access to a file-object in these commands; this is one
of the inbuilt types in Python. We'll look at more of these
types (namely, strings and the various types of numbers)
a week or so from now.
- File read commands:
- line = f.readline(): Read and return line from file.
- Returns null / 0-length string "" at end of file.
- lines = f.readlines(): Read and return file as list of
lines.
- line = f.read(): Read and return file as a string (which
may contain multiple lines).
Wednesday, September 14 (Lecture #4)
[LutzL, Sections 9, 10, and 12; Class Notes]
- Basic Python I: Text File I/O (Cont'd)
- All the preceding talk of lines in files begs the following
question: what is a line in Python?
- A line is a character-string with a special
terminator-character (or characters).
- Line-terminators in Python include '\0' (null /
end-of-file), '\n' (newline), '\r' (carriage return),
and '\r\n' (Microsoft end-of-line).
- When Python reads a line from a file, it reads
characters until it encounters a terminator and it
includes that terminator at the end of the returned
line (can get rid of this with rstrip(); this is
particularly handy when using print, which adds its
own newline).
- Example: Print file to screen using for line in
file(X): construct (cat0.py)
- Example: Print file to screen using readline()
(cat1.py)
- Note use of while-loop
- Python does not allow assignment statements inside
conditions; hence, cannot use classic C / Java
construct while (line = f.readline()) ...
- Example: Print file to screen using readlines()
(cat2.py)
- Example: Print file to screen using read()
(cat3.py)
- File write commands:
- f.write(s): Write string s to file.
- print >> f, e: Write string specified by expression e
to file f.
- Until we look at string formatting commands later in this
course, the print >> f command is a great shortcut for
printing stuff to files.
- Example: File-copy using readlines and write
(copy1.py)
- Example: File-copy using readlines() and print >>
(copy2.py)
- Basic Python I: Control Structures
- General notes
- Syntax: header: block
- Note that a header-line always terminates with a
full colon (":").
- A block is a sequence of statements, all indented the
same.
- Control structures can be nested -- indenting can vary
by block, but it must be the same within each block.
- Can have multiple statements on one line -- separate them
by a semi-colon.
- Do not need statement terminator semicolons --
Python will let you do this, but it is not part of
the Python aesthetic, i.e., Python is not
C / C++ / Java.
- How can you have one statement over multiple lines?
- Old way: Use backslashes at end of lines to be
continued.
- New (and preferred) way: Enclose in parenthesis /
be part of continued expression via comma,
e.g., print-statement argument list, or
square-bracket, e.g., list-specification.
- Conditional branching: The if-elif-else statement
- Syntax: if (condition): block
{{elif (condition2): block2 ...}
{else: block2 }}
- Conditions are Boolean expressions, i.e.,
expressions (unlike arithmetic expressions) whose
values are either true or false.
- Consists of one or more Boolean expressions
connected by logical operators (not, and, or).
- Boolean expressions can be literals (true (non-zero)
or false (zero)) or relational expressions,
e.g., s < 10, tag == "AUT".
- Lutz recommends that parentheses should only be used
where absolutely necessary -- however, this can
compromise code readability, and is in my opinion
an example of pushing the Python aesthetic too far.
Friday, September 16 (Lecture #5)
[LutzL, Sections 12 and 13; Class Notes]
- Basic Python I: Control Structures (Cont'd)
Deterministic looping: The for statement (the partial story)
- Syntax: for x in iterable-thingie: block
- By iterable-thingie, we mean anything that is a set of
elements that you can iterate over. We already know
we can do it with lists and textfiles. As we will see
in lectures to come, this is also true for many other
structures in Python, e.g., strings,
dictionaries.
- This is all well and good; however, suppose I
actually want a for-loop over an index, going from
a lower to an upper point by some increment?
- Try for x in range({lower,} upper + 1 {, increment})
(if you are dealing with integers).
- Example: Classifying food items (version #2)
(produce2.py)
- Example: Evaluating nested summation
expressions (evalsum.py)
- A for-loop over a floating-point index is best
coded as a while-loop.
Monday, September 19 (Lecture #6)
[LutzL, Sections 4, 5, and 13; Class Notes]
- Basic Python I: Control Structures (Cont'd)
- Deterministic looping using the for-loop is all well and good -- however, what if you want to stop
partway through a loop's execution? Unlike C or Java,
modification of a loop index-variable cannot be used to
terminate execution, and you cannot add a condition to
the for-loop.
- Example: Finding an element in a list
(Version #0: for-loop) (find0.py)
- In these cases, we need the traditional while-loop (which,
given the excessive flexibility of for-loops in C and Java
and the resulting potential for unreadable code, is perhaps
a good thing).
- Conditional looping: The while statement (the partial story)
- Syntax: while condition: block
- Useful if you don't know how many times you are
looping, or if you want to stop partway through a
process; implement the latter using a Boolean variable
and a more complex while-termination condition.
- Example: Finding an element in a list
(Version #1) (find1.py)
- This approach to partial-looping is well-known and standard,
but it is wordy and hence prone to implementation error.
Can we do better?
- Loop-execution modification: The break and continue statements
- The break and continue statements are very restricted
versions of the dread goto statement, and are arguably
the last gasp of goto in modern programming language
design.
- The continue statement inside a loop causes execution
to ignore the statements below the continue in
the loop body and jump directly into the next loop
iteration -- can save a level of if-nesting.
- Example: Counting and summing the positive
non-zero integers in a file (Version #1)
(sumfile1.py)
[sumfile1.dat]
- The break statement inside a loop causes execution to
jump to the first statement after the loop, i.e.,
it stops loop iteration cold -- in a while-loop,
can save a level of if-nesting and extra conditions in
the loop-condition designed to stop iteration
partway through (and in the case of for-loops,
is the only thing that can stop iteration short of
sys.exit()).
- In the case of nested loops, a break only terminates
the innermost loop in which it is nested -- the
other loops still continue on.
- Example: Finding an element in a list
(Version #2) (find2.py)
- The break and continue statements are handy, but
they make code less readable and there
are valid alternatives in the language -- hence,
they probably shouldn't be used that much and when they
are, they should be used with care.
- Often when using break to stop a loop, we need to know
afterwards whether or not break was executed during
the loop, e.g., was the requested item found in
the list or not? As we saw above, this can be simulated
with a Boolean variable and some additional if-logic.
Can we do better?
- Looping (the full story): The else clause
- The block associated with a loop else-clause only
executes if the loop terminated normally.
- Example: Finding an element in a list
(Version #3a: while-loop)
(find3a.py)
- Example: Finding an element in a list
(Version #3b: for-loop)
(find3b.py)
- Perhaps this is how we get rid of the break statement
for once and for all -- however, should elimination of
one piece of special-purpose language syntax be done
by introducing yet another piece of special-purpose
language syntax? This is something for you future
script-language designers to mull over.
- Doing nothing at all: The pass statement
- Yes, this is the statement that does nothing at all --
why bother?
- To satisfy Python syntax, i.e., catching an
exception and doing nothing about it.
- As a placeholder for code that will be written
later.
- Example: Counting and summing the positive
non-zero integers in a file (Version #2)
(sumfile2.py)
[sumfile2.dat]
- Like break and continue, a handy statement on occasion,
but it probably shouldn't be used that much, and
when it is, it should be used with care.
- Basic Python I: Object-Types
- General Notes
- In Python, types are associated with objects, not
variables, e.g., What is "Bob"?
- It is critical to make distinctions between objects,
references (to objects), and variables (which
hold references to objects), particularly
when dealing with more complex types like
lists and dictionaries.
- There are ways of determining the type of a variable
X, e.g., type(X), instanceof(X, type)
; however, this is frowned upon in Python.
- None
- Corresponds to an undefined value.
- As variables do not have types, useful if you want to
see if a variable has been assigned a value yet.
Wednesday, September 21 (Lecture #7)
[LutzL, Sections 4 and 5; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Numbers
- Is actually a collection of types:
- Integers (32-bit ("short") / infinite digit (long))
- Floating-Point (32-bit double precision)
- Complex Numbers, e.g., 5 + 4j
- For complex number c, use c.real and c.imag
to access the real and imaginary parts of
c, respectively.
- Boolean (True (non-zero), False(0))
- Long integers particularly handy in certain applications.
- Example: Counting the number of unrooted binary
trees on n leaves (Version #1)
(numbtree1.py)
- Literals
- Regular integer, e.g., -7, 0, 146
- Long integer, e.g., -9999999991119999L,
100000000000000000002L
- Regular floating-point, e.g., -0.727, 54.7, 0.0
- Scientific (exponential) notation, e.g.,
-7e+0, 4.55e+200, 1.1111e-27
- Operators
- Symbolic: N1 + N2, N1 - N2, N1 * N2, N1 / N2,
N1 ** N2 (exponentiation),
N1 % N2 (modulus)
- For any operator x above, can
have x=, which is useful if we want to replace
one of the sets in that operation by the
result of that operation, e.g.,
x += y.
- Beware! Division (/) in pre-3.0 versions of
Python is argument-dependent (classic division) -- if float on
either side, does true division but if integers on
both sides, does floor division,
e.g., 5 / 2 = 2,
5 / -22 = -2.
- Phased in special floor-division operator (//) as of
version 2.2; in 3.0+ versions, / will do true
division regardless of arguments (see Lutzl, pp. 117-121 for
details).
- To start using Python 3.0+ division in Python 2.6
scripts, put the statement from __future__ import
division at the top of your program.
- Relational: Standard set from C / C++ / Java + is
- is-operator checks reference-equality as
opposed to
value-equality; this turns out to be the
same for numbers, but we will see differences
later on.
- Note that relational operators can be chained.
- Functions: abs(N), divmod(N1, N2), pow(N1, N2), round(N)
- Note double functionality of round(); can
round to closest integer (round(N)) or
to nearest number of digits
(round(N, numdigits))
- The latter does not truncate digits if
the result cannot be realized in
fixed-length floating point representation,
e.g., round(1.464,1) = 1.5 but
round(1.426, 2) = 1.42999999999 ...
- Other functions are available in special-purpose
math modules; we will talk more about these
later in the course.
- Numerical Conversions
- Implicit: When two numeric arguments of different
type are in an
expression, arguments are converted where possible
such that no information (magnitude / # digits) is lost,
i.e, integer goes to long integer goes to
floating-point goes to complex.
Otherwise, explicit conversion is required.
- Explicit: complex(N), float(N), int(N), long(N), bool(N)
- When using int() or long() to convert strings, can use
optional second radix argument, e.g.,
int("101") = 101 but int("101", 2) = 5.
Friday, September 23 (Lecture #8)
[LutzL, Sections 5 and 7; PyCook, Sections 3.12-3.14; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Numbers (Cont'd)
- In addition to the inbuilt numeric types, Python can
support others via import -- most important (and to
my knowledge, only current) one of these is Decimal.
- Stores numbers as strings of decimal digits rather
than in binary.
- Excellent for representing fixed-precision currency
calculations or, more importantly, representing
quantities that cannot be represented exactly in
binary, e.g., 0.1 + 0.1 + 0.1 - 0.3 =
5.5511151231257827e-17 in Python floating-point
arithmetic.
- Not used in general because of speed penalty
(typically more than 1000x slower than
corresponding floating-point applications).
- Example: A simple adding machine (Version #1:
floating-point) (addmach1.py)
- Note use of raw_input() function to grab user input
line from the keyboard.
- Example: A simple adding machine (Version #1:
Decimal) (addmach2.py)
- Strings
- Strings are immutable sequences of 8-bit (ASCII / ) /
16-bit (Unicode) characters.
- Literals
- Single-quoted -- can contain double quotes / special
characters indicated by backslash.
- Double-quoted -- can contain single quotes / special
characters indicated by backslash.
- Triple double-quoted -- can contain any type of
quote / can continue over several lines.
- Unicode (u'...').
- Raw (r'...') -- backslash disabled.
- Useful for embedding special characters, e.g.,
\t, \n in strings for subsequent printing
and interpretation.
- Create regular and Unicode strings using str() and
unicode().
- String treated as list of characters -- can access
individual positions (x[i]) or obtain substrings or
subsequences by slices.
- Slices operate like traditional for-loop indices.
- x[i:j] -- substring starting at index i and
ending just before index j.
- Negative indices start at the end of the
string and work backwards.
- x[i:j:k] -- subsequence starting at index i
(increment by k) and ending just before
index j.
- Good for subsequences (x[::3] (every third
element), x[0::2] (odd-position elements),
x[1::2] (even-position elements)) or
reversing strings (x[-1::-1]).
- Strings are immutable -- hence, cannot change
string element, e.g., s[2] = 'x' will
give an error.
- Example: Finding a substring in a file
(Version #1: slices) (findsub1.py)
- There are so many string operators in Python that you
really should look around and see if there's one that
does what you want before you write any string-operation
yourself.
- Operators
- Symbolic: S1 + S2 (concatenation), S * N (repetition), S1 in S2 (substring detection)
- Example: Creating repetitions of a given string
(Version #1: +) (stringreps1.py)
- Example: Creating repetitions of a given string
(Version #2: *) (stringreps2.py)
- Example: Finding a substring in a file
(Version #2: in) (findsub2.py)
- Relational: Standard set from C / C++ / Java + is
- Relational comparisons done relative to
lexicographic (phone book) order.
- is-operator checks reference-equality as
opposed to value-equality; this can vary
depending on string-length, because internal
Python optimization stores short strings as
references to a single object but long
strings as separate objects (LutzL, p. 187).
Monday, September 26 (Lecture #9)
[LutzL, Section 7; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Strings (Cont'd)
- Operators (Cont'd)
- Function: (Cont'd)
- S1.find(S2[, start [,end]]): returns starting index
of S2 in S1 or -1 if S2 not found in S1.
- S1.count(S2): returns number of occurrences of S2 in S1.
- S.replace(old,new [, count])
- Example: Replacing all occurrences of word x in a file
with word y (replaceword.py)
[replaceword.dat]
- Try python replaceword.py replaceword.dat frog X
with X = "dog", "cat", and "car" for progressively
weirder haikus.
- Note that as strings are immutable, any
of these functions that appear to change
a string actually return changed copies
of that string!
- S.split({string})
- Example:
Extract and print employee last names in a data file
(lastname.py)
[lastname.dat]
- S.partition(string), S.rpartition(string)
- S.strip(), S.rstrip()
- S.capitalize()
- S.lower(), S.upper()
- S.isdigit(), S.isalpha(), S.islower(),
S.isupper(), S.isspace()
- Example: Determine the case of all letters in a file
(filecase.py)
[fcLower.dat,
fcUpper.dat,
fcMixed.dat]
- S.startswith(string-tuple), S.endswith(string-tuple)
- ... and many others ...
- Many of these functions are also available
in the string module; however, this
module will vanish in Python 3.0 (as all
its functions are now part of the Python
language proper), so you should not write
any more code using it (and should change
any code you inherit to not use it).
Wednesday, September 28 (Lecture #10)
[LutzL, Section 7; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Strings (Cont'd)
- Syntax modifications
- Strings are iterable-thingies; hence, can use them
in for-loops and iterate over the characters from
front to back, e.g. for c in line: ...
- Example: Counting the number of times a
particular character occurs in a string (Version
#2: find) (numchar1.py)
- Example: Counting the number of times a
particular character occurs in a string (Version
#1: for-loop) (numchar2.py)
- String formatting expressions
- Analogous to string-construction using printf
in C and C++.
- Form: "... %x ..." % argument-tuple
- String x in the above can take on a wide variety
of values, indicating not only the type to
be printed but, perhaps more importantly,
its formatting in terms of digits, spacing,
and justification.
- Commonly-used types: d (decimal), i (integer),
f (float),
e (exponential/scientific notation),
s (string), c (character)
- Formats (L = field length, D = number of
significant digits):
- decimal: %{+}{-|0}{L}d
- integer: %{+}{-|0}{L}i
- float: %{+}{-|0}{L}.{D}f
- exponential: %{+}{-|0}{L}.{D}e
- string: %{-}{L}s
- Example: Counting the number of unrooted
binary trees on n leaves (Version #2)
(numbtree2.py)
- Displays as long as value can be stored
as a float.
- Need way to compactly display really long
integers using scientific notation. Not
yet in Python (another sign of numeric
handling in transition), but we can
simulate it in code.
- Example: Counting the number of unrooted
binary trees on n leaves (Version #3)
(numbtree3.py)[Courtesy of Jason Gedge]
- String formatting method calls (LutzL, pp. 183-193)
- Introduced in Python 2.6.
- Has more Python-particular manner of doing string formatting;
however, is not clearly better to use in current version.
Friday, September 30 (Lecture #11)
[LutzL, Sections 8 and 9; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Sequences
- All sequences are heterogeneous, i.e., the types
of the elements can be different (and even be other
sequences (see below)).
- Changeable sequences are called lists, and immutable
sequences are called tuples.
- Literals
- Empty list: []
- List: [x1, x2, ..., xn]
- Empty tuple: ()
- Tuple: (x1, x2, ..., xn)
- Single-element tuple denoted (x1,) (to
avoid interpretation as x1 with surrounding
parentheses).
- Can be split across multiple lines (and frequently
are).
- Can include lists as elements, creating nested
lists; however, if specified as variables, can
have unexpected problems (see notes on
list representation below).
- Type conversion: Can convert a list to a tuple using
tuple(), and can convert a tuple to a list using
list().
- Can use slice-syntax (see notes on Strings above) to
access elements and sublists.
- In the case of lists, can use slices to change
elements or sublist via assignment -- moreover,
the lists on the two sides of the assignment do
not need to be the same length! (but they must
both be lists (including strings))
- Multiple indices allow access to nested-sequence
elements.
- Example: 2-D integer matrices as doubly-nested
lists of integers.
- Example: sparse 2-D integer matrices as doubly-nested
lists.
- Example: sparse 2-D binary matrices as lists of tuples.
- Example: employee records as nested lists.
- Note that two different errors can occur when list indices are
out of range -- IndexError (if there is a valid
sublist at the requested index-level but the
index is out of range in that sublist) and TypeError
(if there is no sublist at the requested index-level).
- Operators:
- Symbolic: L1 + L2, L * N, X in L, del L[Slice]
- Note that + and * flatten argument-lists prior to
concatenating them; can get around
this by enclosing argument-lists in square brackets
(thus effectively adding a level of nesting).
Monday, October 3
Monday, October 3 (Lecture #12)
[LutzL, Sections 8 and 9; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Sequences (Cont'd)
- Operators (Cont'd):
- Symbolic: (Cont'd)
- Note that in operates in a "shallow" manner,
in that it will only search in the immediate
list-elements, and not dive into sublists.
- Example: Search in integer-list
- Example: Search in sparse 2D binary matrix list, cf.,
sparse 2D integer matrix list.
- Relational: Standard set from C / C++ / Java + is
- Relational comparisons done relative to
lexicographic (phone book) order, invoking
type-appropriate relational operator results for
individual element-pairs; moreover, if
list-elements are both lists, comparison is
done recursively, i.e., list
comparison is deep!
- is-operator checks reference-equality as
opposed to value-equality; does not depend
on list length, cf. strings.
- Function
- Sequence attribute: len(L)
- Change list:
- L.append(X), e.g., L.append(7), L.append([7, 8])
(append item X to end of list L)
- L1.extend(L2) (append elements of list L2 to the end of list L1)
- L.pop() (remove and return last element of L)
- Treats L as stack, with end of list being top of
stack.
- L.index(X) (return index of X in L if X is in L and
ValueError exception otherwise)
- Note inconsistency with behavior of find() in
strings.
- L.count(X) (returns number of occurrences of X in L (shallow))
- L.sort(), L.reverse() (see below)
- Note that all of these operations change list L in place;
hence, assigning result to L is not necessary (and
indeed dangerous, as it erases contents of L).
- Return changed copy of sequence: sorted(S)
- Note that index() and reverse() are
shallow in that they all operate on the top-level
elements in a list), but sort() and sorted() are deep in that the
comparison used to order the top-level elements operates in
a recursive manner, diving into sublists as necessary.
The latter holds because the relational operators on lists operate
in a deep manner!
- Example: Printing entries in a sparse 2D integer matrix
by co-ordinate order.
- Example: Sorting list of employee records by an arbitrary
field (Version #1).
- Combine sequences into list of tuples: zip(S1, S2, ...)
- Iterate a specific function over a sequence: min(S), max(S), sum(S)
- All these functions are shallow.
- As Python allows comparison of arbitrary
types, min()
and max() will produce
results (albeit unexpected ones) for
nested sequences. However, sum()
returns TypeError on
nested sequences.
- Iterate arbitrary functions over sequences: map(F, S)
- Example: Converting a line into a list of integers.
- Creating a string from a sequence: Str.join(S-Str)
- Example: Converting a list of integers into a line.
Wednesday, October 5
Friday, October 7
Monday, October 10
- Midterm break; no lecture
Wednesday, October 12 (Lecture #13)
[Class Notes]
- Went over answers for Class Exam #1.
Friday, October 14 (Lecture #14)
[LutzL, Sections 8, 9, and 14; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Sequences (Cont'd)
- Syntax modifications:
- Iteration over sequences: for X in L: ...
- Sequence assignment: X1, X2, ... = L
- List L may be nested; however, number of top-level
elements in L must be the same as the number of
variables on the left-hand side.
- If Python 3.0+, the right-hand side can be longer
than the left-hand side if the left-hand size
includes a single starred variable; those elements
not assigned to other variables on the left-hand
side will be placed in a list which is assigned
to that starred variable.
- String formatting over tuple of arguments: Str % T, e.g.,
"X%02d%5s" % (2, "spam")
- Length of tuple T must exactly equal the number of targets in Str.
- List comprehensions
- Are actually sequence comprehensions.
- Syntax: [op(x) for x in S {if cond(x)}]
- x can itself be a list, as long as op operates on lists, e.g.,
[(x, y + 1) for x, y in L if y >= 2].
- Example: Sorting list of employee records by an arbitrary
field (Version #2).
- Example: Counting the number of occurrences of a command-line
specified word in a file (woc3.py)
- Example: Counting and summing the positive non-zero integers
in a file (sumfile3.py)
- Can run much faster than loop-version, as it is
executing a construct directly in the interpreter
rather than being interpreted on a
statement-by-statement basis.
- List generators
- Though they are fast, list comprehensions may be
costly in space as they must generate the whole
list before any subsequent operation applied to
that list is invoked.
- In situations where memory space is at a premium,
convert a list comprehension to a list generator
by changing the enclosing square brackets to
parentheses. This generator will only create the
list elements one at a time, saving space, and
can be used anywhere that the list comprehension was
used.
- List representations in Python, and why you should care
- Lists are implemented as arrays of references;
hence, changes made to an object via one variable
and its reference also show up for all variables
that reference that object, e.g., if
L1 = [1, 2, 3] and L2 = [4, L1, 6], executing
L1[1] = 7 changes both L1 and L2.
- How then, do you copy lists?
- Call to list() function, i.e.,
L2 = list(L1) [shallow]
- Slice-copy, i.e., L2 = L1[:] [shallow]
- Deep-copy, i.e., L2 = copy.deepcopy(L1) [requires import of copy module]
Monday, October 17 (Lecture #15)
[Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Sets
- Sets are mutable unordered heterogeneous collections of
hashable-type objects in which two objects of equal
value cannot occur, i.e., sets in Python are
not multisets. Frozensets are immutable sets.
- "Hashable" essentially means that there is a
hash-function associated with that type which
can produce an index-value for any object of
that type. These values are used for very fast
lookup.
- All immutable types are hashable; note that this
includes tuples but not lists (and frozensets
but not sets).
- Prior to Python 3.0, sets are are created by calling
set() or frozenset() with a list (or an
expression that produces a list) as an argument.
- If a string is passed in as the argument to set() or frozenset() ,
it creates the set of all unique characters in the
string!
- Python 3,0+ introduces set literals using curly braces,
e.g., S = {1, 3, "Bob", [1, 4]}.
- Type conversion: Can convert a set to a frozenset using
frozenset(), and can convert a frozenset to a set using
set().
- Operators:
- Symbolic: X in S (membership), S1 | S2 (union),
S1 & S2 (intersection), S1 - S2 (difference),
S1 ^ S2 (symmetric difference = (union of S1
and S2) - (intersection of S1 and S2))
- For any operator x above, can
have x=, which is useful if we want to replace
one of the sets in that operation by the
result of that operation, e.g.,
s1 ^= s2.
- Relational: Standard set from C / C++ / Java + is
- Relational comparisons other than == and != compute
subset-relations (and in the case of the less-than-
and greater-than-equal operators, proper subset
relations).
- is-operator checks reference-equality as
opposed to value-equality; does not depend
on set size, cf. strings.
- Function:
- Set attributes: len(S) (number of elements in S)
- Set modification: S.add(X), S.remove(X) (return KeyError if X not
in S), S.discard(X) (no error returned if X not in S),
S.pop() (remove and return random element from S; return
KeyError if S is empty)), S.clear() (remove all elements from S)
- Set combination (compute and return result): S1.union(S2), S1.intersection(S2),
S1.difference(S2), S1.symmetric_difference(S2)
- Set combination (compute and store result in S1): S1.update(S2) (union of S1 and S2)
- Note that S1 cannot be a frozenset.
- Set comparison (return boolean): S1.issubset(S2), S1.issuperset(S2)
- Set copy (return copy of S1): S2 = S1.copy()
- Sort set: sorted(S) (returns sorted list of set elements)
- As sets are heterogeneous, list organized by set-element type, with
numbers (integers and floats, all interpreted as floats) followed
by strings and tuples.
- Handling of complex numbers in sets causes problems for sorted; this
may be indicative of a Python type in transition (like long integers).
- Syntax modifications:
- for-in construct for iteration over sets
- As order of elements in set is not predictable, elements
will come out in unpredictable order (but it is the same
order one sees when the set is printed).
- Set comprehensions (available in Python 3.0+)
- Syntax: {op(x) for x in S {if cond(x)}}
- Sets are very useful for maintaining collections of distinct objects occurring in
another collection, e.g., list of
birds observed during a particular time-interval.
- Example: Measuring textfile similarity wrt distinct character content
(charsim.py)
Wednesday, October 19 (Lecture #16)
[LutzL, Section 8; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Dictionaries
- Dictionaries are mutable unordered heterogeneous
collections of key : value pairs. Values can be any
object (including nested dictionaries or lists), but
keys must be hashable-type objects. Each key : value
pair is known as an item.
- Literals: {key1 : value1, key2 : value2, ...} or {}
(empty dictionary)
- Can also create dictionaries using dict(), which
takes either lists of 2-tuples (in which the first
value is interpreted as the item-key and the second
value is interpreted as the item-value) or in
a special keyword-form, e.g.,
dict(name = 'bob', age = 45), in which each x = y
expression is interpreted as an item whose key is x
and value is y.
- The 2-tuple version of dict() allows handy creation
of dictionaries using zip().
- Operators:
- Symbolic: K in D (key membership), del D[K] (removes entry with
key K in D; returns KeyError if K not key in D)
- Relational: Standard set from C / C++ / Java + is
- Relational comparisons done relative to sorted
lexicographic (phone book) order on key-value pairs, invoking
type-appropriate relational operator results for
individual element-pairs; moreover, if
keys are both tuples and/or values are sequences, comparison is
done in a deep manner.
- is-operator checks reference-equality as
opposed to value-equality; does not depend
on dictionary size, cf. strings.
- Function:
- Dictionary attributes:
- len(D) (number of key-value pairs in D)
- D.items() returns list of key-value pairs as tuples)
- D.keys() (returns list of keys of items in D)
- D.values() (returns list (not set) of values of items in D)
- sorted(D) (returns sorted list of keys of items in D)
- D[K] (returns value of item with key K in D if such an item in D and
KeyError otherwise)
- D.get(K {,Def}) (like D[K] except if item with key K not in D, returns None
if default-value argument not present and Def otherwise)
- Dictionary modification:
- D[K] = V (add item (K,V) to D if no item with key K in D, and update
value of item with key K to V otherwise)
- D.pop(K) (removes item with K in D and returns value associated with item if
item with key K in D and returns KeyError otherwise)
- D.clear() (remove all items in D)
- Dictionary combination: D1.update(D2) (add all items in D2 to D1; if an item in D2
has the same key as an item in D1, replace the value of that item in D1 with the value
associated with that key in D2)
- Dictionary copy: D.copy() (shallow)
Friday, October 21 (Lecture #17)
[LutzL, Sections 8 and 9; Class Notes]
- Basic Python I: Object-Types (Cont'd)
- Dictionaries (Cont'd)
- If dictionaries are nested, can access lower-level elements by d[kl1][kl2] ...
syntax (analogous to nested-list element access); indeed, we can mix
levels of lists and dictionaries as long we use appropriate indices or
keys at the appropriate nested-structure level to access elements.
- Example: Employee record-storage with arbitrary
employee attributes via dictionary (keyed on employee
ID) of dictionaries (keyed on employee attribute).
- Syntax modifications:
- for-in construct for iteration over
dictionary keys: for K in D: ...
- As order of items in dictionary is not predictable, elements
will come out in unpredictable order (but it is the same
order one sees when the set is printed).
- To iterate over sorted list of keys, use for K in sorted(D): ...
- Example: Sparse n-dimensional matrix
storage via dictionary keyed on n-tuples.
- Dictionaries have many uses:
- Sparse data structures
- Record-structures
- Management of records indexed by non-integers
- Example: Computing and print list of author names
in decreasing order by publication count (Version #1:
Lists)
(authorCount1.py)
[authorCount.dat]
- Example: Computing and print list of author names
in decreasing order by publication count (Version #2:
Dictionaries and List-Comprehensions)
(authorCount2.py)
- Storing Persistent Objects (Part I)
- How do we store the various object-types we've seen so
far between program executions? This is typically done
in files. For now, let's look at the simplest types
of file storage, and leave discussion of advanced
file-indexing and database-style access for later in
the course.
- Can write string-representations of objects obtained
using str() or repr() to text files; these can be
read back in either with user-written parsing or
the eval() function.
- With str(), the representation of a string may
leave out quotes; in this case, repr() is safer.
- eval() will actually execute any Python
command given in the string-argument; should be
used with extreme care.
- More compact string-representations of types can be
obtained using the pickle module.
- To use, import the pickle module (import pickle)
and use pickle.dump(X,fout) to write an object
X to an open textfile fout; to retrieve that
object from the textfile fin subsequently
reopened for reading,
use X = pickle.load(fin).
- Can store multiple objects in one file; just have
to make sure that you re-load them in the same
order in which they were dumped.
- pickle uses a technique called serialization to
create these string representations, which
are ideal for transferring Python objects over the
Internet.
Monday, October 24 (Lecture #18)
[LutzL, Sections 16-18; Class Notes]
- Basic Python II: Functions (Cont'd)
- Why use functions?
- Implementing recursion, e.g., searching a
nested list.
- Single parameterized occurrences of commonly-used pieces
of code, which can lead to fewer errors if
modifications are required, e.g., reading in
textfiles and converting them to lists of words.
- Hides low-level details, which makes
calling code more readable, e.g., hide
pickling load and dump commands inside database
load and dump functions.
- Allows specification of application-specific function
libraries, which simplified application development,
e.g., sparse 2D matrix manipulation.
- Basic syntax: def f(x1, x2, ...): statements; return
{something}
- Critical to define functions used at top of file;
if they are at the bottom, will not be accessible
to main program.
- What is our main program now? For our purposes at the
moment, it is the block of non-function code at
the end. However, once we start creating scripts
consisting purely of functions, this will have to
be modified slightly.
- Example: Recursively searching a nested list
(deepfindEx.py)
- Example: Summing the integers in a file (no range check)
(sumfunc1.py)
- Example: Summing the integers in a file (range check #1)
(sumfunc2.py)
- Interesting features of Python functions
- Function parameter are not typed and can vary in number
- Function parameter / call-argument matching can be
done positionally or by parameter name (keyword
form).
- Non-keyword call-arguments matched positionally as far
as possible, and remainder are placed in *X (if
*X is included as a function parameter).
- Keyword call-arguments not matched are placed in **X
(if **X is included as a function parameter).
- Can interpolate positional parameters into a
function call with a *List list-variable and
a **Dict dictionary-variable.
- With overloaded / polymorphic operators, allows
true multi-type functions (sort of like Java
generics).
- Example: Summing the integers in one or more files (range check #1)
(sumfunc3.py)
Wednesday, October 26 (Lecture #19)
[LutzL, Sections 16-18; Class Notes]
- Basic Python II: Functions (Cont'd)
- Interesting features of Python functions
- Function parameters can have default values set in
the parameter list itself (analogue of keyword-form
in function call).
- Example: Summing the integers in one or more files (range check #2)
(sumfunc4.py)
- Function parameter values set "by assignment"
- Attempts to change immutable objects in a function
will cause errors; hence, for all practical
purposes, immutable objects are passed by value
and mutable objects are passed by reference.
- Function return values are not typed and can vary in
number
- Any comma-separated list will be treated as a
tuple; hence, even though technically one thing
is returned, you can return any number of things
(as items in that tuple).
- Example: Processing the integers in one or more files (range check #2)
(sumfunc5.py)
- Functions are objects
- At most basic, this means functions can be assigned
to variables and stored / passed around like
other objects (which makes map() much more
powerful, for instance).
- Functions can now be defined anywhere in the code,
even inside conditional statements or loops,
, conditional definition of a function
itself rather than implementing conditional
behavior by conditional statements inside a
function.
- To trigger function-object f, use apply(f, pargs
{, kargs}) or f(arg1, arg2, ..., argn) to trigger
function object; apply is particularly useful if
you do not know the number of arguments at
coding-time.
- The features of apply() are so convenient that
they have been made part of the core language
syntax as of Python 2.6, namely f(*parg, **karg), where parg
and karg are a list and a dictionary,
respectively. As apply() will be eliminated
in future versions of Python, should get used
to using the new syntax.
- Note how this new syntax is consistent with
how excess arguments are handled in current
Python function calls.
- Can embed functions into lists and dictionaries
to create jump tables, which specify (by index
or key) the actions to be performed in a
particular situation.
- Example: A simple adding machine (Version #3:
Decimal + jump table) (addmach3.py)
- Variable scope
- Python maintains a hierarchy of variable namespaces;
the same variable-name may exist in multiple
namespaces, each with a different associated object
and type-interpretation.
- The LEGB Rule.
- Specifies namespace-order in which Python
looks for variable-interpretations (local
(function), enclosing function (in reverse nesting
order), global (module), built-in (Python
language)).
- Can be short-circuited with the global statement;
however, you really shouldn't ...
Friday, October 28 (Lecture #20)
[LutzL, Sections 16-19 and 21-22; Class Notes]
- Basic Python II: Functions (Cont'd)
- Lambda expressions
- Syntax: lambda arg1, arg2, ..., argn: expression
- Essentially, allows the definition of very short
anonymous functions.
- Why not just use a regular function?
- Can appear anywhere an expression does,
e.g., function-argument to function
(like map()), jump tables.
- Allows functions to be defined closer to where they
are used (code proximity).
- Example: A simple adding machine (Version #4:
Decimal + jump table + lambda expressions)
(addmach4.py)
- Basic Python II: Modules
- What Is (and Isn't) A Module in Python
- A module is collection of variable-names and their
associated objects; these variable-object pairs are
known as attributes.
- A module in Python can correspond not just to a
Python script but to a collection of functions and/or
data structures
written in another languages such as C or Fortran
that are accessed by Python scripts.
- A module is more than an included library or a
compile-time directive (in that it is an
assignment-like statement that is executed) and less
than a true OOP-style object or class (in
that it does not implement the privacy portion of
encapsulation, or force data in a module to be
manipulated purely by functions in that module).
- Accessing Module Attributes via Import
- Three syntactic variants:
- import X
- from X import Y {as Z}
- from X import *
- The first variant makes all attributes Y of module X
accessible by the syntax X.Y, the second adds attribute
Y of X to the calling module's namespace such that it
can be accessed directly as Y (or Z, if the as-clause
is used), and the third adds all
attributes Y in X to the calling module's namespace
directly.
- An import-statement does three things in order: finds
the requested module, compiles it to bytecode (maybe),
and executes its statements (from top to bottom).
- Finding is done in local directory or under
guidance of Python path list (stored in
sys.path).
- Compilation (to a .pyc bytecode file) is done if
script does not contain a main program (see below)
and .pyc file does not exist or changes have been
made to file since previous .pyc creation.
- Execution creates all functions and objects
specified by the script.
- Is this convenient? Lord yes. Is it time-consuming?
Again, Lord yes. This is why, in situations where
imports of the same module occur multiple times,
e.g., the interactive interpreter
environment, all three steps are only done on
first import and subsequent imports only
link to the established module-object.
- Problematic if relying on value-initialization
via import.
- Can get around this (to a degree) with
reload().
- An import gives access to and the ability
to change all imported attributes -- this cannot
be overridden.
- Module Coding Guidelines
- The from-versions actually invoke full imports,
so they do not save time by selective import.
Be careful using these (particularly from *), as
they will overwrite the values of variables in
the calling module with the same name as imported
attributes.
- With from *, can prevent import by either naming
a variable with an initial underscore, or
restricting the imported attributes to those
in list __all__, e.g.,
__all__ = ["x", "y2", "procFile"].
- Note that this is not a private declaration,
as stuff hidden in this manner can still be made
accessible by a regular import statement,
i.e., you can hide but you can still be
run.
- Imports of whole directories of modules at once also
possible, and is desirable in larger Python systems --
however, we will not cover such package imports in
this course.
- Use __main__ to delimit main-program code (by using the
if __name__ == "__main__": construct to delimit code
that is run if the script is run stand-alone mode;
consider using this in tandem with a main() function).
- If module consists purely of functions that are not
run in stand-alone mode, e.g., a math
function library, use the main program to store
module self-test code.
- Example: Processing the integers in one or more files (module + main program)
(sumfunc6a.py [module],
sumfunc6b.py [main program])
Monday, October 31
Monday, October 31 (Lecture #21)
[Class Notes]
- Basic Python II: Modules (Cont'd)
- Module Coding Guidelines (Cont'd)
- Associate docstrings with each attribute of importance (module,
data-structure, function) by placing docstring immediately
after attribute-definition.
- To get a quick overview of variables and functions
associated with a module X, from within the Python
interpreter, import that module (import X) and print
its associated doc-string (print X.__doc__). One can
display the docstring associated with any attribute
of that module similarly (print X.Y.__doc__).
- Alternatively, to get a nicely-formatted description
of all docstrings associated with a module X, import
the module and use use help(X).
- Use _X and __all__ to limit namespace pollution from
imports.
- Follow standard software engineering practice,
e.g.,
- Minimize coupling of modules via use of "global"
objects/ data structures to pass information
between modules.
- Maximum coherence of modules by making sure
attributes in a module have a common
sensibly-defined purpose and that these
attributes associated with this purpose are not
split across multiple modules.
- Given that we now know about how modules work in Python,
let's spend the next few lectures looking at services
provided by some of the standard Python modules.
- Accessing Python interpreter internals: The sys Module
- Attributes we have seen so far: argv, path, exit(),
getrefcount()
- Variables stdin, stdout, and stderr store the
file-objects associated with where interpreter
input comes from and interpreter output and error
messages go. The defaults for these are the keyboard
and terminal screen, respectively -- however, these
can be changed, e,g., redirect error messages
to a specific file.
- The original version of each stream X is stored
in __X__, and can be recovered; however, Python
being Python, you can change these too ...
- Example: Fun and games with stdin, stdout, and
stderr (sysRedirect.py)
- Accessing System Files and Directories: The os,
shutil, and glob Modules
- The os module operates on an abstract file system which
is a directory-tree with non-empty directories as
internal nodes files (and empty directories) as leaves.
- Can designate any directory as a current working
directory (cwd).
- Can designate directory-paths linking entities in
the tree. Each such path can be cracked into a
directory-path and an entity-name (with the latter
being empty if the entity is a directory).
- Each entity in the tree has a unique associated
directory-path from the root-directory to that
entity (absolute path).
- Each entity in the tree has a unique associated
directory-path from a designated cwd to that
entity (relative path).
- By operating on an abstract file-system whose
path-specifics are stored as variables, e.g.,
path separator, the os module can be customized to
allow generic file and directory access on many types
of operating systems.
- This in turn allow you to write operating-system
invariant code! (provided, of course, that all
file-manipulation is done using os-module
variables and functions).
- Services provided by os:
- Variables:
- name (name of operating system)
- curdir (string denoting current directory; "." under Linux)
- pardir (string denoting parent of current directory; ".." under Linux)
- sep (string denoting directory-path separator; "/" under Linux)
- extsep (string denoting filename-extension separator; "." under Linux)
- Functions:
- Entity characteristics:
- Entities specified by path-strings.
- access(P, {os.R_OK, os.W_OK. os.X_OK}) (returns True if P
accessible in requested manner and False otherwise)
- listdir(P) (returns list of all files and directories in
directory P, including "invisible" dot-files)
- Change entity characteristics:
- chmod(P, mode) (reset access-permissions of P to mode)
- rename(oldP, newP) (rename entity oldP as newP; corresponds
to a mv command in Linux)
- Create / delete entities: remove(P), mkdir(P),
rmdir(P)
- rmdir() only removes an empty directory;
to remove all files and directories in
a directory, use shutil.rmtree(P).
Wednesday, November 2
Friday, November 4 (Lecture #22)
[PyNut, Section 9; Class Notes]
- Went over answers to Class Exam #2.
- Basic Python II: Modules (Cont'd)
- Accessing System Files and Directories: The os,
shutil, and glob Modules (Cont'd)
- The os.path sub-module provides additional services for
manipulating paths themselves
- Characteristics of entity reached by path:
exists(P), getsize(P),
getmtime(P), isfile(P), isdir(P), islink(P)
- Path characteristics: abspath(P) (absolute path of P),
dirname(P) (non-terminal directory path),
basename(P) (terminal entity-name), split(P)
(returns pair (dirpath(P), basename(P))
- Path construction: join(L) (given list L of directories and
optional terminal file, constructs path of entities
in sequence separated by sep)
Monday, November 7
- No lecture; instructor sick
Wednesday, November 9 (Lecture #23)
[PyNut, Section 9; Class Notes]
- Basic Python II: Modules (Cont'd)
- Accessing System Files and Directories: The os,
shutil, and glob Modules (Cont'd)
- Example: Listing all readable files in the current
directory (listDir1.py)
- Traversing a directory tree
- If all you want to do is visit and perform the
same operation on each file in each file in a
directory tree (possibly accumulating results
from each file in a variable or list as you go),
use os.path.walk().
- Usage: os.path.walk(root, myfunc, arg),
where myfunc has the form myfunc(arg,
dirname, files) such that dirname is the
directory being examined and files is a
list of all files in that directory.
- If you want to do something more complex, code up
a traversal yourself using your favorite
recursive tree-traversal algorithm as a template
(one such example is on p. 124 in Section 3.4.7 of
Langtangen (2008)).
- Example: Listing all readable files in the directory tree
rooted at the current directory (listDirTree.py)
- To copy files and directories, use shutil.copy(oldP, newP), shutil.copy2(oldP, newP), and
shutil.copytree(oldP, newP).
- copy() modifies last access / creation time while copy2() does not.
- Selective directory listing with glob
- listdir() is well and good for listing all files in
a directory. However, we often only want files of
a particular type, e.g., Python scripts,
files whose names start with capital letters.
- glob(pat) in module glob returns a list of all files
in the current working directory whose names match
the pattern pat.
- Patterns incorporate ordinary symbols and
various pattern-specifiers, e.g.,
? (any single character), * (0 or more
characters), [x ... y] (any one of
characters x .. y) [x-y] (any one of
characters in Unicode / ASCII range x - y).
- Example: Listing all readable files in the current
directory with a specified extension (listDir2.py)
- Pattern-Matching: The re Module
- The elementary pattern-matching in glob would be a useful
thing to have for string-processing in general. Such a
general facility is provided by the re module, in which
patterns are represented as regular expressions.
- What is a regular expression?
- A regular expression specifies a set of strings; if
a given string s is in that set, s is said to match the
regular expression.
- At its most basic, a regular expression is a
sequence of units, where each units specifies
a choice of one or more things that are repeated
some number of times.
- Something (entity): Symbol
- Choice:
- . (any character except \n)
- [s] (any character in string s)
- [x-y] (any character between characters x and y inclusive in Unicode)
- [^s] (any character not in string s)
- \w (any word character, i.e., [a-aA-Z0-9])
- \W (any non-word character, i.e., [^\w])
- \d (any digit character, i.e., [0-9])
- \D (any non-digit character, i.e., [^\d])
- \s (any space character, i.e., [ \t\n])
- \S (any non-space character, i.e., [^\s])
- Repetition (quantifiers):
- * (0 or more occurrences)
- + (1 or more occurrences)
- ? (0 or 1 occurrences, i.e., optional)
- {m} (m occurrences)
- {m,} (m or more occurrences)
- {m,n} (m to n inclusive occurrences)
Note that special status of symbols overridden
inside square brackets (a+ vs. [a+]). To
avoid proliferation of backslashes used to create
escape-versions of characters, use raw strings
(r'...').
Thursday, November 10
- Final Exam Notes
I making up your final exam now; things may change a bit, but I'm pretty
sure of the general format. The exam will be
closed-book. It will be 120 minutes long and has a total of 120
marks (this is not coincidental; I have tried to make the number
of marks for a question equivalent to the number of minutes
it should take you to do it). There will be 3 questions:
- Give short code-fragments and functions (4 parts; 60 marks)
- Describe the output of a Tkinter script (15 marks)
- Write functions associated with an example system (4 parts; 45 marks)
Topics include all material covered up to and including
GUI design using the Tkinter module. You may also find the
following of use:
- Final exam (Fall 2008) (11 pages:
PDF)
- Answers to final exam (Fall 2008) (5 pages:
PDF)
- GUI code for Question #2 on final exam (Fall 2008) (
tf_GUI_F08.py)
- Final exam (Fall 2009) (9 pages:
PDF)
- Answers to final exam (Fall 2009) (4 pages:
PDF)
- GUI code for Question #2 on final exam (Fall 2009) (
tf_GUI_F09.py)
- Final exam (Fall 2010) (9 pages:
PDF)
- Answers to final exam (Fall 2010) (4 pages:
PDF)
- GUI code for Question #2 on final exam (Fall 2010) (
tf_GUI_F10.py)
I hope the above helps, and I wish you all the best of luck with
this exam and your other exams.
Friday, November 11
- Remembrance Day; no lecture
Monday, November 14 (Lecture #24)
[PyNut, Section 9; Class Notes]
- Basic Python II: Modules (Cont'd)
- Pattern-Matching: The re Module (Cont'd)
- What is a regular expression? (Cont'd)
- Example: Recognizing integers.
- Example: Recognizing floating-point numbers.
- Example: Recognizing exponential numbers.
- Units can be built out of other units by grouping
with parentheses; aside from aiding clarity, such
groups can also be accessed by position going from
left to right in the expression (\i, for i greater
than or equal to 1), or even given names (which we will not get
into in this course).
- Such backreferences allow reference to previous matches later
in pattern, e.g., (\d+)X\1.
- If you do not want parentheses to be interpreted as a
group, use (?: ...).
- Example: Recognizing (and breaking down)
proper names.
- In languages like Python, regular expressions are
augmented to consider matches of one or more
substrings inside a string.
- Can specify where in string match must occur to
be valid, e.g.,
- \A (beginning of string or after \n)
- ^ (beginning of string)
- \Z (end of string or before \n)
- $ (end of string)
- If multiple matches are possible, can
override greedy default (longest) to match
shortest (trailing ? on quantifier).
- Example: Recognizing name at start (end) of file vs.
start (end) of any line in file.
- Example: Recognizing XML-tagged entities
Wednesday, November 16 (Lecture #25)
[PyNut, Section 9; Class Notes]
- Basic Python II: Modules (Cont'd)
- Pattern-Matching: The re Module (Cont'd)
- Regular expression matching in Python: The match object
- Describes result of matching a particular regular
expression against a particular string.
- Variables:
- string (string on which match was performed)
- re (re-object used to make match)
- pos (requested start-index of match)
- endpos (requested finish-index of match in string)
- Functions:
- Matched-group characteristics:
- group(gid=0) (returns string matched by group gid (whole string matched
if no gid specified) or none if no match by group gid)
- groups() (returns tuple of all strings matched by groups, with None if
no match by group gid)
- start(gid=0) (returns start-index of string matched by group gid (start
of whole match if no gid specified) or -1 if not match by group gid)
- end(gid=0) (returns finish-index of string matched by group gid (finish
of whole match if no gid specified) or -1 if not match by group gid)
- span(gid=0) (returns (m.start(gid), m.end(gid)))
- Applying matched groups: expand(s) (return copy of s in which all backreferences
to matched groups are replaced)
- Applying regular expressions
- Creating regular expression objects: r = re.compile(pattern {,flags})
- A regular-expression object (re) has variable
pattern giving the pattern-string from which
it was created.
- During compilation, can also specify various
flags that modify interpretation of pattern,
e.g.,
- re.IGNORECASE (make match case-insensitive)
- re.DOTALL (allow .-character in pattern to also match \n)
- re.VERBOSE (ignores whitespace / #-comments in pattern)
- re.MULTILINE (makes ^ and $ function like \A and \Z)
- Services on regular expression objects:
- Apply re to string:
- r.match(s, start=0, end=sys.maxint) (returns match-object for
match of r to s starting at s-index start and finishing
before s-index end, and None if no match of r in s)
- r.search(s, start=0, end=sys.maxint) (returns match-object for
match of r to s starting at or after s-index start and finishing
before s-index end, and None if no match of r in s)
- r.findall(s) (return list of
(non-overlapping) substrings of s
matched by r)
- Manipulate string using re:
- r.split(s) (return a list of substrings of s matched by
non-overlapping matches of r in s; compare with s.split())
- r.sub(repl, s) (if repl is string, return copy of s in
which all matches with r are replaced by repl (with
backreferences to groups in r in repl replaced appropriately);
if repl is function-object that takes match-object as only
parameter, return copy of s in which all matches with r
are replaced with string returned by repl(m))
- r.subn(repl, s) (returns 2-tuple (r.sub(repl, s), n) where n is number
of matches of repl in s)
- The services above are available through the
re module itself, if the regular expression is
given as a pattern to the function, e.g., re.sub(r, repl, s); however, these
versions lack some of the functionality
available in the re-object versions, e.g.,
cannot specify start / end positions for matches
in string.
Friday, November 18 (Lecture #26)
[PyNut, Section 9; Class Notes]
- Basic Python II: Modules (Cont'd)
- Pattern-Matching: The re Module (Cont'd)
- Example: Breaking exponential numbers into parts
(expflt1.py)
- Example: Extracting real-number values of exponential numbers
(expflt2.py)
- Example: Breaking proper names into parts
(nameparse1.py)
[names.txt]
- Example: Rewriting proper names (Version #1: match-expand version)
(nameparse2.py)
- Example: Rewriting proper names (Version #2: re-sub string version)
(nameparse3.py)
- Example: Rewriting proper names (Version #3: re-sub function version)
(nameparse4.py)
- Example: Jazzing up of all proper names in an annotated file
(jazzname.py)
[aname.txt]
- Example: Counting the number of proper names in an annotated file
(countname.py)
Monday, November 21 (Lecture #27)
[PyNut, Section 17; PyProg, Sections 8 and 9; Class Notes]
- Basic Python II: Modules (Cont'd)
- GUI Development: The Tkinter Module
- Root Window setup
- Create root window: root = Tk()
- Customize appearance of window by calling methods
relative to the created root window-object,
e.g., root.title(S)
- Once GUI is set up, trigger execution of GUI using
root.mainloop() (see below).
- Basic widgets:
- Create widget-objects by calling various functions.
First parameter of each of these functions is
always the container in which the widget is
placed; remaining parameters (typically specified
in keyword-fashion) specify appearance and
behavior of widget.
- Information passed in and out of widgets via
special Tkinter variables (IntVar(), StringVar(),
DoubleVar(), BooleanVar()), which are
manipulated using methods get() and set().
- Information-display widgets:
- Two Forms:
- Label(parent, text="Text")
- Label(parent, textvariable=svar)
Former good for small (possibly multi-line,
with newline-embedded text) static text
displays, and latter good for dynamic text
displays.
- Information-entry widgets
- Entry(parent, textvariable=svar)
- Models single-line text-entry field.
- Text associated with / entered into field
is stored in string-variable svar.
- Checkbutton(parent, variable=ivar,
text="Text")
- Models on/off button.
- Integer-variable ivar has value 1 (0) if
button (not) pressed down with mouse click.
- Radiobutton(parent, variable=type-var,
value=type-val, text="Text")
- Models one of a set of radio buttons,
i.e., a set of buttons in which
only one member can be pressed down at
a time.
- Group of radio buttons specified as
set of radio-button widgets operating
off the same variable type-var (which,
as the name suggests, may be of any
valid Tkinter variable type).
- Variable type-var has value associated
with currently depressed radio button in
group.
- Scale(parent, label="Text", variable=dvar,
from_=dvalL, to=dvalU, tickinterval=dvalI,
resolution=dvalR, showvalue=YES, orient=str)
- Models entry of floating-point value by
slider-scale in range dvalL to dvalU
inclusive. Slider tick-interval is
dvalI.
- Double-variable dvar has value associated
with slider-position as rounded to
nearest floating-point number modulo
resolution dvalR.
- Orientation of slider can be 'horizontal'
or 'vertical'.
Wednesday, November 23 (Lecture #28)
[PyNut, Section 17; PyProg, Sections 8 and 9; Class Notes]
- Basic Python II: Modules (Cont'd)
- GUI Development: The Tkinter Module (Cont'd)
- Basic widgets: (Cont'd)
- Command-activation widgets:
- We will only consider one such widget, the
control-button.
- Syntax: Button(parent, text="Text",
command=func-name)
- Elementary event-handling done inside this
widget using the command parameter,
i.e., function func-name is executed
when button is pressed down.
- Due to the framework-structure of Tkinter
(in which control is handed by Python
to Tkinter and events are handed back
from Tkinter to Python for processing),
event-handling is also known as callback
processing and event-handler functions
are known as callback functions.
- Default is that callback functions have
no parameters; if parameters are
necessary, enclose callback function in
a "helper" lambda function (making sure
that parameters are interpreted
correctly at call-time).
- To exit GUI, may want to define button with
callback function set to either root.quit
(resume execution of script after
root.mainloop() (see below)) or sys.exit
(terminate GUI and script).
- Widget layout in container
- Done by calling layout-method relative to each
each widget.
- Layout methods:
- pack(expand={YES, NO}, fill={BOTH, X, Y},
side={TOP, BOTTOM, LEFT, RIGHT})
- grid(row=int, column=int)
Can implement exact placement by pixel-position
using place(), but this is very complex to use --
pack() and grid() are usually preferable.
- When using pack(), establish contents of top and
bottom sides before adding contents of left and
right sides; otherwise, horizontal extent of
window may be misjudged by Tkinter and contents
may be mixed up.
- When using pack(), can control (to a degree)
placement of widgets when window is resized using
parameters expand and fill.
- Example: Stacked radiobutton group
- All positions in a specified grid need not be
filled when using grid() -- will fill unused
positions in with white space automatically.
- Example: Compass radiobutton group
- Should not mix layout-types in a single container
-- results may be unexpected.
- Once all widgets set up (including Tkinter variables and
callback functions) and configured to have the root
window as their parent container, call root.mainloop()
to hand control to Tkinter and trigger GUI creation and
execution.
- Example: Basic Tkinter GUI (pack()-layout,
no-resizing) (GUI1.py)
- Example: Basic Tkinter GUI (pack()-layout,
automatic resizing) (GUI2.py)
Friday, November 25 (Lecture #29)
[PyNut, Section 17; PyProg, Sections 8 and 9; Class Notes]
- Basic Python II: Modules (Cont'd)
The former is good for adding pretty front-ends to
existing scripts, while the latter is more suited to
interactive sessions alternating I/O and processing.
Example: Basic front-end GUI
(GUI4a.py)
Example: Basic interactive-session GUI
(GUI4b.py)
Example: Extracting real-number values of
exponential numbers (front-end GUI)
(expfltGUI1.py)
Example: Extracting real-number values of
exponential numbers (interactive-session GUI)
(expfltGUI2.py)
Monday, November 28 (Lecture #30)
[PyNut, Section 16;
Class Notes]
- Basic Python II: Modules (Cont'd)
- General Numerical Processing: The math, cmath,
random, and gmpy Modules
- Many commonly-used mathematics functions are given in the
math module; where applicable, versions of these
functions for complex numbers are given in the cmath
module.
- These modules also have variables giving the values
of mathematical constants e and pi (which, oddly
enough, have the namese and pi).
- The random module provides many functions associated with
uniform distributions, e.g.,
- seed(x=None) (sets seed to hashable object x; otherwise,
sets seed to platform-specific source of randomness,
e.g., system time (latter done automatically
when random module is loaded)).
- random() (returns a random float in the range 0 to 1
inclusive)
- uniform(l, u) (returns a random float in the range
l to u inclusive)
- choice(S) (returns random element from sequence S)
- sample(S, k) (returns list of k randomly-selected
elements from sequence S)
- shuffle(S) (does in-place random shuffle of elements
in mutable sequence S)
The random module also offers these services
relative to other commonly-used distributions,
e.g., Gaussian, exponential; if you are
manipulating such distributions, do consult the
documentation on this module to see if what you
need has already been provided.
- The gmpy module implements efficient arbitrary-precision
integer and float types, as well as a rational-number
type.
- Efficient Manipulation of (Numerical) Multidimensional
Arrays: The NumPy Module
- As noted earlier in this course, nested
lists in Python allow easy implementation of
multidimensional numerical array processing; however,
large-scale numerical processing done in this fashion is
very slow. The flexibility of Python multidimensional lists
(heterogeneous, non-contiguous memory storage, mutable) is
purchased at the expense of processing efficiency!
- The multidimensional array type ndarray underlying
NumPy (by virtue of being homogeneous, immutable
(sort of; see below), and based on a contiguous chunk
of memory) regains efficiency at the expense of
flexibility.
- An ndarray s an n-dimensional array of fixed size in which
each element is of a fixed array-specific numerical type.
The number of dimensions is the ndarray's rank, and the number
of elements along a particular dimension is that dimension's
length. Each ndarray has the following associated
variable-attributes:
- shape: Tuple giving lengths of array dimensions.
- ndim: Number of array dimensions.
- size: Number of elements in array, i.e.,
product of shape-tuple elements.
- dtype: Object describing numeric type of array elements,
drawn from set {byte, int, float, complex, uint8, uint16,
uint64, int8, int16, int32, int64, float32, float64,
float96, complex64, complex 128, complex192}.
- itemsize: Number of bytes required to store a single
array element.
- There is no ndarray literal. However, there are a variety of
ways of creating ndarrays:
- Create a one-dimensional array which is subsequently
reshaped to have multiple dimensions (see below),
e.g.,
- arange(l, u, i): A one-dimensional ndarray with
integer elements l through u inclusive relative
to increment i.
- linspace(l, u, n): A one-dimensional ndarray
with n elements evenly spaced between l and u
inclusive.
Note that l, u, and i can be either integer or
floating-point; however, given difficulties with
trying to get exact floating-point quantities
under fixed floating-point precision, linspace()
is safer for generating floating-point sequences.
- Create an ndarray from a nested-list representation
L of a multidimensional array (array(L {, dtype}),
where dtype is one of the numerical element-types
described above).
- Create a special-purpose ndarray with a specified
shape and type via function stype(shape-tuple {, dtype}),
where dtype is one of the numerical element-types
described above and stype is one of ones (all ones),
zeros (all zeroes), or empty (arbitrary-value).
- An ndarray a's shape can be modified using a.transpose() (return view
of a with reversed shape-tuple of a), a.reshape(s) (return view
of a with shape-tuple s, where produce of a.shape = product of elements of s),
a.resize(s) (reshape a according to s in-place), and a.ravel()
(re-shape a in-place to one-dimensional array of elements in a
in enumeration-order of a (rightmost index changes fastest)).
- Note that views are not copies; are rather references to same
areas of memory with different indexing-rules.
- Access elements and slices of ndarrays using nested list indexing and
slice syntax (a[i][j][k]) or collapsed version of same (a[i,j,k]),
which is more efficient.
- Can also extract list of arbitrary elements using a Boolean
matrix B of the same shape as the operand (a[B]) or a
Boolean expression BExpon ndarray x which is evaluated
element-wise on x (a[BExp], e.g., a[a == 10]) (see below).
- Note that in ndarrays (unlike lists), slices do not
produce copies, but are rather references to
areas of memory. To create true copies, use
a.copy().
- Operators:
- Symbolic: Standard Python arithmetic operators
- Are applied element-wise to create matrices of
same size (and upcasted type) as operands if
argument-matrices of same size.
- If operand matrices are not of same size, these
matrices are augmented to be the same. This
is called broadcasting, and as the rules of
broadcasting are intricate, they will not
be covered here.
- Can do operations in-place, e.g., a += b.
- Note that a * b does not give conventional
matrix multiplication; need special function
(see below).
- Relational: Standard Python relational operators
- Are applied element-wise to generate
matrices of boolean values from operand
matrices (which may themselves be augmented
by broadcasting if necessary).
- Can use these boolean matrices as indices
(see above) or as input to matrix-to-scalar
summary functions (see below).
- Function:
- Most math and cmath functions are available in forms
that operate on ndarrays; f(a) returns a copy of
ndarray a as modified element-wise by function f().
- dot(a, b) returns matrix resulting from conventional
matrix-multiplication of a and b.
- There is also a group of matrix-to-scalar functions that
summarize matrices, e.g., min(), max(),
sum(), prod().
- Are many others ...
- Syntax modifications:
- Enumerate over rows / outermost ndarray index: for r in a: ...
- Enumerate over elements of ndarray: for e in a.flat: ...
- Enumerate over index-element pairs of ndarray: for ind, elm in ndenumerate(a): ...
- As convenient as it is, ndenumerate(a) is much slower than
enumeration via a.flat.
- Display ndarray: print a
- If space not sufficient to display full basic-matrix slice, will
replace central elements of slice with dots to indicate missing
elements.
- NumPy supplies special functions a.dump(f) and a.load(f) to write ndarrays to /
read arrays from file f in space-efficient pickle format.
- A special NumPy sub-type Matrix is supplied for high-speed 2-D ndarray
operations. Note that under Matrix, the *-operator corresponds to matrix
multiplication.
- Two lessons can be drawn from the above:
- If you are doing numerical processing, get familiar with the various
numeric-processing libraries in Python.
- Modules like gmpy and NumPy can be seen as temporary, given
that the efficiency concerns that motivated their
creation may be irrelevant or may not matter as much
in future as computers get faster; however, they
are certainly necessary now, to enlarge the
potential user-base for Python to hard-core
numerical processing folk (the Fortran / C / C++
Brigade).
- Matlab-like Plotting: The Matplotlib Module
- Setting up plot-data
- Arrays of x/y-coordinates are stored internally in
Matplotlib as 1-D NumPy arrays. However,
the functions that use these co-ordinates will
accept sequences (lists or tuples) and do
appropriate conversions.
- You can create NumPy arrays of x- or y-co-ordinates
directly using arange(l, u{, i}); such arrays
are useful when specifying the plotted points
in x-data/function format (see below).
- Describing a plot surface
- Immediate attributes of the plot surface can be
set by various functions, e.g.,
- xlabel(s): Set (horizontal) x-axis label to s.
- ylabel(s): Set (vertically-rotated) y-axis
label to s.
- title(s): Set title of plot (centered above
plot) to s.
- Can also control the portion of the co-ordinates
that are plotted using xlim(l, u) and ylim(l, u).
If these are present, they must occur after the
descriptions of individual plot lines (see below).
- Describing and creating an individual line-plot
- General syntax: (p =} plot(point-spec, line-spec)
- The plotted x/y points can be specified in three
ways:
- y-data, e.g., plot(y): A y-coordinate
list or array is given, and x-coordinates in
the range 0,...,len(y)-1 are generated
automatically and paired with the appropriate
y-values.
- x/y-data, e.g., plot(x, y): x- and
y-coordinate lists or arrays are given and
automatically paired in zip-fashion.
- x-data/function, e.g., plot(x, f):
An x-coordinate NumPy array is given with a
function f(), and y-coordinates are generated
automatically using f() and paired with the
appropriate x-values. The function may be
expressed as a Python function object or
a NumPy array-expression written in terms
of x, e.g., x * x, (2 * x) + 1,
2 ** x.
- Plot-line characteristics are expressed in terms of
a three-region string in which the first,
second, and third regions are codes for the
requested line color, line style, and x/y-point
marker style. The most commonly-used codes are as
follows:
- Line color: b (blue), g (green), r (red),
c (cyan), m (magenta), y (yellow), k (black)
- Line style: - (solid), -- (dashed), -.
(dash-dot), : (dotted), null string (no line
connecting point-markers)
- x/y-point marker style: . (point), o (circle),
^ (triangle), s (square), D (diamond),
p (pentagon), h (hexagon) + (plus-sign),
x (cross)
- A plot-window displaying the described plot is created
using show() (this transfers control to the
plot-window; on termination of this
window, control is passed back to the plot-generating
Python script). The plot can also be saved to a file
using savefig(filename.ext), where ext
specifies the format in which the plot is saved.
- Example: Plotting a y-data line
(mplot1.py)
- Example: Plotting an x/y-data line
(mplot2.py)
- Example: Plotting an x-data/function line
(mplot3.py)
- Example: Generalized plotting of an
x-data/function line
(gmplot1.py)
[gmplot1_1.dat,
gmplot1_2.dat]
Wednesday, Novemner 30 (Lecture #31)
[PyCook, Section 8; PyNut, Section 18; Class Notes]
- Basic Python II: Modules (Cont'd)
- Matlab-like Plotting: The Matplotlib Module
- Advanced plotting
- Multiple single-line plots on one surface: An n x m
grid is implicitly specified using calls to
subplot().
- Prior to each plot() call, have a
call subplot(n, m, i) which specifies the
grid dimensions (rows x columns) and the
index-position i in which that plot is placed
(i is in the range 1, ..., m * n and
indicate positions starting at the upper
left-hand corner and moving left to right and
down the rows to the lower right-hand
corner).
- Individual x- and y-labels of the sub-plots
may be set by placing the appropriate
xlabel() and ylabel() calls between the
subplot() and plot() calls.
- Multi-line plots: In this case, create a list P of
n plot-objects via the appropriate calls to plot()
along with an n-length list L of strings (possibly
containing embedded LaTeX code) describing the
individual plot-linesa and call function
legend(P, L, loc=loc-str), which generates a
single plot in which all specified lines are
plotted and a legend is placed on the plot in the
position specified by loc-str, e.g.,
"upper right", "lower center", lower left".
- Example: Sub-plotting several x-data/function
lines on a 2 x 2 plot-surface grid
(mplot4.py)
- Example: Multi-plotting several x-data/function
lines on a single plot surface
(mplot5.py)
- Example: Generalized multi-plotting of several
x-data/function lines on a single plot-surface
(gmplot2.py)
[gmplot2_1.dat]
- Testing, Debugging, and Optimizing Python Scripts: The
doctest, timeit,
profile, and pstats Modules
- Testing, debugging, and optimizing are the activities
underlying the Ordered Holy Trinity of Programming:
Make it run, make it right, make it fast.
- Systematic testing is typically done by making sure
that a program produces correct answers relative to
a specified set of test cases (if correctness is judged
by a test case producing the same answer as that
produced by a previous program thought to be correct,
this is called regression testing).
- Testing can be done relative to individual program units
(typically functions) or the system as a whole; focus
here on the former.
- Simple unit testing: The doctest module
- Good for testing test-cases that are simple outputs
of functions.
- To invoke, have as main program import of doctest
and statement doctest.testmod(); this will locate
all examples in doc-strings with associated
outputs and automatically run examples and compare
against outputs, flagging those that differ.
- Make sure given outputs for examples in
doc-strings are themselves correct! This
can be ensured by cut-and-paste of examples
and outputs from Python interpreter session.
- Example: Testing number-string generation
(numstr1.py)
- Once you have isolated a problem by testing and can
trigger it when necessary with one or more test cases,
debugging is in order.
- At heart, debugging is essentially interrogating various
objects at specific points in a program run to see if
they are what you think they should be. This is most
simply done with print statements; however, there are
several modules that allow more advanced forms of
interrogation, e.g., inspect, pdb.
- Much of Python is optimized already, meaning that code
will often be fast enough. If you must optimize, do the
following in order:
- Make sure there is a speed problem, i.e.,
run speed benchmark tests.
- Find out what parts of the code are taking the
most time, and are hence worth optimizing
(profiling).
- Do large-scale optimization, i.e.,
choose better algorithms.
- Do small-scale optimization, i.e.,
choose better statements/ constructs.
Friday, December 2 (Lecture #32)
[PyCook, Section 8; PyNut, Section 18; Class Notes]
- Basic Python II: Modules (Cont'd)
- Testing, Debugging, and Optimizing Python Scripts: The
doctest, timeit,
profile, and pstats Modules (Cont'd)
- Profiling code performance: The profile
and pstats modules
- If programs or program portions are executed a large
number of times, an ordinary wristwatch suffices to
do gross benchmarking.
- Distinguish several types of execution time:
- elapsed time : wallclock time
- system time : time spent by operating system doing
I/O
- user time : time spent processing data
- CPU time : total execution time (system + user)
- Precisely measuring execution time of code-segment: The
os-times() function
- To use this function, import os, call t0 =
os.times() before the code-segment of interest, and
t1 = os.times() after the code-segment.
- Given t0 and t1, for that code-segment, elapsed time =
t1[4] - t0[4], user time = t1[0] - t0[0], system
time = t1[1] - t0[1], and CPU time = system time +
user time.
- Example: Timing number-string generation #1
(numstr2.py)
- Precisely measuring CPU time of statement(s): The
timeit module
- To use this module after import, set up a Timer
object by specifying one more statements and the
setup-statements for those statements, and then
calling timeit() relative to the Timer object with
the requested number of iterations of the
statements.
- Example: Timing number-string generation #2
(numstr3.py)
- There is also a command-line version of timeit
(see doc-string of timeit module and page 484 of
PyNut for details).
- The run()-function in profile runs a particular
command via exec() and stores profiling
information on that command in a specified file;
the information in such files may then be sorted
and/or reduced prior to display using the Stats
object in pstats.
- To profile programs, you may find it useful to
create a special main-program function that is
callable via exec().
- Example: A profiler for program PyText3.py
(adapted from PyText2.py in Assignment #7)
(profile_PyText3.py)
[PyText3.py,
nm1.dat,
nm2.dat,
nm3.dat,
nm4.dat,
tc1.dat,
tc2.dat,
tc3.dat,
tc4.dat,
com1.txt)
- More often than not, large-scale optimization via better
choice of algorithms suffices to handle problems
identified by profiling. If further optimization is
necessary, the following are some common ways of
obtaining additional speedup:
- Use join() to create strings by repeated
concatenation (O(n^2) to O(n)).
- ... Though this only matters when the
string-lists
are of sufficient length! (try for n = 5,
50, 100, and 250 in numstr2.py and numstr3.py
above). This is not surprising, given the
leading constant and trailing terms hidden by
O() notation.
- Use "decoration" instead of special sort()
comparators (5x speedup).
- Avoid from X import * where possible.
- Replace loops over lists with list comprehensions.
- In multi-nested loops, "hoist" code that does not
depends on inner loop indices to outer loops
(this includes use of global variables).
- Inline short functions
- Avoid module prefixes for frequently-called
functions by using from X import Y.
Before starting any time-consuming optimization, it is
always worth doing a timing study to make sure that the
effort is truly worthwhile in terms of potential
increases in code performance (see repeated-string
example above).
- Course Evaluation Questionnare (CEQ)
References
-
Langtangen, H.P. (2008) Python Scripting for Computational Science
(Third Edition). Texts in Computational Science and Engineering no. 3.
Springer; Berlin.
-
Loui, R.P. (2008) "In Praise of Scripting: Real Programming Pragmatism."
IEEE Computer, 41(7), 22-26. [PDF]
-
Lutz, M. (2009) Learning Python (Fourth edition). O'Reilly.
(Abbreviated above as LutzL)
-
Lutz, M. (2006) Programming Python (Third edition). O'Reilly.
(Abbreviated above as LutzP)
-
Martelli, A. (2006) Python in a Nutshell (Second edition). O'Reilly.
(Abbreviated above as PyNut)
-
Martelli, A., Ravenscroft, A.M., and Ascher, D. (2005) Python
Cookbook (Second edition). O'Reilly.
(Abbreviated above as PyCook)
Created: June 28, 2011
Last Modified: November 18, 2011