Computer Science 3718, Fall '00
Course Diary
Week 1,
Week 2,
Week 3,
Week 4,
Week 5,
Week 6,
Week 7,
Week 8,
Week 9,
Week 10,
Week 11,
Week 12,
Week 13,
Week 14,
(end of page)
Wednesday, September 6 (Lecture #1)
- Modules: Why should we care?
[Brooks 1995]
- Origins of the software crisis: 1950 - 1970
- Problem not limitations of hardware
but difficulty of keeping conceptual
structure of complex systems straight in
the minds of programmers.
- Modules as partial solution to software crisis.
- Modular systems easier to develop, implement,
debug, and maintain.
Friday, September 8 (Lecture #2)
- Modules and Module Decompositions
[Parnas 1972b; Pfleeger 1998,
Chapter 5]
- Modules as logical pieces of system / work-team assignments.
- There are many possible module decompositions; which is
best in a given situation (and hence, what kinds of
modules are best in that situation)?
- Example: KWIC (Key Word In Context) Index
[Parnas 1972b; see also Pfleeger 1998, Chapter 5, pp. 195-198,
206-207, and 238-242].
- Decomposition by task
- Decomposition by (abstract) data type => Information
hiding!
- Utility of information hiding
- Harlan Mills: All details should be public -- Code
open to critical review and can take advantage of
details in other parts of program to be more efficient;
however, can cause conceptual overload in programmers.
- David Parnas: Details should be encapsulated in
modules and made available only if absolutely
necessary (via module interfaces) -- code may be
less efficient but is much easier to deal with
conceptually.
Monday, September 11 (Lecture #3)
- Introduction to Java
- Instructor was sick but tried to lecture anyway; bad idea.
Material covered much more clearly in next lecture.
Wednesday, September 13 (Lecture #4)
- Introduction to Java (Take 2)
- Structure of Java programs
- Class as data / operations bundle; corresponds most
naturally to data-oriented module in the sense of
Parnas (1972b), but can also simulate task-oriented
module by use of static binding modifier on
methods, e.g., standard Java class Math.
- Data: Primitive data types and objects.
- Methods (= operations on data).
- Example: Stack module and main program.
- Some useful Java classes (String, Math).
Friday, September 15 (Lecture #5)
- Introduction to Java (Cont'd)
- Some subtle aspects of the static binding modifier
and constructor definition.
- Screen and File I/O: Setup and methods.
- Exceptions and exception handling.
- The StringTokenizer class.
Monday, September 18 (Lecture #6)
- Introduction to Java (Cont'd)
- Example program: Point class, input from file,
method overloading, this qualifier, use of
parseType methods to extract values
from strings.
Wednesday, September 20 (Lecture #7)
- Introduction to Java (Cont'd)
- Packages: Bundling related classes together.
- The import statement.
- The protected class-element qualifier.
- Inheritance: Building Classes from other classes
- The extends class-modifier.
- Method overriding and variable shadowing.
- Polymorphic reference.
- Example classes: Point and Pixel,
Person and Lecturer.
- Placed copy of Lewis and Loftus (1998) on library reserve
today to help those getting used to Java.
Friday, September 22 (Lecture #8)
- Introduction to Java (Cont'd)
- Abstract classes.
- Interfaces.
- Example: Introduction to running example for this term:
Automated creation of chemotherapy drug prescriptions for cancer
patients (CPS system).
Monday, September 25 (Lecture #9)
- Modules and Module Decompositions
[H&S, Chapter 6; Pfleeger, Chapter 5]
- Create system design from requirements document (what users
want) and specification document (what system designers
say is feasible given the requirements document); system
design must then be decomposed into modules, which are
pieces of the program that are reasonably independent of one
another and are of manageable size and complexity.
- Many possible decompositions - how do we pick a good one?
- Advantages of a good decomposition
[H&S, pp. 95-96]:
- Shorter development time (as modules are independent,
programming teams can work in parallel).
- Improved verification (Simpler modules makes for easier
manual / automated verification against specification).
- Reduced maintenance cost (provided modules encapsulate
anticipated changes (see below)).
- Evaluate modules / module decompositions on two types of
characteristics:
- Logical, e.g., what is the module hiding?
- Structural, e.g., how well is it hidden?
(inter/intradependence of modules in decomposition;
also known as coupling and cohesion (see next lecture)).
- Every kind of module hides something; call that something the
module's secret. There are several types of secrets,
and hence modules:
- (External) behavior, e.g., I/O formats / devices.
[I/O streams in Java]
- Hardware, e.g. operating system / machine details.
[Java Virtual Machine]
- "Software"
- Algorithms / tasks, e.g. sequence of operations.
[Procedures / methods in Java]
- Data structures / implementations, e.g.
abstract data types [Queue / Checker
modules in Assignment #1].
- This is a module in the sense of Parnas (1972b)
and an object wrt Java.
- General rule of thumb: Build modules around potentially
changeable elements of a system to protect other portions of
the system from such changes (anticipated changes)
(H&S, p. 96).
- Example: Sample decompositions of CPS system by
task and by object.
Wednesday, September 27 (Lecture #10)
- Modules and Module Decompositions
[Garlan, Kaiser, and Notkin 1992; Pfleeger, Chapter 5]
- Each type of module secret implies not only different types of
modules, but also different styles of programming and
appropriate programming languages: (Pfleeger, Chapter 5,
pp. 195-201, 238-241).
- Task hiding; hierarchical (top-down) decomposition
(Structured programming)
- Data hiding; Data abstraction / decomposition
(information hiding in
sense of (Parnas 1972b); object-oriented programming)
- All module decompositions hide complexity of programs and
make them easier to deal with conceptually; however, they
differ in how easy allow subsequent changes to programs.
- Task hiding assumes system function and data formats
known and stable; resulting code efficient but very
hard to change.
- Data hiding assumes system function known and stable
but data formats subject to change; resulting code
less efficient but underlying data formats can be
changed easily.
- Other types of decomposition (which in turn require different
styles of programming and different programming languages /
environments) are possible, e.g., Tool abstraction
(event-driven programming?)
[Garlan, Kaiser, and Notkin 1992]; assumes both system
function and data formats are subject to change; resulting
code less efficient but system function and underlying data
formats can be changed easily (or so GKN claim).
- Switch focus to structural characteristics, as measured
(qualitatively) by coupling and cohesion
(concepts proposed by Constantine and Yourdon in 1978
(Pfleeger, Chapter 5, pp. 217-222))
- Coupling: Measure of dependence of module on other
modules.
- Measured with respect to a set of modules created by a
system design decomposition.
- Modules should have the least coupling possible, i.e.,
be as independent as possible.
- Types of coupling (from most to least coupled):
- Content coupling: Module A can arbitrarily
modify data or execute pieces of code associated
with module B.
- Common coupling: Modules A and B operate on
shared data, e.g., a common file that is
read and written.
- Control coupling: Code in module B is
only called by, and hence is under the control
of, module A.
- Stamp coupling: Modules A and B exchange
data only via subroutine parameters, where these
parameters can be data structures or objects,
i.e., modules A and B share information
about each other's internal data storage schemes.
- Data coupling: Modules A and B exchange
data only via subroutine parameters, where these
parameters are restricted to be elementary data
types like int or char
i.e., modules A and B are ignorant of
each other's internal data storage schemes.
- Uncoupled: No module exchanges data with
or has control over / calls subroutines associated
with any other module.
- Uncoupled modules cannot form a system; therefore, some
coupling is necessary -- the issue is, how much?
- Cohesion: Measure of the strength of the (logical)
relationship of components within a module.
- Measured with respect to individual modules.
- Modules should be as cohesive as possible.
- Types of cohesion (from least to most cohesive):
- Coincidental cohesion: There is no
relationship between the module components,
e.g., the elements are in one module
because they don't fit anywhere else.
- Logical cohesion: All elements in the
module are associated with a particular
type of data, e.g., I/O in general.
- Procedural cohesion: All elements in the
module are associated with a particular
type of operation, e.g., any library
of math functions in any programming language.
- Communicational cohesion: All elements
in the module are associated with a core
of data elements that belong together (either
by virtue of storage in some media or for
logical reasons, e.g., the data elements
of a Java object).
- Sequential cohesion: All elements in the
module are associated with a particular sequence of
operations in a computation, e.g., the
computational steps required to make a deposit
in a bank.
- Functional cohesion: The elements in the
module are necessary and sufficient; none can
can be added or removed, e.g., a good
abstract data type implementation..
- Note that functional cohesion is achievable, cf.
an uncoupled module decomposition.
- Oddly enough, correctly-designed objects
implemented in OO-languages like Java are by
nature minimally coupled and tend to be
maximally cohesive. Coincidence? You be the judge.
Friday, September 29
- Instructor sick; no lecture.
Monday, October 2 (Lecture #11)
- Specifying Module Interfaces
[H&S, Chapter 7; Parnas 1972a]
- An interface between two programs is essentially the
assumptions that two programs make about each other, where
an assumption could be a service (function) offered by a
module or some limitation implicit in the services
offered by a module.
- A module interface specification (MIS) is a statement
of both the assumptions a user (calling module) can make
about a particular module and (to a lesser degree) the
assumptions that module makes about a user.
- Characteristics of a good module interface specification
(Parnas 1972a, p. 330):
- Reveals only the necessary amount of information about
module internals to a calling program and no
more (hides module implementation details from
calling programs / users).
- Reveals only the necessary amount of information about
a potential calling program to the module and no
more (prevents module implementor from making
unjustified assumptions about calling programs).
- Module specification is in principle formal
enough to be automatically verifiable.
- Module specification uses informal notation that is
comprehensible to both users and implementors.
- The tension between the third and fourth characteristics
continues to this day, and is a burning issue in the formal
verification community (indeed, Parnas has flipped between
the poles implicit in these characteristics at least once
since 1972); this will be examined in more depth in the final
lectures of this course.
- Abstract MIS proposed by Parnas (Parnas 1972a) as an
alternative to using the implementation of a module
as that module's interface specification, i.e.,
concrete / implementation MIS.
- Abstract MIS has two components: syntax (static aspects;
statement of services and limitations) and semantics
(dynamic aspects; behavior of module services).
- Why use abstract MIS (as opposed to implementation MIS)?
(H&S, pp. 107-108)
- Implementation MIS slow system development (cannot
implement modules that call each other in parallel;
cannot test system until fully implemented).
- May be difficult to extract (simple) interface from
(complex) implementation.
- Implementation may not be available, i.e., proprietary
code.
- Implementation typically does not encode or state
assumptions about or limitations of module usage;
this is a common source of subsequent software
failures.
- When MIS go wrong: The September 1999 burnup of the Mars
Climate Orbiter due to an imperial / metric units mixup
between JPL and Martin Marrietta Corporation software.
- Specify syntax of abstract MIS as
a table describing the name, input and output types,
and exceptions associated with each function in the
module (H&S, pp. 108-109).
- Specify semantics as (H&S, pp. 109-111):
- State variables (abstract, , integer
sequence for integer stack; focuses on behavior
and hides as much implementation detail as possible)
and constants.
- State invariants, e.g., conditions that
are true both before and after execution of
module functions; useful in both delimiting possible
internal states of module sand in subsequent module
verification.
- Assumptions about module use, e.g., in case of
integer stack, init-function called before any others
are called.
- Descriptions of module functions in terms of induced
module state-transitions and generated exceptions.
- For now, describe module functions in terms of pseudocode
or natural language; formal behavioral specification
notations will be examined later in this course.
- Example: Stack MIS (H&S, pp. 109-110).
- Criteria for judging quality of abstract MIS (H&S,
pp. 113-115):
- Each function in MIS is essential, i.e., no
function can be removed without destroying module
functionality and no extra functions are included.
- Each function in MIS is minimal, e.g., no
function can be decomposed into or simulated by
the action of two or more other functions in the
module (remove and return top element version of
pop vs. remove (pop) and return
(top) MIS for a stack object).
- MIS as a whole is opaque, i.e., hides
appropriate details from both user and module
implementer.
Wednesday, October 4 (Lecture #12)
- Specifying Module Interfaces (Cont'd)
[H&S, Chapters 2 and 7; Parnas 1972a]
- Example: MIS for three basic object types
(sets, sequences, tuples) (H&S, pp. 112-114).
- Quality-criteria for MIS mentioned at end of last lecture
often conflict, and have hidden costs in terms of additional
coding and verification; MIS design is, ultimately, a skill
that comes with experience at balancing programming aesthetics
and development costs.
- In (Parnas 1972a), Parnas suggested that method-names in MIS
should not be mnemonic and indicative of function, as
service-users and implementors may assume different
function than that specified in semantics; interesting
idea, but perhaps not practical.
- Dealing with abnormal processing (H&S, pp. 26-27).
- Specify normal vs. abnormal processing possibilities.
- Characteristics of abnormal processing:
- Illegal requests, e.g., trying to pop an empty
stack.
- Resource limitations, e.g., trying to push
onto a full stack, running out of system memory
when creating a new object-instance.
- Once this normal vs. abnormal distinction made, further
subdivide abnormal cases into those that can and
should be detected and signaled by the module
(exceptions) and those that are not
(assumptions (about function-use)).
- Note that such exceptions and assumptions are
precisely those recorded in the syntax and
semantics of an H&S-style MIS as described in the
previous lecture!
- These three distinctions (normal, exception, assumption)
form the specification trichotomy.
- Characteristics of an assumption:
- Absolutely undetectable violation, e.g.,
checking if pointer passed into a function denotes
a valid address in memory.
- Violations undetectable in practice for cost /
performance reasons, e.g., bounds checking
for array indexes.
- The decision of whether to treat an abnormal situation as
an exception or an assumption has no hard-and-fast rules;
it is a judgment based on experience, programming
aesthetics, and costs in terms additional coding and
verification.
Friday, October 6
- Class cancelled.
- Proposal put forward to move midterm exam to October 20, as
October 18 is same day as 3740 midterm. If anyone disagrees,
please let me know by next Wednesday's class.
Monday, October 9
- No lecture; Thanksgiving holiday.
Wednesday, October 11 (Lecture #13)
- Specifying Module Interfaces (Cont'd)
[A&G, Chapter 7; H&S, Chapters 2 and 7; Parnas 1972a]
- Let's continue with some further thoughts on handling
abnormal processing in programs.
- Parnas scheme for MIS (Parnas 1972): For each method in an
abstract MIS, specify transition in module state induced
by normal processing and separately specify exceptions
associated with that method.
- Makes for clearer specification; error-checking is
messy and distracts from the underlying logic of
normal processing (the clarity vs. correctness
dilemma (A&G, p. 151)).
- If any exception is issued, method is not allowed to
make any (irreversible) change to the module state,
i.e., method-calls that generate exceptions
are invisible.
- Calling program is responsible for handling any
exception by either (1) dealing with the exception
or (2) in turn passing its own exception to its
calling program.
- If exception is passed on, Parnas suggests should
rephrase it in terms only of the calling program;
extension of information hiding philosophy to
exceptions.
- Such exception rephrasing arguably causes a loss of
information about the underlying cause of the
exception.
- Perhaps such rephrasing should only be done after a
software system is debugged and ready for use?
- Given that an exception needs to be signaled, how do we do
this? Three ways (H&S, pp. 110-111):
- Values returned by methods (as method-value or as
method parameter).
- Changing flow of control (on exception, control transfers
to another part of the code by transfer to label or
subroutine that is passed into method that issues
exception).
- Explicit language construct. e.g., the Java
exception mechanism.
- The first and third of these methods are most common in Java;
second can be done, but with great difficulty.
- First and third options are not mutually exclusive in a
program; may want to use to flag "lesser" and "greater"
exceptions, respectively -- using the Java exception
mechanisms for all exceptions may not be appropriate,
e.g., using exceptions to terminate a file line read
loop (A&G, pp. 159-160).
- This effectively splits part of the specification
trichotomy down into two more alternatives.
- Lesser exceptions are unexpected in the sense that their
precise time of occurrence is unknown but they will eventually
occur, e.g., reaching the end of input from a file;
greater exceptions are truly unexpected, e.g.,
attempting to read after reaching the end of file input
stream.
- Signal lesser exceptions by returned values of a method,
e.g., an add method for a database returning
false if the given entry is already in a database and
true otherwise / the Java readLine method's
return of null on reaching the end of input; signal
greater exceptions using Java's exception mechanism.
- May alternatively signal lesser exceptions indirectly by
additional subroutines that signal an imminent exception,
e.g., a special routine EOF that returns
true if end of file has been reached and false
otherwise.
- Example: Deciding which kinds of signaling to use
for various exceptions (A&G, Exercise 7.2, p. 160).
- "Deciding which situations are expected and which are not is a
fuzzy area. The point is not to abuse exceptions as a way
to report expected situations." (A&G, p. 160)
Friday, October 13 (Lecture #14)
- Specifying Module Interfaces (Cont'd)
- It's official; the midterm has been moved to October 20.
- Example: Design of the FlatFile module
in assignment #2 (
Version 1.0).
- Listed module services and return types; justified
boolean return types on methods addEntry
and replaceEntry as signaling the lesser
exceptions (in the sense of the last lecture) of
unexpectedly finding or not finding entries, respectively.
- Noted "hole" introduced by methods getText() and
saveText; is done for efficiency.
- Can we force file to be saved once it is loaded? With
difficulty -- for this assignment, will make this the
responsibility of calling programs.
- Students pointed out that to truly hide nature of field-storage,
should hand in entries as 1-D arrays of fields and text as
2-D arrays of fields / 1-D arrays of entries. This makes
field-separator invisible and makes interface to other
field-storage media other than files (such as standard
database packages) seamless. Will probably make this
change over the weekend to assignment.
- Considered what exceptions are signaled and which methods have
what exception-signaling obligations.
- Two choices for signaling overall module consistency
exceptions, e.g., files missing, some lines of
file have wrong number of fields:
- All methods check on invocation.
- Initialization method does all checks and all other
routines are careful to maintain module consistency.
- Two choices for signaling method parameter
exceptions, e.g., requested field number > num
field per entry in file, given field value has embedded
field separator.
- All methods check all parameters on invocation.
- Only those methods that change the module state are
required to check for sensible parameter values.
- Have adopted latter stance in both cases; tradeoff
between completeness of error checking and added cost
of error checking (both in program development (to add
extra error-checking code) and program execution (time
overhead for executing extra error-checking code)).
- Consequences of these design decisions is that
constructor does most module consistency checking up
front, and methods addEntry
and replaceEntry are required to do parameter
checking -- have also allowed the tagged-methods to do
do parameter checking as well, but only on the given field
number.
- Error on my part; method saveText also changes the
module state! Should also generate BadFlatFile
exception.
Monday, October 16 (Lecture #15)
- Specifying Module Interfaces (Cont'd)
- Example: Design of the FlatFile module
in assignment #2 (Cont'd) (
Version 1.0).
- Discussed decision to change entry-parameters in FlatFile
MIS to 1-D arrays of field-strings; will change module-name to
GenericTable in next version of assignment.
- Specify file name as table name in constructor.
- No longer makes sense to specify field separator in
constructor => Choose field separator from outer reaches
of the Unicode character set.
- How do we now signal that a given field contains the file
field separator? Two proposals:
- Encode each occurrence of field-separator in fields when
stored; decode appropriately when retrieved.
- Issue generic bad field character exception.
- Former suggestion better but more expensive; will stick with
latter suggestion for this assignment (and project).
- Discussed suggestion to make hasEntry return integer
denoting line number of entry in file. Would be convenient
for subsequent calls to replaceEntry; however, as public
method, would expose too many details of module storage. Could
be useful private method, though.
- Example: Design of the PasswordDb module
in assignment #2 (
Version 1.0).
- Discussed original decision to just return FlatFile
exceptions in constructor; have decided against this -- will
"sugar coat" exceptions by rephrasing them in PasswordDb
terminology (as suggested by Parnas (Parnas 1972a)).
- Discussed lack of specific exceptions for bad given fields in
addEntry and changePassword. Have decided against
this -- bad decision. Will throw "sugar coated" versions of
corresponding FlatFile exceptions.
- Lesson of the Day: Once one decides to throw exceptions in a
rational manner, don't skimp on defining new exceptions -- it
is a lot of work, though.
- Was pointed out that security hole is introduced by
PasswordExists /
hasEntry / addEntry under old policy of
requiring both names and passwords to be unique in the
password database (once password discovered, could query
relative to all names to get useful password-name combination).
- Have decided that names must be unique but passwords need
not be; will adjust addEntry and remove
passwordExists.
- Must keep hasEntry to retain module functionality --
is a security hole, but service is required.
- Degree of paranoia about security hole related to users of
module -- if core system function, may not matter. Lesson
here is that modules cannot be designed in splendid
isolation.
- Discussed decision to leave imposition of constraints (length /
character composition) on passwords and names to calling
program. Have decided against this; will implement as part
of passwordDb entry-modification methods and will
introduce appropriate exceptions.
- Discussed leak of PasswordDb implementation details by
allowing methods load, save, and
isLoaded.
- Ideally, the choice of whether to use loaded or file
storage should be done at the level of the FlatFile
module in response to usage characteristics. However,
this is somewhat complex, and have decided to leave
scheme in this assignment as it is.
Wednesday, October 18 (Lecture #16)
- Discussed midterm exam.
- Exam will consist of three questions: A program trace, concepts
and definitions, and MIS design and (partial) implementation.
- Exam out of 50 marks (26 marks Java programming / 24 marks
concepts and design).
- Java knowledge up to and including arrays (the first 2/3 of
2710) should be sufficient for the programming portion;
all other material can be derived from the on-line class
notes, the assignments, and the examples covered in class
(and no, I will not be referring to particular aspects of
any assignment, rather the manner in which they were
specified).
- Discussed specification and system decomposition of course project
(an on-line system for taking multiple choice exams); details will be
officially posted shortly.
- In light of various delays in issuing / correcting assignments,
proposed new deadlines scheme:
- Assignment 2: Due Friday, Oct 27
- Assignment 3: Given Friday, Oct 27 / Due Friday, Nov 10
- Assignment 4: Given Friday, Nov 10 / Due Friday, Nov 24
- Term Project 4: Given soon / Due Friday, Dec 1
Assignments #2 and #3 are key components of term project; will
provide source code for those who can't get it working (with
appropriate deduction of marks from value of term project). Will
also provide source code for non-exam core of project around
Nov 20 for those who can't get that working and wish to try to
finish the rest (again, subject to substantial deduction in value
for term project).
- As with the rescheduling of the midterm, I'll give people a week to
argue with me over this scheme; then it will become law.
- Best of luck to you all with the exam tomorrow, and please
remember -- don't panic! This is common sense stuff.
Friday, October 20
Monday, October 23 (Lecture #17)
- Refining Module Specifications into Code
[H&S, Chapters 8 and 9]
Wednesday, October 25 (Lecture #18)
- Coding Guidelines [K&P, Chapter 1]
- From the viewpoint of formal software development, writing
code is the easy part -- all major decisions will have been
made previously during the specification and design phases.
However, in practice, there is still latitude for creativity
in creating well-written readable (as well as
badly-written unreadable) code.
- Why bother with programming style? Anecdotally, well-written
code is easier to understand, shorter, and most importantly,
seems to contain fewer errors.
- Guidelines (by subject):
- Names, e.g., identifiers:
- Use longer descriptive names for global variables,
shorter names for local variables.
- Use names to show relationships and differences between
variables.
- Use active function names, e.g.,
initialize instead of initialization.
- Use names that accurately reflect what a variable or
function does, cf. Parnas (1972a).
- Comments:
- Don't belabor the obvious.
- Clarify, don't confuse.
- Don't contradict the code, i.e., always update
comments to reflect changes in the code.
- Expressions:
- Indent to show structure.
- Parenthesize fully to resolve ambiguity.
- Break up complex expressions.
- Be clear, not concise.
- Be careful with side effects, cf. i = i++;
- Use constants to relationships between "magic numbers"
in the code.
- Use nested if-else clauses instead of
statements to encode alternatives.
- General:
- Use indenting consistent with the code you are
working on.
- Use forms of programming constructs, e.g., loops,
memory allocation, that are standard practice
(idioms) in the implementation language you are
using, e.g., standard forms of loops in C.
- Security:
- Minimize critical code segments, e.g., the
code between when a file is opened and closed.
- Check (non)existence and permissions on all files
manipulated by code.
- Check all function return values.
- Check all function arguments for invalid values.
- Use full file pathnames.
- Where possible, invoke array-bound checking.
- disable core dumping.
Friday, October 27 (Lecture #19)
- Coding Guidelines (Cont'd)
- The final security guidelines discussed in the last class don't
hold in Java, which does not (to my knowledge) dump core when
programs terminate abnormally and which always does array
and string index checking
- Java has at least one interesting hole related to inheritance
and method overriding in derived classes -- namely, in a
security-related class, it may be possible to extend that class
and obtain all previous functionality while overriding security
checks, e.g., in Assignment #2, making a hasEntry
method
always return true regardless of submitted name and
password in a class derived from PasswordDb.
- Can get around this by appending the modifier final to the
class or method of interest -- in much the same way that
final renders a data-element a constant by forbidding
changes of value, final prevents classes from being
extended or methods from being overridden in derived classes.
- This still leaves potential holes from inheritance (what if a
final-ized class or method still relies on
non-final-ized classes or methods in another class?);
however, it is yet another tool in restricting access along
the lines of the various Java visibility modifiers.
- Verifying Program Correctness: Testing and Inspection
[H&S, Chapters 2 and 10; K&P, Chapter 6; Pfleeger 1998, Chapter 7]
- Many coding guidelines in last lecture fall under name of
defensive programming (K&P, Section 6.1); one tries
to anticipate and deal with various invalid situations that
the code may encounter (including those generated by
buggy code). By testing,we mean looking for problems in
code in general.
- What is testing?
- "... a determined, systematic attempt to break a program
that you think is working." (K&P, p. 139)
- The purpose of testing is to look for bugs; this is not a process
for the meek or gentle of heart.
- Why is this often not the case? Pfleeger (p. 286) speculates
that programmers are taught defensive attitudes towards their
own code during training. Assignments require code that only
need work in certain situations (often those that show the code
at its best); the real world requires code that works under
all situations. Hence, implementors are often not the best
people to test code; this is often to other programmers or
users to try out nasty test cases (in the case of users, often
inadvertently), e.g., video-store employees in the
Goulds.
- Implementer is ultimately best person to test code;
however, requires that we cultivate attitude of
egoless programming among programmers.
- Testing should be systematic in three senses (H&S, p. 185):
- Testing should be planned from the beginning; where
possible, code should be designed to be testable in the
sense of its results being observable..
- Testing should be documented, e.g., the
reasoning behind choices for test-cases.
- Test-sets should be maintained to keep up with
changing code; tests sets are not used just once --
rather, they should be run each time the code is
changed.
- Three approaches to testing:
- Testing in the traditional sense, i.e., running code
on particular inputs to see if it generates the
expected outputs.
- Code inspection and/or walkthroughs.
- Formal proofs of code correctness.
For the moment, focus on testing in the traditional sense.
- Two levels of traditional testing:
- Function testing, e.g., individual methods.
- System (integration) testing, e.g., module / system.
For the moment, focus on function testing.
- Testing as a six-step process (H&S, p. 186):
- Build a test harness.
- Select test cases.
- Generate expected outputs for test cases.
- Execute code on test cases within test harness.
- Compare actual with expected outputs.
- Evaluate results of comparison.
This is, of course, an iterative process; results of one
round of testing may entail the development of a more
specific suite of test-cases to track a bug down.
- The fourth and fifth of these steps are automatable and should be
(more about this in another lecture); focus for now on step (2),
the selection of test cases.
- Four approaches to selecting test cases (H&S, Section 10.4;
Pfleeger, pp. 287-288):
- Exhaustive, i.e., test everything.
- Functional (Black Box), i.e., test subset of
possible combinations of parameter-values to code
without knowledge of code structure.
- Structural (White Box), i.e., assume
knowledge of code structure and arrange test cases so
that each line of code and each alternative implicit
in each line of code is executed at least once in
executing the test cases.
- Random, i.e., generate test cases at random.
Exhaustive is impossible in all but the simplest cases; second
and third approaches attempt to shatter space of possible inputs
into classes of equivalent inputs wrt code behavior, and
hence achieve reasonable-size test cases by selecting single
members of these equivalence classes for testing; philosophy of
fourth approach is to defeat any attempt to hide where bugs
might be by potentially executing any input, e.g.
video-store employees in the Goulds.
Monday, October 30
- Class cancelled.
- Midterms will be returned in Wednesday's class.
- Please look at Assignment #3 and the term project descriptions for
discussion after midterm handback in Wednesday's class.
Tuesday, October 31
- Due to unilateral declaration of Midterm Break, assignment #2 now
due next Tuesday (Nov 7).
Wednesday, November 1
- Midterm break: No classes.
Friday, November 3
- Midterm break: No classes.
Monday, November 6
Thursday, November 9
- According to the latest University release, all classes are cancelled
now for this Friday and next Monday, and all assignments and exams are
postponed. I will thus once again postpone deadline for submission of
Assignment #2, this time indefinitely. However, as this assignment is
part of your term project, I would encourage you to do it and
Assignment #3 and to start thinking about your term project.
- Please note that I will be away from this Saturday (Nov 11) to the
following Tuesday night (Nov 14). I can only hope that by then all
this strike business is settled, and we can get back to what we
do normally around here.
Monday, November 13
Wednesday, November 15 (Lecture #20)
- Went over answers to midterm. Class average was approximately 65%.
If you want to discuss my marking (and I know there is room
for discussion), please come see me; I especially encourage people who
got borderline pass grades to come to see me.
- As we have missed time and the term is marching on, I proposed a
new assignment / project schedule: Assignment #2 will now be due
Monday, Nov 20; Assignment #3 will be due Monday, November 27; and
the project will be due Tuesday, Dec 5. I will cancel Assignment #4;
however, I will post a mock assignment on material that would have
been covered in Assignment #34 with answers, for study purposes for
the final. This means that the 20% for assignments will now be
distributed over Assignments # 1-3. I have several other suggestions
from students I'm also considering re. new schedule; if you have any
yourself, please let me know. If there are no objections, the new
scheme will go into effect on Friday.
Friday, November 17 (Lecture #21)
- Verifying Program Correctness: Testing and Inspection (Cont'd)
[H&S, Chapters 10; K&P, Chapter 6; Pfleeger 1998, Chapter 7]
- Look more closely at functional (H&S, pp. 191-192) and structural
(H&S, pp. 192-194) test-case selection.
- Functional / Black Box Testing: Treat the functions
of a module as black boxes / input-output
relations. For each parameter of each function, derive
a set of important parameter-values. The test set
for that function is then the cross product of the
derived sets for all parameters of that function.
- If parameter is a numerical interval, select
values above and below the interval, the boundary
values of the interval, and a value inside the
interval, e.g., for the interval
[-5,...,24], two possible sets of
test-values are {-6,-5,0,24,25}
and {-6,-5,5,24,25}.
- If parameter is an enumerated type, e.g., types
of fruit or vegetables, partition values into
sets of "equivalent" values and select one
representative value from each set. Alternatively,
if there are several values and none is equivalent to
the others, e.g., the values true
and false for the boolean data type,
simply select the set of all values.
You can also apply these rules to select important states
of module, and then create sequences of function-calls
to exercise transitions between these states, e.g.,
in case of stack, important states are empty, partially
full, and full stacks.
- (H&S, p. 192) For example, given a function
f(p1,p2) where p1 is an integer
restricted to the interval [1,...,100] and
p2 belongs to the enumerated type
{red,blue,green}, reasonable test-sets for
p1 and p2 are {0,1,50,100,101}
and {red,blue,green}, respectively, and a
reasonable test-set for f() would be
{0,1,50,100,101} X {red,blue,green} =
{{0,red},{0,blue},{0.green},{1,red},{1,blue>,{1,green},
...,{101,red},{101,blue},{101,green}}.
- Structural / Clear Box Testing: Assume all code for the
module is visible, i.e., it is a "clear box". Select
test cases that cause certain aspects of the code to be
executed, i.e., select a set of test cases such that
each statement / each conditional branch-point path /
each execution path is executed in the process of
running at least one test case in the set.
- (H&S, pp. 193) For example, consider the following
Java method that
checks a given integer and increments global
counters for positive and even integers accordingly:
public static void tst(int x){
if (x > 0)
if (x % 2 == 0)
}
This method has four possible execution-paths
depending on
what combination of the if-statements is
executed. Two possible test-sets that cause each of
the possible execution-paths to be executed are
{-2,-1,1,2} and {-3,0,2,5}.
- Functional testing is relatively easy to automate but may not
be feasible due to the combinatorial explosion in the number
of possible test cases. Structural testing is theoretically
better at selecting important test cases but it is not very
automatable and moreover may still suffer from a
combinatorial explosion problem (especially if selected test
cases must exercise higher-level structures such as
conditional branch-points or execution paths).
- H&S (page 194) advocate a mixed strategy:
Use the functional approach to derive an initial test set and
then perform structural analysis to trim / augment this
test set.
- Can also add test cases on basis of experience dealing with
common failings of particular data-structure or operation
implementations. For example, a stack implemented as a linked
list should be tested not only in empty and full states but when
there are one, two, and three elements in the stack (as problems
frequently arise with correctly updating "head" pointers in such
lists); for same reason, should also test that elements can be
added, deleted, and then added again.
- Once you have test cases, there are at least two ways to apply
these test cases:
- Test harness in which test cases are entered by user,
.e.g., user enters commands one-by-one through
interface that offers each module function as a menu
option.
- Test harness that automatically runs all test cases without
user intervention (test driver); test cases can
either be hard-coded into test program or are read from
specifications in a system file (see H&S, pp. 195-197).
First approach works best with small modules or large modules in
initial debugging; second approach more work, but is more efficient
and effective in general.
- A special case of automatic testing is when one implementation
of a module is tested against a second (either earlier or
parallel) implementation of that module (regression
testing). Must still select test cases carefully, but now are
only concerned that outputs are identical.
- This type of testing is especially useful to determine if
a new version of a program are compatible with older
versions, e.g., programming language compilers /
interpreters.
Monday, November 20 (Lecture #22)
- Verifying Program Correctness: Testing and Inspection (Cont'd)
[H&S, Chapter 2; K&P, Chapter 6; Pfleeger 1998]
- Example: Test-case selection for stack (Assignment #1,
Part (a)).
- Assume precise values in stack do not matter; focus on
testing transitions between important states (empty /
partially full (one, two, or three elements) / full stacks);
multiple partially full cases needed to ensure that linked
list implementations are adequate.
- Arrange tests to add elements up to a certain number of
elements, delete them all plus one other delete attempt,
and then add the elements again; after each pop
or push operation, print out stack-state to make
sure all is going as expected.
- Make sure exceptions associated with popping empty and
pushing full stacks are exercised by test cases.
- Example: Test-case selection for class GenericTable
(Assignment #2).
- Order tests to exercise methods in order of calling and
complexity, e.g. start with initialization /
constructor, then do state display (numEntries,
hasEntry), simple state change (addEntry,
replaceEntry), and complex state change
(numTaggedEntries, getTaggedEntries,
getTable, saveTable) in that order.
- Keep in mind dependencies between methods when ordering
test cases,e.g., as addEntry calls
hasEntry, should test hasEntry before
addEntry.
- Within a method, test exceptions before normal functions.
- Test all methods relative to both empty and
non-empty files (with one and two entries to see if there
are problems with adding entries); moreover, try
initializing, adding, and then reinitializing to see if
added elements are still there.
- Test-case suites should be structured to move from simple to
more complex cases; moreover, if bugs encountered wrt to simple
cases, testing should stop until that bug is fixed and only then
resumed -- as simple-case bugs may interact in unexpected ways,
not useful to continue testing more complex cases in presence
of such bugs.
- Stress testing is a special kind of testing in which
very-large, randomly generated inputs are thrown at a program;
useful at exposing problems not considered by programmers when
selecting test cases, e.g., handling of string / array
overflows and type conversions.
- Example The Ariane 5 Disaster: $500 million dollars
up in smoke because of an incorrectly handled type
conversion (running example in Pfleeger 1998; see also
K&P, p. 157).
- Switch gears now and consider inspection (also known as
code walkthrough or review) as a method for verifying program
correctness.
- Informally, individual programmers often use elements of
inspection to test and debug their code, e.g.,
getting someone else to read a piece of code or explaining a
piece of code to someone else (who need not be sentient,
e.g., K&P's help desk teddy bear (p. 123)).
- Are many formal versions of inspection; consider version
described in H&S (pp. 28-29).
- Inspection consist of a moderator, a reader who explains
a piece of code to the inspection-group, and a set of
inspectors who try to find flaws in the reader's
explanation.
- Goal of inspection meetings is to find
errors, not to fix them -- that is the programmer's
responsibility. Whether they did or not can be checked in a
followup round of inspection.
- Works best if reader is not programmer who created
product being inspected.
- Inspection does not require executable code; hence, can actually
be applied to any product in the software development process,
e.g., requirements documents, MIS, MID, code, test plans.
- Experience varies, but consensus seems to be that inspection is
surprisingly cost-effective (Pfleeger 1998, pp. 289-292); for
example, H&S (p. 28) note that at Bell-Northern Research (BNR)
in Ottawa, on average, inspection found one fault for every
person-hour invested, while testing required 2 - 4 person-hours
to find a fault. Moreover, inspection often found faults before
the product was installed in the field, at which point it would
4.5 person-days to fix.
Wednesday, November 22 (Lecture #23)
- Verifying Program Correctness: Testing and Inspection (Cont'd)
[H&S, Chapters 2 and 10; K&P, Chapter 6; Pfleeger 1998, Chapter 7]
- Compare testing and inspection (H&S, pp. 30-31):
- Testing
- Pro:
- Many steps can be automated easily.
- Correctness guaranteed over test cases considered.
- Con:
- Results not general, e.g., are specific to
test cases considered.
- Debugging still required.
- Only applicable to executable code.
- Inspection
- Con:
- Very difficult (if not impossible) to automate.
- Correctness not guaranteed over any test cases.
- Pro:
- Results are weak wrt to guaranteeing correctness,
but may hold over very large subsets of test cases.
- No debugging required; act of finding fault typically
points at source of fault.
- Applicable to work-products at all stages of the
software development cycle.
- Our discussions of testing and inspection thus far have focused on
individual modules in isolation; how does one go about testing systems
of coupled modules? This is the subject matter of system
(integration) testing.
- Need way of testing successively larger subsets of modules in
some order; localizing found errors to such subsets of code
makes finding and fixing bugs easier.
- Assume approximate hierarchical relations between modules in the
system.
- Several possible orders for testing (H&S, pp. 189-191;
Pfleeger 1998, pp. 303-310):
- Big-Bang: Test all components individually, then test
them all together and pray.
- Top-down: Test topmost modules in hierarchy first,
and then extend groups downward until all subsets tested.
At all points in testing, replace calls to methods in lower
modules not being tested by special trivial methods called
stubs that return constant or randomly-chosen values.
- Bottom-up: Test bottom modules in hierarchy first, and
then extend groups upwards until all subsets tested.
At all points in testing, replace calls to methods in higher
modules not being tested by test harnesses / test drivers.
- Sandwich: Do top-down and bottom-up testing in
parallel to converge on "target layer" of hierarchy.
- Big-bang integration works on occasion, but if errors occur, they
are very difficult to track down. Top-down integration good for
testing top-most components first, which often have most
important roles in functionality, but requires a lot of
coding effort for stub creation and maintenance. Bottom-up
integration good at detecting lower-level system faults early,
but requires a great deal of coding effort for driver creation
and maintenance. Sandwich integration has the pros and cons of
both top-down and bottom-up integration; it requires much more
coding effort than any other method considered, but it also has
the possibility of finishing faster.
- Bottom-up integration is arguably the natural way to test
object-oriented system (simple objects first, followed by more
complex ones).
- Which integration method you choose may depend on your
situation, e.g., deadlines and available programmer
manpower.
- Example: Microsoft's "Synch-and-Stabilize" software
development strategy (Pfleeger 1998, pp. 309-310).
- Need to get products out fast; moreover, feature-sets of
products need to be adjustable during development to reflect
changing client needs, and features need to be implemented
in order of importance to client needs.
- Divide features into three groups (most crucial, desirable,
least critical); assign small programmer teams (3-8 people)
to work on various feature-sets in parallel, starting with
most most critical. Work of all teams (big-bang?) integrated
and tested daily. Work on features proceeds to less critical
features until all features implemented or deadline for
product is reached.
- Requires a tremendous amount of programmer manpower, but it
does get products out fast, with focus on most important
features being implemented earliest; also makes products
adjustable during development (up to 30% change in
product features over course of development reported).
- "The most important rule of testing is to do it." (K&P, p. 162).
Friday, November 24 (Lecture #24)
- Finding and Fixing Code Errors: The Art of Debugging [K&P, Chapter 5]
- Despite our best efforts in programming, errors occur in code.
Methodologies and tools that reduce or eliminate such errors are
still in their infancy, and work for small systems if at all.
The facts of life are, as programmers, we must still learn
to debug manually.
- Debuggers are an aid to debugging, in displaying stack traces and
values of variables and allowing us to step through code.
However, using a debugger is not in itself debugging.
Several problems:
- Debuggers not always available.
- Debuggers seldom work for complex systems, e.g.,
distributed or multi-process systems.
- Easy to get lost in trivial details of debugging displays
rather than actual debugging.
- Debugging is fundamentally a process of backward reasoning from
visible effects to causes in code, and is for now best done by
human thought.
=> THINK FIRST -- TYPE LATER <=
- K&P find the following sequence of debugging actions useful,
moving from simple to complex on the basis of how good the clues
are about the nature of the error.
- Make sure you're really dealing with a bug, and not just
an obsolete version of the code or an executable that's not
up to date.
- Good Clues, Easy Bugs:
- Look for familiar patterns; maybe you've seen
and fixed this bug before.
- Examine the most recent change in the code; if it
worked before, dollars to donuts the error is in
the most recently modified region of code.
- Don't make the same mistake twice; if you find a bug
and fix it, make sure there are no other occurrences
of the same buggy code in the program (this is why
I insist so often on replacing multiple occurrences
of similar pieces of code in a program by a single
appropriately parameterized method).
- Before you do any typing, think, e.g., have
someone else read your code / explain it to someone
else.
- No Clues, Hard Bugs:
- Make the bug easily reproducible, i.e., create
the smallest sequence of actions / smallest input that
causes a bug to manifest itself.
- Look for suspicious patterns in the occurrence of a
bug that may be related to the value of some system
constant (what K&P call "the numerology of failure"),
e.g., counters that flip out at powers of 2
(8, 16, 32, 64) or characters that go missing at
block boundaries (256, 512, 1024, 2048).
- Eliminate code (in a binary fashion) to locate the
region in which the bug occurs, i.e., delete
code until the bug goes away.
- Insert output statements in the code (in a binary
fashion) to locate the region in which the bug occurs,
e.g., messages that read "Made it to here" or
"Dear God, you should never see this".
- Insert checking-code for the validity of various values
or data-structures.
- Write program-status output frequently to a log file.
- When All Else Fails ...
- Look for resource "leaks", e.g., dangling
pointers, uninitialized variables, failures to
allocate sufficient memory or close open files.
- Use the step-by-step execution mode on the debugger;
perhaps your mental model of how the code is working
is so subtly wrong that you will only realize it when
you have all the gory details of execution in front
of you.
- When outputting messages during debug, make sure any output
buffers are flushed immediately after issuing the output;
otherwise, these messages may not be printed before the fault
occurs and you may be misled.
- Note that we should only invoke the full power of a debugger as
a last resort -- as stressed above, thought should always precede
typing.
Monday, November 27 (Lecture #25)
- Finding and Fixing Code Errors: The Art of Debugging (Cont'd)
[K&P, Chapter 5]
- In addition to debuggers (or indeed, instead of them), the
following software tools are of use in debugging:
- Making sure you actually have a bug:
- Maintaining a library of different versions of parts of
the program code (rcs).
- Maintenance of code dependencies and automatic
recompilation of code on the basis of those
dependencies (make).
- Manipulating debugging output:
- Finding all lines in which a given string occurs in a
file; useful for finding out if a particular message
occurred (grep).
- Finding all differences between two files; useful in
finding out where a correct and a buggy version of a
program diverge, or if two versions of a program
produce the same output under regression testing
(diff).
Programs for performing these tasks under the UNIX operating
system are given in bold-face. To find out more about these
commands, consult on-line manuals using the man
command, e.g., man make.
- Improving Code Performance [K&P, Chapter 7]
- Once you have code debugged and working, you may want to speed
up the code or make it more efficient wrt memory use. However,
you should only do this if you have to -- otherwise, it is
an intellectual exercise rather than useful work.
- "... the first rule of optimization is don't
(K&P, p.165)
- Once you have decided to optimize your code, it is useful to
go through these steps in sequence:
- Get accurate running times for your code on a variety of
cases. Moreover, where possible, use CPU time figures rather
rather than elapsed (clock) time figures -- if you are on a
multi-user system and there are a number of other programs
running, these two times will not be the same.
- If these running times indicate that optimization is
warranted, isolate those portions of your program which
are being used the most frequently and/or are taking up
most of the run time (this can be done with a profiling
tool; one such tool for C language programs on the UNIX
system is prof). These portions are important because
effort spent on optimizing these portions will have the
greatest benefits in terms of better performance.
- Optimize the selected portions of code by various techniques:
- If possible, invoke code optimization in whatever
compiler you are using.
- Invoke more efficient algorithms and/or data structures,
e.g., progression through recursive / iterative
/ lookup table implementations of factorial function.
- "Tune" your code (see K&P, Sections 7.4-7.6 for
details).
- Note that optimization often results in code that is more complex
and less comprehensible; such code may run faster but be
harder to maintain and modify. Hence, there is a tradeoff
between efficiency and utility in software development, and you
may have to choose which is more important for a given project.
- Documentation: It's Not Just For Users Anymore
(H&S, Chapter 2, Sections 2.2-2.3; Parnas and Clements 1986)
- As we've talked about it in this course, each stage of software
development has been characterized by a type of document;
arguably, these documents are more important than the code,
as we could reconstruct code from the documents but not
vice versa.
- Documentation is often seen as a necessary evil in software
development; all too often, documentation is written after
the project is over by people who don't want to do it and have
not been given any guidance on how the documentation should
be written. Thus, most documentation is often boring and
inconsistent, and characterized by a "stream of execution" style
that describes (often wrongly) how the code executes.
- Should have standards for documentation such that each component
document has a particular well-defined role, structure, and
notation and each fact about the software development process
has a single well-defined place in the documentation. Such
documentation is a concise and rigorous explanation of how
a product works rather than a history of how the product
was developed (just as a mathematical proof is simplified
and tidied up for publication; all we care about is the result,
not the process by which we arrived at the proof).
- Ideally, such rational documentation would be a natural
outgrowth of a rational software development process. Even
though this process and the initial documentation is seldom
rational because of various unavoidable human factors and the
rational is thus an ideal, Parnas
and Clements argue that we should always write documents as if
the process had been rational, e.g., we should fake it.
There give two reasons for this:
- The rational ideal is a useful standard in large
organizations, both for document creation and review.
- The rational ideal is a useful goal for programmers; even
if it is not always attained, the effort expended to
understand the code thoroughly enough to attempt to
write rational documentation will lead to a better
understanding of the software at all points in the
development process, and thus lead to a better final
product.
- "It is very hard to be a rational designer; even faking that
process is quite difficult. However, the result is a product
that can be understood, maintained, and reused. If the
project is worth doing, the methods described here are
worth using." (Parnas and Clements 1986, p. 256)
Wednesday, November 29 (Lecture #26)
- Building A Better Tomorrow: Trends in Software Development
[Brooks 1995]
- Let's spend a few lectures looking at the future of software
development as seen through the eyes of Brooks and Harel in
the "Silver Bullet" Trilogy (Brooks 1986 (reprinted as Brooks
1995, Chapter 16); Brooks 1995, Chapter 17; Harel 1992).
- Brooks sees the big problem in software development as dealing
with the complexity of the underlying conceptual design of
software. This complexity is inherent in software and cannot
be abstracted away because:
- Software models human organizations and human-derived
processes, which are in themselves complex
(conformity);
- Software can and must change to reflect the demands of
changing client needs (changeability); and
- The underlying conceptual design of software has no
one satisfactory representation, visual or mathematical,
with which programmers can reason about it
(invisibility).
- Most progress to date in software development has been via
tools that remove accidental (or rather, incidental)
difficulties in constructing software, e.g., high-level
languages, time-sharing operating systems, program development
environments; however, few tackle the essential difficulty, which
is the inherent complexity of software (see above).
- Now, where does this crazy "silver bullet" metaphor come from?
In legend, the werewolf was a monster that emerged suddenly
from the familiar, and he could only be slain with a silver
bullet. In much the same way, software crises seem to emerge
unexpectedly from apparently simple system requirements. Many
researchers have claimed to have a "silver bullet" that will
make software development easy, e.g., automatic
programming, object-oriented programming, visual languages,
expert systems.
- Brooks in 1986 suggested that no one technique was a silver
bullet in itself -- an opinion he stated all the more strongly
looking back in 1995. Rather, if there is a solution, it must
come from a fusion of new software development methodologies
and new software development tools -- in particular, new
software specification notations that allow for easier,
automatable specification of systems and system
requirements, translation between various levels of
systems specifications, and verification of the
correctness of these specifications relative to requirements.
- Consider several promising methodologies as selected by Brooks
in 1995:
- Progressive Refinement: Instead of proceeding through
the software process stage by stage for the whole system
(the "waterfall" model; arguably the model
underlying this course), start with a basic skeleton of a
system menu and add functionality piece by piece in
response to client requirements (the incremental build
model), with the clients being involved at all points in
the software development process in viewing and critiquing
the evolving system as well as making suggestions about
what the system should do (rapid prototyping).
- Such an organic process of "growing" complex software
seems to work remarkably well, both in dealing with
complexity and being responsive to client needs,
e.g., the Microsoft "synch-and-stabilize"
software development process (see notes for Lecture
23 (November 22) above).
- "Buy vs./and Build": Instead of developing a
customized system
from scratch, buy a generic version off the shelf and
configure it ("Buy vs. Build": the Brooks 1986 vision) or
better yet, buy generic components off the shelf and
configure them into the system you want ("Buy and Build":
the Brooks 1995 vision) (metaprogramming).
- Brooks notes that this has been driven by market
forces -- namely, the decreasing cost of hardware
relative to software and the explosion in the number
of computers available (and hence a corresponding
explosion in the number of
computer-(semi)literate users).
- Note that market forces could potentially
reverse this trend to once again favor custom over
generic software. For example, the cheap labor costs
associated with software development in the Third
World (Asia, Africa) as opposed to North America
may flip the hardware / software cost ratio back to
the way it used to be (Tony Middleton, private
communication). I conjecture that this may be so
in the short term, but would in time revert to
favoring generic software in the face of
ever-increasing system complexity.
Friday, December 1 (Lecture #27)
- Building A Better Tomorrow: Trends in Software Development (Cont'd)
[Davis 1993, Chapter 4; Harel 1984, 1987, 1992; H&S, Chapter 3;
Iglewski and Mincer-Daszkiewicz 1997]
- Having talked about some promising software development
methodologies last lecture, let's talk now about some trends in
tools. It seems that in order to attack conceptual complexity,
we need better ways of reasoning about and representing concepts,
i.e., better system specification notations.
- Characteristics of a good system specification notation:
- Should reduce ambiguity, incompleteness, and
inconsistency.
- Should serve as a basis for design, translation,
and verification.
- Natural language or pseudocode often used
but is either too vague (former) or too
program-specific (latter) -- focus here on abstract
specification notations.
- There is a tension between rigor and comprehensibility
in specification notations --
formality makes specifications rigorous (and hence
makes property-checking and testing automatable),
while "informality" makes specifications comprehensible
to those unfamiliar with formalisms, e.g., other
programmers or clients.
- If comprehensibility is essential, be informal; if
rigor is essential, e.g., safety-critical
systems like nuclear reactors, pacemakers, or
air traffic control systems, be formal.
- Two major types of specification notations:
- State-Oriented
- System viewed as collection of states, and
behavior stated in terms of how
functions calls / conditions / events cause
the system to change its state.
- The state of a system can be seen as the
collection of the values of all data
structures in the system.
- A natural approach for modeling reactive
systems, i.e., non-terminating
(possibly
distributed) systems that react to events.
- Function-Oriented
- System behavior stated in terms of that
system's input/output function or as the
set of legal sequences of (module) function
calls and their outputs.
- Seen as more abstract specification
in that it does not require a model of a
system's state (though portions of this
state are exposed in the formats of inputs
and outputs).
- A natural approach for modeling
transformational systems, i.e., systems
that transform inputs to outputs and
terminate.
- Focus first on state-oriented notations.
- Many state-oriented notations are based on finite-state
machines (FSM) (Davis 1993, Chapter 4, pp. 217 - 223;
H&S, Chapter 3, pp. 51-60),
which extend finite-state acceptors/automata (FSA)
by considering inputs as function calls or (external)
events, having no final states, and associating outputs
with either state-transitions (Mealy FSM) or states
(Moore FSM).
- Mealy FSM good at modeling transient outputs,
e.g., returned function values.
- Moore FSM good at modeling long-term outputs,
e.g., output on graphic display device.
- FSM very versatile, however, have several problems:
- Cannot groups states together or express hierarchical
relations between groups of states => difficult to
refine FSM specifications towards code.
- All states and transitions must be represented
explicitly => potential exponential state explosion
problem, e.g., a system consisting of n
light bulbs that can be lit or dark has 2^n
possible states.
- Difficult to concisely represent process communication /
synchronization in FSM without causing state
explosion.
- There have been many extensions proposed to FSM that address
these problems. One of these proposed notations is that for
statecharts (Davis 1993, Chapter 4, pp. 223 - 233;
Harel 1987)
- Allow concise formulation of identical
transitions to multiple states as well as
hierarchical grouping of states into "superstates".
- Allows primitive concurrency / process
synchronization.
- More formal notations, i.e., Z or B,
define states and transitions
mathematically in terms of set theory or first-order
logic and allow concise and rigorous formulations of very
large or
infinite-state systems -- however, specifications in
these notations are difficult for those unfamiliar with
these notations to comprehend (Davis 1993, Chapter 4,
pp. 259 - 260).
- Focus now on function-oriented notations.
- Two simple but effective function-oriented notations are
decision tables and decision trees (Davis 1993, Chapter 4,
pp. 244-248).
- A decision table is a table whose rows are split into
two groups, conditions (C = {c1, c2, ..., cn}) and
actions (A = {a1, a2, ..., am}). Each column of the
table corresponds to a subset of C that describes a
particular input situation and the subset of A that
is the system's response to that situation, cf. FSM
encodes set of outputs corresponding to a single input.
- Decision tables tend to be very large (have 2^n
columns for n decisions); however, they can be easily
stored and edited using standard spreadsheet packages.
- Decision tables are good in initial stages of problem
analysis, when you want to make sure you have
considered outputs associated with all possible
combinations of inputs; are also most concise form
when most possible combinations of outputs are
possible.
- A decision tree is essentially a tree-shaped flowchart
whose internal nodes correspond to the conditions in C
and whose leaves are subsets of the actions in A.
- Decision trees tend to be compact; moreover, they
beautifully encode the importance of various
conditions in deciding which outputs must occur, i.e.,
key conditions are at or near the root of the
decision tree.
- Decision trees are good for summarizing decision
tables; they also subtly introduce a sequential
order on decisions that need to be made to
relate inputs to outputs that can then be used
in refining the behavior specification into code.
- A more mathematical function-based notation underlies the
Trace Assertion Method (TAM). TAM was defined by
Bartussek and Parnas in 1978; it allows function-oriented
specification of the behavior of modules, where each
module has an associated set of access-functions
(Iglewski and Mincer-Daszkiewicz 1997).
- A trace is a sequence of interleaved function inputs,
function calls, and function outputs, e.g.,
i.f_1.o_1.f_2.o_2.f_3.o_3 ... o_(n - 1).f_n.o_n;
note that
the module state at the end of the trace is implicit in
the trace itself.
- The set of all possible traces for a module is partitioned
into classes of equivalent traces, i.e., traces that
end up in the same state.
- Represent each such class with a canonical
trace.
- Each such class of traces is equivalent to
a state of the module -- in particular, the
state of the module at the end of the execution
of each trace in the class, cf. the
Myhill-Nerode theorem which shows the equivalence
of regular languages and FSM.
- A TAM system specification consists of:
- The module interface specification.
- A canonical trace function.
- An extension function (shows legal extensions to a given
trace).
- An output relation (shows output associated with a
given trace).
- TAM allows designers to specify a module's behavior with
minimal release of module state details, e.g., those
details of the state implicit in access-function input and
output formats.
- Object- and function-oriented notations are ultimately
equivalent in descriptive power; which one you use depends
more on what is more comprehensible in a given situation.
- Note that both types of notation have both visual and
mathematical versions. The usual assumption is that visual
notations are more comprehensible / less formal and
mathematical notations are less comprehensible / more formal.
- Is this necessarily so? Must there always be a tradeoff
between comprehensibility and rigor in adopting a notation,
or can we have compressible yet rigorous notations,
e.g., a visual notation with a well-defined and
powerful semantics? Harel would argue the latter
(Harel 1984, 1992) and claims to have developed such
notations (Harel 1987, 2000; Harel and Politi 1998).
We will explore Harel's vision of how such notations might
revolutionize software development in the next and final
lecture.
Monday, December 4 (Lecture #28)
- Building A Better Tomorrow: Trends in Software Development (Cont'd)
[Brooks 1995; Harel 1992, 2000]
- Look at Harel's dream for a maximally-automated software
development process (Harel 1992, 2000).
- Consider the following drastically simplified view of the
software development process:
clients =(a)=> Requirements <(b)=(c)> Specification =(d)=> Code
Traditionally, processes (a)-(d) have been done manually, and
have gone under the names requirements management (a), testing /
inspection (b), system decomposition / MIS / MID development
(c), and programming (d).
- Can we do better? Can we achieve better living through automation,
at least wrt the software development process? Harel thinks so,
and the key is better software specification notations --
in particular, visual notations like statecharts and its
descendents that have rigorous, well-defined semantics,
e.g., UML (Unified
Modeling Language).
- Several tools already exist that use such notations
(ObjecTime
(
ObjecTime Technical Papers Archive),
Rational Rose RealTime
(
Rational Home Page),
Rhapsody,
Statemate,
(Harel et al. 1990,
Harel and Politi 1998)).
At present, such tools can execute system specifications
directly (executable models), produce passable code
(currently as software and soon as firmware or specifications for
special-purpose hardware) from system specifications, and, where
requirements are phrased in terms
of "stories" describing how a system responds to various
sequences of events (scenarios; these are very much like
traces in TAM (see previous lecture)), automatically execute
specifications relative to these sequences and compare actual
with expected results, i.e., "play-out" scenarios
(sort of a glorified test harness).
- Harel sees the challenges as automating steps (b) and (c).
- Verification (step (c)) involves not only checking that a
specification satisfies requirements over a small set of
inputs but over all sets of inputs relative to some set of
desirable properties encoded in the requirements. These properties
can be loosely classified as safety properties, i.e.
, something bad never happens (reactor meltdown), and
liveness properties , i.e., something good always
happens (system resources are always available / resource
deadlock never occurs). Harel has worked on encoding such
properties in scenario-variants called anti-scenarios.
Verification is computationally very difficult (in some
cases nonrecursive) in general, but it is practical in restricted
cases, particularly those involving finite-state specifications.
- Synthesis (step (c) seems to be much more difficult
to do automatically. Research is underway. Harel suspects that
it will involve a mix of current methodologies, e.g.,
progressive refinement (see notes for November 29 (Lecture 26)),
with new methodologies and "...a deep and profound wisdom ..."
(Harel 2000, p. 9) arising from accumulated experience using
synthesis tools of the future.
- Harel (Harel 2000, p. 9) sketches a new form of rapid prototyping
called scenario play-in that he is working on now. In
this scheme, clients would operate a rudimentary user interface
to a system under development and "play in" scenarios of how the
system should operate, in effect building up the requirements
automatically (almost like training a neural net). No papers as
of yet ... Watch the skies! ...
- In summary, for the diagram above, it seems that we can at
present automatically do (d) passably and (c) on occasion;
(a) and (b) await future research.
- There is, however, one very important thing that we can do for
now. Brooks (1986) actually made one more methodological
proposal for improving software development:
... Improve software developers themselves! ...
Various studies have confirmed that the most important factor
(by a factor of four over the next factor) (Harel 1995, p. 276)
in the creation of quality software is the quality of the
software development team. Hence, much effort is usefully
expended on the care and cultivation of such teams (Harel 1995,
pp. 276-279) -- tools are not everything!
- "The central question of how to improve the software art
centers, as it always has, on people." (Brooks 1995, p. 202).
- If software development is at its heart conceptual design,
software developers need not only to be technically proficient in
tools, techniques, and methodologies but also to be curious,
playful, and above all creative,
living embodiments of both engineering and art. Training such
people is very difficult (Brooks (1995), pp, 202-203, gives
some suggestions) -- however, doing this successfully might
ultimately be the greatest advance we can make in improving
the process of developing software in future.
End-of-Term Notes:
- Dec 8, 7:15pm
- In question #3 of the final, I deleted the stuff on testing.
- Questions #3 and #4 of the final are now worth 40 and 50 marks,
respectively.
- The last 3 lectures in the course (the "Trends" lectures) will
not be covered on the final exam.
- Dec 6, 5:00pm
- Let me start out by apologizing to those of you who I have snapped
at or been brief with the last few days; I am very busy
right now and unfortunately, strain tends to appear at the most
inconvenient times. That being said, please don't ask questions
or drop by to chat unless it's absolutely necessary.
- I am still working on getting the assignments marked; they will
probably not be finished before the exam, but you don't need them.
- Several people have asked if, in light of the somewhat unusual chain
of events that transpired this semester, they can get a deferred
exam or term project. I would rather defer the term project, and
will do so for a few people if they *** REALLY *** need to
do so. This will entail people getting an INCOMPLETE on their
mark for the course this term, handing in their projects on
Wednesday January 10, 2001 (the day before classes start), and
getting a real mark after I mark their projects in
mid-January, 2001. If you *** REALLY *** need to do this,
talk to me and I'll do up a memo to that effect (so if, as Elaine
in the General Office puts it, "I get hit by a yellow bus", you
have some proof that this was all OK by me).
- The exam is this coming Saturday (Dec 9) from 3-5pm in SN-2067.
I have the final exam made up now. It will have 4 questions:
- Definitions (stuff since the midterm; please see the on-line
notes) (10 marks).
- Deriving test cases via the black box and white box test-case
selection criteria (I have augmented the notes for
November 17 above; please go back and look at this again)
(20 marks).
- Given an abstract description of an object and an MIS,
derive test cases, correct a faulty method implementation
of a function in that MIS, and provide a partial
implementation of a module for that object based on the
given MIS (45 marks).
- Given an abstract description of an object, derive an MIS
(using the MIS format in the previous question) and provide a
partial implementation of a module for that object based on
that MIS (45 marks).
The Java involved in the programming is all very basic stuff so
don't panic. The format and marks assigned may change slightly as I
look things over tonight, but I think this is it. How do you study
for this exam, you ask? I think the
best preparation is having worked on the assignments and reading
over the on-line notes; reading the stuff on reserve in the library
should not be necessary.
References:
-
Arnold, K. and Gosling, G. (1998) The Java Programming Language
(Second Edition). Addison-Wesley, Reading, MA. (abbreviated A&G
above)
-
Brooks, Jr., F.P. (1986) "No silver bullet -- essence and accidents of
software engineering." In H.J. Kugler (ed.), Information Processing
86, pp. 1069-1076. Elsevier Science, Amsterdam. Reprinted in
IEEE Computer, 20(4) (April 1987), 10-19 and Brooks (1995).
-
Brooks, Jr., F.P. (1995) The Mythical Man-Month (20th Anniversary
Edition). Addison-Wesley, New Jersey.
-
Davis, A.M. (1993) Software Requirements: Objects, Functions, and
States (revised edition). Prentice Hall PTR, New Jersey. [Chapter 4]
-
Garlan, D., Kaiser, G.E., and Notkin, D. (1992) "Using Tool Abstraction to
Compose Systems." IEEE Computer, June, 30-38.
-
Harel, D. (1984) "On Visual Formalisms." Communications of the ACM,
31(5), 514-530.
-
Harel, D. (1987) "Statecharts: A Visual Formalism for Complex Systems."
Science of Computer Programming, 8, 231-274.
-
Harel, D. (1992) "Biting the Silver Bullet: Towards a Brighter Future
for System Development." IEEE Computer, January, 8-20.
-
Harel, D. (2000) "From Play-In Scenarios to Code: An Achievable Dream."
In Proceedings of the 5th International Conference on Implementation
and Application of Automata (CIAA'2000), pp. 1-15.
-
Harel, D., Lachover, H., Naamad, A., Pnueli, A., Politi, M., Sherman, R.,
Shtull-Trauring, A., and Trakhtenbrot, M. (1990) "STATEMATE: A Working
Environment for the Development of Complex Reactive Systems." IEEE
Transactions on Software Engineering, 16(4), 403-414.
-
Harel, D. and Politi, M. (1998) Modeling Reactive Systems with
Statecharts: The Statemate Approach. McGraw-Hill; New York.
-
Hoffman, D. and Strooper, P. (1999) Software Design, Automated Testing
and Maintenance. International Thomson Computer Press, London.
[Chapters 2, 3, 7, 8, 9, and 10] (abbreviated H&S above)
-
Iglewski, M. and Mincer-Daszkiewicz, J. (1997) "Internal design of
modules specified in the trace assertion method." Science of Computer
Programming, 28, 139-170.
-
Kernighan, B.W. and Pike, R. (1999) The Practice of Programming.
Addison-Wesley, Reading, MA. [Chapters 1, 5, 6, and 7] (abbreviated
K&P above)
- Lewis, H. and Loftus, W. (1998) Java Software Solutions:
Foundations of Program Design. First Edition. Addison-Wesley,
Reading, MA. (book
resource website)
-
Parnas, D.L. (1972a) "A Technique for Software Module Specification with
Examples." Communications of the ACM, 15(5), 330-336.
-
Parnas, D.L. (1972b) "On the Criteria To Be Used in Decomposing Systems into
Modules." Communications of the ACM, 15(12), 1053-1058.
-
Parnas, D.L. and Clements, P.C. (1986) "A Rational Design Process: How and
Why to Fake It." IEEE Transactions on Software Engineering, SE-12(2),
251-257.
-
Pfleeger, S.L. (1998) Software Engineering: Theory and Practice.
Prentice Hall, New Jersey. [Chapters 5 and 7]
This page is adapted from that for CS3718 (Fall 1999)
created by Mike Rendell.
Created: September 1, 2000
Last Modified: December 9, 2000