Computer Science 3200, Fall '24
Course Diary

Copyright 2024 by H.T. Wareham, R. Campbell, and D. Churchill
All rights reserved

Week 1, Week 2, Week 3, Week 4, (In-class Exam #1 Notes),
Week 5, Week 6, Week 7, Week 8, Week 9, (In-class Exam #2 Notes),
Week 10, Week 11, Week 12, (Final Exam Notes),
Week 13, (end of diary)

In the notes below, RN3, RN4, and DC3200 will refer to Russell and Norvig (2010), Russell and Norvig (2022), and David Churchill's COMP 3200 course notes for the Fall 2023 semester.

Week 1 [DC3200, Lecture #1; RN3, Chapter 1; Class Notes]

Thursday, September 5 (Lecture #1)
[DC3200, Lecture #1; RN3, Chapter 1; Class Notes]

Went over course outline.
- Important note: Though this course was created by Dave Churchill, I am not teaching it the way Dave does. Differences include:
  - Different lecture topics / assignments / evaluation scheme.
  - Assignments done individually, not in teams.
  - Assignments in Python, not C++.
  - No lecture capture (though links will be given to Dave's recorded lectures on YouTube where appropriate).
  - No course Discord channel.
  For those who prefer Dave's 3200, he will be teaching it in the Winter 2025 semester.
As this course is on algorithmic techniques for artificial intelligence, let us first give brief overviews of intelligence, artificial intelligence, and algorithms.
What is intelligence?
- Intelligence is at its heart the ability to perform particular tasks, e.g.,
  1. problem solving, e.g., playing complex games like chess and Go, proving mathematical theorems, logical inference.
  2. being creative, e.g., producing literature, music, or art, doing research.
  3. using tools to perform tasks
  4. social interaction with others
  5. remembering / learning
  6. talking
  7. seeing
- Intelligence is typically associated with agents and the behavior these agents exhibit when interacting with an external environment (more on this in the next lecture).
- Intelligence has often been seen as anything humans do that animals don't, e.g., proposed biological Kingdom-level class Psychozoa for humans (Huxley (1957)); as evidence has accumulated of lower-level intelligence in animals, e.g., gorilla communication by American Sign Language (ASL), counting by crows, memory in bees, complex social interaction and tool use by chimpanzees, intelligence has been redefined in terms of complex high-level tasks, e.g., (1-2) above, to preserve human superiority.
- This view of intelligence as exclusively human (and high-level adult human at that) surfaces in what we find anomalous (and funny), e.g., "Children / animals can't do that!"
- Example: Not the Nine O'Clock News (1980): Gerald the Gorilla (Link)
What is artificial intelligence (AI)?
- Mythology and history are filled with artificial creatures that exhibited the appearance of (if not actual) intelligence, e.g., the bronze giant Talos, Roger Bacon's talking bronze head, European clock and musical automata, Vaucanson's mechanical duck (1739).
- The term "artificial intelligence" was coined by John McCarthy to describe the goal of a two-month summer workshop held at Dartmouth College in 1956. AI is essentially the implementation of intelligent abilities by machines (in particular, electronic computers).
- Many AI application areas, e.g.,
  - Problem solvers, e.g., General Problem Solver (GPS), artificial theorem provers, chess and Go programs (Deep Blue (1997); AlphaGo (2016)).
  - Logical / probabilistic reasoning systems, e.g., expert systems, Bayesian networks.
  - Generative AI, e.g., ChatGPT, DALI
  - (Swarm) Robotics
  - Machine learning
  - Natural Language Processing (NLP)
  - Computer Vision
- Can distinguish types of AI based on mode of implementation (weak / strong), implementation mechanisms (symbolic / connectionist), and extent of implemented ability (Artificial Specialized Intelligence (ASI) / Artificial General Intelligence (AGI)).
  - Strong AI uses computational approximation of human / natural cognitive mechanisms; Weak AI uses any possible computational mechanisms.
  - Symbolic AI uses computer programs operating within discrete-entity frameworks, e.g., logic; Connectionist AI uses approximations of biological brain mechanisms, e.g., neural networks.
  - ASI implements a single intelligent ability, e.g., recognizing faces, answering questions, playing chess; AGI implements all intelligent abilities performed by a human being.
- The initial focus of AI research was on strong symbolic ASI for high-level intelligent abilities (tasks (1-2) above) with the hope that developed ASI modules could be combined to create AGI. However, given the difficulties of implementing even supposedly low-level intelligent abilities (tasks (3-7) above), let alone AGI, current research tends more to weak connectionist ASI.
- Though it has gone through several "winters" in the last 65+ years (Haigh (2023a, 2023b, 2024a, 2024b, 2025)), AI is currently in a boom period, with weak connectionist systems exhibiting seemingly human- (and sometimes trans-human-) level performance, e.g., ChatGPT, medical image analysis.
- It is important to keep the distinctions made above in mind, particularly between ASI and AGI; it is often in the interests of researchers and corporations to claim that their systems do far more they than they actually do, and the resulting expectations can have embarrassing, e.g., automated machine (mis)translation, or even lethal, e.g., (non-)autonomous vehicles, consequences.
What is an algorithm?
- Classical CS definition: Given a problem P which specifies input-output pairings, an algorithm for P is a finite sequence of instructions which, given an input I, specifies how to compute that input's associated output in a finite amount of time.
- This is the type of algorithm underlying symbolic GOFAI (Good Old-Fashioned AI); such algorithms are both created (hand-crafted) and understandable by humans.
- Many modern AI systems have broadened the term "algorithm" to methods derived by machine learning systems that have been trained on massive amounts of data; such algorithms are machine-crafted and in many cases not easily understandable by humans, e.g., BERTology = the field of research devoted to trying to understand why the NLP BERT Large Language Model produces the outputs that it does.
- It is critically important to keep the hand-crafted / machine-crafted distinction in mind given emerging challenges in real-world AI.
  - In order for AI systems to operate in tandem with humans or autonomously, these systems must be trustworthy, i.e.,
    - These systems are reliable, i.e., they produce wanted behaviors consistently;
    - These systems are fair, i.e., they do not encode bias that compromises their behavior; and
    - These systems are explainable, i.e., the reasons why a system makes the decisions it does can be recovered in an efficient manner.
    While these properties can be ensured in hand-crafted, algorithms, it is not yet known if this can be done (even in principle, let alone efficiently (Buyl and De Bie (2024); Adolfi et al (2024a, 2024b)) for machine-crafted algorithms.
AI is most certainly in an exciting yet unnerving state at present, but if one can forego knee-jerk panic, there are many opportunities for both applying existing and developing new algorithmic techniques for AI.
In this course, we will first look at symbolic hand-crafted algorithmic techniques (search (single agent / multi-agent), constraint satisfaction, logical reasoning, planning) and then move on to symbolic and connectionist machine-crafted techniques (evolutionary computation, reinforcement learning, neural networks and deep learning).
But before all that, next lecture, a closer look at agents and environments -- see you then!
Potential exam questions

Week 2 [DC3200, Lectures #2 and 3; RN3, Chapters 2 and 3; Class Notes]

Tuesday, September 10 (Lecture #2)
[DC3200, Lecture #2; RN3, Chapter 2; Class Notes]

Agents and Environments
- Agents
  - An agent exists and performs intelligent behaviors, e.g., makes decisions and takes actions, in a specific environment and interacts with that environment via a sense-understand-plan-act loop, i.e.,
  - An agent perceives its environment through sensors, e.g., camera, touch, taste, keyboard input; these sensors provide percepts, which form percept sequences over time.
  - Agents act in their environment through actuators, e.g., limbs, tools, allowable moves in a game; actuators perform actions that can modify the environment (including the agent's internal state and/or position in the environment).
  - An agent's choice of action is specified as an agent function and implemented as an agent program relative to a particular architecture.
    - Agent functions are also known as policies and there are various names for agent programs depending on the application, e.g., controllers (robotics).
    - In general, an agent's action-choice can depend on its entire percept-history to date; however, we will in this course focus an agents whose action-choices depend on the current percept.
    - There are typically many ways to implement an agent function, e.g., lookup tables, IF-THEN statement blocks, neural networks.
  - An agent may have knowledge about its environment, e,g., environment model, game rules / allowable moves.
  - EXAMPLE: Some example agents:
    - Roomba vacuum cleaner
    - Human players in a game of chess
    - Self-driving vehicle
    - AI personal assistant
    What are the sensors, percepts, actuators, actions, and knowledge (if any) associated with each of these agents?
  - An agent may have multiple internal states, with percept / action pairs between these states encoded in a state-transition function.
  - EXAMPLE: Three deterministic finite-state agents that operate in a 2D grid-world (Wareham and Vardy (2018)).
    - State transitions are specified as (state, percept, action, state) tuples, where an action is specified as an (environment-change, agent move) pair.
    - Agent processing starts in state q0.
    - What does each of these agents do?
  - EXAMPLE: A stochastic agent
    - Specified as a table whose x- and y-axes are percepts and table entries are probability-vectors specifying the probabilities of a set of actions relative to a set of x/y-percept combinations.
    - Consider the following agent which specifies the probabilities of what to do (wear coat, carry umbrella, do nothing extra) if you want to take a walk outside depending on the weather ((rainy / cloudy / sunny) x (cold / neutral / hot)):
  - A rational agent does the best possible action at each point wrt optimizing some pre-defined performance measure; such an agent has an optimal policy.
    - Note that rationality is NOT omniscience, i.e., knowing the complete environment and the outcomes of all (even random) actions.
    - EXAMPLE: Going across street at crosswalk to see friend vs not crossing street because of incoming meteor from space.
  - Four basic types of agents:
    1. Simple reflex agents
      - This simplest type of agent reacts without understanding or planning by using sets of encoded IF-THEN rules that directly link percepts and actions, e.g., the deterministic finite-state and stochastic agents given above.
      - The intelligence encoded in such agents can be impressive, e.g., Brooks' Genghis robot, basic obstacle avoidance in self-driving vehicles, but are ultimately limited in that there is little understanding or planning in the action-selection process.
    2. Model-based agents
      - This type of agent maintains a non-trivial internal state that incorporates knowledge of how the world works and a model of the world encountered to date (including the agent's own past actions in the world, e.g., the two-state deterministic finite-state agent given above).
      - The intelligence encoded in such agents now allows understanding to be part of the action-selection process.
    3. Goal-based agents
      - This type of agent incorporates a description of an agent's wanted goal, e.g., the destination to be reached, as well as a way of evaluating if that goal has been achieved, e.g., are we at the requested destination?
      - Typically more complex than simple reflex agents but much more flexible in terms of the intelligence it can encode (in particular, planning as part of the action-selection process).
    4. Utility-based agents
      - This type of agent incorporates a utility function that internalizes some performance measure, e.g., the distance traveled on a particular path from the start point to the destination, and is thus capable of optimizing this performance measure, e.g., choosing the shortest path from a start point to the destination.
      - Such agents are more complex still but can (given an appropriate utility function) encode general rationality.
      - Note that choosing the appropriate utility function is crucial to the encoded degree of rationality and this is typically very hard to do.
  - The intelligence in the four types of agents above is hard-coded using hand-crafted algorithms. The most complex type of agent can learn from experience and both become rational and maintain this rationality in the face of changing environments. The resulting agent programs are machine-crafted and will be discussed in the final lectures of this course.

Thursday, September 12 (Lecture #3)
[ DC3200, Lecture #2 and DC3200, Lecture #3; RN3, Chapters 2 and 3; Class Notes]

Agents and Environments (Cont'd)
- Environments
  - The current configuration of an environment is that environment's current state.
    - A configuration encodes all entities in an environment, including the agents (and, if applicable, the positions and internal states of these agents) within the environment.
    - EXAMPLE: A configuration of a Solitaire card game would show all cards and their order, both in the visible stacks of cards and the deck of undrawn cards.
    - What an agent perceives may not be a complete representation of an environment's state; however, such observations are all the agent has to make its decisions.
    - EXAMPLE: An agent playing a game of Solitaire can only see the visible stacks of cards and their orders in these stacks.
  - Environments can be viewed as problems to which agents provide solutions, e.g., a path between locations on a map, a sequence of rule-applications that results in a wanted goal.
  - EXAMPLE: Some example agents:
    - Roomba vacuum cleaner
    - Human players in a game of chess
    - Self-driving vehicle
    - AI personal assistant
    What are the environments, environment configurations, and problems associated with each of these agents?
  - Environments have associated state and action spaces.
    - State space = the number of possible configurations of the environment.
    - Action space = the number of possible actions relative to a state.
    - Average / worst-case measures of the sizes of state and action spaces are frequently used as measures of environment complexity.
      - Are good measures of the computational complexity of searching state / action spaces by exhaustive search.
    - EXAMPLE: State space sizes for some games
      - Tic-Tac-Toe: 10^4
      - Chess: 10^{43}
      - Go: 10^{172}
    - EXAMPLE: Total # sequences of actions in some game (game tree complexity)
      - Tic-Tac-Toe: <= 10^5
      - Chess: <= 10^{123}
      - Go: <= 10^{360}
      - Starcraft II: > 10^{1000)
    - Given that there are approximately 10^{80} atoms in the universe and these atoms are expected to decay into a quark mist in 10^{70} years, the opportunities for completing exhaustive searches for the largest state spaces are severely limited; that is why you need the smartest search algorithms possible (some of which we will be looking at over the next several lectures).
  - Environments have several properties:
    - Fully vs partially observable by agent, e.g., chess vs poker.
    - Deterministic vs stochastic from viewpoint of agent, e.g., chess vs poker.
    - Episodic vs sequential, i.e., next episode does not or does depend on previous episodes, e.g., OCR (Optical Character Recognition) vs chess.
    - Static vs dynamic, i.e., environment changes discretely when agents act or continuously (possibly even during agent decision), e.g., turn-based (chess) vs real-time (Starcraft) games.
    - Discrete vs continuous conception of time and environment change, e.g., poker /chess vs self-driving vehicles.
    - Single agent vs multiple agents, e.g., Solitaire vs chess vs self-driving vehicles.
    - Complete vs incomplete knowledge of actions and their consequences, e.g., chess vs real life.
  - Example environments and their properties:
  - EXAMPLE: Some example agents:
    - Roomba vacuum cleaner
    - Human players in a game of chess
    - Self-driving vehicle
    - AI personal assistant
    What are the properties of the environments associated with each of these agents?
  - The properties of an environment dramatically affect the computational difficulties of designing agents for and operating agents within environments; we will see this in action over the course of our lectures, many of which will consider only the simplest environments..
Potential exam questions
Problem-solving and search
- Problem-solving agents are rational agents that act in environments to solve the problems posed by those environments.
  - In the simplest case, they are goal-based agents that find a sequence of actions that satisfies some goal, e.g., find a path from a start point to a given destination.
  - More complex cases involve utility-based agents that find a goal-achieving sequence of actions that optimize some performance measure, e.g., find a shortest path (in terms of distance traveled) from a start point to a specified destination.
  - EXAMPLE: Finding a (shortest) path from our classroom at MUN to the Winners store at the Avalon Mall.
  - A solution (also known as a solution path) is a sequence of actions that terminates in the problem's goal.
- A well-defined problem consists of:
  1. An initial state that the agent starts in.
  2. The actions possible from each state.
    - Specified as a successor function, e.g., from position (x,y) in a 2D grid-based maze, you can go up, down, left, or right.
    - This is typically refined relative to each state to the viable actions for that state, i.e., the actions that do not violate constraints like moving into a wall in a 2D maze.
  3. The set of goal states
    - Specified as a goal test function that returns "Yes" if a given state is in the set of goal states and "No" otherwise, i.e., are we just outside the main door of the Winners store at the Avalon Mall? Is our chess opponent in checkmate?
  4. Action cost function
    - Most problems assume positive non-zero action costs.
- The cost of a solution is the sum of the costs of the actions in that solution's associated path; an optimal solution has the lowest cost over all possible solutions.
- EXAMPLE: Shortest path between two vertices in an edge-weighted graph G.
  - Problem definition:
    1. Start vertex in G.
    2. For each vertex v in G, the set of edges with v as an endpoint.
    3. Destination vertex in G.
    4. Edge-weight function for G.
  - The problem objective is to find a shortest (in terms of summed edge-weights) path in G from the start vertex to the destination vertex.
- EXAMPLE: Shortest path between two locations in a 2D grid G incorporating obstacles.
  - Problem definition:
    1. Start location in G.
    2. For each location l in G that is not part of an obstacle, the subset of possible moves {up, down, left, right} that do not land on some obstacle when executed at l.
    3. Destination location in G.
    4. Function that assigns cost 1 to up, down, left, and right moves.
  - The problem objective is to find a shortest (in terms of # of moves) path in G from the start location to the destination location.
- EXAMPLE: Most efficient plan for making a cup of tea.
  - Problem definition:
    1. State consisting of {cup(empty), teaBag, kettle(empty), faucet, electricity}.
    2. The set A of actions, i.e.,
      1. kettle(empty) + faucet => -kettle(empty) + kettle(fullCold)
        [fill kettle with cold water]
      2. cup(empty) + faucet => -cup(empty) + cup(fullCold)
        [fill cup with cold water]
      3. kettle(fullCold) + electricity => -kettle(fullCold) + kettle(fullHot)
        [boil kettle]
      4. cup(fullCold) + electricity => -cup(fullCold) + cup(fullHot)
        [boil cup]
      5. cup(empty) + teaBag => -teaBag + -cup(empty) + cup(teaBag)
        [add tea bag to cup]
      6. cup(empty) + kettle(fullHot) => -cup(empty) + cup(fullHot)
        [add hot water to cup]
      7. cup(teaBag) + kettle(fullHot) => -cup(teaBag) + cup(tea)
        [make tea I]
      8. cup(fullHot) + teaBag => -cup(fullHot) + -teaBag + cup(tea)
        [make tea II]
      Each action X => Y (where X is the cause and Y is the consequence) is an IF-THEN rule whose application to a state s is viable if for each x (-x) in X, x is (not) in s and results in a state s' that modifies s by adding (removing) each y to (from) s such that y (-y) is in Y, e.g., the initial state is modified to state s' = {cup(teaBag), kettle(empty), faucet, electricity} by the application of action 5.
      Given the above, for each possible state s that is a subset of {faucet, electricity, kettle(empty), kettle(fullCold), kettle(fullHot), teaBag, cup(empty), cup(fullCold), cup(fullHot), cup(teaBag), cup(tea)}, the set of viable actions for s is the subset of A that can be viably applied to s.
    3. Any state including {cup(tea)}.
    4. Function that assigns cost 1 to each action.
  - The problem objective is to find a most efficient (in terms of # of actions) plan that, starting in the initial state, creates a hot cup of tea.
  - The formalism above is a simplified version of the input language for the STRIPS (STanford Research Institute Problem Solver) automated planning system (Fikes and Nilsson (1971); see also Bylander (1994)) and will be used in Assignment #2.
  - The type of state in this problem has what is known as a factored representation, i.e., a state with a multi-component internal structure; we will be using this type of problem-state frequently later in the course.

Week 3 [DC3200, Lecture #3; RN3, Chapter 3; Class Notes]

Tuesday, September 17 (Lecture #4)
[DC3200, Lecture #3; RN3, Chapter 3; Class Notes]

Problem-solving and search (Cont'd)
- Search is the process of exploring a problem's search space to find a solution. This process generates a search tree in which the root node, edges, and non-root nodes correspond to the initial problem state, actions, and possible solution paths.
  - Starting with the root node, the search tree is generated by expanding nodes, i.e., generating the nodes reachable from a node n by viable actions, in some specified manner until either a goal node is found or it is established that no goal-state is reachable by any sequence of actions starting at the root node.
    - At each point in this process, the list of leaf nodes that has been generated but not yet expanded is known as the fringe.
    - Search algorithms can be distinguished in how they select nodes from the fringe to expand.
  - Each search tree node corresponds to problem state; however, if actions are reversible, a problem state can be associated with multiple search tree nodes and a finite-size problem state space can have an infinite-size search tree.
  - EXAMPLE: Search tree for problem of finding a shortest path between two vertices in a graph. [DC3200, Lecture #3, Slides 14-26 (PDF)]
  - EXAMPLE: Search tree for 15 puzzle [DC3200, Lecture #3, Slides 27-28 (PDF)]
  - Must distinguish problem-states and search-tree nodes - the former are configurations of the problem environment and the latter are bookkeeping data structures allowing recovery of solution paths to goals. To this send, a search tree node n typically contains the following data:
    - The problem state associated with n.
    - A pointer to n's parent node, i.e., the node whose expansion generated n.
    - The action by which n was generated from its parent node.
    - The cost of the sequence of actions on the path in the search tree from the root node to n.
      - This is traditionally denoted by g(n).
    - The depth of n, i.e., the number of actions on the path in the search tree from the root node to n.
    Additional data may also be stored in the node depending on the search algorithm used.
  - ALGORITHM: General uninformed tree search
  - ALGORITHM: Node expansion [ASSIGNMENT #1]
  - We will build on these algorithms below when we consider different search strategies.
- Search strategies
  - Measures for assessing the problem-solving performance of a search-strategy algorithm s:
    1. Completeness, i.e., is s guaranteed to find a solution if it exists?
    2. Optimality, i.e., is s guaranteed to find an optimal solution?
    3. Time complexity, i.e., how long does it take for s to find a solution or establish that a solution does not exist?
      - Measures the total number of search tree nodes generated.
    4. Space complexity, i.e., how much memory is needed to execute s?
      - Measures the maximum number of search tree nodes that must be stored at any point in the search process.
    Aspects of the search tree typically used to derive time and space complexity functions are the branching factor b (the maximum number of successor nodes for any tree node) and the depth d (of the shallowest goal node, if a solution exists, and the maximum depth of a generated node, if one does not). Note that the size of a search tree is upper-bounded by b^{d+1}.
  - Two categories of search strategies:
    - Uninformed (blind) search
      - Has no extra information about states in a search tree beyond that in the problem description, i.e., algorithm operations limited to successor, generation, and goal test.
    - Informed search
      - Has extra information which allows guesses about which states are more likely to be on useful / optimal solution paths, i.e., algorithm operations additionally include the value of heuristic node evaluation function h(n).
      - Hopefully leads to faster search.
    We will look first at uninformed search strategies and then move on to informed search strategies, with an intermediate stop to consider closed lists, a key algorithmic technique for optimizing the performance of both types of search strategies.
- Uninformed (blind) search strategies
  - Breadth-First Search (BFS)
    - Root node is expanded first, followed its successors, followed by their successors, and so on.
    - As such, all nodes at a given depth are expanded before any nodes in the next level; this is done by implementing the fringe-set as a queue.
    - ALGORITHM: BFS [ASSIGNMENT #1]
      Lines modified from the algorithm for general uninformed tree search are indicated by "<=".
    - EXAMPLE: BFS for a shortest (# edges) path between vertices A and C in an unweighted graph. [DC3200, Lecture #3, Slides 44-48 (PDF)]
    - EXAMPLE: BFS for a shortest (Manhattan distance) path between two locations in a 2D grid with obstacles [DC3200, Lecture #3, Slides 49-54 (PDF)]
    - EXAMPLE: BFS for a shortest (summed action-cost) solution to a modified STRIPS planning problem based on facts [q, a, b, c], initial state [q], goal state [c], and actions
      1. q => -q + a (cost 1)
      2. q => -q + b (cost 1)
      3. q => -q + c (cost 1)
      4. a => -a + b (cost 1)
      5. b => -b + c (cost 1)
    - Performance of BFS
      - BFS is complete.
        
        BFS can explore the entire search tree; hence, if there is a goal node in the tree, BFS will find it.
        Note that if there are multiple goal nodes in the search tree, BFS will find the shalowest one.
      - BFS is not optimal in general.
        
        This is because the path to the shallowest goal node need not have the optimal cost.
        EXAMPLE: BFS is not optimal for finding a shortest (summed edge-weights) path between vertices A and C in an edge-weighted graph. [DC3200, Lecture #3, Slide 57 (PDF)]
        EXAMPLE: BFS is not optimal in finding a shortest (summed action-cost) solution to a modified STRIPS planning problem based on facts [q, a, b, c], initial state [q], goal state [c], and actions
        
        q => -q + c (cost 5)
        q => -q + b (cost 1)
        q => -q + a (cost 1)
        a => -a + b (cost 2)
        b => -b + c (cost 1)
        
        BFS is optimal if (1) path costs never decrease as depth increases or (2) all action costs are the same.
      - Time Complexity: O(b^{d+1}), as all nodes in a search tree of depth d with branching factor b may be generated.
      - Space Complexity: O(b^{d+1}), as all generated nodes must be stored in memory to allow reconstruction of solution paths from goal nodes via parent pointers.
  - Uniform Cost Search (UCS)
    - At each point, the node with the smallest path cost, i.e., g(n)-value, is expanded; this is done by implementing the fringe-set as a priority queue wrt g(n).
      - UCS is essentially Dijkstra's algorithm!
    - UCS works only if all action cost are > 0; otherwise, can enter an infinite loop.
    - ALGORITHM: UCS [ASSIGNMENT #1]
      Lines modified from the algorithm for general uninformed tree search are indicated by "<=".
    - EXAMPLE: UCS for a shortest (summededge-weights) path between vertices A and C in an edge-weighted graph. [DC3200, Lecture #3, Slide 57 (PDF)]
    - EXAMPLE: UCS for a shortest (summed action-cost) solution to a modified STRIPS planning problem based on facts [q, a, b, c], initial state [q], goal state [c], and actions
      1. q => -q + c (cost 5)
      2. q => -q + b (cost 1)
      3. q => -q + a (cost 1)
      4. a => -a + b (cost 2)
      5. b => -b + c (cost 1)
    - Performance of UCS
      - UCS is complete
      - UCS is optimal if action costs > 0 (epsilon)
      - Time Complexity: O(b^{1 + floor(C*/epsilon)}), where C* is the cost of an optimal solution
        
        This is necessary because path cost is no longer linked to depth.
        Can be much bigger than b^{d+1} is the worst case.
      - Space Complexity: O(b^{1 + floor(C*/epsilon)})

Thursday, September 19 (Lecture #5)
[DC3200, Lecture #3; RN3, Chapter 3; Class Notes]

Problem-solving and Search (Cont'd)
- Uninformed (blind) search strategies (Cont'd)
  - Depth-First Search (DFS)
    - At each point, the node with the greatest depth is expanded; if a non-goal node is reached with no successors, DFS backs up and resumes search at the previous depth.
    - This can be done by implementing the fringe-set as a stack, but is more commonly done using recursion
    - ALGORITHM: DFS (non-recursive) [ASSIGNMENT #1]
      Lines modified from the algorithm for general uninformed tree search are indicated by "<=".
    - ALGORITHM: DFS (recursive)
    - EXAMPLE: DFS for a solution to a modified STRIPS planning problem based on facts [q, a, b, c], initial state [q], goal state [c], and actions
      1. q => -q + c (cost 1)
      2. q => -q + b (cost 1)
      3. q => -q + a (cost 1)
      4. a => -a + b (cost 1)
      5. b => -b + c (cost 1)
    - Performance of DFS
      - DFS is not complete in general
        
        Can enter infinite loop if does not keep track of already-visited states.
        EXAMPLE: DFS for a path between vertices A and B in an unweighted graph. [DC3200, Lecture #3, Slide 57 (PDF)]
        Infinite loops can be prevented by algorithm enhancements (see DLS and ID-DFS below) or by the use of closed lists (see below).
      - DFS is not optimal in general
        
        This is because it returns the first goal found.
        EXAMPLE: If B is visited before D, DFS is not optimal for finding a shortest (summed edge-weight) path between vertices A and C in an edge-weighted graph. [DC3200, Lecture #3, Slide 57 (PDF)]
        EXAMPLE: DFS is not optimal for finding a shortest (either # moves or summed action-cost) solution to the modified STRIPS planning problem given above.
      - Time Complexity: O(b^m), where m is is the depth of the deepest node.
      - Space Complexity: O(bm)
        
        This is because in the worst case, DFS stores a path of m nodes from the route along with the b successors of each node on this path.
  - Depth-Limited Search (DLS)
    - Solves the infinite-loop DFS problem by limiting the maximum search depth to some L > 0.
    - ALGORITHM: DLS (recursive)
      Lines modified from the algorithm for DFS (recursive) are indicated by "<=".
    - Performance of DLS
      - DLS is not complete when solution depth > L
      - DLS is not optimal in general
        
        This is because it returns the first goal found.
      - Time Complexity: O(b^L)
      - Space Complexity: O(bL)
  - Iterative Deepening Depth-First Search (ID-DFS)
    - In DLS, one chooses maximum search depth L by analysis of the problem's state space; alternatively, one can gradually increase the depth limit until a solution is found. This is what ID-DFS does.
    - ALGORITHM: ID-DFS
    - Performance of ID-DFS
      - ID-DFS is complete
      - ID-DFS is optimal under same conditions as BFS
      - Time Complexity: O(b^{d+1}), where d is the depth of the shallowest goal node
        
        As nodes may be recomputed as L is increased, may seem wasteful; however, in practice, generates about twice as many nodes as BFS.
      - Space Complexity: O(bd)
      As it combines the advantages of both BFS and DFS, ID-DFS is in general preferable to both BFS and DFS.
  - TABLE: Uninformed search strategy comparison [DC3200, Lecture #3, Slide 79 (PDF)]
  - INTERLUDE: A look at the mechanics of Assignment #1.
    - How do you modify the BFS, UCS, and DFS (non-recursive) algorithms above to print all solutions to a given problem?
  - None of the algorithms presented above remember the states that have already been expanded. Such states lead to duplicated effort and (if actions are reversible) infinite loops. Such states can be remembered and avoided using closed lists, which are lists of previously expanded (and hence explored) nodes.
    - To ensure efficient lookup, closed lists are often implemented using dictionaries and hash tables (the latter of which will be described in detail in several lectures).
  - In addition to eliminating infinite loops, such lists can lead to an exponential savings in the number of nodes generated; however, these lists are not universal cures for all of the ills that can plague search (see below).
  - Following terminology in Russell and Norving, tree search algorithms that use closed lists (1) rename fringe lists as open lists and, perhaps more confusingly, (2) are renamed graph search algorithms (after the classic BFS graph search algorithm that implements closed lists implicitly using a node coloring scheme to avoid re-exploring previously-encountered vertices).
  - ALGORITHM: General uninformed graph-based search
    Lines modified from the algorithm for general uninformed tree search are indicated by "<=".
  - ALGORITHM: Graph-based BFS
    Lines modified from the algorithm for (tree-based) BFS are indicated by "<=".
  - ALGORITHM: Graph-based DFS
    Lines modified from the algorithm for (tree-based) DFS are indicated by "<=".
  - Graph-based search algorithms still generate search trees; however, as each problem state can now only label only one search tree node, the search tree is effectively overlaid on the state-space graph whose vertices correspond to problem states and arcs correspond to actions linking pairs of states.
    - Problem states may repeat in the fringe data structure, as states may be encountered several times before they are expanded and added to the closed list.
  - EXAMPLE: Operation of gbBFS in finding a shortest (# edges) path between vertices A and C in a graph. [txt]
  - EXAMPLE: Operation of gbDFS; in finding a path between vertices A and C in a graph. [txt]
  - Performance of a graph-based version of search strategy s
    - Completeness is as in the tree-based version of s.
    - Optimality depends on s.
      - As graph-based search effectively discards all paths to a node but the one found first, it is only optimal if we can gaurantee that the first solution found is the optimal one, e.g., BFS or UCS with all actions having the same cost.
    - Time Complexity: O(#states)
      - This is because each state is expanded at most once and in the worst case, every state in the environment must be expanded; hence, it is the same for every search strategy.
      - In practice, the actual number of generated states is far lower than this worst case.
    - Space Complexity: O(#states)
      - This is because of the closed list.
      - Note that this also applies to tree-based versions that ran in linear space, e.g., DFS and its associated variants.

Week 4 [DC3200, Lectures #5 and #7; RN3, Chapter 3; Class Notes]

Tuesday, September 24 (Lecture #6)
[DC3200, Lecture #5; RN3, Chapter 3; Class Notes]

Problem Solving and Search (Cont'd)
- Informed (heuristic) search strategies
  - Uses problem-specific knowledge encoded as heuristic functions that (ideally, but not always) cut down the total number of nodes searched and speed up search times to goal.
    - Heuristic function h(n) gives a (hopefully good) estimate of the cost of the optimal path from node n to a goal node. If n is a goal node, h(n) = 0.
    - h*(n) = perfect heuristic which gives exact cost of optimal path from n to goal node (and wouldn't that be nice ...)
    - EXAMPLE: A heuristic h1(n) for shortest-path in edge-weighted graphs is the cost of the lowest-cost edge originating at n that does not double back to an already-explored node.
    - EXAMPLE: A heuristic h2(n) for shortest-path in 2D grids is the Manhattan distance from n to the the requested location.
    - EXAMPLE: A heuristic h3(n) for efficient plans to make tea is the modified Hamming distiance from the problem state n to a possible goal state.
      - Replace all non-specified items in the goal by * to generate pg and then compute the Hamming distance between the non-star positions in pg and the state encoded in n.
      - For example, given a goal p1 + -p3 + p4 in a five-item state, pg = 1*01* and the modified Hamming distance from pg to sn = 01000 is 2 (position 3 matches and positions 1 and 4 do not match).
    - Types of heuristics
      - An admissible heuristic never overestimates the distance from a node n to a goal node, i.e., h(n) <= h*(n ).
        
        Can be seeen as an optimistic guess.
        When h() is admissible, f(n) = g(n) + h(n) never overestimates the cost of a best-possible path from the initial state to a goal state that goes through n.
      - A consistent heuristic (also called a monotone heuristic) has |h(a) - h(b)| <= c(a,b), where c(a,b) is the optimal cost of a path from a to b.
        
        With such a heuristic, the difference of the heuristic values of two nodes must be less than or equal to the optimal cost of a path between those nodes.
        Every consistent heuristic is also admissible; indeed, it is hard (but not impossible) to construct an admissible heuristic that is not consistent.
        When h(n) is consistent, the values of f(n) = g(n) + h(n) along any path are non-decreasing; this will turn out to be very useful later in our story.
    - EXAMPLE: Which of the heuristics h1(n), h2(n), and h3(n) above are admissible? Which are also consistent?
  - Informed search strategies differ in how they combine g(n) and h(n) to create a function f(n) that is used to select the next node in the open list to expand.
  - Tree-based Greedy Best-First BFS (GBeFS)
    - Select node in open list with lowest f(n) value, where f(n) = h(n), i.e., only cares about distance to goal.
    - Resembles DFS in that it tries a single path all the way to the goal and backs up when it hits a "dead end".
    - EXAMPLE: BFS vs GBeFS in shortest-path in 2D grid world with obstacles. [DC3200, Lecture #5, Slides 19-33 (PDF)]
    - ALGORITHM: Tree-based GBeFS
      Lines modified from the algorithm for UCS are indicated by "<=".
    - Performance of GBeFS (similar to DFS)
      - GBeFS is not complete
        
        May not find goal; may even "get lost" and retrace paths if heuristic is bad.
      - GBeFS is not optimal
      - Time Complexity: O(b^m), where m = maximum explored node depth.
      - Space Complexity: O(bm)
    - Though performance not so great in general, performs well for carefully crafted heuristics; such heuristics can be derived using means-ends analysis.
    - GBeFS was the core search strategy underlying the earliest automated reasoning systems Logic Theorist (1956) (Newell and Simon (1956)) and GPS (General Problem Solver (1959) (Newell, Shaw, and Simon (1959)) as well as foundational computational modeling work on problem solving by humans (Newell and Simon (1972)).

Thursday, September 26 (Lecture #7)
[DC3200, Lecture #5 and DC3200, Lecture #7; RN3, Chapter 3; Class Notes]

Problem Solving and Search (Cont'd)
- Informed (heuristic) search strategies (Cont'd)
  - Tree-based A* search
    - Most well-known BeFS algorithm.
    - Select node in open list with lowest f(n) value, where f(n) = g(n) + h(n), i.e., cares about total distance from start state to goal.
    - ALGORITHM: Tree-based A* search
      Lines modified from the algorithm for UCS are indicated by "<=".
    - Performance of Tree-based A*
      - Tree-based A* is complete
        
        Will eventually visit all nodes in the search tree, starting with those who h-values are lowest.
      - Tree-based A* is optimal if h(n) is admissible
      - Time Complexity: O(b^m), where m = maximum explored node depth.
      - Space Complexity: O(bm)
  - Graph-based A* search
    - Let us now try to optimize the runtime of A* search by using closed lists.
    - ALGORITHM: Graph-based A* search
      Lines modified from the algorithm for tree-based A* search are indicated by "<=".
    - Performance of Graph-based A*
      - Graph-based A* is complete
      - Graph-based A* is optimal if h(n) is consistent
        
        Admissible heuristics do not suffice for optimality, as "raw" graph-based search only retains the first-encountered path to a repeated state (where a later-encountered path may be better).
        If a consistent heuristic is used, the values of f(n) along any path are non-decreasing; as the sequence of nodes expanded by graph-based A* is in non-decreasing order of f(n), any node selected for expansion is on an optimal path as all later nodes are at least as expensive!
      - Time Complexity: O(b^m)
      - Space Complexity: O(b^m)
    - EXAMPLE: (VERY detailed) Graph-based A* search for path between two locations in a 2D grid with obstacles [DC3200, Lecture #5, Slides 68-147 (PDF)]
    - Graph-based A* also has the property that no other optimal uninformed search algorithm can expand fewer nodes; that being said, the number of nodes that need to be searched and retained (as part of the closed list) is still exponential in the worst case and can be prohibitive in practice.
- Potential exam questions
- There are a variety of ways of improving the performance in practice (though not in the worst case) of the search algorithms above; these ways tend to be specific to particular types of environments, e.g., 2D grids (see DC3200, Lecture #7), or search algorithms, e.g., graph-based A* search (see DC3200, Lecture #5).
- An important example of the latter is using appropriate data structures to optimize membership query time in open and closed lists in graph-based search. Some common data structures and their worst-case membership query times for an open / closed list with n items are:
  - List /array: O(n)
  - Set / dictionary (via balanced search trees): O(log n)
  - Hash table: O(n)
  - Perfect hash table: O(1)
  Given their simplicity and the possibility of O(1) membership query time, it is worth looking at hash tables and the manners in which they can be made as perfect as possible.
- Hash Tables: Overview and Implementation
  - A hash table is at its heart a 1D array that stores instances of more complex data structures for easy lookup.
    - A commonly-stored data structure is key-value pairs.
    - EXAMPLE: A hash table for person name / phone number pairs with 16 buckets.
  - Given a data structure d, a hash function is used to compute an integer that is d's (initial) position in the hash table (also called d's bucket). Such functions are made as simple as possible to optimize membership query time.
    - As keys are typically less complex than their associated values, the hash function is applied to the key and the full key / value pair is stored in the bucket.
  - In a perfect hash table (see example above), each data structure maps to a unique position in the hash table (giving O(1) membership query time if the hash function is simple enough).
  - If two or more data structure map to the same position, i.e., a hash collision has occured, an appropriate collision resolution policy is used to redistribute the colliding items in the hash table such that membership query time is degraded as little as possible.
  - EXAMPLE: A hash collision in a hash table for person name / phone number pairs with 16 buckets.
- Hash functions
  - Have many uses in addition to hash tables, e.g., password protection, data verification (checksum creation), proximity queries in computational geometry (geometric hashing).
  - Hash functions typically have an associated input format and range of input values; if a hash function is applied to an unexpected value or a value outside the given range, the results may be poor or undefined.
  - EXAMPLE: Hashing a character string to an integer. [DC3200, Lecture #6, Slide 18 (PDF)]
  - Output ranges, i.e., the available storage indices in a hash table, can be ensured by the modulus operator.
  - Hash function properties
    - There a variety of properties a hash function can have, whose necesssity depends on the application, e.g.,
      - Determinism (A good hash function should always produce the same output for the same input)
      - Uniformity (a good hash function maps inputs as evenly as possible over its output range to prevent hash collisions)
      - Data Normalization (Ignores irrelevant features of the input, e.g., upper / lower case letters; is application-dependent, e.g., data lookup (good) vs password protection (bad)).
      - Continuity (Small changes in the input have small changes in the output; is application-dependent, e.g., geometric hashing (good) vs password protection (bad))
      - Reversibility / invertibility (input can be easily recovered from the output; is application-dependent, e.g., geometric hashing (good) vs password protection (bad))
    - One can also characterize hash functions in terms of purely mathematical properties, e.g.,
      - Injective (Every input maps to a unique output, i.e., the function is one-to-one)
        
        Note that some outputs may not be mapped.
        (Many) Injective functions are invertible.
        Such functions are perfect hash functions!
      - Surjective (Every output has at least input that maps to it, i.e., the function is onto)
        
        (Many) Surjective functions are not invertible.
      - Bijective (Every input maps to a unique output and every output maps to a unique input, i.e., the function is one-to-one and onto)
        
        Such functions are minimal perfect hash functions, i.e., they are perfect hash functions with the smallest-possible associated hash tables!
      - EXAMPLE: Diagrams of injective, surjective and bijective functions [DC3200, Lecture #6, Slide 40 (PDF)]
  - Hash function algorithms
    - Simple hash functions
      - Trivial hashing: When (typically small) data can be directly translated to its hash value, e.g., h(ASCII character) = ASCII character code.
      - Modulus hashing: h(x) = x mod n
        
        Only applicable to integers
        Focuses on lower-order bits in given integer; truncates higher-order bits.
        Can have uniformity problems if x is not a multiple of n.
      - Binning hashing: h(x) = x / n
        
        Only applicable to integers
        Focuses on higher-order bits in given integer; truncates lower-order bits.
        Uniformity characteristics almost the opposite of modulus hashing.
    - String hashing
      - Character summation hashing: Sum the ASCII / Unicode values of the characters in the string.
      - Block summation hashing: Treat each succesive block of n characters as a single long integer whose bits are the concatenated ASCII / Unicode codes for those characters and sum the long integers corresponding to all character-blocks.
      - EXAMPLE: What are the characteristics of both of these schemes wrt -jectivity, uniformity, invertibility, and input character order?
    - Geometric hashing
      - Converts n-dimensional location input (x1, x2, ..., xn) into an integer.
      - For a 2D grid with width wd and height ht, n(x,y) = (y * wd) + x; note that this h() is a minimal perfect hash function whose output i is invertible (y = i / wd, x = i % wd).
- Hash collisions: Overview and Resolution Policies
  - Collision resolution is expensive and should be avoided where possible.
  - Perfect (injective) hash functions ensure that there are no collisions; however, this requires that the input space is smaller than the hash table, which is seldom the case -- indeed, the input space is typically MUCH larger than the hash table.
  - If one is forced (as one typically is) to use imperfect hash functions, the probability of a collision rises surprisingly fast with each new hash.
    - The probability of a hash collition after n hashes with a uniform hash function into a hash table with m buckets is 50% when n is approximately sqrt(m).
    - EXAMPLE: The Birthday Paradox
      - The probability of two people having the same birthday is 50% when there are only 23 people!
      - This is effectively the probability of a hash colision when hashing n = 23 (birthday / person name) pairs into a hash table with m = 365 buckets.
    - EXAMPLE: The probability of a hash collision after n = 70,000 (250,000) hashes into a hash table indexed by 32-bit integers, i.e., m = 2^{32}, is 50% (99%).
  - Another way of quantifying this degredation of hash table performance is the load factor = n/m, with performance decreasing (due to hash collisions) as load factor increases.
  - There a variety of hash collision resolution policies, e.g.,
    - Overwiting
      - If a bucket is already occupied, overwrite it.
      - There are a variety of ways of choosing which of the old and new values to store, e.g., newest, oldest, minimum, maximum, deepest search result.
      - Though simple, this policy loses too much data and hence is not used in practice.
    - Chaining
      - Make each bucket a list, and whenever data d hashes to a bucket, add d to that bucket's list.
      - Allows a hash table to grow beyond its initial size as necessary.
      - If the hash function is uniform, each bucket's list will be small and membership queries will be as efficient as possible, i.e., query time = average list size.
      - In worst case, could have one bucket with all n hashed data items in that bucket's list, giving O(n) query time.
    - Open Addressing (OA)
      - Requires that each data item have two components k1 and k2 that can function as lookup keys.
      - General procedure for inserting data item d whose k1 and k2 components have values v1 and v2:
        
        If table location h(v1) is unoccupied, store d there.
        If table location h(v1) is occupied and its k2 component does not have value v2, scan forward in table (possibly repeatedly) in some fixed manner m until an unoccupied location is found to store d.
      - General procedure for membership query on data item d whose k1 and k2 components have values v1 and v2:
        
        If table location h(v1) is unoccupied, return false.
        If table location h(v1) is occupied and its k2 component has value v2, return true.
        Otherwise, scan forward in table (possibly repeatedly) in some fixed manner m until an unoccupied location is found (return false), an occupied location is found whose k2 component has value v2 (return true), or all table locations have been checked (return false).
      - There are a variety of forward scanning options, e.g.,
        
        linear probling (advance forward one location at a time, i.e., scan interval is 1)
        
        Can result in clustering of occupied table locations.
        If few dense clusters in table, can dramatically slow queries.
        
        quadratic probing (scan interval is is quadratic function of d)
        double hashing (scan interval is secondary hash of d)
        Many other options exist, all seeking to minimize clustering
      - Coalesced hashing
        
        Combines chaining and OA by making each bucket's list consist of other buckets in the table.
        Superior memory performance to chaining by itself as long as the number of stored data items does not exceed the table sie.
    - Coalesced Hashing
- There is no universally-applicable hash table scheme -- rather, there are best (found to date) schemes for particular applications, e.g., Zobrist hashing for game-state lists in chess.

Friday, September 27

In-class Exam #1 Notes
I've finished making up the first in-class exam. The exam will be closed-book and written in-class on paper (please bring your own pens). It will be 75 minutes long and has a total of 50 marks (this is not coincidental; I have tried to make the number of marks for a question times 1 1/2 approximately equivalent to the number of minutes it should take you to do that question). The exam will cover material in course lectures up to and including Lecture #9. There will be four questions:
1. Topics 1 and 3 (4 parts / 12 marks total)
2. Topics 2 and 3 (2 parts / 8 marks total)
3. Topic 3 (3 parts / 21 marks total)
4. Topic 5 (2 parts / 9 marks total)
I hope the above helps, and I wish you all the best of luck with this exam.

Week 5 [DC3200, Lecture #10; RN3, Chapters 4 and 5; Class Notes]

Tuesday, October 1 (Lecture #8)
[RN3, Chapter 4; Class Notes]

Problem Solving and Search (Cont'd)
- Before we leave the topic of single-agent search, it is worth looking at a class of search methods that are not necessarily complete or optimal but are often used in practice.
- Local search
  - Useful for problems where the goal state is what is wanted and the path of actions used to create that goal is irrelevant, e.g., combinatorial optimization problems like Maximum Clique:
  - Assumes a problem state-space where each state s has a "neighbourhood" N(s) of states, each of which are reachable by applying a single action to s.
  - EXAMPLE: The state space for the Maximum Clique problem for a graph G.
    The state space is the set of all subsets of vertices of G that define clique-subgraphs, e.g., {A,B,C}, {D,E}, {E}. The neighbourhood of a state s is the set of all states corresponding to those vertex-subsets of G that induce clique-subgraphs and that can be created by the addition or removal of a single vertex to or from s, e.g., the neighbourhood of state {B,C} is {{A}, {B}, {A,B,C}}.
  - These neighbourhoods, when combined with a state-evaluation function, induce a state-space landscape (Note that though the discussion and algorithms below are phrased in terms of finding maximum-value states on such landscapes, they can also be rephrased in terms of finding minimum-value states, e.g., (greedy) valley-descending search).
  - EXAMPLE (RN3, Figure 4.1):
    Useful features of such landscapes are:
    - Global maximum: A state with the highest value over all states in the state space.
    - Local maximum: A state with the highest value in that state's neighbourhood.
    - Plateau: A set of adjacent states of the same value. Can distinguish between local maximum plateau (all states adjacent to the plateau have lower value than the states in the plateau) and shoulder plateau (some states adjacent to the plateau have higher value than the states in the plateau).
    Though global maxima are most desirable, local search runs the (frequently unavoidable) risk of being trapped in local maxima.
  - Types of local search are distinguished by the methods used to escape local maxima.
    - Hill-climbing search (greedy search)
      - Starting from some initial state, go "uphill" on the state-space landscape until you cannot go uphill anymore.
      - ALGORITHM: Basic hill-climbing search (RN3, Figuire 4.2)
        
        1. Function Hill-Climbing(problem) 2. current = Node(problem.initial_state) 3. while (true) 4. neighbour = highest-value successor of current 5. if (neighbour.value <= current.value) return current 6. current = neighbour
      - This algorithm finds a local maximum, local maximum plateau, or shoulder that is accesible from the initial state.
      - EXAMPLE: Basic hill-climbing search for finding (maximal) cliques.
        
        In what ways can hill-climbing search proceed from state {A} in the graph above? State {F}? State {E}? State {D1}?
      - Not complete or optimal in general but is typically fast and very memory-efficient.
        
        Is complete and optimal if the state-space landscape has a particular "dome" structure (Cormen et al (2009), Section 16.4), e.g., Prim's and Kruskal's algorithms for finding minimum spanning trees.
      - Variants:
        
        Limited sideways-move hill climbing.
        
        Allows sideways moves in hopes that an encountered plateau is actually a shoulder.
        Need to limit the number of sideways moves so you do not go into an infinite loop on an actual plateau.
        
        Stochastic hill-climbing.
        
        Chooses at random among uphill moves with uphill move probability proportional to degree of solution improvement.
        
        First-choice hill-climbing.
        
        Chooses at random among uphill moves with uphill move probability proportional to degree of solution improvement until either a move resulting in a state with a value better than that of the current state is found or there are no uphill moves from the current state, i.e., a plateau has been reached..
        
        Random-restart hill climbing.
        
        Implements common dictum: "If at first you don't succeed, try again."
        Performs hill-climbing repeatedly from randomly-chosen start states until goal is found.
        Is complete, as one will eventually generate the goal state.
        If probability of attaining a goal from a randomly-chosen start state is p, the expected number of required restarts is 1/p.
        
        Though all of these variants are slower than basic hill-climbing search, they can be remarkably effective in certain state-space landscapes, e.g. allowing up to 100 sideways moves in a hill-climbing search for the 8-Queens problem increases the percentage of solved instances from 14% to 94%.
    - Local beam search
      - Unlike hill-climbing, which only has one current state, local beam search keeps track of the best k states found so far and performs hill-climbing simultaneously on all of them.
      - EXAMPLE: Basic local beam search for finding (maximal) cliques.
      - Is not equivalent to running k independent random-restart hill-climbing searches in parallel; in local beam search, useful information is passed between searches and resources are focused dynamically where progress is being made, i.e., "No progress there? Help me over here!".
      - Not complete or optimal in general but is typically fast and very memory-efficient.
      - Variants:
        
        Stochastic local beam search
        
        Chooses k new states at random among uphill moves from the k current states with uphill move probability proportional to degree of solution improvement.
        Analogous to natural selection in biological evolution.
  - Local search and its variants are characterized by an increasing use of computer time as a substitute for hand-crafted search strategy (Tony Middleton's "cook until done") -- as such, these methods are a first taste of the machine-crafted algorithms we will be looking at in the final lectures of this course.
- Potential exam questions

Thursday, October 3 (Lecture #9)
[DC3200, Lecture #10; RN3, Chapter 5; Class Notes]

Problem Solving and Search (Cont'd)
- So far with only looked at search in environments with a single agent; let us now consider search in multi-agent environments.
- A particularly important (and basic) case of multi-agent environments is two-person board games. These games share a set of properties:
  - Alternating moves, e.g., player 1 moves, then player 2 moves, then player 1 moves, then player 2 moves ...
  - Determinism, i.e., all moves are specified by game rules and there is no randomness.
  - Discreteness of the game environment and game moves.
  - Perfect information, i.e., both players always have complete knowledge of the game state, game rules, and the immediate consequences of all game moves.
  - The games are zero-sum, i.e., any advantage gained by one player is a disadvantage to the other player.
- How can we design an algorithm to play a two-player alternating-move game?
  1. Manually analyze the game to determine general strategies and tactics, implement the individual strategies and tactics as algorithms, and create a controller which determines which strategy and tactic algorithms are invoked at each point as the game progresses.
    - Corresponds to the way many board games such as chess are taught by and to people.
    - Was the earliest implemented approach; discarded in many cases because it is hideously difficult to do (both in analysis and two-level controler / algorithm architecture).
  2. Implement game controller and strategy / tactic decisions on an individual move basis as a set of IF (game-pattern) THEN (perform move) statements.
    - Transfers implementation difficulty from strategy / tactic / controller design to individual IF-THEN statement design
    - Hideously diffficult to do because even a moderately complex game may require many thousands of IF-THEN statements.
  3. Implement game controller and strategy / tactic decisions on an individual move basis as one-move lookahead and evaluate.
    - At each point in gameplay, look at the moves available to the current player, evaluate the game-states resulting from each of these moves, and pick the move that gives the greatest utility to the current player.
    - Assumes the current player is a rational agent.
    - Assumes a good game-state utility function, which can be very difficult to find.
    - EXAMPLE: Tic-Tac-Toe one-move lookahead and evaluate. [DC3200, Lecture #10, Slide 13 (PDF)]
    - First approach that compactly automates gameplay.
  4. Implement uninformed brute-force search of entire game tree, i.e., one-time full-depth lookahead.
    - Sidestep the difficulty of deriving a good game-state evaluation function by doing a complete game tree search.
    - Starting from the initial state, this game tree has levels that alternate the current player's options at each generated game-state and terminates in game-state leaves in which either player 1 wins, player 2 wins, or (if this is possible in the game) there is a tie.
    - Is gauranteed to find move-sequences that allow players to win; however, it does not gaurantee that both players will necessarily perform the moves specified in the sequence as it makes no assumptions about the rationality of either player.
    - Worse still, as we saw earlier in this course, game trees for even moderately-complex games are VERY large and prohibitively expensive to search completely.
    - EXAMPLE: Total # sequences of actions in some games (game tree complexity)
      - Tic-Tac-Toe: <= 10^5
      - Chess: <= 10^{123}
      - Go: <= 10^{360}
      - Starcraft II: > 10^{1000)
  5. Implement informed partial search of game tree, i.e., succesive limited-depth lookahead and evaluate.
    - Is a compromise between approaches (3) and (4) -- look ahead as far as you can afford to.
    - In this reduced game tree, the current player and the other player are called the max and min players and game-state evaluation is only performed at the leaves of the tree; the results of these evaluations are then propagated backward to the root in alternating minimization and maximization phases until a move is found at the root for the current player that gives the best possible results for them relative to the limited-depth lookahead.
    - Assumes both players are rational agents and that there is a good game-state evaluation function available.
    - EXAMPLE: Limited-depth lookahead and evaluate game tree (one move each player) [DC3200, Lecture #10, Slides 24-29 (PDF)]
    - It is this approach that underlies the popular and widely-used MiniMax AI gameplay algorithm.
- We will build up the basic limited-depth MiniMax algorithm in the following stages.
- ALGORITHM: MaxValue (one-level)
- ALGORITHM: MaxValue (full tree)
  Lines modified from the MaxValue (one-level) algorithm are indicated by "<=".
- ALGORITHM: MaxValue and MinValue (full tree)
  eval(s) returns score wrt the current (maximizing) player.
- ALGORITHM: MaxValue and MinValue (limited-depth)
  Lines modified from the MaxValue and MinValue (full tree) algorithms are indicated by "<=".
- ALGORITHM: MiniMax search (depth-limited)
- One final (and trivial) change is required to ensure that the best action for the current (max) player to take is stored in the start state.
- ALGORITHM: MiniMax search (depth-limited) + saved best action [ASSIGNMENT #2]
  - INTERLUDE: A look at the mechanics of Assignment #2.
  - EXAMPLE: MiniMax generation and evaluation of an altNim game tree from Assignment #2 (maxD = 2, i.e., two-move lookahead / startOdd = True / # given sticks = 6) (see Test #3 in Assignment #2).
  - EXAMPLE: MiniMax generation and evaluation of an altNim game tree from Assignment #2 (maxD = 3, i.e., three-move lookahead / startOdd = False / # given sticks = 15) (see Test #6 in Assignment #2).
Potential exam questions

Tuesday, October 8

In-class Exam #1

Thursday, October 10

No lecture; class cancelled

Tuesday, October 15

No lecture; midterm break

Thursday, October 17

No lecture; class cancelled

Week 8 [DC3200, Lecture #10; RN3, Chapters 5--7; Class Notes]

Tuesday, October 22 (Lecture #10)
[DC3200, Lecture #10; RN3, Chapters 5 and 6; Class Notes]

Went over In-class exam # 1.
Problem Solving and Search (Cont'd)
- Given that max(a, b) = -max(-a, -b), we can shorten the MiniMax algorithm above to create the NegaMax algorithm.
- NegaMax is harder to understand and debug but it is the version of MiniMax that is used in practice.
- ALGORITHM: NegaMax search (depth-limited)
- Performance of MiniMax search
  - MiniMax is complete wrt given maximum depth MaxD
  - MiniMax is optimal wrt given maximum depth MaxD
  - Time Complexity: O(b^{MaxD})
  - Space Complexity: O(b * MaxD)
  - MiniMax plays the Nash Equilibrium, i.e., each player makes the best response to the possible actions of the other player and thus neither player can gain by deviating from the moves proposed by MiniMax.
- A closer examination of the intermediate results created at the internal game tree nodes during MiniMax search reveals that these results can be used to prune parts of the game tree that cannot yield solutions better than those that have been found so far.
- EXAMPLE: MiniMax + pruning (one move each player) [DC3200, Lecture #10, Slides 40-50 (PDF)]
- This optimization of MiniMax is known as AlphaBeta pruning.
  - Introduces two new MiniMax parameters:
    - alpha: Best alternative for the Max player on this particular path through the game tree.
    - beta: Best alternative for the Min player on this particular path through the game tree.
  - The interval [alpha, beta] is an ever-narowing window as MiniMax search progresses which ``good'' paths must pass through and thus allows other paths to be pruned.
- ALGORITHM: MaxValue and MinValue (limited-depth + AlphaBeta)
  Lines modified from the MaxValue and MinValue (limited-depth) algorithms are indicated by "<=".
- EXAMPLE: MaxValue + AlphaBeta pruning (two moves each player) [DC3200, Lecture #10, Slides 56-110 (PDF)]
- ALGORITHM: MiniMax search (depth-limited + AlphaBeta)
  Lines modified from the MiniMax (limited-depth) algorithm are indicated by "<=".
- One final (and trivial) change is required to ensure that the best action for the current (max) player to take is stored in the start state.
- ALGORITHM: MiniMax search (depth-limited + AlphaBeta + saved best action)
  Lines modified from the MiniMax (limited-depth + AlphaBeta) algorithm are indicated by "<=".
- AlphaBeta pruning cuts the number of nodes searched by MiniMax from O(b^d) to approximately 2b^{d/2} (which can be proven to be the best possible). This allows searches to depth d under MiniMax with a fixed amount of computational resources to go to depth 2d under MiniMax + AlphaBeta with the same resources, e.g., a previously bad gameplaying program can be enhanced to beat a human world champion!
- The MiniMax + AlphaBeta algorithm can be shortened by rearranging the code above to be more efficient; a runtime limit can also be enforced by using the iterative deepening strrategy previously applied to Depth-First Search in Lecture #5 (see Lecture #10 of DC3200 for details).
Potential exam questions
- In most of our our previously discussed search algorithms (cf. local search), our primary concern when solving problems was more with sequences of actions that resulted in goal states rather than the goal states themselves.
- Are there other types of search algorithms for problems where the goal states are the primary concern?
- Let us focus on Constraint Satisfaction Problems (CSPs).
  - In a CSP, states have factored representations (see Lecture #3) that are defined by a set of variables X = {X1, X2, ..., Xn}, each of which has a value from some variable-specific domain Di, and a set C of constraints on the values of those variables.
    - A constraint is described by a pair <(scope), rel> where scope is a tuple of variables from X and rel is a relation specifying the values of the variables in rel allowed by the constraint.
    - The number or variables in scope determines that constraint's arity (e.g., 1 = unary, 2 = binary, 3 = trinary, m = m-ary where m is arbitrary (global constraint)).
    - There are typically several ways to specify a particular variable relation.
    - EXAMPLE: We can specify the binary constraint on variables X1 and X2 with common domain {A,B} that X1 and X2 have different values as <(X1,X2),[(A,B),(B,A)]> or <(X1,X2), X1 \neq X2>.
  - Each state in the state-space of a CSP specifies an assignment of values [X1 = v1, X2 = v2, ..., Xn = vn] to the variables in the CSP.
    - A value-assignment that gives a value to each variable in a CSP is said to be complete; if one or more variables are not assigned values, the value-assignment is partial.
    - A value-assignment that satisfies all given constraints in C for a CSP is said to be consistent; if one or more constraints in C are not satisfed, the value-assignment is inconsistent..
  - A solution (goal state) for a CSP is a complete consistent value-assignment for that CSP.
  - EXAMPLE: A CSP for coloring a map of Australia.
    - In map coloring, we want to assign colors to the regions in the map such that no two adjacent regions have the same color.
    - Coloring a map of Australia (part (a)) can be modeled as a CSP with variables X = {WA, NT, Q, NSW, V, SA, T} over the common domain D = {red, green, blue} and the binary constraints C = {SA \neq WA, SA \neq NT, SA \neq Q, SA \neq NSW, SA \neq V, WA \neq NT, NT \neq Q, Q \neq NSW, NSW \neq V} (where X \neq Y is shorthand for the constraint <(X,Y) X \neq Y>)..
    - Such a CSP with all binary constraints can be modeled as a constraint graph (part (b)), which is an undirected graph whose vertices correspond to variables and edges correspond to constraints.
      - CSPs with m-ary constraints, m >= 2, can be modeled as constraint hypergraphs (See RN3, Figure 6.2).
    - There are many possible solutions for the map coloring CSP above, e.g., [WA = red, NT = green, Q = red, NSW = green, V = red, SA = blue, T = red].
  - EXAMPLE: Soduku
    - In soduku, one is a given a partial assignment from the domain {1, 2, ..., 9} to the squares of a 9 x 9 grid (left) and we want to fill in the unassigned squares such that no number appears twice in any column, row, or 3 x 3 box (right).
    - A CSP for a game of soduku corresponds to a set X of 81 variables (one for each square in the 9 x 9 grid) over the common domain D = {1, 2, ..., 9}, a given partial value-assignment to X wrt D (i.e., the initially specified square-numbers), and a set C of 9 + 9 + 9 = 27 global constraints specifying that the values of the 9 variables in each column, row, and 3 x 3 box are different, e.g., AllDiff(A1, A2, A3, A4, A5, A6, A7, A8, A9) for the first row in the 9 x 9 grid.
  - One way of solving a CSP is to generate and evaluate all possible value-assignments for that CSP's variables.
    - This can be done in a tree with depth d = |X| and branching factor b = max |Di|, where each level assigns values to a particular variable and each leaf encodes a complete assignment to the variables in X.
    - This tree presupposes fixed orders on both the variables on X and the values in each domain Di.
  - EXAMPLE: A portion of a complete CSP value-assignment tree for colouring a map of Australia:
  - Generating complete value-assignment trees is memory-expensive as such trees can have b^{d + 1} - 1 nodes; can lower memory requirements by using Depth-First Search to generate leaves one at a time.
  - ALGORITHM: CSP DFS (RN3, Figure 6.5 variant)
```
1.   Function CSP_Search(csp)
2.     return CSP_DFS({}, 1, csp)

3.   Function CSP_DFS(assignment, d, csp)
4.     if (depth d = |X| + 1, i.e., assignment is complete, and
           assignment is consistent) 
5.       return assignment
6.     else
7.       var = Xd
8.       for each value in Dd
9.         add {Xd = value} to assignment
10.        result = CSP_DFS(assignment, d + 1, csp)
11.        if (result \neq failure)
12.          return result
12.        remove {Xd = value} from assignment
13.    return failure
```
  - The above fixes the memory problem but it is still time-expensive as we may generate up to b^{d + 1} - 1 nodes in the worst case. Indeed, the NP-completeness of many CSP problems suggests that such brute-force algorithms may be the best we can do if we want to solve all input instances of such CSPs.
  - ... But we typically don't want to solve all instances of a particular CSP problem, only the ones we encounter in practice -- can we do better in practice?

Thursday, October 24 (Lecture #11)
[RN3, Chapters 6 and 7; Class Notes]

Problem Solving and Search (Cont'd)
- Constraint Satisfaction Problems (CSPs) (Cont'd)
  - Let's start by unpacking the constraints from leaf-evaluation.
    - Observe that if a partial assignment does not satisfy some constraint ci in C, no completion of this partial assignment can satisfy ci. Hence, we can stop search at any node whose partial assignment is inconsistent with some constraint in C and thus prune the subtree rooted at that node.
    - ALGORITHM: CSP DFS + Partial (RN3, Figure 6.5 variant)
```
1.   Function CSP_Search(csp)
2.     return CSP_DFS2({}, 1, csp)

3.   Function CSP_DFS2(assignment, d, csp)
4.     if (depth d = |X| + 1, i.e., assignment is complete)    <=
5.       return assignment
6.     else
7.       var = Xd
8.       for each value in Dd
9.         if (value is consistent with assignment)          <=
10.          add {Xd = value} to assignment
11.          result = CSP_DFS2(assignment, d + 1, csp)
12.          if (result \neq failure)
13.            return result
14.          remove {Xd = value} from assignment
15.    return failure
```
      Lines modified from the algorithm CSP DFS are indicated by "<=".
    - EXAMPLE: A portion of a CSP value-assignment tree for colouring a map of Australia which allows pruning of inconsistent partial assignment subtrees (dashed box):
  - Further improvements arise from dynamic variable / domain-value ordering heuristics.
    - Variables can be ordered by the minimum-remaining value (MRV) or degree heuristics, which respectively prefer the variables with the most constrained domains and the variables that are involved in the largest number of constraints, e.g., choose SA first in coloring a map of Australia.
    - Domain values can be ordered by the least-constraining heuristic, which prefers values that rule out the fewest choices for neighbouring values in the constraint graph and thus allows maximum flexibility for future value-choices.
    - Good variable-ordering heuristics prune the largest parts of search trees quickest (fail-first) while good domain-value ordering heuristics consider first those values most likely to lead to solutions (fail-last).
    - ALGORITHM: CSP DFS + Partial + Ordering (RN3, Figure 6.5 variant)
```
1.   Function CSP_Search(csp)
2.     return CSP_DFS3({}, csp)

3.   Function CSP_DFS3(assignment, csp)
4.     if (assignment is complete) 
5.       return assignment
6.     else
7.       var = selectUnassignedVariable(csp, assignment)            <=
8.       for each value in orderDomainValues(var, assignment, csp)  <=
9.         if (value is consistent with assignment)
10.          add {var = value} to assignment
11.          result = CSP_DFS3(assignment, csp)
12.          if (result \neq failure)
13.            return result
14.          remove {var = value} from assignment
15.    return failure
```
      Lines modified from the algorithm CSP DFS + Partial are indicated by "<=".
  - We can also make inferences over the constraints themselves (constraint propagation) to derive useful domain restrictions for variables.
    - Such constraint consistency checks can involve one or more constraints of varying constraint arities.
      - Enforcing constraints that a variable has a particular value (unary consistency), e.g., the constraint <(SA) SA = red> that the SA region of the map of Australia must be colored red.
      - Enforcing constraints on pairs of variables (arc consistency) such that every value in a variable's domain can satisfy all binary constraints containing that variable, e.g., the constraint <(WA,NT) WA \neq NT> is satisfied by picking any color for WA and not picking the same color for NT.
      - Enforcing (sets of) constraints on 3 variables (path consistency), e.g., WA, NT, and SA are pairwise linked by difference constraints and hence must have three different colors.
    - Constraint propagation typically reduces the sizes of variable domains; if in this process any variable domain becomes empty, at least one of the type of constraints being checked cannot be satisfied by any assignment of values to the variables in the CSP.
    - ALGORITHM: AC-3 for arc consistency (RN3, Figure 6.3)
    - In the best case, constraint propagation derives solutions and replaces search, e.g., constraint propagation algorithm for map 2-coloring; however, it is more commonly done as preprocessing for and subsequent interleaving with search.
    - ALGORITHM: CSP DFS + Partial + Ordering + Inference (RN3, Figure 6.5)
```
1.   Function CSP_Search(csp)
2.     return CSP_DFS4({}, csp)

3.   Function CSP_DFS4(assignment, csp)
4.     if (assignment is complete) 
5.       return assignment
6.     else
7.       var = selectUnassignedVariable(csp, assignment)
8.       for each value in orderDomainValues(var, assignment, csp)
9.         if (value is consistent with assignment)
10.          add {var = value} to assignment
11.          inferences = getInferences(csp, var, assignment)       <=
12.          if (inferences \neq failure)                           <=
13.            add inferences to assignment                         <=
14.            result = CSP_DFS4(assignment, csp)
15.            if (result \neq failure)
16.              return result
1r.        remove {var = value} and inferences from assignment      <=
18.    return failure
```
      Lines modified from the algorithm CSP DFS + Partial + Ordering are indicated by "<=".
  - More complex algorithm enhancements yield further improvements in performance, e.g., intelligent backtracking (backjumping).
  - Local search may also be done with respect to CSPs, using a state-space landscape based on complete assignments where states are adjacent if they differ in a single variable assignment and the state evaluation function is the number of constraints in C that are not satisfied by the assignment encoded in that state.
  - ALGORITHM: MinConflicts (RN3, Figure 6.8)
  - Min-conflicts is surprisingly effective for many CSPs. For example, the run-time of Min-conflicts on the n-Queens problem is independent of problem size such that the million-Queens problem is solved in an average of 50 steps!
Potential exam questions
Logical Reasoning
- Use richer representations of knowledge to go from domain-specific search procedures to domain-general search procedures. Here we will focus on knowledge representations based on various logics.
- Knowledge-based agents
  - knowledge base (KB) = set of (true) sentences.
  - Interact with KB by TELL and ASK operations, both of which may involve inference, i.e., creating new sentences from old.
    - Inference obeys the requirement that any new sentences that are derived must follow (in a technical sense we will discuss later) from one or more of the sentences known from previous TELL interactions or inferences.
  - ALGORITHM: Knowledge-based agent percept->action computations (RN3, Figure 7.1)
  - A KB initially encodes only background knowledge, and is enriched both by percepts obtained during agent interaction with the environment and inference over that KB. As such, KB are often described at the knowledge level, i.e., what is known and inferred, rather than the implementation level, i.e., how what is known is physically represented.
  - An agent's behavior can be derived purely from a KB, percepts, and inference (declarative approach) or from hard-coded behaviors (procedural approach).
    - Successful agents often combine these approaches, using declarative knowledge for generality and compiling this knowledge into procedural code when necessary for efficiency.
- EXAMPLE: Wumpus World (RN3, Figure 7.2)
  - Wumpus World (WW) is a puzzle game in which an agent must explore a cave to find and return a cache of gold while avoiding bottomless pits and the deadly wumpus.
  - The WW problem is described as follows:
    - Environment: A 4 x 4 square containing one wumpus, one cache of gold, and several bottomless pits. The agent always starts at square [1,1] facing right and initially has no knowledge of the environment except that agent's percept at [1,1].
    - Sensors: An agent has five one-bit sensors:
      1. Stench: Returns True if the wumpus (alive or dead) is in same square as the agent or is in a square directly (not diagonally) adjacent to the agent.
      2. Breeze: Returns True if the agent is directly adjacent to a bottomless pit.
      3. Glitter: Returns True if the agent is in the same square as the cache of gold.
      4. Bump: Returns True if the agent walks into one of the four walls surrounding the grid.
      5. Scream: Returns True if the wumpus dies after being struck by an arrow (see below).
      A percept will be given as a binary vector of the current values for these sensors. For example, if there is a stench and a breeze but no glitter, bump or scream, the percept will be [S,B,-,-,-].
    - Actuators: The agent can perform the following actions:
      - The agent can move forward one square (Forward), turn left 90 o (TurnLeft), or turn right 90 o (TurnRight).
        
        If a move forward would cause an agent to bump into a wall, the agent does not move.
        If an agent enters a square with a live wumpus or a bottomless pit, the agent suffers a horrible death.
      - The agent can Shoot its only arrow forward in the direction the agent is facing.
        
        The arrow travels in that direction until it either hits the wumpus (and kills it) or strikes a wall.
      - If the agent is in the same square as the cache of gold, the agent can Grab the cache.
      - If the agent is on square [1,1], it can Climb out of the cave.
    - Performance Measure: +1000 for climbing out of the cave with the gold, -1000 for falling into a bottomless pit or being eaten by the wumpus, -10 for shooting the arrow, and -1 for each other action.
    The game ends when the agent dies or climbs out of the cave. Note that even if an agent lives, it may not be successful, e.g., the cache of gold may be surrounded by bottomless pits and hence inaccessible.
  - The main challenge to an agent in WW is that agent's initial ignorance of the configuration of the environment; this ignorance is incrementally dispelled by reasoning-guided exploration, e.g.,
    1. In [1,1], percept is [-,-,-,-,-]; squares [1,2] and [2,1] are OK to move into => move to square [2,1].
    2. In square [2,1], percept is [-,B,-,-,-]; squares [2,2] and [3,1] may have bottomless pits => move back to [1,1] and then to [1,2].
    3. In [1,2], percept is [S,-,-,-,-]; the wumpus cannot be in [2,2] (as stench would have been perceived when visiting [2,1]) so the wumpus is in [1,3] and [2,2] is OK => move to square [2,2].
    4. In [2,2], percept is [-,-,-,-,-]; squares [2,3] and [3,2] are OK => move to square [2,3].
    5. In [2,3], percept is [S,B,G,-,-]; the cache of gold is in this square => grab cache of gold, retrace path to [1,1] on OK squares, and climb out of the cave.
    Several of the inferences made above, e.g., in (3), are difficult because they involve knowledge gained at different times and in different places.
  - Note that when the agent draws a conclusion from available information, that conclusion is only guaranteed to be correct if the available information is correct. Hence, it is critical that sensors return correct values and actuators correctly and completely perform requested actions.

Week 9 [DC3200, Lecture #10; RN3, Chapter 7; Class Notes]

Tuesday, October 29 (Lecture #12)
[RN3, Chapter 7; Class Notes]

Logical reasoning (Cont'd)
- Logic: An Overview
  - Each logic has an associated syntax and semantics, where the syntax specifies all well-formed sentences in that logic and the semantics specifies whether or not each well-formed sentence in that logic is true with respect to each possible world (or, more formally, model).
    - A possible world is a (potentially) real world one or more agents exist within, while a model is a mathematical abstraction which fixes the truth or falsity of every relevant sentence relative to variable-values proposed by that model.
    - If a sentence s is true in a model m, we say that m satisfies s or m is a model of s.
    - EXAMPLE: A possible world consists of x men and y women sitting at a table playing bridge; as a game of bridge requires exactly 4 players, the sentence s = "x + y = 4" must be true. Formally, all possible models for this world are the assignments of non-negative integers to x and y and the models that satisfy, s are those that assign values to x and y that add to 4, e.g., [x = 1 / y = 3], [x= 4 / y = 0].
  - Given a definition of truth, we can now define logical reasoning / inference as the process of correctly deriving a conclusion s from a body of knowledge KB (KB |-i s).
    - a |= b means that sentence a entails sentence b, i.e., b follows from a.
    - More formally, a |= b if and only if M(a) \subseteq M(b), i.e., in every model for which a is true, b is also true.
    - Given this definition of entailment, logical inference is the process of deriving a valid conclusion s from a knowledge base KB (sometimes written as KB |-i s).
    - Think of all consequences of KB as a haystack and a particular consequence s as a needle. Entailment can say whether or not a specified needle is in a haystack, and inference is the finding of that needle.
  - Model checking is an inference procedure that enumerates all possible worlds and for a sentence s checks if s is true in all models for which KB is true; as such, it shows one way in which entailment can be used to derive conclusions.
  - EXAMPLE: Model checking for Wumpus World (RN3, Figure 7.5)
    - Let KB be the KB at the beginning of step (2) in the reasoning-based exploration of WW above. Relative to KB, consider the following two sentences:
      - a1 = "There is no pit in [1,2]".
      - a2 = "There is no pit in [2,2]".
    - From inspection, we can see that for every model in which KB is true, a1 is also true and hence KB |= a1. However, we can also see that in some models in which KB is true, a2 is false, i.e., KB |\= a2. Hence we can conclude that there is a no pit in [1,2] but we cannot conclude that there is not (or, for that matter, is) a pit in [2,2].
  - Desirable properties of inference algorithms:
    - Soundness (Truth-preserving); An inference algorithm that only derives entailed conclusions is sound, i.e., the inference algorithm does not make up things.
    - Completeness: An inference procedure that can derive any conclusion that is entailed is complete, i.e., it is not the case that there are entailed conclusions that the inference algorithm cannot derive.
  - Logical reasoning should also be grounded in the real world, i.e., if s is derived from a KB by a sound inference procedure then s should be true in the real world. One does this by making sure sensors and actuators are correct (grounded percepts) and taking care in grounding inference rules in the real world.
- Propositional logic (PL)
  - Syntax
    - Sentences in PL can be atomic sentences consisting of a single Boolean literal, i.e., True, False, or propositional symbol, e.g., P, Q, R, or complex sentences built up using logical connectives, e.g., not, and, or, => (implies), <=> (if and only if), and parentheses.
      - Though there is a precedence order on logical connectives, it is perhaps best to use parentheses to denote the wanted recursive structure of complex sentences.
    - Syntax of PL (RN3, Figure 7.7)
  - Semantics
    - The truth value of a sentence s in PL for a given model M is built up recursively in a bottom-up fashion from the truth-values of the propositional symbols in s wrt M by applying the truth tables for the logical connectives in s, e.g.,
      
      The truth table for => is counterintuitive because P => Q (unlike natural language "if P then Q") does not imply causation or relevance between P and Q, e.g., "5 is odd => Tokyo is the capital of Japan" is true in PL. Rather, P => Q should be interpreted as "If P is true then I am claiming Q is true; otherwise, I am making no claim".
  - EXAMPLE: A PL KB for Wumpus World
    - Focus for now on the immutable aspects of WW. This will be done for each location [x,y] in the cave with the following symbols:
      - Pxy: True if there is a pit in [x,y].
      - Wxy: True if there is a wumpus (dead or alive) in [x,y].
      - Bxy: True if the agent perceives a breeze in [x,y].
      - Sxy: True if the agent perceives a stench in [x,y].
    - The following sentences will suffice to derive not P12 (there is no pit in [1,2]):
      - R1: not P11 (there is no pit in [1,1])
      - R2: B11 <=> (P12 or P21)
      - R3: B21 <=> (P11 or P22 or P31)
      - R4: not B11
      - R5: B12
  - PL Inference procedures
    - Given a knowledge base KB and a sentence s in in PL, we want to determine if KB |= s, e,g, for the WW KB, is it the case that KB |= not P12?
    - We will here consider two broad classes of inference procedures: model checking and theorem proving.
    - Model checking
      - Basic model checking
        
        Given a set S of relevant propositional symbols, for all possible truth-assignments to the symbols in S, i.e., all possible models for S, check that s is true for every truth-assignment for which KB is true.
        ALGORITHM: TT-Entails? (RN3, Figure 7.10)
        
        1. Function TT-Entails?(KB, s) 2. sym = list of proposition symbols in KB and s 3. return TT-Check-All(KB, s, sym, {}) 4. Function TT-Check-All(KB, s, sym, model) 5. if Empty?(sym) 6. if PL-True?(KB, model) return PL-True?(s, model) 7. else return true 8. else 9. P = First(sym) 10. rest = Rest(sym) 11. return (TT-Check-All(KB, s, rest, model + {P = true}) and TT-Check-All(KB, s, rest, model + {P = false})) PL-True?(S, m) returns true if sentence-set S holds within model m and false otherwise.
        
        Note that this algorithm uses depth-first search to implicitly generate and explore the basic complete model-tree for KB and s to enumerate all models for which both KB and s are true.
        EXAMPLE: The basic complete model-tree for the sentence ``not A or B" in propositional logic relative to variabl-ordering B -> A and value-ordering F -> T.
        
        Note that all nodes in the tree corresponding to consistent models have ben circled.
        This algorithm is sound (as it directly implements the definition of entailment) and complete (because it works for any KB and s, and always terminates as there are only finitely many models to examine).
        Note that "finitely many" is not always the same as "few"; for our WW example in which s = not P12, the relevant symbols in KB and s are B11, B21, P11, P12, P21, P22, and P31, which have 2^7 = 128 possible models.
        More generally, if there are n propositional symbols in KB and s, the algorithm above runs in O(2^n) time, which is true of every known inference algorithm for PL (and may be impossible to better for all inputs as propositional entailment is coNP-complete and probably no easier than NP-complete).
      - Advanced model checking
        
        We will look at two families of algorithms that perform well in practice using model-tree pruning and local search, respectively.
        We also i) rephrase entailment statements KB => s {which is equal to not KB or s) as determining the unsatisfiability of (KB and not s) (which is obtained from the above by applying DeMorgan's Law) and ii) rephrase both KB and s in Conjunctive Normal Form (CNF) in which a sentence is a conjunction (AND) of disjunctive (OR) clauses, e.g., (A or not B) and (B or not C or D) and (B or D).
        
        There is a standard procedure for transforming an arbitrary PL sentence into a CNF equivalent (RN3, pp. 253-254).
        As many other problems can be reduced to CNF SAT, our derived algorithms have a much broader applicability.
        
        Backtracking + heuristics (Davis-Putnam-Logeman-Loveland (DPLL))
        
        DPLL is a depth-first search of the model-tree that uses partial model unsatisfiability and various sentence-reducing heuristics to prune subtrees containing non-satisfying models, e.g.,
        
        Pure heuristic: Assign pure symbols (symbols that have the same sign in all occurrences) the value that makes them true.
        Unit heuristic: Assign symbols in single-symbol clauses the value that makes them true.
        Note that both heuristics when invoked may trigger a cascade of subsequent sentence reductions.
        ALGORITHM: DPLL-Satisfiable? (RN3, Figure 7.17)
        
        1. Function DPLL-Satisfiable?(s) 2. sym = list of proposition symbols in s 3. cls = list of clauses in CNF representation of s 4. return DPLL(cls, sym, {}) 5. Function DPLL(cls, sym, model) 6. if every clause in cls is true in model return true 7. if some clause in cls is false in model return false 8. (P, value) = Find-Pure-Symbol(sym, cls, model) 9. if P is non-null return DPLL(cls, sym - P, model + {P = value}) 10. (P, value) = Find-Unit-Clause(cls, model) 11. if P is non-null return DPLL(cls, sym - P, model + {P = value}) 12. P = First(sym) 13. rest = Rest(sym) 14. return (DPLL(cls, rest, model + {P = true}) or DPLL(cls, rest, model + {P = false}))
        
        A variety of scale-up enhancements to the DPLL algorithm, e.g., variable and value ordering, allow modern SAT solvers to tackle input instances with tens of millions of variables.
        
        Local search
        
        Uses a model-evaluation function h(m) that counts the number of clauses in the given instance of CNF SAT that are not satisfied by m.
        At each step, the algorithm picks an unsatisfied clauses and selects at random one of two ways to chose a variable in that clause to flip.
        ALGORITHM: WalkSAT (RN3, Figure 7.18)
        
        1. Function WalkSAT(cls, p, maxFlips) 2. model = random assignment of true / false to symbols in cls 3. for i = 1 to maxFlips do 4. if model satisfied all clauses in cls return model 5. c = randomly-selected clause in cls that is false in model 6. with probability p 7. flip the value in model of a randomly-selected symbol in cls 8. else 9. flip the value in model of whichever symbol in cls whose flipped value maximizes the number of satisfied clauses 10. return failure
        
        A returned failure can mean either that we did not give the algorithm enough time or that the given sentence is unsatisfiable; only the former can be fixed by increasing maxFlips. Hence, WalkSAT is more commonly used when we expect that a solution exists then when we need to show unsatisfiability / entailment.

Wednesday, October 30

In-class Exam #2 Notes
I've finished making up the second in-class exam. The exam will be closed-book and written in-class on paper (please bring your own pens). It will be 75 minutes long and has a total of 50 marks (this is not coincidental; I have tried to make the number of marks for a question times 1 1/2 approximately equivalent to the number of minutes it should take you to do that question). The exam will cover material in course lectures #8-14. There will be four questions:
1. Topics 4 and 7 (4 parts / 14 marks total)
2. Topics 6 and 7 (5 parts / 10 marks total)
3. Topic 6 (2 parts / 12 marks total)
4. Topic 7 (2 parts / 14 marks total)
I hope the above helps, and I wish you all the best of luck with this exam.

Thursday, October 31 (Lecture #13)
[RN3, Chapter 7; Class Notes]

Logical reasoning (Cont'd)
- Propositional logic (PL) (Cont'd)
  - PL Inference procedures (Cont'd)
    - Theorem proving
      - Given a knowledge base KB, a sentence s, and a set R of inference rules, each of which specifies a new sentence that is implied by one or more existing sentences, derive a sequence of applications of rules in R to KB that results in s, i.e., a proof that KB entails s.
        
        If the number of models is large but the length of the proof is short, theorem proving can be more efficient then model checking.
      - Inference and proofs
        
        An inference rule that says sentences s1, s2, ..., sn imply sentence s is written as
        
        s1 s2 ... sn ------------ s
        
        Two classic inference rules:
        
        Modus Ponens (Latin: "mode that affirms")
        
        s1 => s2 s1 ------------- s2
        
        AND-elimination
        
        s1 and s2 --------- s1
        
        Other inference rules can be derived from logical equivalences.
        
        Two sentences s1 and s2 are logically equivalent, i.e., s1 leq s2, if they are true for the same models.
        Any logical equivalence s1 leq s2 has two associated inference rules
        
        s1 s2 ----- ---- s2 s1
        
        As such, each logical equivalence allows inference in two directions. Note that Modus Ponens and AND-elimination can only infer in one direction.
        Standard logical equivalences (Table 7.11)
        
        EXAMPLE: Theorem Proving by Hand in Wumpus World (Part I)
        
        Recall the following simple KB for WW when we are trying to establish not P12 (there is no pit in [1,2]):
        
        R1: not P11 (there is no pit in [1,1])
        R2: B11 <=> (P12 or P21)
        R3: B21 <=> (P11 or P22 or P31)
        R4: not B11
        R5: B12
        
        We can prove not P12 as follows:
        
        Apply biconditional elimination to R2:
        
        R6: (B11 => (P12 or P21) and ((P12 or P21) => B11))
        
        Apply AND-elimination to R6:
        
        R7: (P12 or P21) => B11
        
        Apply contrapositive to R7:
        
        R8: not B11 => not (P12 or P21)
        
        Apply Modus Ponens to R8 and percept R4:
        
        R9: not (P12 or P21)
        
        Apply De Morgan's Rule to R9:
        
        R10: not P12 and not P21
        
        Apply AND-elimination to R10:
        
        R11: not P12
        
        Observe that such logical inference is monotonic, i.e., the number of sentences in KB never decreases.
      - We can automate the theorem proving process using any of our search algorithms (informed or uninformed) relative to the following search problem:
        
        Initial State: Knowledge base KB = KB0.
        Actions: Inference rule-set R, with the proviso that the viable actions for any state KBi, i geq 0, is the set of rules whose top half matches KBi.
        Result: The result of applying any viable action is to add the sentence in the lower half of that rule to KBi to create KB(i+1).
        Goal: Wanted sentence s is in KBi, i geq 0.
        Courtesy of the monotonicity of logical inference, it is guaranteed that any order of application of valid inference rules in a proof will lead to the wanted sentence s if such a proof is possible wrt KB and R.
      - In many cases, proof search can be more efficient than model checking because it ignores irrelevant propositions, no matter how many of them there are, e.g., the WW proof above only requires R2 and R4, even if there were millions of sentences in KB in addition to R1, R3, and R5!
      - Wait a minute .. Before we get too excited, read the fine print, i.e., the phrase "if such a proof is possible wrt KB and R".
        
        The above establishes that logical inference is sound -- how do we ensure that it is complete?
        Using a complete search algorithm helps; however, we also need to ensure that our set R of inference rules is powerful enough. For example, if biconditional elimination was not part of R, the WW theorem proving example above would not work.
        How do we determine if a given R is powerful enough to do what we want?
      - Proof by resolution
        
        Ensures sound and complete logical inference with a single inference rule!!!
        Let us first consider the unit resolution rule, which holds if li and m are complementary literals, i.e., one is the complement of the other:
        
        l1 or l2 or ... or lk m ------------------------------------------------ l1 or l2 or ... or l(i-1) or l(i+1) or ... or lk
        
        Any sentences derived using the unit resolution rule are called resolvents.
        EXAMPLE: Theorem Proving by Hand in Wumpus World (Part II)
        
        Recall the following KB for WW after we established not P12 (there is no pit in [1,2]):
        
        R1: not P11 (there is no pit in [1,1])
        R2: B11 <=> (P12 or P21)
        R3: B21 <=> (P11 or P22 or P31)
        R4: not B11
        R5: B12
        R6: (B11 => (P12 or P21) and ((P12 or P21) => B11))
        R7: B11 => (P12 or P21)
        R8: not B11 => not (P12 or P21)
        R9: not (P12 or P21)
        R10: not P12 and not P21
        R11: not P12
        
        When the agent returned from [2,1] to [1,1] and then went to [1.2], recall that it perceived a stench but no breeze. We can now add the following facts to KB:
        
        R12: not B12
        R13: B12 <=> (P11 or P22 or P12)
        
        Using a process similar to that by which we established R11 earlier, we can now derive the following:
        
        R14: not P22
        R15: not P13
        
        We can also apply biconditional elimination R3, followed by Modus Ponens with R5:
        
        R16: P11 or P22 or P31
        
        Now, the question is, where is the pit? To find out, we first apply the unit resolution rule to R16 and R14:
        
        R17: P11 or P31
        
        Finally, we apply the unit resolution rule to R17 and R1:
        
        R18: P31
        This establishes definitely that the pit is in [3,1].
        
        Let us now extend the unit resolution rule to create the (generalized) resolution rule, which holds if li and mj are complementary literals:
        
        l1 or ... or lk m1 or ... or mn --------------------------------------------- l1 or ... or l(i-1) or l(i+1) or ... or lk or m1 or ... or m(j-1) or m(j+1) or ... or mn
        
        This rule is sound (RN3, p. 253).
        Observe that if repeated applications of the resolution rule to a CNF sentence s results in the empty clause then s is unsatisfiable (as a disjunctive clause is true if and only if that clause has at least one literal that evaluates to true). Given that KB |= s if and only if (KB and not s) is unsatisfiable, we now have a resolution-based algorithm for proving entailment.
        EXAMPLE: Theorem Proving by Hand in Wumpus World (Part III)
        
        Suppose the agent in WW is in [1,1], perceives no breeze, and wishes to establish that there is no pit in [1,2], i.e., not P12. The relevant knowledge base is
        
        KB = R2 and R4 = (B11 <=> P12 or P21) and not B11
        
        which means that
        
        KB and not (not P12)= ((B11 <=> P12 or P21) and not B11 and P12)
        
        whose CNF form has five disjunctive clauses:
        
        C1: not P21 or B11
        C2: not B11 or P12 or P21
        C3: not P12 or B11
        C4: not B11
        C5: P12
        
        Observe that the resolution of C3 and C4 gives clause (not P12), whose resolution with C5 is the empty clause. Hence, there is no pit in [1,2].
        
        ALGORITHM: PL-Resolution (RN3, Figure 7.12)
        
        1. Function PL-Resolution(KB, s) 2. cls = list of clauses in CNF representation of (KB and not s) 3. new = {} 4. while (true) 5. for each pair of clauses (ci, cj) in cls do 6. resolvents = PL-Resolve(ci, cj) 7. if resolvents contains the empty clause return true 8. new = new + resolvents 9. if new \subseteq cls return false 10. cls = cls + new PL-Resolve(c1, c2) returns the set of all possible clauses obtained by resolving c1 and c2.
        
        It can be shown that this algorithm is both sound (RN3, p. 253) and complete (RN3, pp. 255-256).
      - There are a number of cases in which the full power of resolution is not needed to do logical inference. One restricted case that allows efficient (indeed, linear time) inference is knowledge bases composed of definite clauses.
        
        A definite clause is a disjunction of one or more literals in which exactly one literal is positive, e.g., (not P or Q), (P or not Q or not R), and (R) are all definite clauses while (not P or not R) and (not Q) are not.
        Knowledge bases containing only definite clauses are interesting for three reasons:
        
        Any definite clause can be rewritten as a conjunction of the negations of the negated literals (the clause's premise) that implies the unnegated literal (the clauses' conclusion), e.g., (P or not Q or not R) is logically equivalent to (Q and R) => P. A definite clause with one literal is logically equivalent to True and is called a fact.
        Logical inference over definite clauses can be done by either the forward or backward chaining algorithms.
        Both of these algorithms, when implemented carefully, run in time linear in the size of KB.
        
        Both of these algorithms can perhaps best be explained by visualizing a definite clause KB as an AND-OR graph.
        
        An AND-OR graph is a directed graph in which each non-leaf vertex is either an AND vertex or an OR vertex.
        A definite clause KB can converted to an AND graph in which:
        
        each symbol is a vertex;
        each fact is a leaf;
        each symbol which is the conclusion of exactly one clause is an AND-vertex which is connected to the symbol-vertices in that clause's premise; and
        each symbol which is the conclusion of multiple clauses is an OR-vertex connected to the AND-vertices corresponding to these clauses, each of which is in turn connected to the symbol-vertices in their respective premises.
        
        EXAMPLE: A definite clause KB and its associated AND-OR graph (RN3, Figure 7.16)
        
        In the examples below, the five clauses will be abbreviated as Q, P, M, L1, and L2, respectively.

Week 10 [RN3, Chapter 7; Class Notes]

Tuesday, November 5 (Lecture #14)
[RN3, Chapter 7; Class Notes]

Logical reasoning (Cont'd)

Propositional logic (PL) (Cont'd)

PL Inference procedures (Cont'd)

Theorem proving (Cont'd)

Restricted cases of theorem proving (Cont'd)

Forward chaining (FC)

Starting from facts in the KB, FC proves successively more implications using the clauses in KB until either a wanted query-symbol qs is proved true or no more clauses can be applied to derive new implications, i.e., qs is false.

ALGORITHM: PL-FC-Entails? (RN3, Figure 7.15) [ASSIGNMENT #3]

1.   Function PL-FC-Entails?(KB, qs)
2.     count = a table, where count[c] is the number of
                symbols in clause c's premise
3.     inferred = a table, where inferred[s] is initially 
                   false for each symbol s
4.     agenda = a queue of symbols; initialized to all symbols 
                 known to be true in KB
5.     while (agenda is not empty)
6.       p = agenda.pop()
7.       if (p = qs) return true
8.       if (inferred[p] = false)
9.         inferred[p] = true
10.        for each clause c in KB where p is in c.premise do
11.          count[c] = count[c] - 1
12.          if (count[c] = 0)
13.            agenda.push(c.conclusion)
14.    return false

EXAMPLE: FC on a definite Clause KB
- Suppose we want to prove that qs = Q is entailed by the definite clause KB in the example above.
- Starting with facts A and B on the agenda queue and premise counts for L, M, P, and Q set to 2, 2, 2, and 1, respectively the following happens in each while-loop iteration:
  1. Pop A; A inferred; premise counts for L1 and L2 each reduced by 1.
  2. Pop B; B inferred; premise counts for M and L2 each reduced by 1; as L2's premise count goes to 0, push L onto agenda.
  3. Pop L; L inferred; premise counts for P and M each reduced by 1; as M's premise count goes to 0, push M on agenda.
  4. Pop M; M inferred; premise count for P reduced by 1; as P's premise count goes to 0, push P onto agenda.
  5. Pop P; P inferred; premise counts for Q and L1 each reduced by 1; as both Q's and L1's premise counts go to 0, push Q and L onto agenda.
  6. Pop Q; as qs = Q, return True.
  Note in the above that clauses are processed in the order in which they are given.
FC is sound as every inference is an application of Modus Ponens. FC is also complete as every entailed atomic sentence will eventually be derived (RN3, p. 258).
FC is an example of data-driven reasoning. It can be used to derive implications from incoming percepts without a guiding query; if so, it must be kept under tight control lest KB be swamped with irrelevant conclusions ("There are a million things that are true. The question is, which ones are relevant?" -- Barbara Gowdy, "Lizards").

EXAMPLE: Consider the set of definite clauses in the second test-set of Assignment #3 (DCClause2.txt):

Clauses   5
1: A + B => S1
2: A + C + D => S2
3: C + D => S3
4: S1 + S3 => D1
5: S2 + S3 => D2

The implications derived by FC from each possible set of facts relative to these clauses is given in the following table:

Available Facts Derived Implications

S1 S2 S3 D1 D2

{A}

{B}

{C}

{D}

{A,B} X

{A,C}

{A,D}

{B,C}

{B,D}

{C,D} X

{A,B,C} X

{A,B,D} X

{A,C,D} X X X

{B,C,D} X

{A,B,C,D} X X X X X

FC as given here can state whether or not there is a proof linking available facts to a specified implication; however, it does not give any let alone all such proofs (which is required for explainability, e.g., medical diagnosis).

Backward chaining (BC)

Starting from wanted query-symbol qs in the KB, BC runs successively more clauses in reverse until either all true facts required by the premises of these clauses are found (i.e., qs is true) or not (i.e., qs is false).
The path from qs to the current symbol being explored is maintained to keep the process from going into infinite loops associated with cycles in the AND-OR graph (RN3, p. 136).

ALGORITHM: PL-BC-Entails? (Adapted from RN3, Figure 4.11) [ASSIGNMENT #3]


1.   Function PL-BC-Entails?(KB, qs)
2.     inferred = a table, where inferred[s] is initially 
                   true for each fact and false for each other
                   symbol 
3.     return ORSearch(KB, qs, inferred, [])

4.   Function ORSearch(KB, s, inferred, path)
5.     if (inferred[s] = true) return true
6.     if (s is on path) return false
7.     for (each clause c in KB such that s = c.conclusion)
8.       if (ANDSearch(KB, c.premise, inferred, [s | path]) = true)
9.           return true
10.    return false

11.    Function ANDSearch(KB, premise, inferred, path)
12.      for (s in premise)
13.        if (ORSearch(KB, s, inferred, path) = false)
14.          return false
15.        inferred[s] = true
16.      return true

EXAMPLE: BC on a definite Clause KB

Suppose we want to prove that qs = Q is entailed by the definite clause KB in the example above.

Starting with the call PL-BC-Entails(KB, Q) the following happens:

 1.  Call ORSearch(KB, Q, inferred, [])
 2.    Call ANDSearch(KB, {P}, inferred, [Q])
 3.      Call ORSearch(KB, P, inferred, [Q])
 4.        Call ANDSearch(KB, {L,M}, inferred, [P Q])
 5.          Call ORSearch(KB, L, inferred, [P Q])
 6.            Call ANDSearch(KB, {A, P}, inferred, [L P Q])  # Processing L1
 7.              Call ORSearch(KB, A, inferred, [L P Q])
 8.              Return true; go back to (6).
 9.              Call ORSearch(KB, P, inferred, [L P Q])
10.              Return false; go back to (6).
11.            Return false; go back to (5).
12.            Call ANDSearch(KB, {A, B}, inferred, [L P Q])  # processing L2
13.              Call ORSearch(KB, A, inferred, [L P Q])
14.              Return true; go back to (12).
15.              Call ORSearch(KB, B, inferred, [L P Q])
16.              Return true; go back to (6).
17.            Return true; go back to (5).
18.          L is inferred; Return true; go back to (4).
19.          Call ORSearch(KB, M, inferred, [P Q])
20.            Call ANDSearch(KB, {B, L}, inferred, [M P Q])
21.              Call ORSearch(KB, B, inferred, [M P Q])
22.              Return true; go back to (20).
23.              Call ORSearch(KB, L, inferred, [M P Q])
23.              Return true; go back to (18).
24.            Return true; go back to (5).
24.          M inferred; Return true; go back to (4).
25.        Return true; go back to (3).
26.      P inferred; Return true; go back to (2).
27.    Return true; go back to (1).
28.  Return true.

Note in the above that clauses are processed in the order in which they are given.

BC is sound and complete.
BC is an example of goal-driven reasoning; as BC only invokes relevant clauses and facts, its runtime behaviour is often for less than linear in the size of KB.

EXAMPLE: Consider the set of definite clauses in the second test-set of Assignment #3 (DCClause2.txt):

Clauses   5
1: A + B => S1
2: A + C + D => S2
3: C + D => S3
4: S1 + S3 => D1
5: S2 + S3 => D2

The facts required for BC to derive each possible implication relative to these clauses are given in the following table:

Derivable implications Required Facts

A B C D

S1 X X

S2 X X X

S3 X X

D1 X X X X

D2 X X X

Derivable implications	Required Facts
A	B	C	D
S1	X	X
S2	X		X	X
S3			X	X
D1	X	X	X	X
D2	X		X	X

Note that we are assuming here that each implication can only be derived in one way and hence has only one set of required facts. If an implication was derivable in several ways, we would have to list for each implication the fact-sets relative to which that implication would be derived. For example, if there was an additional clause "B + C => S1" in the above, S1 would be derivable via the fact-sets {A,B} and {B,C}.

BC as given here can state whether or not the available facts allow a proof of a specified implication; however, it does not give any let alone all such proofs (which is required for explainability, e.g., medical diagnosis).

INTERLUDE: A look at the mechanics of Assignment #3.

Potential exam questions

Thursday, November 7

In-class Exam #2

Week 11 [DC3200, Lecture #12; RN3, Chapters 7 and 8; Class Notes]

Tuesday, November 12 (Lecture #15)
[RN3, Chapter 7; Class Notes]

Logical reasoning (Cont'd)

Propositional logic (PL) (Cont'd)

Agents based on propositional logic
- The first problem confronting a PL-based agent is deducing the current state of the world given initial knowledge and previous percepts and actions.
  - Recall that initial knowledge states facts that are always always true, e.g., there is always at least one wumpus (W11 or W12 or ... or W44) and and there is always at most one wumpus (not W11 or not W12, not W1 or not W13, ..., not W43 or not W44, i.e., at least one of each pair of locations is wumpus-free). The variables involved in such knowledge are called atemporal.
  - Subsequent knowledge wrt percepts and actions is time-specific, e.g., Stench^3, not Stench^4; the variables involved in such knowledge are fluents.
  - To handle fluents, we need a transition model that incorporates both translations of fluents to atemporal variables, e.g., L^t_xy => (Breeze^t <=> Bxy), and effect axioms that specify how actions in time changes variable values, e.g., L^0_11 and FacingEast^0 and Forward^0 => (L^1_21 and not L^1_11).
  - Note that to allow proper inference, we also have to explicitly specify what doesn't change over time, e.g., if we have an arrow at time 1 and move forward then we still have an arrow at time 2; the difficulties in establishing what needs to be thus specified and furthermore what needs to projected forward in time wrt k-step plans are known as the representational and inferential frame problems, respectively.
  - The former is often dealt with by invoking the spatial and temporal locality of typical actions and the latter by changing the focus from writing axioms about actions to axioms about fluent-change (successor-state axioms), e.g., HaveArrow^{t+1} <=> (HaveArrow^t and not Shoot^t).
- We can combine logical inference and search to create a hybrid agent for a problem.
- ALGORITHM: Hybrid agent for Wumpus World (RN3, Figure 7.20)
  Note that this agent constructs plans based on a fixed decreasing priority of goals (grab gold if present and then leave cave => visit somewhere new that is totally safe => shoot arrow at possible wumpus => visit somewhere new that may not be totally safe => leave cave without gold but nonetheless alive).
- We can also have an agent that constructs plans using only logical inference. This is done in three steps:
  1. Constuct a sentence that includes:
    1. Init^0 (a collection of assertions about the initial state);
    2. Transition^1, Transition^2, ..., Transition^t (the successor-state axioms for all actions at all times up to a maximum time t); and
    3. The assertion that the goal is achieved at time t, e.g., HaveGold^t and ClimbedOut^t.
  2. Present the whole sentence to a SAT solver. If the solver finds a satisfying model, the goal is achievable; else, the planning problem is impossible.
  3. If a satisfying model was found in (2), extract the plan consisting of those action-variables that were assigned the value True.
- ALGORITHM: SATPlan (RN3, Figure 7.22)
- As satisfaction is not entailment, SATPlan can find models with impossible or simultaneously-executed actions if the specifying sentence is incomplete, e.g., lacks precondition and/or action exclusion axioms; hence, even if hybrid agents are ultimately used, SATPLan is typically employed during knowledge-base development as a debugging tool.
Knowledge-bases created with propositional logic can be ludicrously large; for smaller KB, we need to consider representation languages that have more expressive power.

Languages differ in what they assume about what exist in the world (ontological commitments) and what an agent believes about facts (epistemological commitments), e.g., Figure 3.1 in RN3:

Language	Ontological	Epistemological
Propositional logic	facts	true/false/unknown
First-order logic	facts, objects, relations	true/false/unknown
Temporal logic	facts, objects, relations, times	true/false/unknown
Probability theory	facts with degree of belief (in [0,1])	true/false/unknown
Fuzzy logic	facts with degree of truth (in [0,1])	known interval value

Focus here on first-order logic.

First-order logic (FOL)
- Also known as first-order predicate calculus
- First-order logic is an intermediate language that combines the rigor of propositional logic and the expressive power of natural language.
  - At the core of this expressivity are objects, relations, and functions, e.g.,
    - Objects: Wumpus, Ronald McDonald, baseball games, centuries.
    - Relations: Unary (properties) (red, round, prime), n-ary (brother, bigger than, comes between).
    - Functions: father of, one more than, beginning of.
  - FOL can encode almost any assertion with these, e.g.,
    - "one plus two equals three"
    - "squares neighbouring the wumpus are smelly"
    - "King John ruled England in 1200"
  - FOL is also to express that assertions hold relative to all or only some objects, e.g.,
    - "every dog bites"
    - "some cats are black"
    - "there is someone for everybody"
    Note that such qualification may (as in the third example) be nested.
- Syntax and semantics
  - The syntax of first-order logic builds on and extends that given previously for propositional logic.
    - Where symbols in PL were Boolean variables, symbols in FOL can be variables of any type, functions, predicates or constants.
    - Atomic and complex sentences are also broadened in FOL to allow recursively-embedded predicates and functions as well as nested Exists and ForAll quantifiers.
  - Syntax of first-order-logic (RN3, Figure 8.3)
  - EXAMPLE: Objects, predicates and functions:
    - Objects: John, Richard, Crown
    - Predicates: isKing(), isGold(), areMarried()
    - Functions: LeftLeg(), Father(), Mother()
    The arity of a predicate or function is the number of parameters of that predicate or function, e.g., the arities of isKing(), areMarried(), and Father() are 1, 2, and 1, respectively.
  - EXAMPLE: Atomic and complex sentences
    - Atomic: Brother(Richard, John), areMarried(Father(Richard), Mother(John))
    - Complex: not Brother(LeftLeg(John), Richard), isBrother(John, Richard) and isBrother(Richard, john), not isKing(John) => isKing(Richard)
  - EXAMPLE: Quantified sentences
    - ForAll: ForAll x (isKing(x) => isPerson(x))
    - Exists: Exists x (isKing(x))
    - Nested: ForAll x (Exists y, t (isPerson(x) and isPerson(y) and isTime(t) => Loves(x, y, t)))
  - The semantics of POL, like PL, is based on models. Models in FOL are more complex, as each such model consists of a set of objects and an interpretation that maps constant symbols to objects, predicate symbols to relations on these objects, and function symbols to functions on these objects.
    - As the number of objects may be infinite, e.g., the positive integers, the number of possible models may be unbounded. Even if the number of objects is bounded, the possibilities in interpretation may still yield ludicrously large numbers of models, e.g., ~137 billion models for 6 objects under the interpretations allowed in Figure 8.4 of RN3.
    - For these reasons, model checking by complete enumeration of all possible models is not feasible in many cases or possible in others.
- EXAMPLE: The basic biological kinship domain.
  - Is based on the domain of objects that are persons, the unary predicates isMale() and isFemale(), the binary predicates isParent(), isSibling(), isBrother(), isSister(), isChild(), isSon(), isDaughter(), isSpouse(), isHusband(), isWife(), isGrandParent(), isGrandChild(), isCousin(), isAunt(), and isUncle(), and the binary functions Mother() and Father().
  - We can on build on top of these basic objects, predicates, and functions to define various axioms, e.g.,
    - ForAll m, c (Mother(c) = m <=> isFemale(m) and isParent(m, c))
    - ForAll x (isMale(x) <=> not isFemale(x))
    - ForAll p, c (isParent(p, c) <=> isChild(c, p))
    - ForAll x, y, c (areParents(x, y, c) <=> x \neq y and isParent(x, c) and isParent(y, c))
    - ForAll x, y (isSibling(x, y) <=> x \neq y and Exists p (isParent(p, x) and isParent(p, y)))
  - We can also derive theorems from the given axioms, e.g., ForAll x, y (isSibling(x, y) <=> isSibling(y, x)) is implied by the isSibling() axiom above. To save on the computational cost of re-proving, we may choose to include basic theorems in a knowledge base.
  - Note that the above applies to basic biological kinship only and does not necessarily reflect other types of kinship, e.g., kinship in matrilineal or polygamous societies, kinship in families based on divorced, unmarried, or non-heterosexual partners => one must be exceptionally careful when constructing knowledge bases for real-world domains.
- With the exception of those based on complete model enumeration, all of the PL inference algorithms seen previously have FOL analogues (see RN3, Chapter 9).
  - Efficient algorithms for forward and backward chaining in FOL (RN3, Sections 9.3 and 9.4) are the foundations of both logic programming languages like Prolog and (when combined with appropriate techniques for handling uncertainty (RN3, Chapter 14)) expert systems for real-world inference in domains such as medical diagnosis and finance.
- Knowledge engineering with FOL
  - Knowledge engineering is the process by which we create and maintain knowledge bases for particular applications. We saw some very basic examples of this in our lectures on propositional logic and Assignment #3.
  - FOL allows the creation of far more powerful and versatile KBs. We consider here the process of creating special-purpose KB for carefully circumscribed domains whose full range of queries is known in advance, e.g., an online airline and hotel reservation system (techniques for engineering general-purpose KB for domains such as free-range natural language understanding are discussed in Chapter 12 of RN3).
  - The special-purpose KB creation process is as follows:
    1. Identify the task.
    2. Assemble relevant knowledge.
    3. Decide on a vocabulary of predicates, functions, and constants.
    4. Encode general knowledge about the domain.
    5. Encode a description of the specific problem instance.
    6. Pose queries to the inference procedure and get answers.
    7. Debug the knowledge base.
    This process is seldom a linear progression and it is often necessary to loop backwards to fix found problems; as such, the ideal KE process described above is very much like the waterfall model in software engineering.
  - Whether special- or general-purpose, knowledge engineering is never easy and must be done very, very carefully to ensure the consistency and completeness of derived KB wrt the given real-world application ("In much wisdom there is much grief, and he who increases knowledge increases sorrow." -- Andrei Rublev (1966))
FOL is very powerful but does not handle situations involving uncertainty, e.g., medical diagnosis; for this, we need to invoke special techniques, e.g., probabilistic reasoning (RN3, Chapters 13 and 14).

Potential exam questions

Thursday, November 14 (Lecture #16)
[DC3200, Lecture #12; Class Notes]

INTERLUDE: hand-crafted -> machine-crafted algorithms
- So far, we have focused on hand-crafted AI algorithms; we will now progress to machine-crafted algorithms (or rather, machine-crafted algorithm development techniques).
- Such techniques require (often vastly) increased amounts of computing power; in the case of systems that learn (e.g., those based on Reinforcement Learning (RL) and artificial neural networks (NN)), massive amounts of training data are also required.
- Such techniques still require initial setup by humans, but less of it in going from Evolutionary Computation (EC) to RL to NN.
- There is also an increasing opaqueness in and lack of obvious interpretability of produced solutions as we go from EC to RL to NN.
Evolutionary Computing (EC)
- EC is a type of problem-solving search algorithm that is inspired by the mechanisms of biological Darwinian evolution.
  - First proposed by Alan Turing in 1948.
  - Analogous to local beam search in how it explores a problem's state-space using a population of communicating candidate solutions.
- EC algorithms vary in the invoked biological mechanisms, the algorithmic implementations of these mechanisms, and the type of solutions produced, e.g., sets of parameter values (e.g., Dave Madddock's BSch), combinatorial objects (e.g., cliques in a graph), finite state machines, basic computer programs.
- Core EC Metaphor:
- Biological Foundations of EC
  - Darwinian Evolution
    - Arises because all environments have finite resources that can only support so many organisms but organisms live to reproduce and hence populations grow exponentially (the Malthusian dilemma).
    - Nature resolves this dilemma by selecting for survival those organisms whose traits allow them to compete for resources most effectively, and hence have more offspring in the next generation, i.e., those with higher fitness (survival of the fitt(-er, not necessarily -est (Herbert Spencer))).
    - Survival-enhancing traits are part of the organism's phenotype and may be inherited, e.g., faster running speed, or learned, e.g., use of tools; classical Darwinian evolution focuses on traits that are encoded in an organism's genotype and can be passed on (after possibly being modified by random genetic mutations) to offspring.
    - Courtesy of genetic mutation and sexual reproduction (in which the genotypes of two parent organisms are combined to create an offspring's genotype), a population of a given species consists of organisms with a diverse range of traits that changes over time.
      - Individual organisms are selected.
      - Populations of organisms evolve.
    - The degree to which traits enhance survival depends on the environment and may change (sometimes dramatically) as environments change.
    - EXAMPLE: English Peppered Moth
      - The white-color gene variant gives moths camouflage in natural settings, while the black-color gene variant makes moths stand out and get eaten by predators.
      - When the Industrial revolution happened in the mid-1700s in England, tree bark was blackened by factory smoke and the situation reversed.
      - With the decline of English heavy industry in the late 20th century, thy situation reversed yet again.
    - EXAMPLE: Sickle-cell anemia
      - The sickle-cell mutation makes individuals susceptible to severe anemia and crippling pain but can mitigate Tsetse-fly-borne sleeping sickness.
      - Is a useful trait in African regions where sleeping sickness is endemic but not elsewhere.
  - Implementing Biological Darwinian Evolution: Genetics
    - Genetics is the study of organism genotypes, i.e., the information required to encode an organism's observable characteristics and behaviors, and the expression of these genotypes as phenotypes, i.e, an organism's observable characteristics and behaviors.
    - Though genetics originally focused on the latter (Mendelian genetics; pre-1953), it is now more often focused on the former (molecular genetics; post-1953), with 1953 being the publication year of Crick and Watson's first paper on the 3D structure of the DNA (Deoxyribonucleic Acid) double helix.
    - Genes and the Genome
      - The collection of an organism's genes is that organism's genome.
      - Mendelian genes encode individual phenotypic traits, e.g., pea shape / color; molecular genes encode proteins which may individually or in combination with the proteins of other genes result in phenotypic traits, e.g., hemophilia, intelligence.
      - An organism's molecular genes are encoded in functional stretches of DNA helices; DNA helices are in turn packaged into chromosomes.
      - EXAMPLE: The human genome consist of a master DNA sequence that is 3 billion base-pairs in length, which is in turn packaged into 23 chromosomes.
    - In organisms that reproduce sexually, each organism has two copies of its genome with one copy coming from each of its parents.
    - Genome changes can arise in individual organisms through a variety of mechanisms, e.g., gene point mutations, gene translocation ("jumping genes"), chromosome duplication (e.g., Down's syndrome (trisomy 21)), chromosome crossover (between the two copies of a chromosome inherited from parents).
      - Genome change can have catastrophic (decreasing), neutral, or advantageous (increasing) effects on organism fitness, with the first of these being the most probable, e.g., changing the components of a car.
      - Genome change is random, and organisms do not intentionally modify their genomes to better suit their environments (Lamarckian evolution).
    - EXAMPLE: Single-point chromosome crossover
- Core EC Metaphor (Refined):
  1. Populations of individuals exist in an environment with limited resources.
  2. Competition for these resources causes selection of fitter individuals that are better adapted.
  3. Those individuals reproduce to form a new generation of individuals through recombination and mutation.
  4. New individuals have fitness evaluated, and high fitness individuals are chosen to reproduce and pass on good traits.
  5. Over time, selection causes overall population fitness to rise.
- ALGORITHM: Evolutionary Algorithm (basic template}
- There a variety of genotype representations, each of which gives rise to a different type of evolutionary algorithm (EA), e.g.
  - Binary strings (Genetic Algorithms (GA))
  - Real-valued vectors (Evolution Strategies (ES))
  - Finite State Machines (Evolutionary Programming (EP))
  - Tree-structured computer programs (Genetic Programming (GP))
  One chooses representations and mutation operators to suit the problem of interest.
- We will consider the following example problems below in our discussion of EA:
  - Maximum 2-Satisfiability (MAX 2SAT): Given a set U of Boolean variables and a CNF sentence S over U in which each disjunctive clause has exactly two variables, give a truth-assignment to the variables of U that satisfies the largest possible number of disjunctive clauses in S.
  - Finite-state Agent Design (FSD): Given an environment E and a task T, output a finite-state agent that successfully performs T in E (see Lecture #2).
- The seven components of an EA:
  1. Representations (phenotype and genotype of individual)
    - An individual's phenotype is that individual's associated candidate solution to the problem of interest; an individual's genotype is the encoding of that phenotype that is operated on directly by the EA. As such, the relationship between phenotypes and genotypes is 1:1.
    - In order to find a (globally optimal) solution to a problem, every possible solution must be represented in that problem's genotype space.
    - EXAMPLE: A phenotype (genotype) representation for MAX 2SAT is a Boolean vector (a binary string) of length |U|.
    - EXAMPLE: A phenotype / genotype representation for FSD is a finite-state machine along the lines of those given in Lecture #2.
  2. Evaluation (fitness) function
    - Assigns a real-value fitness which estimates how well an individual's phenotype performs in the given environment; these values are the basis for subsequent selection.
    - Though we typically talk of maximizing fitness, some problems may require it to be minimized, e.g., minimum set cover.
    - EXAMPLE: An evaluation function for MAX 2SAT individual I is the number of clauses in S satisfied by the truth-assignment encoded in I divided by the number clauses in S.
    - EXAMPLE: A evaluation function for FSD individual I is a normalized measure of how well the FSM encoded in I performs task T in E, e.g., if M is the number of squares between L and R plus one and M' is the number of squares the FSM in I moves to the right after starting at R then a possible evaluation function is 1 - (|M - M'|/M).
  3. Population
    - Contains all genotype representations of the current generation of individuals; is usually of a fixed size determined by available computational resources and the time complexity of the evaluation function.
    - In some EAs, the population is embedded in a structure, e.g., a 2D grid.
    - The diversity of a population P refers to the number of different fitnesses or genotypes / phenotypes in P.
    - EXAMPLE: A population for MAX 2SAT could consist of 1000 individuals (no associated structure).
    - EXAMPLE: A population for FSD could consist of 50 individuals (no associated structure).
  4. Parent selection mechanism
    - Uses fitness evaluations to assign probabilities to each individual I in the population that I will be a parent in creating the next generation.
    - Is usually probabilistic in that 1) high-quality individuals are more (but not most) likely to be parents and 2) low- (or no-)quality individuals have a non-zero chance of being parents.
      - This stochasticity can help the EA to escape local optima, e.g., the peppered moth, sickle-cell anemia.
    - EXAMPLE: Roulette wheel selection
  5. Genome change operators
    - Used to generate individuals encoding new candidate solutions from one or more parents.
    - Usually divided into types based on the number of parents:
      - Mutation: 1 parent
      - Crossover: Two parents
      - Recombination: > 1 parent
    - Most EA use both mutation and recombination.
    - EXAMPLE: Point and single-point crossover mutations.
    - EXAMPLE: Genome change for MAX 2SAT can be done with point-mutations on individual bit-strings and single- or double-point crossover on pairs of bit strings.
    - EXAMPLE: Genome change for FSD can be done with point-mutations or single- or double-point crossover on linear encodings of finite state machines.
  6. Survivor selection
    - Also called environmental selection.
    - Ensures constant population size X over generations by selecting X-sized subset of parents and (mutated) offspring in each generation.
    - There are a variety of selection methods, e.g.,
      - Fitness-based (rank all by fitness and probabilistically select X individuals)
      - Age-based (rank all by fitness and probabilistically select X individuals with preference bias for offspring)
      - Elitism-based (rank all by fitness and deterministically select top X individuals)
  7. Population initialization and termination condition
    - Initialization of the population is typically done with randomly-selected individuals to help ensure broad coverage of the genotype / phenotype spaces; can also shorten the evolutionary process by including known solutions and solutions derived using heuristics.
    - There are a variety of termination conditions, e.g.,
      - Reach some desired / known fitness
      - Reach an absolute generation limit
      - Reach a minimum diversity
      - Reach a generation limit with no improvement
    There is no "silver bullet" combination of the above that creates good EA for all possible problems so much of the EA development process involves trying different combinations of component-options to see which works best.
- Characteristics of EA operation
  - In a typical EA run, maximum population fitness increases rapidly over time and plateaus (hopefully at very high fitness) and population diversity decreases at a slower rate (as the population converges on an optimum).
  - Such optima may be local, as optimal EA performance is seldom guaranteed and is in some cases impossible for time-efficient EA. For example, in the case of NP-hard problems such as MAX 2SAT (Garey and Johnson (1979), problem L05) and FSD (Wareham and Vardy (2018)), if EA exists for those problems with either probability > 1/2 of polynomial runtime or probability > 1/2 of producing optimal solutions then P = NP (by the reasoning in the proof of Result E in Wareham (2022)).
Potential exam questions

Problem Solving	Evolution
Problem	Environment
Candidate solution	Individual organism
Solution quality	Organism fitness

Week 12 [DC3200, Lecture #14; RN3, Chapter 21; Class Notes]

Tuesday, November 19 (Lecture #17)
[DC3200, Lecture #14; RN3, Chapter 21; Class Notes]

Reinforcement Learning (RL)
- Reinforcement learning is a type of machine learning that is inspired by mechanisms in Behavioral psychology.
  - RL proposes learning through interactions with an environment where each action has an associated positive or negative reward, e.g., pleasure or pain.
    - There is no explicit teacher, e.g., the training set of correct (input-output) examples in supervised learning; rather, teaching is both implicit in and localized to action rewards.
    - There is no evolution of a population as in EC but rather updating over time of an individual's beliefs about situation-appropriate actions in response to interactions with an environment.
  - The goal of an RL agent is to maximize summed rewards over time.
  - RL can implement problem-solving by structuring the environment and action rewards to achieve a goal, e.g., structuring a human community to enforce a utopia (Skinner (1948)).
  - RL is not an algorithm but rather a problem specification method -- there are many algorithmic techniques for solving RL-specified problems.
- The overall structure of reinforcement learning:
- The reinforcement learning loop:
- EXAMPLE: The Cart-Pole and Mountain Car problems as formulated in RL {DC3200 Lecture #14, Slides 19-21 (PDF)]
- The four components of an RL-specified problem:
  1. Agent
  2. Environment
  3. Policy
    - A policy is a mapping from states of the environment to the actions to be taken at those states.
    - A policy is the core component of an RL agent in that it defines that agent's behavior.
    - Can be implemented by a variety of deterministic or stochastic mechanisms, e.g., lookup table (see Lecture #2), finite-state machine, neural network.
    - EXAMPLE: Blackjack tabular policy [DC3200 Lecture #14, Slide #26 (PDF)]
    - Policies change over time in response to interactions as specified by the chosen RL algorithm.
  4. Reward function
    - Maps each perceived state (or state-action pair) to a single number (the reward); as such, it implicitly defines the goal of an RL-specified problem.
    - Rewards come from the environment, not the agent.
    - The agent's goal is to maximize the expected summed rewards over time (that agent's return).
    - EXAMPLE: Reward function for finding a shortest path.
      - Goal is terminal state (the destination).
      - Non-goal states have reward -1.
      - Getting to goal in the fewest number of steps maximizes the summed rewards / return.
    - EXAMPLE: Reward function for learning to bet well on horse races.
      - Winning bet = positive reward proportional to money won.
      - Losing bet = negative reward proportional to money lost.
      - Learning to bet well maximizes the summed rewards / return over time.
    - An optimal policy maximizes the expected return over time; how can we quantify expected return? => value functions!
      - If a reward function tells us what is immediately good, a value function indicates what is expected (in the probabilistic sense) to eventually be good or bad over time, i.e., long-term good.
      - The value of a state s or state-action-pair (s,a) is the return an agent can expect to accumulate in the future given that it is currently in s or currently in s and performs action a.
      - Value is much harder to determine than rewards, and requires more complex calculation and estimation techniques.
    - EXAMPLE: Value function for finding a shortest path.
      - Let non-goal states have values inversely proportional to the lengths of the shortest path from those states to the goal.
      - Actions which take us to high-value states have the highest value.
    - EXAMPLE: Value function for learning to bet well on horse races.
      - Let individual horses have bet-values proportional to their records of winning past races.
      - Horses with the best records of winning past races have the highest probability of winning the current race (as well as future races) and hence bets on such horses have the highest value.
    - An agent's policy should ideally prefer at each possible state not the highest-reward action but rather the highest-value action.
- One of the main challenges in RL is the trade-off between exploration and exploitation.
  - To maximize rewards, an agent must prefer actions that it knows produce good results, i.e., the agent must exploit knowledge it has.
  - In order to learn which actions produce good results, an agent must first try a wide range of actions, i.e., the agent must explore to acquire knowledge.
  - This is also frequently a challenge in real life.
  - EXAMPLE: When choosing a restaurant, do you eat at somewhere you've had a good meal in the past or try somewhere new?
  - EXAMPLE: When betting on horse races, do you bet on a horse that won for you before or bet on a new horse?
- There are a wide variety of algorithmic techniques for implementing RL and dealing with the issues mentioned above, e.g., Temporal Difference Learning, Monte Carlo RL, Q-Learning; please see Sutton and Barto (2018) (a truly excellent textbook, available online for free to boot) and Chapter 21 of RN3 for details.
Potential exam questions

Thursday, November 21 (Lecture #18)
[DC3200 Lecture #19 and Lecture #20; RN4, Chapter 21; Class Notes]

Neural Networks and Deep Learning (Kedia and Rasu (2020), Chapters 8-11)

The search and logical reasoning algorithms we have studied thus far in this course are all process-oriented, as they all implement rules of some form (e.g., search actions, proof rules) that correspond to processes proposed by people to implement intelligent behavior.
Machine learning and neural network (NN) algorithms, in contrast, are function-oriented, in that the components of these algorithms typically do not have process-entity correlates and the focus is instead on inferring (with the assistance of massive amounts of example input-output data) various functions associated with intelligent behavior.
- These functions map sequences or 2D images onto categories (e.g., sentiment analysis, shortest-path or game moves (AlphaGo), image classification) or other sequences and 2D images (e.g., machine translation, generative AI (ChatGPT, DALL-E)).
- Machine learning and NN algorithms can produce spectacular results -- however, they are not magic but rather math in action.

Biological Foundations of neural networks

Artificial neural networks are inspired by the networks of neurons in organism brains,

	Computer	Brain
Processing units	1 CPU = 10^9 transistors	10^11 neurons
Storage capacity	10^9 B RAM, 10^12 B non-volatile memory	10^11 neurons, 10^14 synapses
Processing speed	10^-8 second	10^-3 second
Bandwidth	10^9 bits/second	10^14 bits/second

Neurons conduct and generate electrical signals.
The axon of a neuron connects at synapses to the dendrites of other neurons.
If the dendritic tree of a neuron receives a sufficient summed stimulation, that neuron will generate a "spike" signal on its axon and in turn stimulate other neurons, triggering a cascading chain reaction of subsequent spike signals in these other neurons.

Implementing biological neurons: Artificial neurons
- An artificial neuron has n inputs x1, x2, ..., xn, input-specific weights w1, w2, ..., wn, a bias-term b, an activation function f(), and an output y = f((x1 * w1) + (x2 * w2) + .... + (xn * wn) + b).
- The bias term is incorporated to make subsequent training to adjust weights easier.
- There are several kinds of activation functions, e.g., sigmoid, ReLU (Rectified Linear Unit) (Kedia and Rasu (2020), pp. 181-184); all introduce necessary non-linearity into the otherwise vanilla linear-equation-system behaviour of an artificial neuron.
Neural network research can be characterized to date in three waves:
- First Wave (1943-1968): McCulloch and Pitts propose artificial neurons in 1943. Starting in the late 1950s, Rosenblatt explores the possibilities for representing and learning functions relative to single-layer collections of artificial neurons (which he calls perceptrons). The mathematical principles underlying the backpropagation procedure for training neural networks are developed in the early 1960's (Section 5.5, Schmidhuber (2015)). Perceptron research is killed off by the publication in 1968 of Minsky and Papert's monograph Perceptrons, in which they show by rigorous mathematical proof that perceptrons are incapable of representing many basic mathematical functions such as Exclusive OR (XOR).
- Second Wave (1980-1990): Rumelhart, McClelland, and colleagues propose and explore the possibilities for multi-level feed-forward neural networks incorporating hidden layers of artificial neurons. This is aided immensely by Hinton's (re?)-development of the backpropagation procedure, which allows efficient learning of arbitrary functions by multi-layer neural networks. Though it is shown that these networks are powerful (Universal Approximation Theorems state that any well-behaved function can be approximated to an arbitrarily close degree by a neural network with only one hidden layer), research remains academic as backpropagation on even small to moderate-size networks is too data- and computation-intensive.
- Third Wave (2000-now): With the availability of massive amounts of data and computational power courtesy of the Internet and Moore's Law (as instantiated in special-purpose processors like GPUs), neural network research re-ignites in a number of areas, starting with image processing and computer vision. This is aided by the development of many-level neural networks incorporating special structures inspired by structures in the human brain, e.g., Convolutional Neural Networks, Long Short-Term Memory cells (see below), as well as other less-biologically-motivated structures. Results obtained with the resulting Deep Neural Networks (DNN) are so spectacular that DNN-based techniques are currently the state of the art in many applications.
Let us now briefly examine the various types of neural networks that have emerged during these three waves in the transition from shallow to deep neural networks.
- Feed-forward Multi-layer Neural Networks (FF-NN) (Kedia and Rasu (2020), Chapter 8)
  - In a multi-layer FF-NN, there is an input layer consisting of n inputs, one or more hidden layers of possibly different numbers of artificial neurons, and an output layer consisting of one or more single artificial neurons. Each layer is fully connected to the next, and there are no connections between pairs of neurons in the same layer.
  - Neuron connection-weight and bias are inferred by (supervised) training.
    - Initially, all weights and bias are given random values.
    - Given a training set of input-output value-pairs specifying the wanted FF-NN behaviour, this training set is run through the FF-NN multiple times and the backpropagation procedure is applied to the FF-NN to modify the appropriate neuron weights and biases to reduce any difference between the generated and wanted outputs. This is done by exploiting greedily smooth gradients in loss functions based on the differences between generated and wanted output values.
      - Both loss-gradients and weight / bias changes propagate backwards from the output layer through the hidden layers to the input layer.
      - Deep Learning = Backpropagation (or other NN-training methods) on deep neural networks with lots of layers.
  - Once backpropagation has converged on good weight and bias values for the FF-NN relative to the training set, the FF-NN is typically evaluated relative to a separate test set to see how well the FF-NN's behaviour generalizes.
    - FF-NN may exhibit underfitting (performance is bad on both the training and test sets) or overfitting (performance is good on the training set but bad on the test set, i.e., the FF-NN's behaviour does not generalize).
    - Underfitting is dealt with by modifications to the FF-NN's architecture, e.g., number of hidden layers, number of neurons per layer.
    - There are a variety of techniques for dealing with overfitting, e.g.,
      - more training data (add more input-output pairs to the training set).
      - regularization (add a term to penalize too good a fit between the generated and wanted output).
      - early stopping (stop training before weight- and bias-value convergence has occurred).
      - dropout (remove the effects of some percentage of randomly-selected neurons in the FF-NN in each training run).
  - Courtesy of the Universal Approximation Theorems, it is known that FF-NN are powerful in theory; however, converging on the promised network weights and bias in practice by procedures like backpropagation is difficult for several reasons:
    - Large layers and full connections between them require both lots of training data and very large (often exponential) numbers of training runs to converge, cf., so-called "few-shot" learning of natural language by human beings.
    - When layers are reduced in size by having many smaller layers, the values of the loss-function gradients exploited by backpropagation may explode or vanish, leading to wild swings in weight and bias values across training runs or no changes at all in weight and bias values in layers further from the output layer.
    These problems have been dealt with in more advanced types of neural networks by adding additional structures and modules in the network architecture such that embedded FF-NN are smaller and flatter.
- Convolutional Neural Networks (CNN) (Kedia and Rasu (2020), Chapter 9)
  - First proposed for image processing and computer vision in the late 1980's, by analogy with known neural structure in the human visual cortex. CNN are designed to detect and compute over spatial patterns formed by input elements that are close together, e.g., lines / line crossings / corners within images.
  - A CNN consists of a series of feature detectors over a 2D input field, followed by "pooling" layers that summarize detected features and an FF-NN that takes the outputs of the pooling layers as inputs.
    - Patterns are encoded as small square filter matrices called kernels, and pattern detection is implemented by multiplying filter matrices by a version of the 2D input matrix that is padded with zeroes on the right and lower edges to ensure that the filter is applied to all elements of the input matrix.
    - Pooling of matrices produced by feature detection summarizes non-overlapping blocks of values by various techniques, e.g., block maximum / average / sum; the resulting downsampling not only reduces the amount of data input to the FF-NN but also helps prevent overfitting.
  - Though they have been successful in certain applications, e.g., image classification, CNN are not good at detecting long-distance sequence or temporal relationships between input elements.
- Recurrent Neural Networks (RNN) (Kedia and Rasu (2020), Chapter 10)
  - RNN enable recognition and processing of temporal relationships in sequences by gifting neural networks with memory of past activities.
  - An individual RNN cell is a FF-NN which produces an output yt at time t using the input xt at time t (that is, the element at position t in the input sequence) and the hidden state h(t-1) of the cell at time t-1.
  - An RNN can be "unrolled" relative to its operation over time. In this unrolled version, note that the FF-NNs in all timesteps are identical.
  - RNN can be trained by backpropagation that moves backwards in time rather than layers from the output, where the initial hidden state consists of random values. This is best visualized relative in the unrolled version. That being said, it is important to note that the only FF-NN whose weights and biases are changed is that corresponding to timestep zero -- in all other cells, only errors and gradients are propagated.
  - There are several types of RNN, each of which is used in particular applications (shown below in their unrolled versions). These types vary in the relationship between the lengths of their input and output sequences.
    - One-many RNN are good for generating narratives.
    - Many-one RNN are good for implementing functions over sequences, e.g., spam e-mail detection, sentiment analysis.
    - Many-many same-length RNN are good for implementing functions between same- and different-length sequences, e.g., machine translation.
  - Though powerful, RNN are unfortunately deep networks as input and output sequences can be long (particular in NLP). Hence, they are particularly prone to the exploding and vanishing gradient problems noted above. Additional problems arise from the necessity of large hidden states to process long sequences.
- Long Short-Term Memory (LSTM) Cells (Kedia and Rasu (2020), Chapter 10)
  - One can get the advantages of RNN wrt temporal processing without many of the disadvantages introduced by extended recurrent processing over time by replacing the RNN cells in unrolled versions with LSTM cells.
  - LSTM cells (first proposed in 1995) allow dramatic reductions in the sizes of the hidden states by allowing both the forgetting of older irrelevant and the remembering of new relevant information.
  - Each LSTM cell consists of an input-output-chained sequence of three gates -- namely, a forget gate, an input gate, and an output gate.
    - Each gate is implemented as a gate-specific FF-NN and activation function and is in turn specific to its cell and may vary across cells, cf., RNN cells..
    - The LSTM cell at time t (like a RNN cell) takes as input the input element at time t and the hidden state at time t-1 and produces the output and hidden state at time t; however, there is an addition memory input at time t-1 and memory output at time t.
    - Essentially, the forget, input, and output gates allow the LSTM to determine wrt the memory what to forget, what (and how much) to remember, and and what to output (as the output element and the new memory and hidden states), respectively.
  - In addition to dramatically reducing the required sizes of hidden states, LSTM-based neural networks eliminate the vanishing (and mitigate to a degree the exploding) gradient problem because backpropagation training for a particular cell only needs to take into account the cells immediately before and after it in the LSTM-based neural network.
- Encoders and Decoders (Kedia and Rasu (2020), Chapter 11)
  - The encoder / decoder architecture, first proposed in 2014, implements sequence-to-sequence functions (Seq2Seq) by (1) building on the many-many RNN architecture, (2) replacing RNN cells with LSTM cells, and (3) allowing more structure in the hidden state passed from the final encoder cell to the first decoder cell (context vector).
    - Encoders effectively compress input data in creating context vectors, which can in some circumstances make higher-level structure in such data more readily accessible.
  - In this architecture, the initial encoder hidden state consists of random values and training focuses on the decoder such that the training set consists of context vector / target sequence pairs.
    - The production of a target sentence by the decoder is triggered by the input of a start token and ended by the output of an end token.
  - Fixed-length context vectors are problematic if target sequences that must be produced by the decoder are longer than the source sequences used to create context vectors in the encoder.
  - This has been mitigated by incorporating an attention mechanism into the encoder / decoder architecture.
    - In this mechanism, all (not just the final) hidden states produced by the encoder are available to every cell in the decoder.
    - This set of hidden states is then weighted in a manner specific to each decoder cell to indicate the relevance of each encoder-input token to that cell's output token.
    - Essentially, this enables decoder cells to have individually-tailored variable-length context vectors.
- Transformers (Kedia and Rasu (2020), Chapter 11)
  - The Transformer architecture, first proposed in 2017, retains the notion of encoders and decoders but discards the many-many RNN backbone. The encoders and decoders are placed in an encoder stack and a decoder stack, respectively.
  - Individual encoder and decoder cells are now connected by the self-attention mechanism, in which each each encoder cell has access to (and can individually weight) the hidden states of all other encoder cells and each decoder cell has access to (and can individually weight) the hidden states of all prior decoder cells (masked self-attention).
  The mechanisms above after and including LSTMs have progressively less biological motivation wrt organism brains. Hence, later types of neural networks cannot (despite NN researcher claims, e.g., Hinton's 2015 renaming of context vectors as "thought vectors") claim true biological motivation (classical (pre-2010) vs modern (post-2010) connectionism as described in Guest and Martin (2024)) and research that uses such NN to reverse engineer biological brains and their cognitive abilities should be viewed with very critical skepticism (Guest and Martin (2023)).
Though very successful in some applications, it is increasingly being acknowledged that NN-based AI systems have problems.
- It is very easy to fool these systems, either by adding specific and at first sight trivial distortion to existing inputs or by using engineered nonsense inputs (DC3200 Lecture #21, Slides 60-61 (PDF)).
- It is not obvious how common-sense knowledge and the ability to reason using such knowledge can be effectively and efficiently incorporated into such systems.
- It is not obvious how such systems can be both effectively and efficiently shown to be reliable (cf. software verification), unbiased, or explainable (cf. logical inference systems).
It is critical that these problems be addressed so that NN-based systems can be truly trustworthy and safely integratable into human society (Marcus and Davis (2019, 2021)).

Potential exam questions

Friday, November 22

Final Exam Notes
I've finished making up the final exam. The exam will be closed-book and written in-class on paper (please bring your own pens). It will be 120 minutes long and has a total of 100 marks (this is not coincidental; I have tried to make the number of marks for a question approximately equivalent to the number of minutes it should take you to do that question). The exam will cover material in course lectures #3-20. There will be four questions:
1. Topics 3, 4, and 7 (5 parts / 20 marks total)
2. Topics 3 and 6 (4 parts / 38 marks total)
3. Topic 7 (3 parts / 26 marks total)
4. Topics 8-10 (3 parts / 16 marks total)
I hope the above helps, and I wish you all the best of luck with this exam.

Week 13 [Class Notes]

Tuesday, November 26

Guest Lecture: Data Representations as a Driving Force in AI Research
(Dr. Rory Campbell, University of Western Ontario)[PDF]

Thursday, November 28 (Lecture #19)
[Class Notes]

Epilogue: The Future of Algorithmic Methods for AI
- AI is projected to have a dramatic impact on the CS workplace (Shein (2024))
  - There will be less coding and more interpretation / design.
  - The focus will be on AI models, e.g., large language models in NLP (Bert, ChatGPT).
  - There will also be new categories of jobs, e.g., ML development engineers (building, training, and validating AI models), ML operations engineers (testing, deploying, and monitoring AI models), ML maintenance engineers (diagnosing and fixing AI model faults), AI compliance officers (ensure AI model adherence to laws and regulations).
  - All this is consistent with the observations by the economist John Maynard Keynes on workplace changes caused by the 18th-century Industrial Revolution in England: (1) technology eliminates jobs, not work, e.g., craftsmen => factory workers, and (2) workplace disruption is a temporary but necessary stepping stone for economic growth (which raises the questions, how good are the new types of work and how temporary is the disruption?)
- There are three broad categories of opportunities for algorithmic methods wrt AI:
  1. Deploy existing methods in new application areas.
  2. Improve performance of existing methods, possibly by proving tighter performance guarantees for those methods.
  3. Develop new methods.
  The above applies to both hand- and machine-crafted methods.
- A promising area wrt (3) is, for want of a better term, trust engineering.
  - As noted previously at the end of Lecture #1, in order for AI systems to operate in tandem with humans or autonomously, these systems must be trustworthy, i.e.,
    - These systems are reliable, i.e., they produce wanted behaviors consistently;
    - These systems are fair, i.e., they do not encode bias that compromises their behavior; and
    - These systems are explainable, i.e., the reasons why a system makes the decisions it does can be recovered in an efficient manner.
    Establishing trust is thus a crucial challenge to AI (Bellogin et al (2024); Dhar (2024)).
  - Reliability for hand-crafted algorithms is addressed by mathematical proofs and various semi- and fully-automated verification methods; given the tendency of ML- and neural network-based to on occasion ``hallucinate'', this is a very active and important research area.
    - There is much work on available options for the efficient training of accurate neural network (NN)-based systems (see Livni et al (2014) and Song et al (2017) and references). Many impossibility results have been proven, e.g., lax degrees of accuracy are not efficiently achievable even when given perfect training data (van Rooij et al (2024)).
  - Fairness is an issue for both hand- and machine-crafted algorithms which may be exceptionally difficult or even impossible to fully resolve (see Buyl and De Bie (2024) and references).
  - Explainability is often explicitly built into hand-crafted algorithms, e.g., search, logical inference; however, it is a very active and important research area for machine-crafted algorithms, particular those underlying popular NN-based systems.
    - In such systems, explanations for system outputs on specified inputs are derived by reasoning over the answers to posed queries.
    - Two useful query types:
      - Outer interpretability: What aspects of the input are responsible for the observed output, e.g., (minimum) sufficient / necessary reasons?
      - Inner interpretability: What subcircuits of the neural network are responsible for the observed output, e.g., (minimum) sufficient / necessary circuits?
    - While it is known that many such queries can be done efficiently relative to simple AI systems based on decision trees and perceptrons, NP-hardness or worse rears its head when we get to general feed-forward neural networks (see Barcelo et al (2020), Bassan et al (2024), and Adolfi et al (2024a, 2024b) and references). That being said, there may yet be efficient query algorithms relative to input and network restrictions that hold in practice, and such options are under active investigation.
- All told, there are and will remain for the foreseeable future many opportunities for algorithmic methods in AI. The wealth of potential pleasures for the algorithmically inclined is summarized nicely by following paraphrase of Dr. Seuss (Seuss (1970)):

References

Adolfi, F., Vilas, M., and Wareham, T. (2024a) "`Complexity-Theoretic Limits on the Promises of Artificial Neural Network Reverse-Engineering." In the Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024).
Adolfi, F., Vilas, M., and Wareham, T. (2024b) "The Computational Complexity of Circuit Discovery for Inner Interpretability." arXiv.2410.08025 (71 pages). [PDF]
Barcelo, P., Monet, M., Perez, J., and Subercaseaux, B. (2020) "Model interpretability through the lens of computational complexity." In Advances in neural information processing systems, 33, 15487-15498. [PDF]
Bassan, S., Amir, G., and Katz, G. (2024) "Local vs. Global Interpretability: A Computational Complexity Perspective." arXiv preprint arXiv:2406.02981. [PDF]
Bellogín, A., Grau, O., Larsson, S., Schimpf, G., Sengupta, B., and Solmaz, G. (2024) "The EU AI Act and the Wager on Trustworthy AI." Communications of the ACM, 67(12), 58-65. [PDF]
Buyl, M. and De Bie, T. (2024) "Inherent Limitations of AI Fairness." Communications of the ACM, 67(2), 48-55. [HTML]
Bylander, T. (1994) "The Computational Complexity of Propositional STRIPS Planning.: Artificial Intelligence, 69(1-2), 165-204.
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009) Introduction to Algorithms. Third Edition. The MIT Press; Cambridge, MA. [Abbreviated above as CLRS]
Dhar, V. (2024) "The Paradigm Shifts in Artificial Intelligence." Communications of the ACM, 67(11), 50-59. [HTML]
Fikes, R.E. and Nilsson, N.J. (1971) "STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving." Artificial Intelligence, 2(3-4), 189-208.
Garey, M.R. and Johnson, D.S. (1979) Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman; San Francisco, CA.
Guest, O. and Martin, A. E. (2023) "On logical inference over brains, behaviour, and artificial neural networks." Computational Brain & Behavior, 6(2), 213-227 [PDF]
Guest, O. and Martin, A.E. (2024) "A Metatheory of Classical and Modern Connectionism." Preprint. [PDF]
Haigh, T. (2023a) "Conjoined Twins: Artificial Intelligence and the Invention of Computer Science." Communications of the ACM, 66(6), 33-37. [HTML]
Haigh, T. (2023b) "There was No 'First' AI Winter." Communications of the ACM, 66(12), 35-39. [HTML]
Haigh, T. (2024a) "How the AI Boom Went Bust." Communications of the ACM, 67(2), 22-26. [HTML]
Haigh, T. (2024b) "Between the Booms: AI in Winter." Communications of the ACM, 67(11), 19-23. [HTML]
Haigh, T. (2025) "Artificial Intelligence: Then and Now." Communications of the ACM, 68(2), 24-29. [HTML]
Huxley, J. (1957) New Bottles for New Wine. Chatto & Windus.
Kedia, A, and Rasu, M. (2020) Hands-On Python Natural Language Processing. Packt Publishing; Birmingham, UK.
Livni, R., Shalev-Shwartz, S., and Shamir, O. (2014) "On the computational efficiency of training neural networks." Advances in Neural Information Processing Systems, 27, 1-9.
Marcus, G. and Davis, E. (2019) Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon Books; New York.
Marcus, G. and Davis, E. (2021) "Insights for AI from the Human Mind", Communications of the ACM, 64(1), 38-41. [Text]
Newell, A. and Simon, H.A. (1956) The Logic Theory Machine: A Complex Information Processing System. RAND Technical Report P-868.
Newell, A., Shaw, J.C., and Simon, H.A. (1959) Report on a General Problem Solving Program. RAND Technical Report P-1584.
Newell, A. and Simon, H.A. (1972) Human Problem Solving. Prentice-Hall.
Russell, S. and Norvig (2010) Artificial Intelligence: A Modern Approach (3rd Edition). Prentice-Hall.
Russell, S. and Norvig (2022) Artificial Intelligence: A Modern Approach (4rd Edition). Prentice-Hall.
Schmidhuber, J. (2015) "Deep learning in neural networks: An overview." Neural Networks, 61, 85-117.
Seuss, Dr. (1970) One fish two fish red fish blue fish. Random House.
Shein, E. (2024) "The Impact of AI on Computer Science Education." Communications of the ACM, 67(11), 13-15. [HTML]
Skinner, B.F. (1948) Walden Two. Hackett Publishing Company.
Song, L., Vempala, S., Wilmes, J., and Xie, B. (2017) "On the complexity of learning neural networks." In Advances in Neural Information Processing Systems, 30, 1-9.
Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction (Second Edition). The MIT Press. [Website]
Van Rooij, I., Guest, O., Adolfi, F., de Haan, R., Kolokolova, A., and Rich, P. (2024) "Reclaiming AI as a theoretical tool for cognitive science." Computational Brain & Behavior, 1-21. [PDF]
Wareham, T. (2022) "Creating Teams of Simple Agents for Specified Tasks: A Computational Complexity Perspective." CoRR abs/2205.02061 (20 pages). [PDF]
Wareham, T. and Vardy, A. (2018) "Putting It Together: The Computational Complexity of Designing Robot Controllers and Environments for Distributed Construction." Swarm Intelligence, 12(2), 111-128.

Available Facts	Derived Implications
Available Facts	S1	S2	S3	D1	D2
{A}
{B}
{C}
{D}
{A,B}	X
{A,C}
{A,D}
{B,C}
{B,D}
{C,D}			X
{A,B,C}	X
{A,B,D}	X
{A,C,D}		X	X		X
{B,C,D}			X
{A,B,C,D}	X	X	X	X	X

Computer Science 3200, Fall '24 Course Diary

Copyright 2024 by H.T. Wareham, R. Campbell, and D. Churchill All rights reserved

Thursday, September 5 (Lecture #1) [DC3200, Lecture #1; RN3, Chapter 1; Class Notes]

Tuesday, September 10 (Lecture #2) [DC3200, Lecture #2; RN3, Chapter 2; Class Notes]

Thursday, September 12 (Lecture #3) [ DC3200, Lecture #2 and DC3200, Lecture #3; RN3, Chapters 2 and 3; Class Notes]

Tuesday, September 17 (Lecture #4) [DC3200, Lecture #3; RN3, Chapter 3; Class Notes]

Thursday, September 19 (Lecture #5) [DC3200, Lecture #3; RN3, Chapter 3; Class Notes]

Tuesday, September 24 (Lecture #6) [DC3200, Lecture #5; RN3, Chapter 3; Class Notes]

Thursday, September 26 (Lecture #7) [DC3200, Lecture #5 and DC3200, Lecture #7; RN3, Chapter 3; Class Notes]

Friday, September 27

Tuesday, October 1 (Lecture #8) [RN3, Chapter 4; Class Notes]

Thursday, October 3 (Lecture #9) [DC3200, Lecture #10; RN3, Chapter 5; Class Notes]