CS代考计算机代写 algorithm flex deep learning Bayesian network data structure Bayesian decision tree AI Hidden Markov Mode chain 1

1
INTRODUCTION
CHAPTER

CHAPTER 2 INTELLIGENT AGENTS
function TABLE-DRIVEN-AGENT(percept) returns an action persistent: percepts, a sequence, initially empty
table, a table of actions, indexed by percept sequences, initially fully specified
append percept to the end of percepts action ←LOOKUP(percepts,table) return action
Figure 2.7 The TABLE-DRIVEN-AGENT program is invoked for each new percept and re- turns an action each time. It retains the complete percept sequence in memory.
function REFLEX-VACUUM-AGENT([location,status]) returns an action
if status = Dirty then return Suck else if location = A then return Right else if location = B then return Left
Figure2.8 Theagentprogramforasimplereflexagentinthetwo-locationvacuumenviron- ment. This program implements the agent function tabulated in Figure ??.
function SIMPLE-REFLEX-AGENT(percept) returns an action persistent: rules, a set of condition–action rules
state ← INTERPRET-INPUT(percept) rule ← RULE-MATCH(state, rules) action ← rule.ACTION
return action
Figure 2.10 A simple reflex agent. It acts according to a rule whose condition matches the current state, as defined by the percept.

function MODEL-BASED-REFLEX-AGENT(percept) returns an action persistent: state, the agent’s current conception of the world state
transition model , a description of how the next state depends on the current state and action
sensor model , a description of how the current world state is reflected in the agent’s percepts
rules, a set of condition–action rules action, the most recent action, initially none
state←UPDATE-STATE(state,action,percept,transition model,sensor model) rule ← RULE-MATCH(state, rules)
action ← rule.ACTION
return action
Figure 2.12 A model-based reflex agent. It keeps track of the current state of the world, using an internal model. It then chooses an action in the same way as the reflex agent.
3

CHAPTER 3
SOLVING PROBLEMS BY SEARCHING
function BEST-FIRST-SEARCH(problem, f ) returns a solution node or failure
node ← NODE(STATE=problem.INITIAL)
frontier ← a priority queue ordered by f , with node as an element
reached ← a lookup table, with one entry with key problem.INITIAL and value node while not IS-EMPTY(frontier) do
node ← POP(frontier)
if problem.IS-GOAL(node.STATE) then return node for each child in EXPAND(problem, node) do
s ←child.STATE
if s is not in reached or child.PATH-COST < reached[s].PATH-COST then reached[s]←child add child to frontier return failure function EXPAND(problem,node) yields nodes s ← node.STATE for each action in problem.ACTIONS(s) do s′ ← problem.RESULT(s, action) cost ←node.PATH-COST + problem.ACTION-COST(s,action,s′) yield NODE(STATE=s′, PARENT=node, ACTION=action, PATH-COST=cost) Figure 3.7 The best-first search algorithm, and the function for expanding a node. The data structures used here are described in Section ??. See Appendix B for yield. 5 function BREADTH-FIRST-SEARCH(problem) returns a solution node or failure node ← NODE(problem.INITIAL) if problem.IS-GOAL(node.STATE) then return node frontier ← a FIFO queue, with node as an element reached ← {problem.INITIAL} while not IS-EMPTY(frontier) do node ← POP(frontier) for each child in EXPAND(problem, node) do s ←child.STATE if problem.IS-GOAL(s) then return child if s is not in reached then add s to reached add child to frontier return failure function UNIFORM-COST-SEARCH(problem) returns a solution node, or failure return BEST-FIRST-SEARCH(problem, PATH-COST) Figure 3.9 Breadth-first search and uniform-cost search algorithms. function ITERATIVE-DEEPENING-SEARCH(problem) returns a solution node or failure fordepth =0to∞do result ←DEPTH-LIMITED-SEARCH(problem,depth) if result ̸= cutoff then return result function DEPTH-LIMITED-SEARCH(problem, l) returns a node or failure or cutoff frontier ← a LIFO queue (stack) with NODE(problem.INITIAL) as an element result ← failure while not IS-EMPTY(frontier) do node ← POP(frontier) if problem.IS-GOAL(node.STATE) then return node if DEPTH(node) > l then
result ← cutoff
else if not IS-CYCLE(node) do
for each child in EXPAND(problem, node) do add child to frontier
return result
Figure 3.12 Iterative deepening and depth-limited tree-like search. Iterative deepening re- peatedly applies depth-limited search with increasing limits. It returns one of three different types of values: either a solution node; or failure, when it has exhausted all nodes and proved there is no solution at any depth; or cutoff , to mean there might be a solution at a deeper depth than l. This is a tree-like search algorithm that does not keep track of reached states, and thus uses much less memory than best-first search, but runs the risk of visiting the same state multiple times on different paths. Also, if the IS-CYCLE check does not check all cycles, then the algorithm may get caught in a loop.

6 Chapter 3 Solving Problems by Searching
function BIBF-SEARCH(problemF , fF , problemB, fB) returns a solution node, or failure nodeF ← NODE(problemF .INITIAL) // Node for a start state
nodeB ←NODE(problemB.INITIAL) //Node for a goal state frontierF ←apriorityqueueorderedbyfF,withnodeF asanelement
frontier B ← a priority queue ordered by fB , with node B as an element reachedF ← a lookup table, with one key nodeF .STATE and value nodeF reachedB ←a lookup table, with one key nodeB.STATE and value nodeB solution ← failure
while not TERMINATED(solution, frontierF , frontierB) do if fF (TOP(frontierF )) < fB(TOP(frontierB)) then solution←PROCEED(F,problemF frontierF,reachedF,reachedB,solution) else solution ← PROCEED(B, problemB, frontierB, reachedB, reachedF , solution) return solution function PROCEED(dir, problem, frontier, reached, reached2, solution) returns a solution // Expand node on frontier; check against the other frontier in reached 2 . // The variable “dir” is the direction: either F for forward or B for backward. node ← POP(frontier) for each child in EXPAND(problem, node) do s ←child.STATE if s not in reached or PATH-COST(child) < PATH-COST(reached[s]) then reached[s]←child add child to frontier if s is in reached2 then solution2 ← JOIN-NODES(dir, child, reached2[s])) if PATH-COST(solution2) < PATH-COST(solution) then solution ← solution2 return solution Figure 3.14 Bidirectional best-first search keeps two frontiers and two tables of reached states. When a path in one frontier reaches a state that was also reached in the other half of the search, the two paths are joined (by the function JOIN-NODES) to form a solution. The first solution we get is not guaranteed to be the best; the function TERMINATED determines when to stop looking for new solutions. 7 function RECURSIVE-BEST-FIRST-SEARCH(problem) returns a solution or failure solution, fvalue ← RBFS(problem, NODE(problem.INITIAL), ∞) return solution function RBFS(problem,node,f limit) returns a solution or failure, and a new f-cost limit if problem.IS-GOAL(node.STATE) then return node successors ← LIST(EXPAND(node)) if successors is empty then return failure, ∞ for each s in successors do // update f with value from previous search s.f ← max(s.PATH-COST + h(s), node.f )) while true do best ← the node in successors with lowest f -value if best.f > f limit then return failure, best.f
alternative ← the second-lowest f -value among successors result,best.f ←RBFS(problem,best,min(f limit,alternative)) if result ̸= failure then return result, best.f
Figure 3.22 The algorithm for recursive best-first search.

CHAPTER 4 SEARCH IN COMPLEX
ENVIRONMENTS
function HILL-CLIMBING(problem) returns a state that is a local maximum current ← problem.INITIAL
whiletrue do
neighbor ← a highest-valued successor state of current
if VALUE(neighbor) ≤ VALUE(current) then return current current ← neighbor
Figure 4.2 The hill-climbing search algorithm, which is the most basic local search tech- nique. At each step the current node is replaced by the best neighbor.
function SIMULATED-ANNEALING(problem,schedule) returns a solution state current ← problem.INITIAL
fort =1to∞do
T ←schedule(t)
if T = 0 then return current
next ← a randomly selected successor of current ∆E ← VALUE(current) – VALUE(next)
if ∆E > 0 then current ← next
else current ← next only with probability e−∆E/T
Figure 4.4 The simulated annealing algorithm, a version of stochastic hill climbing where some downhill moves are allowed. The schedule input determines the value of the “temper- ature” T as a function of time.

9
function GENETIC-ALGORITHM(population,fitness) returns an individual repeat
weights ← WEIGHTED-BY(population, fitness) population2 ← empty list
for i = 1 to SIZE(population) do
parent1, parent2 ←WEIGHTED-RANDOM-CHOICES(population,weights,2) child ← REPRODUCE(parent1 , parent2 )
if (small random probability) then child ← MUTATE(child)
add child to population2
population ← population2
until some individual is fit enough, or enough time has elapsed return the best individual in population, according to fitness
function REPRODUCE(parent1 , parent2 ) returns an individual
n ← LENGTH(parent1)
c ← random number from 1 to n
return APPEND(SUBSTRING(parent1,1,c),SUBSTRING(parent2,c +1,n))
Figure 4.7 A genetic algorithm. Within the function, population is an ordered list of indi- viduals, weights is a list of corresponding fitness values for each individual, and fitness is a function to compute these values.
function AND-OR-SEARCH(problem) returns a conditional plan, or failure return OR-SEARCH(problem,problem.INITIAL,[])
function OR-SEARCH(problem, state, path) returns a conditional plan, or failure if problem.IS-GOAL(state) then return the empty plan
if IS-CYCLE(path) then return failure
for each action in problem.ACTIONS(state) do
plan←AND-SEARCH(problem,RESULTS(state,action),[state] + path])
if plan ̸= failure then return [action] + plan] return failure
function AND-SEARCH(problem, states, path) returns a conditional plan, or failure for each si in states do
plani ← OR-SEARCH(problem, si, path)
if plani = failure then return failure
return [if s1 then plan1 else if s2 then plan2 else . . . if sn−1 then plann−1 else plann]
Figure 4.10 An algorithm for searching AND–OR graphs generated by nondeterministic en- vironments. A solution is a conditional plan that considers every nondeterministic outcome and makes a plan for each one.

10 Chapter 4 Search in Complex Environments
function ONLINE-DFS-AGENT(problem, s′) returns an action s, a, the previous state and action, initially null
persistent: result, a table mapping (s, a) to s′, initially empty
untried, a table mapping s to a list of untried actions
unbacktracked , a table mapping s to a list of states never backtracked to
if problem.IS-GOAL(s′) then return stop
if s′ is a new state (not in untried) then untried[s′]←problem.ACTIONS(s′) if s is not null then
result[s,a]←s′
add s to the front of unbacktracked[s′] if untried[s′] is empty then
if unbacktracked[s′] is empty then return stop
else a ←an action b such that result[s′,b] = POP(unbacktracked[s′]) else a ← POP(untried[s′])
s ←s′
return a
Figure 4.20 An online search agent that uses depth-first exploration. The agent can safely explore only in state spaces in which every action can be “undone” by some other action.
function LRTA*-AGENT(problem, s′, h) returns an action s, a, the previous state and action, initially null
persistent: result, a table mapping (s, a) to s′, initially empty
H , a table mapping s to a cost estimate, initially empty
if IS-GOAL(s′) then return stop
if s′ is a new state (not in H) then H[s′]←h(s′) if s is not null then
result[s,a]←s′
H [s] ← min LRTA*-COST(s, b, result[s, b], H )
b∈ACTIONS(s)
a ← argmin LRTA*-COST(problem, s′,b,result[s′,b],H)
b∈ACTIONS(s) s ←s′
return a
function LRTA*-COST(problem,s,a,s′,H) returns a cost estimate if s′ is undefined then return h(s)
else return problem.ACTION-COST(s, a, s′) + H[s′]
Figure 4.23 LRTA∗-AGENT selects an action according to the values of neighboring states, which are updated as the agent moves about the state space.

CHAPTER 5
ADVERSARIAL SEARCH AND GAMES
function MINIMAX-SEARCH(game, state) returns an action player ← game.TO-MOVE(state)
value, move ← MAX-VALUE(game, state)
return move
function MAX-VALUE(game, state) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v ← −∞
for each a in game.ACTIONS(state) do
v2 , a2 ← MIN-VALUE(game, game.RESULT(state, a)) ifv2 >v then
v , move ← v2 , a return v, move
function MIN-VALUE(game, state) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v ← +∞
for each a in game.ACTIONS(state) do
v2 , a2 ← MAX-VALUE(game, game.RESULT(state, a)) ifv2 v then
v, move ←v2, a
α←MAX(α, v)
ifv ≥ βthenreturnv,move
return v, move
function MIN-VALUE(game, state,α,β) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v ← +∞
for each a in game.ACTIONS(state) do
v2 , a2 ← MAX-VALUE(game, game.RESULT(state, a), α, β) if v2 < v then v, move ←v2, a β ← MIN(β, v) ifv ≤ αthenreturnv,move return v, move Figure 5.7 The alpha–beta search algorithm. Notice that these functions are the same as the MINIMAX-SEARCH functions in Figure ??, except that we maintain bounds in the variables α and β, and use them to cut off search when a value is outside the bounds. function MONTE-CARLO-TREE-SEARCH(state) returns an action tree ← NODE(state) while IS-TIME-REMAINING() do leaf ← SELECT(tree) child ← EXPAND(leaf ) result ← SIMULATE(child) BACK-PROPAGATE(result, child) return the move in ACTIONS(state) whose node has highest number of playouts Figure 5.11 The Monte Carlo tree search algorithm. A game tree, tree, is initialized, and then we repeat a cycle of SELECT / EXPAND / SIMULATE / BACK-PROPAGATE until we run out of time, and return the move that led to the node with the highest number of playouts. CHAPTER 6 CONSTRAINT SATISFACTION PROBLEMS function AC-3(csp) returns false if an inconsistency is found and true otherwise queue ← a queue of arcs, initially all the arcs in csp while queue is not empty do (Xi, Xj)←POP(queue) if REVISE(csp, Xi, Xj) then if size of Di = 0 then return false for each Xk in Xi.NEIGHBORS - {Xj} do add (Xk, Xi) to queue return true function REVISE(csp, Xi, Xj) returns true iff we revise the domain of Xi revised ← false for each x in Di do if no value y in Dj allows (x,y) to satisfy the constraint between Xi and Xj then delete x from Di revised ← true return revised Figure 6.3 The arc-consistency algorithm AC-3. After applying AC-3, either every arc is arc-consistent, or some variable has an empty domain, indicating that the CSP cannot be solved. The name “AC-3” was used by the algorithm’s inventor (?) because it was the third version developed in the paper. 14 Chapter 6 Constraint Satisfaction Problems function BACKTRACKING-SEARCH(csp) returns a solution or failure return BACKTRACK(csp, { }) function BACKTRACK(csp,assignment) returns a solution or failure if assignment is complete then return assignment var ←SELECT-UNASSIGNED-VARIABLE(csp,assignment) for each value in ORDER-DOMAIN-VALUES(csp,var,assignment) do ifvalueisconsistentwithassignment then add {var = value} to assignment inferences ←INFERENCE(csp,var,assignment) if inferences ̸= failure then add inferences to csp result ←BACKTRACK(csp,assignment) if result ̸= failure then return result remove inferences from csp remove {var = value} from assignment return failure Figure 6.5 A simple backtracking algorithm for constraint satisfaction problems. The algorithm is modeled on the recursive depth-first search of Chapter ??. The functions SELECT-UNASSIGNED-VARIABLE and ORDER-DOMAIN-VALUES, implement the general- purpose heuristics discussed in Section ??. The INFERENCE function can optionally im- pose arc-, path-, or k-consistency, as desired. If a value choice leads to failure (noticed either by INFERENCE or by BACKTRACK), then value assignments (including those made by INFERENCE) are retracted and a new value is tried. function MIN-CONFLICTS(csp,max steps) returns a solution or failure inputs: csp, a constraint satisfaction problem max steps, the number of steps allowed before giving up current ← an initial complete assignment for csp fori =1tomax steps do if current is a solution for csp then return current var ← a randomly chosen conflicted variable from csp.VARIABLES value ←the value v for var that minimizes CONFLICTS(csp,var,v,current) set var = value in current return failure Figure 6.9 The MIN-CONFLICTS local search algorithm for CSPs. The initial state may be chosen randomly or by a greedy assignment process that chooses a minimal-conflict value for each variable in turn. The CONFLICTS function counts the number of constraints violated by a particular value, given the rest of the current assignment. function TREE-CSP-SOLVER(csp) returns a solution, or failure inputs: csp, a CSP with components X, D, C n ← number of variables in X assignment ← an empty assignment root ← any variable in X X ←TOPOLOGICALSORT(X,root) forj =n downto2do MAKE-ARC-CONSISTENT(PARENT(Xj),Xj) if it cannot be made consistent then return failure for i = 1 to n do assignment [Xi ] ← any consistent value from Di if there is no consistent value then return failure return assignment Figure 6.11 The TREE-CSP-SOLVER algorithm for solving tree-structured CSPs. If the CSP has a solution, we will find it in linear time; if not, we will detect a contradiction. 15 CHAPTER 7 LOGICAL AGENTS function KB-AGENT(percept) returns an action persistent: KB, a knowledge base t, a counter, initially 0, indicating time TELL(KB,MAKE-PERCEPT-SENTENCE(percept,t)) action ← ASK(KB, MAKE-ACTION-QUERY(t)) TELL(KB,MAKE-ACTION-SENTENCE(action,t)) t←t+1 return action Figure 7.1 A generic knowledge-based agent. Given a percept, the agent adds the percept to its knowledge base, asks the knowledge base for the best action, and tells the knowledge base that it has in fact taken that action. 17 function TT-ENTAILS?(KB,α) returns true or false inputs: KB , the knowledge base, a sentence in propositional logic α, the query, a sentence in propositional logic symbols ← a list of the proposition symbols in KB and α return TT-CHECK-ALL(KB,α,symbols,{}) function TT-CHECK-ALL(KB,α,symbols,model) returns true or false if EMPTY?(symbols) then if PL-TRUE?(KB,model) then return PL-TRUE?(α,model) else return true // when KB is false, always return true else P ← FIRST(symbols) rest ← REST(symbols) return (TT-CHECK-ALL(KB,α,rest,model ∪ {P = true}) and TT-CHECK-ALL(KB,α,rest,model ∪ {P = false })) Figure 7.10 A truth-table enumeration algorithm for deciding propositional entailment. (TT stands for truth table.) PL-TRUE? returns true if a sentence holds within a model. The variable model represents a partial model—an assignment to some of the symbols. The key- word and here is an infix function symbol in the pseudocode programming language, not an operator in proposition logic; it takes two arguments and returns true or false. function PL-RESOLUTION(KB, α) returns true or false inputs: KB , the knowledge base, a sentence in propositional logic α, the query, a sentence in propositional logic clauses ← the set of clauses in the CNF representation of KB ∧ ¬α new ← { } whiletrue do for each pair of clauses Ci, Cj in clauses do resolvents ← PL-RESOLVE(Ci, Cj ) if resolvents contains the empty clause then return true new ← new ∪ resolvents if new ⊆ clauses then return false clauses ← clauses ∪ new Figure 7.13 A simple resolution algorithm for propositional logic. PL-RESOLVE returns the set of all possible clauses obtained by resolving its two inputs. 18 Chapter 7 Logical Agents function PL-FC-ENTAILS?(KB,q) returns true or false inputs: KB , the knowledge base, a set of propositional definite clauses q, the query, a proposition symbol count ← a table, where count[c] is initially the number of symbols in clause c’s premise inferred ← a table, where inferred [s ] is initially false for all symbols queue ← a queue of symbols, initially symbols known to be true in KB while queue is not empty do p ← POP(queue) if p = q then return true if inferred[p] = false then inferred[p]←true for each clause c in KB where p is in c.PREMISE do decrement count[c] if count[c] = 0 then add c.CONCLUSION to queue return false Figure 7.15 The forward-chaining algorithm for propositional logic. The agenda keeps track of symbols known to be true but not yet “processed.” The count table keeps track of how many premises of each implication are not yet proven. Whenever a new symbol p from the agenda is processed, the count is reduced by one for each implication in whose premise p appears (easily identified in constant time with appropriate indexing.) If a count reaches zero, all the premises of the implication are known, so its conclusion can be added to the agenda. Finally, we need to keep track of which symbols have been processed; a symbol that is already in the set of inferred symbols need not be added to the agenda again. This avoids redundant work and prevents loops caused by implications such as P ⇒ Q and Q ⇒ P . function DPLL-SATISFIABLE?(s) returns true or false inputs: s, a sentence in propositional logic clauses ← the set of clauses in the CNF representation of s symbols ← a list of the proposition symbols in s return DPLL(clauses, symbols, { }) function DPLL(clauses,symbols,model) returns true or false if every clause in clauses is true in model then return true if some clause in clauses is false in model then return false P,value ←FIND-PURE-SYMBOL(symbols,clauses,model) if P is non-null then return DPLL(clauses,symbols – P,model ∪ {P=value}) P,value ←FIND-UNIT-CLAUSE(clauses,model) if P is non-null then return DPLL(clauses,symbols – P,model ∪ {P=value}) P ← FIRST(symbols); rest ← REST(symbols) return DPLL(clauses,rest,model ∪ {P=true}) or DPLL(clauses,rest,model ∪ {P=false})) Figure 7.17 The DPLL algorithm for checking satisfiability of a sentence in propositional logic. The ideas behind FIND-PURE-SYMBOL and FIND-UNIT-CLAUSE are described in the text; each returns a symbol (or null) and the truth value to assign to that symbol. Like TT-ENTAILS?, DPLL operates over partial models. function WALKSAT(clauses,p,max flips) returns a satisfying model or failure inputs: clauses , a set of clauses in propositional logic p, the probability of choosing to do a “random walk” move, typically around 0.5 max flips, number of value flips allowed before giving up model ← a random assignment of true/false to the symbols in clauses foreachi= 1tomax flips do if model satisfies clauses then return model clause ← a randomly selected clause from clauses that is false in model if RANDOM(0, 1) ≤ p then flip the value in model of a randomly selected symbol from clause else flip whichever symbol in clause maximizes the number of satisfied clauses return failure Figure 7.18 The WALKSAT algorithm for checking satisfiability by randomly flipping the values of variables. Many versions of the algorithm exist. 19 20 Chapter 7 Logical Agents function HYBRID-WUMPUS-AGENT(percept) returns an action inputs: percept, a list, [stench,breeze,glitter,bump,scream] persistent: KB , a knowledge base, initially the atemporal “wumpus physics” t, a counter, initially 0, indicating time plan, an action sequence, initially empty TELL(KB,MAKE-PERCEPT-SENTENCE(percept,t)) TELL the KB the temporal “physics” sentences for time t safe ←{[x,y] : ASK(KB,OKtx,y) = true} if ASK(KB,Glittert) = true then plan ← [Grab] + PLAN-ROUTE(current, {[1,1]}, safe) + [Climb] if plan is empty then unvisited←{[x,y]:ASK(KB,Lt′ )=falseforallt′≤t} x,y plan ← PLAN-ROUTE(current, unvisited ∩ safe, safe) if plan is empty and ASK(KB,HaveArrowt) = true then possible wumpus ←{[x,y] : ASK(KB,¬ Wx,y) = false} plan←PLAN-SHOT(current,possible wumpus,safe) if plan is empty then // no choice but to take a risk not unsafe←{[x,y] : ASK(KB,¬OKtx,y) = false} plan←PLAN-ROUTE(current,unvisited∩not unsafe,safe) if plan is empty then plan ← PLAN-ROUTE(current, {[1, 1]}, safe) + [Climb] action ← POP(plan) TELL(KB,MAKE-ACTION-SENTENCE(action,t)) t←t+1 return action function PLAN-ROUTE(current,goals,allowed) returns an action sequence inputs: current, the agent’s current position goals, a set of squares; try to plan a route to one of them allowed , a set of squares that can form part of the route problem ←ROUTE-PROBLEM(current,goals,allowed) return SEARCH(problem) // Any search algorithm from Chapter ?? Figure 7.20 A hybrid agent program for the wumpus world. It uses a propositional knowl- edge base to infer the state of the world, and a combination of problem-solving search and domain-specific code to choose actions. Each time HYBRID-WUMPUS-AGENT is called, it adds the percept to the knowledge base, and then either relies on a previously-defined plan or creates a new plan, and pops off the first step of the plan as the action to do next. function SATPLAN(init, transition, goal, T max) returns solution or failure inputs: init, transition, goal, constitute a description of the problem T max, an upper limit for plan length fort =0toTmax do cnf ←TRANSLATE-TO-SAT(init, transition, goal,t) model ← SAT-SOLVER(cnf ) if model is not null then return EXTRACT-SOLUTION(model) return failure Figure 7.22 The SATPLAN algorithm. The planning problem is translated into a CNF sen- tence in which the goal is asserted to hold at a fixed time step t and axioms are included for each time step up to t. If the satisfiability algorithm finds a model, then a plan is extracted by looking at those proposition symbols that refer to actions and are assigned true in the model. If no model exists, then the process is repeated with the goal moved one step later. 21 8 FIRST-ORDER LOGIC CHAPTER CHAPTER 9 INFERENCE IN FIRST-ORDER LOGIC function UNIFY(x,y,θ=empty) returns a substitution to make x and y identical, or failure if θ = failure then return failure else if x = y then return θ else if VARIABLE?(x) then return UNIFY-VAR(x,y,θ) else if VARIABLE?(y) then return UNIFY-VAR(y,x,θ) else if COMPOUND?(x) and COMPOUND?(y) then return UNIFY(ARGS(x),ARGS(y),UNIFY(OP(x),OP(y),θ)) else if LIST?(x) and LIST?(y) then return UNIFY(REST(x),REST(y),UNIFY(FIRST(x),FIRST(y),θ)) else return failure function UNIFY-VAR(var,x,θ) returns a substitution if {var/val} ∈ θ for some val then return UNIFY(val,x,θ) else if {x/val} ∈ θ for some val then return UNIFY(var,val,θ) else if OCCUR-CHECK?(var,x) then return failure else return add {var /x } to θ Figure 9.1 The unification algorithm. The arguments x and y can be any expression: a constant or variable, or a compound expression such as a complex sentence or term, or a list of expressions. The argument θ is a substitution, initially the empty substitution, but with {var/val} pairs added to it as we recurse through the inputs, comparing the expressions element by element. In a compound expression such as F (A, B), OP(x ) field picks out the function symbol F and ARGS(x ) field picks out the argument list (A, B). 24 Chapter 9 Inference in First-Order Logic function FOL-FC-ASK(KB,α) returns a substitution or false inputs: KB , the knowledge base, a set of first-order definite clauses α, the query, an atomic sentence while true do new ← { } // The set of new sentences inferred on each iteration for each rule in KB do (p1 ∧ . . . ∧ pn ⇒ q) ← STANDARDIZE-VARIABLES(rule) for each θ such that SUBST(θ,p1 ∧ ... ∧ pn) = SUBST(θ,p1′ ∧ ... ∧ pn′ ) for some p1′ ,...,pn′ in KB q′ ←SUBST(θ,q) if q′ does not unify with some sentence already in KB or new then add q′ to new φ←UNIFY(q′,α) if φ is not failure then return φ if new = { } then return false add new to KB Figure 9.3 A conceptually straightforward, but inefficient, forward-chaining algorithm. On each iteration, it adds to KB all the atomic sentences that can be inferred in one step from the implication sentences and the atomic sentences already in KB. The function STANDARDIZE-VARIABLES replaces all variables in its arguments with new ones that have not been used before. function FOL-BC-ASK(KB,query) returns a generator of substitutions return FOL-BC-OR(KB,query,{}) function FOL-BC-OR(KB,goal,θ) returns a substitution for each rule in FETCH-RULES-FOR-GOAL(KB, goal) do (lhs ⇒ rhs)←STANDARDIZE-VARIABLES(rule) for each θ′ in FOL-BC-AND(KB,lhs,UNIFY(rhs, goal, θ)) do yield θ′ function FOL-BC-AND(KB,goals,θ) returns a substitution if θ = failure then return else if LENGTH(goals) = 0 then yield θ else first,rest ← FIRST(goals), REST(goals) for each θ′ in FOL-BC-OR(KB, SUBST(θ, first), θ) do for each θ′′ in FOL-BC-AND(KB,rest,θ′) do yield θ′′ Figure 9.6 A simple backward-chaining algorithm for first-order knowledge bases. procedure APPEND(ax,y,az,continuation) trail ← GLOBAL-TRAIL-POINTER() if ax = [ ] and UNIFY(y, az ) then CALL(continuation) RESET-TRAIL(trail) a,x,z ←NEW-VARIABLE(), NEW-VARIABLE(), NEW-VARIABLE() if UNIFY(ax,[a] + x) and UNIFY(az,[a | z]) then APPEND(x,y,z,continuation) Figure 9.8 Pseudocode representing the result of compiling the Append predicate. The function NEW-VARIABLE returns a new variable, distinct from all other variables used so far. The procedure CALL(continuation) continues execution with the specified continuation. 25 10 KNOWLEDGE REPRESENTATION CHAPTER 11 AUTOMATED PLANNING Init(At(C1, SFO) ∧ At(C2, JFK) ∧ At(P1, SFO) ∧ At(P2, JFK) ∧ Cargo(C1) ∧ Cargo(C2) ∧ Plane(P1) ∧ Plane(P2) ∧ Airport(JFK) ∧ Airport(SFO)) Goal(At(C1, JFK) ∧ At(C2, SFO)) Action(Load(c, p, a), PRECOND: At(c, a) ∧ At(p, a) ∧ Cargo(c) ∧ Plane(p) ∧ Airport(a) EFFECT: ¬ At(c, a) ∧ In(c, p)) Action(Unload(c, p, a), PRECOND: In(c, p) ∧ At(p, a) ∧ Cargo(c) ∧ Plane(p) ∧ Airport(a) EFFECT: At(c, a) ∧ ¬ In(c, p)) Action(Fly(p, from, to), PRECOND: At(p, from) ∧ Plane(p) ∧ Airport(from) ∧ Airport(to) EFFECT: ¬ At(p, from) ∧ At(p, to)) Figure 11.1 A PDDL description of an air cargo transportation planning problem. Init(Tire(Flat) ∧ Tire(Spare) ∧ At(Flat,Axle) ∧ At(Spare,Trunk)) Goal (At (Spare , Axle )) Action(Remove(obj , loc), PRECOND: At(obj,loc) EFFECT: ¬ At(obj,loc) ∧ At(obj,Ground)) Action(PutOn(t, Axle), PRECOND: Tire(t) ∧ At(t, Ground) ∧ ¬ At(Flat, Axle) ∧ ¬ At(Spare, Axle) EFFECT: ¬ At(t, Ground) ∧ At(t, Axle)) Action (LeaveOvernight , PRECOND: EFFECT: ¬ At(Spare, Ground) ∧ ¬ At(Spare, Axle) ∧ ¬ At(Spare, Trunk) ∧ ¬ At(Flat,Ground) ∧ ¬ At(Flat,Axle) ∧ ¬ At(Flat, Trunk)) Figure 11.2 The simple spare tire problem. CHAPTER 28 Chapter 11 Automated Planning Init(On(A,Table) ∧ On(B,Table) ∧ On(C,A) ∧ Block(A) ∧ Block(B) ∧ Block(C) ∧ Clear(B) ∧ Clear(C) ∧ Clear(Table)) Goal(On(A,B) ∧ On(B,C)) Action(Move(b, x, y), PRECOND: On(b,x) ∧ Clear(b) ∧ Clear(y) ∧ Block(b) ∧ Block(y) ∧ (b̸=x) ∧ (b̸=y) ∧ (x̸=y), EFFECT:On(b,y) ∧ Clear(x) ∧ ¬On(b,x) ∧ ¬Clear(y)) Action (MoveToTable (b, x), PRECOND: On(b,x) ∧ Clear(b) ∧ Block(b) ∧ Block(x), EFFECT: On(b,Table) ∧ Clear(x) ∧ ¬On(b,x)) Figure 11.4 A planning problem in the blocks world: building a three-block tower. One solution is the sequence [MoveToTable(C, A), Move(B, Table, C), Move(A, Table, B)]. Refinement (Go (Home , SFO ), STEPS: [Drive(Home, SFOLongTermParking), Shuttle (SFOLongTermParking , SFO )] ) Refinement (Go (Home , SFO ), STEPS: [Taxi(Home,SFO)] ) Refinement(Navigate([a, b], [x, y]), PRECOND: a=x ∧ b=y STEPS: [ ] ) Refinement(Navigate([a, b], [x, y]), PRECOND:Connected([a, b], [a − 1, b]) STEPS: [Left, Navigate([a − 1, b], [x, y])] ) Refinement(Navigate([a, b], [x, y]), PRECOND:Connected([a, b], [a + 1, b]) STEPS: [Right, Navigate([a + 1, b], [x, y])] ) ... Figure 11.7 Definitions of possible refinements for two high-level actions: going to San Francisco airport and navigating in the vacuum world. In the latter case, note the recursive nature of the refinements and the use of preconditions. function HIERARCHICAL-SEARCH(problem,hierarchy) returns a solution or failure frontier ← a FIFO queue with [Act] as the only element while true do if IS-EMPTY(frontier) then return failure plan ←POP(frontier) //chooses the shallowest plan in frontier hla ← the first HLA in plan, or null if none prefix ,suffix ← the action subsequences before and after hla in plan outcome ← RESULT(problem.INITIAL, prefix) if hla is null then // so plan is primitive and outcome is its result if problem.IS-GOAL(outcome) then return plan else for each sequence in REFINEMENTS(hla,outcome,hierarchy) do add APPEND(prefix,sequence,suffix) to frontier Figure 11.8 A breadth-first implementation of hierarchical forward planning search. The initial plan supplied to the algorithm is [Act]. The REFINEMENTS function returns a set of action sequences, one for each refinement of the HLA whose preconditions are satisfied by the specified state, outcome. 29 30 Chapter 11 Automated Planning function ANGELIC-SEARCH(problem,hierarchy,initialPlan) returns solution or fail frontier ← a FIFO queue with initialPlan as the only element while true do if EMPTY?(frontier) then return fail plan ←POP(frontier) //chooses the shallowest node in frontier if REACH+(problem.INITIAL,plan) intersects problem.GOAL then if plan is primitive then return plan // REACH+ is exact for primitive plans guaranteed←REACH−(problem.INITIAL,plan) ∩ problem.GOAL if guaranteed̸={ } and MAKING-PROGRESS(plan, initialPlan) then finalState ← any element of guaranteed return DECOMPOSE(hierarchy,problem.INITIAL,plan,finalState) hla ← some HLA in plan prefix ,suffix ← the action subsequences before and after hla in plan outcome ← RESULT(problem.INITIAL, prefix) for each sequence in REFINEMENTS(hla,outcome,hierarchy) do frontier ←Insert(APPEND(prefix,sequence,suffix),frontier) function DECOMPOSE(hierarchy,s0 ,plan,sf ) returns a solution solution ← an empty plan while plan is not empty do action ← REMOVE-LAST(plan) si ←a state in REACH−(s0 ,plan) such that sf ∈REACH−(si,action) problem ← a problem with INITIAL = si and GOAL = sf solution ←APPEND(ANGELIC-SEARCH(problem,hierarchy,action),solution) sf ←si return solution Figure 11.11 A hierarchical planning algorithm that uses angelic semantics to identify and commit to high-level plans that work while avoiding high-level plans that don’t. The predi- cate MAKING-PROGRESS checks to make sure that we aren’t stuck in an infinite regression of refinements. At top level, call ANGELIC-SEARCH with [Act] as the initialPlan. Jobs ({AddEngine1 ≺ AddWheels1 ≺ Inspect1 }, {AddEngine2 ≺ AddWheels2 ≺ Inspect2 }) Resources(EngineHoists(1), WheelStations(1), Inspectors(e2), LugNuts(500)) Action(AddEngine1, DURATION:30, USE:EngineHoists(1)) Action(AddEngine2, DURATION:60, USE:EngineHoists(1)) Action(AddWheels1, DURATION:30, CONSUME:LugNuts(20), USE:WheelStations(1)) Action(AddWheels2, DURATION:15, CONSUME:LugNuts(20), USE:WheelStations(1)) Action(Inspecti, DURATION:10, USE:Inspectors(1)) Figure 11.13 A job-shop scheduling problem for assembling two cars, with resource con- straints. The notation A ≺ B means that action A must precede action B. 31 12 QUANTIFYING UNCERTAINTY function DT-AGENT(percept) returns an action persistent: belief state, probabilistic beliefs about the current state of the world action, the agent’s action update belief state based on action and percept calculate outcome probabilities for actions, given action descriptions and current belief state select action with highest expected utility given probabilities of outcomes and utility information return action Figure 12.1 A decision-theoretic agent that selects rational actions. CHAPTER 13 PROBABILISTIC REASONING function ENUMERATION-ASK(X , e, bn) returns a distribution over X inputs: X , the query variable e, observed values for variables E bn, a Bayes net with variables vars Q(X ) ← a distribution over X , initially empty for each value xi of X do Q(xi) ← ENUMERATE-ALL(vars, exi ) where exi is e extended with X = xi return NORMALIZE(Q(X)) function ENUMERATE-ALL(vars, e) returns a real number if EMPTY?(vars) then return 1.0 V ← FIRST(vars) if V is an evidence variable with value v in e thenreturnP(v|parents(V)) × ENUMERATE-ALL(REST(vars),e) elsereturn􏲹v P(v|parents(V)) × ENUMERATE-ALL(REST(vars),ev) where ev is e extended with V = v Figure 13.11 The enumeration algorithm for exact inference in Bayes nets. function ELIMINATION-ASK(X , e, bn) returns a distribution over X inputs: X , the query variable e, observed values for variables E bn, a Bayesian network with variables vars factors ← [ ] for each V in ORDER(vars) do factors←[MAKE-FACTOR(V,e)] + factors if V is a hidden variable then factors ←SUM-OUT(V,factors) return NORMALIZE(POINTWISE-PRODUCT(factors)) Figure 13.13 The variable elimination algorithm for exact inference in Bayes nets. CHAPTER 34 Chapter 13 Probabilistic Reasoning function PRIOR-SAMPLE(bn) returns an event sampled from the prior specified by bn inputs: bn , a Bayesian network specifying joint distribution P(X1 , . . . , Xn ) x ← an event with n elements for each variable Xi in X1,...,Xn do x[i] ← a random sample from P(Xi | parents(Xi)) return x Figure 13.16 A sampling algorithm that generates events from a Bayesian network. Each variable is sampled according to the conditional distribution given the values already sampled for the variable’s parents. function REJECTION-SAMPLING(X , e, bn, N ) returns an estimate of P(X | e) inputs: X , the query variable e, observed values for variables E bn, a Bayesian network N , the total number of samples to be generated local variables: C, a vector of counts for each value of X , initially zero for j = 1 to N do x ← PRIOR-SAMPLE(bn) if x is consistent with e then C[j]←C[j]+1 where xj is the value of X in x return NORMALIZE(C) Figure 13.17 The rejection-sampling algorithm for answering queries given evidence in a Bayesian network. 35 function LIKELIHOOD-WEIGHTING(X , e, bn, N ) returns an estimate of P(X | e) inputs: X , the query variable e, observed values for variables E bn, a Bayesian network specifying joint distribution P(X1, . . . , Xn) N , the total number of samples to be generated local variables: W, a vector of weighted counts for each value of X , initially zero for j = 1 to N do x, w ← WEIGHTED-SAMPLE(bn, e) W[j]←W[j]+w wherexj isthevalueofX inx return NORMALIZE(W) function WEIGHTED-SAMPLE(bn, e) returns an event and a weight w ← 1; x ← an event with n elements, with values fixed from e fori =1tondo if Xi is an evidence variable with value xij in e then w ←w × P(Xi = xij |parents(Xi)) else x[i] ← a random sample from P(Xi | parents(Xi )) return x, w Figure 13.18 The likelihood-weighting algorithm for inference in Bayesian networks. In WEIGHTED-SAMPLE, each nonevidence variable is sampled according to the conditional distribution given the values already sampled for the variable’s parents, while a weight is accumulated based on the likelihood for each evidence variable. function GIBBS-ASK(X , e, bn, N ) returns an estimate of P(X | e) local variables: C, a vector of counts for each value of X , initially zero Z, the nonevidence variables in bn x, the current state of the network, initialized from e initialize x with random values for the variables in Z fork =1toN do choose any variable Zi from Z according to any distribution ρ(i) set the value of Zi in x by sampling from P(Zi | mb(Zi)) C[j]←C[j]+1wherexj isthevalueofX inx return NORMALIZE(C) Figure 13.20 The Gibbs sampling algorithm for approximate inference in Bayes nets; this version chooses variables at random, but cycling through the variables but also works. 14 PROBABILISTIC REASONING OVER TIME function FORWARD-BACKWARD(ev, prior ) returns a vector of probability distributions inputs: ev, a vector of evidence values for steps 1, . . . , t prior, the prior distribution on the initial state, P(X0) local variables: fv, a vector of forward messages for steps 0, . . . , t b, a representation of the backward message, initially all 1s sv, a vector of smoothed estimates for steps 1, . . . , t fv[0] ← prior for i = 1 to t do fv[i] ← FORWARD(fv[i − 1], ev[i]) fori= tdownto1do sv[i] ← NORMALIZE(fv[i] × b) b ← BACKWARD(b, ev[i]) return sv Figure 14.4 The forward–backward algorithm for smoothing: computing posterior prob- abilities of a sequence of states given a sequence of observations. The FORWARD and BACKWARD operators are defined by Equations (??) and (??), respectively. CHAPTER function FIXED-LAG-SMOOTHING(et,hmm,d) returns a distribution over Xt−d inputs: et, the current evidence for time step t hmm, a hidden Markov model with S × S transition matrix T d, the length of the lag for smoothing persistent: t, the current time, initially 1 f, the forward message P(Xt | e1:t), initially hmm.PRIOR B, the d-step backward transformation matrix, initially the identity matrix et−d:t, double-ended list of evidence from t − d to t, initially empty local variables: Ot−d , Ot , diagonal matrices containing the sensor model information add et to the end of et−d:t Ot ← diagonal matrix containing P(et | Xt) if t > d then
f ← FORWARD(f, et−d)
remove et−d−1 from the beginning of et−d:t
Ot−d ← diagonal matrix containing P(et−d | Xt−d )
B ← O−1 T−1BTOt t−d
else B ← BTOt
t←t+1
if t > d + 1 then return NORMALIZE(f × B1) else return null
Figure 14.6 An algorithm for smoothing with a fixed time lag of d steps, implemented as an online algorithm that outputs the new smoothed estimate given the observation for a new time step. Notice that the final output NORMALIZE(f × B1) is just α f × b, by Equation (??).
function PARTICLE-FILTERING(e, N , dbn) returns a set of samples for the next time step inputs: e, the new incoming evidence
N , the number of samples to be maintained
dbn, a DBN defined by P(X0), P(X1 | X0), and P(E1 | X1) persistent: S, a vector of samples of size N, initially generated from P(X0) local variables: W , a vector of weights of size N
37
for i = 1 to N do
S[i]←sample from P(X1 |X0 = S[i]) //step 1 W[i]←P(e|X1 = S[i]) //step 2
S ← WEIGHTED-SAMPLE-WITH-REPLACEMENT(N , S , W ) return S
// step 3
Figure 14.17 The particle filtering algorithm implemented as a recursive update oper- ation with state (the set of samples). Each of the sampling operations involves sam- pling the relevant slice variables in topological order, much as in PRIOR-SAMPLE. The WEIGHTED-SAMPLE-WITH-REPLACEMENT operation can be implemented to run in O(N) expected time. The step numbers refer to the description in the text.

15
PROBABILISTIC PROGRAMMING
CHAPTER
type Researcher, Paper, Citation random String Name(Researcher) random String Title(Paper)
random Paper PubCited(Citation) random String Text(Citation)
random Boolean Professor(Researcher) origin Researcher Author(Paper)
#Researcher ∼ OM(3,1)
Name(r) ∼ NamePrior()
Professor(r) ∼ Boolean(0.2)
#Paper (Author = r) ∼ if Professor (r) then OM (1.5, 0.5) else OM (1, 0.5) Title(p) ∼ PaperTitlePrior()
CitedPaper(c) ∼ UniformChoice({Paper p})
Text(c) ∼ HMMGrammar(Name(Author(CitedPaper(c))),Title(CitedPaper(c)))
Figure15.5 AnOUPMforcitationinformationextraction.Forsimplicitythemodelassumes one author per paper and omits details of the grammar and error models.

39
#SeismicEvents ∼ Poisson(T ∗ λe)
Time(e) ∼ UniformReal(0,T)
EarthQuake(e) ∼ Boolean(0.999)
Location(e) ∼ if Earthquake(e) then SpatialPrior() else UniformEarth() Depth (e) ∼ if Earthquake (e) then UniformReal (0, 700) else Exactly (0) Magnitude(e) ∼ Exponential(log(10))
Detected(e,p,s) ∼ Logistic(weights(s,p),Magnitude(e), Depth(e), Dist(e,s)) #Detections(site = s) ∼ Poisson(T ∗ λf (s))
#Detections(event=e, phase=p, station=s) = ifDetected(e,p,s)then1else0 OnsetTime(a,s) if (event(a) = null) then ∼ UniformReal(0,T)
else=Time(event(a)) + GeoTT(Dist(event(a),s),Depth(event(a)),phase(a)) + Laplace(μt(s), σt(s))
Amplitude(a,s) if (event(a) = null) then ∼ NoiseAmpModel(s)
else = AmpModel (Magnitude (event (a)), Dist (event (a), s), Depth (event (a)), phase (a))
Azimuth(a, s) if (event(a) = null) then ∼ UniformReal(0, 360)
else = GeoAzimuth(Location(event(a)),Depth(event(a)),phase(a),Site(s))
+ Laplace (0, σa (s)) Slowness(a,s)if(event(a)=null)then ∼ UniformReal(0,20)
else = GeoSlowness(Location(event(a)),Depth(event(a)),phase(a),Site(s)) + Laplace(0,σs(s))
ObservedPhase(a,s) ∼ CategoricalPhaseModel(phase(a)) Figure 15.6 A simplified version of the NET-VISA model (see text).
#Aircraft(EntryTime =t) ∼ Poisson(λa)
Exits(a,t) ∼ if InFlight(a,t) then Boolean(αe)
InFlight(a,t) = (t=EntryTime(a)) ∨ (InFlight(a,t−1) ∧ ¬Exits(a,t−1)) X(a,t) ∼ ift = EntryTime(a)thenInitX()
else if InFlight (a, t) then N (F X (a, t − 1), Σx )
#Blip(Source=a, Time=t) ∼ if InFlight(a,t) then Bernoulli(DetectionProb(X(a,t))) #Blip(Time=t) ∼ Poisson(λf )
Z(b) ∼ if Source(b)=null then UniformZ(R) else N(HX(Source(b),Time(b)),Σz)
Figure 15.9 An OUPM for radar tracking of multiple targets with false alarms, detection failure, and entry and exit of aircraft. The rate at which new aircraft enter the scene is λa, while the probability per time step that an aircraft exits the scene is αe. False alarm blips (i.e., ones not produced by an aircraft) appear uniformly in space at a rate of λf per time step. The probability that an aircraft is detected (i.e., produces a blip) depends on its current position.

40 Chapter 15 Probabilistic Programming
function GENERATE-IMAGE() returns an image with some letters letters ← GENERATE-LETTERS(10)
return RENDER-NOISY-IMAGE(letters, 32, 128)
function GENERATE-LETTERS(λ) returns a vector of letters n ∼ Poisson(λ)
letters ← [ ]
for i = 1 to n do
letters[i] ∼ UniformChoice({a,b,c,···}) return letters
function RENDER-NOISY-IMAGE(letters,width,height) returns a noisy image of the letters clean image←RENDER(letters,width,height,text top=10,text left=10)
noisy image←[]
noise variance ∼ UniformReal(0.1, 1)
for row = 1 to width do for col = 1 to height do
noisy image[row,col] ∼ N(clean image[row,col],noise variance) return noisy image
Figure15.11 Generativeprogramforanopen-universeprobabilitymodelforopticalcharac- ter recognition. The generative program produces degraded images containing sequences of letters by generating each sequence, rendering it into a 2D image, and incorporating additive noise at each pixel.
function GENERATE-MARKOV-LETTERS(λ) returns a vector of letters n ∼ Poisson(λ)
letters ← [ ]
letter probs ← MARKOV-INITIAL()
for i = 1 to n do
letters[i] ∼ Categorical(letter probs)
letter probs ← MARKOV-TRANSITION(letters[i])
return letters
Figure 15.15 Generative program for an improved optical character recognition model that generates letters according to a letter bigram model whose pairwise letter frequencies are estimated from a list of English words.

16
MAKING SIMPLE DECISIONS
function INFORMATION-GATHERING-AGENT(percept) returns an action persistent: D, a decision network
integrate percept into D
j ← the value that maximizes VPI (Ej ) / C (Ej ) if VPI(Ej) > C(Ej)
then return Request(Ej) else return the best action from D
Figure 16.9 Design of a simple, myopic information-gathering agent. The agent works by repeatedly selecting the observation with the highest information value, until the cost of the next observation is greater than its expected benefit.
CHAPTER

17
MAKING COMPLEX DECISIONS
function VALUE-ITERATION(mdp, ǫ) returns a utility function
inputs: mdp , an MDP with states S , actions A(s), transition model P (s′ | s, a),
rewards R(s, a, s′), discount γ
ǫ, the maximum error allowed in the utility of any state
local variables: U , U ′, vectors of utilities for states in S , initially zero
δ, the maximum relative change in the utility of any state
repeat
U ← U ′; δ ← 0
for each state s in S do
U ′[s] ← maxa ∈ A(s) Q-VALUE(mdp, s, a, U )
if|U′[s] − U[s]| > δthenδ←|U′[s] − U[s]| untilδ ≤ ǫ(1−γ)/γ
return U
Figure 17.6 The value iteration algorithm for calculating utilities of states. The termination
condition is from Equation (??).
function POLICY-ITERATION(mdp) returns a policy
inputs: mdp , an MDP with states S , actions A(s), transition model P (s′ | s, a) local variables: U , a vector of utilities for states in S , initially zero
π, a policy vector indexed by state, initially random
repeat
U ← POLICY-EVALUATION(π, U , mdp) unchanged ? ← true
for each state s in S do
a∗ ← argmax Q-VALUE(mdp, s, a, U ) a∈A(s)
ifQ-VALUE(mdp,s,a∗,U) > Q-VALUE(mdp,s,π[s],U)then π[s]←a∗; unchanged?←false
until unchanged? return π
Figure 17.9 The policy iteration algorithm for calculating an optimal policy.
CHAPTER

function POMDP-VALUE-ITERATION(pomdp, ǫ) returns a utility function
inputs: pomdp , a POMDP with states S , actions A(s), transition model P (s′ | s, a),
sensor model P (e | s), rewards R(s), discount γ
ǫ, the maximum error allowed in the utility of any state
local variables: U , U ′, sets of plans p with associated utility vectors αp
U ′ ← a set containing just the empty plan [ ], with α[ ](s) = R(s) repeat
U ←U′
U ′ ← the set of all plans consisting of an action and, for each possible next percept,
a plan in U with utility vectors computed according to Equation (??) U ′ ← REMOVE-DOMINATED-PLANS(U ′)
untilMAX-DIFFERENCE(U,U′) ≤ ǫ(1−γ)/γ return U
Figure 17.16 A high-level sketch of the value iteration algorithm for POMDPs. The REMOVE-DOMINATED-PLANS step and MAX-DIFFERENCE test are typically implemented as linear programs.
43

18
MULTIAGENT DECISION MAKING
Actors(A,B)
Init(At(A,LeftBaseline) ∧ At(B,RightNet) ∧
Approaching(Ball,RightBaseline) ∧ Partner(A,B) ∧ Partner(B,A) Goal(Returned(Ball) ∧ (At(x,RightNet) ∨ At(x,LeftNet)) Action(Hit(actor,Ball),
PRECOND:Approaching(Ball,loc) ∧ At(actor,loc)
EFFECT:Returned(Ball)) Action(Go(actor,to),
PRECOND:At(actor,loc) ∧ to ̸= loc, EFFECT:At(actor,to) ∧ ¬ At(actor,loc))
Figure 18.1 The doubles tennis problem. Two actors, A and B, are playing together and can be in one of four locations: LeftBaseline, RightBaseline, LeftNet, and RightNet. The ball can be returned only if a player is in the right place. The NoOp action is a dummy, which has no effect. Note that each action must include the actor as an argument.
CHAPTER

19
LEARNING FROM EXAMPLES
function LEARN-DECISION-TREE(examples, attributes, parent examples) returns a tree
if examples is empty then return PLURALITY-VALUE(parent examples) else if all examples have the same classification then return the classification else if attributes is empty then return PLURALITY-VALUE(examples)
else
A ← argmaxa ∈ attributes IMPORTANCE(a, examples) tree ← a new decision tree with root test A
for each value v of A do
exs ←{e : e∈examples and e.A = v}
subtree ← LEARN-DECISION-TREE(exs, attributes − A, examples) add a branch to tree with label (A = v ) and subtree subtree
return tree
Figure19.5 Thedecisiontreelearningalgorithm.ThefunctionIMPORTANCEisdescribedin Section ??. The function PLURALITY-VALUE selects the most common output value among a set of examples, breaking ties randomly.
CHAPTER

46 Chapter 19 Learning from Examples
function MODEL-SELECTION(Learner,examples,k) returns a (hypothesis, error rate) pair
err ← an array, indexed by size, storing validation-set error rates training set , test set ← a partition of examples into two sets for size = 1 to ∞ do
err[size]←CROSS-VALIDATION(Learner,size,training set,k) if err is starting to increase significantly then
best size←the value of size with minimum err[size] h←Learner(best size,training set)
return h, ERROR-RATE(h, test set)
function CROSS-VALIDATION(Learner,size,examples,k) returns error rate
N ← the number of examples errs ← 0
for i = 1 to k do
validation set←examples[(i − 1) × N/k:i × N/k] training set ← examples − validation set h←Learner(size,training set)
errs ←errs + ERROR-RATE(h,validation set)
return errs / k // average error rate on validation sets, across k-fold cross-validation
Figure 19.8 An algorithm to select the model that has the lowest validation error. It builds models of increasing complexity, and choosing the one with best empirical error rate, err, on the validation data set. Learner(size,examples) returns a hypothesis whose complexity is set by the parameter size, and which is trained on examples. In CROSS-VALIDATION, each iteration of the for loop selects a different slice of the examples as the validation set, and keeps the other examples as the training set. It then returns the average validation set error over all the folds. Once we have determined which value of the size parameter is best, MODEL-SELECTION returns the model (i.e., learner/hypothesis) of that size, trained on all the training examples, along with its error rate on the held-out test examples.
function DECISION-LIST-LEARNING(examples) returns a decision list, or failure
if examples is empty then return the trivial decision list No
t ← a test that matches a nonempty subset examples t of examples
such that the members of examplest are all positive or all negative
if there is no such t then return failure
if the examples in examples t are positive then o ← Yes else o ← No
return a decision list with initial test t and outcome o and remaining tests given by
DECISION-LIST-LEARNING(examples − examplest) Figure 19.11 An algorithm for learning decision lists.

function ADABOOST(examples, L, K ) returns a hypothesis
inputs: examples, set of N labeled examples (x1, y1), . . . , (xN , yN )
L, a learning algorithm
K , the number of hypotheses in the ensemble
local variables: w, a vector of N example weights, initially all 1/N
h, a vector of K hypotheses
z, a vector of K hypothesis weights
ǫ ← a small positive number, used to avoid division by zero for k = 1 to K do
h[k ] ← L(examples , w)
error ← 0
for j = 1 to N do //Compute the total error for h[k]
if h[k](xj) ̸= yj then error ←error + w[j]
if error > 1/2 then break from loop
error ← min(error , 1 − ǫ)
for j = 1 to N do // Give more weight to the examples h[k ] got wrong
ifh[k](xj)=yj thenw[j]←w[j]· error/(1−error)
w ← NORMALIZE(w)
z[k ] ← 1 log ((1 − error )/error ) // Give more weight to accurate h[k ] 2􏲹
return Function(x) : zi hi(x)
Figure 19.25 The ADABOOST variant of the boosting method for ensemble learning. The algorithm generates hypotheses by successively reweighting the training examples. The func- tion WEIGHTED-MAJORITY generates a hypothesis that returns the output value with the highest vote from the hypotheses in h, with votes weighte􏲹d by z. For regression problems, or for binary classification with two classes -1 and 1, this is k h[k]z[k].
47

20
LEARNING PROBABILISTIC MODELS
CHAPTER

21
DEEP LEARNING
CHAPTER

22
REINFORCEMENT LEARNING
function PASSIVE-ADP-LEARNER(percept) returns an action
inputs: percept, a percept indicating the current state s′ and reward signal r persistent: π, a fixed policy
mdp, an MDP with model P, rewards R, actions A, discount γ
U , a table of utilities for states, initially empty
N s′ |s,a , a table of outcome count vectors indexed by state and action, initially zero s, a, the previous state and action, initially null
ifs′ isnewthenU[s′]←0 if s is not null then
increment N s′ |s,a [s , a ][s ’]
R[s, a, s′] ← r
add a to A[s]
P(· | s, a) ← NORMALIZE(N s′|s,a[s, a]) U ←POLICYEVALUATION(π,U,mdp) s,a ←s′,π[s′]
return a
Figure 22.2 A passive reinforcement learning agent based on adaptive dynamic program- ming. The agent chooses a value for γ and then incrementally computes the P and R values of the MDP. The POLICY-EVALUATION function solves the fixed-policy Bellman equations, as described on page ??.
CHAPTER

51
function PASSIVE-TD-LEARNER(percept) returns an action
inputs: percept, a percept indicating the current state s′ and reward signal r persistent: π, a fixed policy
s, the previous state, initially null
U , a table of utilities for states, initially empty Ns, a table of frequencies for states, initially zero
if s′ is new then U[s′]←0 if s is not null then
increment N s [s ]
U[s]←U[s]+α(Ns[s])×(r +γU[s′]-U[s]) s ←s′
return π[s′]
Figure 22.4 A passive reinforcement learning agent that learns utility estimates using tem-
poral differences. The step-size function α(n) is chosen to ensure convergence.
function Q-LEARNING-AGENT(percept) returns an action
inputs: percept, a percept indicating the current state s′ and reward signal r persistent: Q , a table of action values indexed by state and action, initially zero
Nsa , a table of frequencies for state–action pairs, initially zero s, a, the previous state and action, initially null
if s is not null then
increment Nsa [s , a ]
Q[s,a]←Q[s,a] + α(Nsa[s,a])(r + γ maxa′ Q[s′,a′] − Q[s,a])
s,a ←s′,argmaxa′ f(Q[s′,a′],Nsa[s′,a′]) return a
Figure 22.8 An exploratory Q-learning agent. It is an active learner that learns the value Q(s, a) of each action in each situation. It uses the same exploration function f as the ex- ploratory ADP agent, but avoids having to learn the transition model.

23
NATURAL LANGUAGE PROCESSING
function CYK-PARSE(words,grammar) returns a table of parse trees inputs: words, a list of words
grammar, a structure with LEXICALRULES and GRAMMARRULES
T ←a table //T[X, i, k] is most probable X tree spanning wordsi:k
P ←a table, initially all 0 //P[X, i, k] is probability of tree T[X, i, k] // Insert lexical categories for each word.
for i = 1 to LEN(words) do
for each (X, p) in grammar.LEXICALRULES(wordsi) do P [X , i , i ] ← p
T[X, i, i]←TREE(X, wordsi)
//Construct Xi:k from Yi:j + Zj+1:k, shortest spans first. for each (i, j, k) in SUBSPANS(LEN(words)) do
for each (X, Y, Z, p) in grammar.GRAMMARRULES do PYZ ←P[Y, i, j] × P[Z, j +1, k] × p
ifPYZ > P[X, i, k]do
P [X , i , k ] ← PYZ
T[X,i,k]←TREE(X,T[Y,i,j],T[Z,j +1,k]) return T
function SUBSPANS(N) yields (i, j, k) tuples forlength=2toN do
fori =1toN + 1 − length do k←i + length − 1
for j = i to k − 1 do
yield (i, j, k)
Figure 23.5 The CYK algorithm for parsing. Given a sequence of words, it finds the most probable parse tree for the sequence and its subsequences. The table P[X,i,k] gives the probability of the most probable tree of category X spanning wordsi:k. The output table T[X, i, k] contains the most probable tree of category X spanning positions i to k inclu- sive. The function SUBSPANS returns all tuples (i,j,k) covering a span of wordsi:k, with i ≤ j < k, listing the tuples by increasing length of the i : k span, so that when we go to combine two shorter spans into a longer one, the shorter spans are already in the table. LEXICALRULES(word) returns a collection of (X, p) pairs, one for each rule of the form X →word[htbp],andGRAMMARRULESgives(X,Y,Z,p)tuples,oneforeachgrammar ruleoftheformX →Y Z [p]. CHAPTER .] [SBAR-ADV as if [S [NP she] [VP did n’t [VP [VP hear [NP *-1]] or [VP [ADVP even] see [NP *-1]] [NP-1 him]]]]]]]] Figure 23.8 Annotated tree for the sentence “Her eyes were glazed as if she didn’t hear or even see him.” from the Penn Treebank. Note a grammatical phenomenon we have not covered yet: the movement of a phrase from one part of the tree to another. This tree analyzes the phrase “hear or even see him” as consisting of two constituent VP s, [VP hear [NP *-1]] and [VP [ADVP even] see [NP *-1]], both of which have a missing object, denoted *-1, which refers to the NP labeled elsewhere in the tree as [NP-1 him]. Similarly, the [NP *-2] refers to the [NP-2 Her eyes]. 53 [ [S [NP-2 Her eyes] [VP were [VP glazed [NP *-2] 24 DEEP LEARNING FOR NATURAL LANGUAGE PROCESSING It is a truth universally acknowledged that the earth is not the center of the uni- verse. There are those who assert there is. I do not accept them, but others I consider to be of the same opinion. The truth is, however, that if there are other than the center, and if there are any other living things in the universe and if they are not human, then we do not yet have our answers. We have to go on. This page gives a simplified, simplified answer to the problem. We don’t have all the answers. The truth is, however, that the truth is out there. When Gregor Samsa woke up one morning, he did not notice anything strange. “When my wife is looking at me, I feel like she is looking at a piece of art,” he said. “I think she is admiring something I have created.” The idea is that by looking at your own life, you learn something important and become a better person. It is a theory that emerged from psychologist Daniel Goleman’s work, in which he asked “How do you know you’re not a loser?” Alice was beginning to get very tired of sitting with her sister on the bank. She sat up, yawned, and said, with a loud little scream, “I hope you don’t mind if I keep on doing what I should like to do, and if someone asks me which of us will do more, don’t tell them that I won’t do much, my dear sister.” All happy families are alike; each happy family is like a garden of paradise. The only difference between happy families and unhappy families, is that the unhappy family doesn’t have any flowers or trees. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Please fill out the following details. Thank you... Thank you for your interest in this interview. Please wait... Figure 24.13 Example completion texts generated by the GPT-2 language model, given the prompts in bold. Most of the texts are quite fluent English, at least locally. The final example demonstrates that sometimes the model just breaks down. CHAPTER 25 COMPUTER VISION CHAPTER 26 ROBOTICS function MONTE-CARLO-LOCALIZATIONa, z, N, P(X′|X, v, ω), P(z|z∗), map returns a set of samples, S, for the next time step inputs: a, robot velocities v and ω z, a vector of M range scan data points P(X′|X, v, ω), motion model P (z|z∗), a range sensor noise model map, a 2D map of the environment persistent: S, a vector of N samples local variables: W , a vector of N weights S′, a temporary vector of N samples if S is empty then for i = 1 to N do // initialization phase S[i] ← sample from P (X0) for i = 1 to N do //update cycle S′[i]←sample from P(X′|X = S[i],v,ω) W[i]←1 for j = 1 to M do z∗ ← RAYCAST(j, X = S′[i], map) W[i]←W[i] · P(zj| z∗) S ← WEIGHTED-SAMPLE-WITH-REPLACEMENT(N , S′, W ) return S Figure 26.6 A Monte Carlo localization algorithm using a range-scan sensor model with independent noise. CHAPTER 27 PHILOSOPHY, ETHICS, AND SAFETY OF AI CHAPTER 28 THE FUTURE OF AI CHAPTER 29 MATHEMATICAL BACKGROUND CHAPTER 30 NOTES ON LANGUAGES AND ALGORITHMS CHAPTER

Posted in Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *

CS代考计算机代写 algorithm flex deep learning Bayesian network data structure Bayesian decision tree AI Hidden Markov Mode chain 1

1
INTRODUCTION
CHAPTER

CHAPTER 2 INTELLIGENT AGENTS
function TABLE-DRIVEN-AGENT(percept) returns an action persistent: percepts, a sequence, initially empty
table, a table of actions, indexed by percept sequences, initially fully specified
append percept to the end of percepts action ←LOOKUP(percepts,table) return action
Figure 2.7 The TABLE-DRIVEN-AGENT program is invoked for each new percept and re- turns an action each time. It retains the complete percept sequence in memory.
function REFLEX-VACUUM-AGENT([location,status]) returns an action
if status = Dirty then return Suck else if location = A then return Right else if location = B then return Left
Figure2.8 Theagentprogramforasimplereflexagentinthetwo-locationvacuumenviron- ment. This program implements the agent function tabulated in Figure ??.
function SIMPLE-REFLEX-AGENT(percept) returns an action persistent: rules, a set of condition–action rules
state ← INTERPRET-INPUT(percept) rule ← RULE-MATCH(state, rules) action ← rule.ACTION
return action
Figure 2.10 A simple reflex agent. It acts according to a rule whose condition matches the current state, as defined by the percept.

function MODEL-BASED-REFLEX-AGENT(percept) returns an action persistent: state, the agent’s current conception of the world state
transition model , a description of how the next state depends on the current state and action
sensor model , a description of how the current world state is reflected in the agent’s percepts
rules, a set of condition–action rules action, the most recent action, initially none
state←UPDATE-STATE(state,action,percept,transition model,sensor model) rule ← RULE-MATCH(state, rules)
action ← rule.ACTION
return action
Figure 2.12 A model-based reflex agent. It keeps track of the current state of the world, using an internal model. It then chooses an action in the same way as the reflex agent.
3

CHAPTER 3
SOLVING PROBLEMS BY SEARCHING
function BEST-FIRST-SEARCH(problem, f ) returns a solution node or failure
node ← NODE(STATE=problem.INITIAL)
frontier ← a priority queue ordered by f , with node as an element
reached ← a lookup table, with one entry with key problem.INITIAL and value node while not IS-EMPTY(frontier) do
node ← POP(frontier)
if problem.IS-GOAL(node.STATE) then return node for each child in EXPAND(problem, node) do
s ←child.STATE
if s is not in reached or child.PATH-COST < reached[s].PATH-COST then reached[s]←child add child to frontier return failure function EXPAND(problem,node) yields nodes s ← node.STATE for each action in problem.ACTIONS(s) do s′ ← problem.RESULT(s, action) cost ←node.PATH-COST + problem.ACTION-COST(s,action,s′) yield NODE(STATE=s′, PARENT=node, ACTION=action, PATH-COST=cost) Figure 3.7 The best-first search algorithm, and the function for expanding a node. The data structures used here are described in Section ??. See Appendix B for yield. 5 function BREADTH-FIRST-SEARCH(problem) returns a solution node or failure node ← NODE(problem.INITIAL) if problem.IS-GOAL(node.STATE) then return node frontier ← a FIFO queue, with node as an element reached ← {problem.INITIAL} while not IS-EMPTY(frontier) do node ← POP(frontier) for each child in EXPAND(problem, node) do s ←child.STATE if problem.IS-GOAL(s) then return child if s is not in reached then add s to reached add child to frontier return failure function UNIFORM-COST-SEARCH(problem) returns a solution node, or failure return BEST-FIRST-SEARCH(problem, PATH-COST) Figure 3.9 Breadth-first search and uniform-cost search algorithms. function ITERATIVE-DEEPENING-SEARCH(problem) returns a solution node or failure fordepth =0to∞do result ←DEPTH-LIMITED-SEARCH(problem,depth) if result ̸= cutoff then return result function DEPTH-LIMITED-SEARCH(problem, l) returns a node or failure or cutoff frontier ← a LIFO queue (stack) with NODE(problem.INITIAL) as an element result ← failure while not IS-EMPTY(frontier) do node ← POP(frontier) if problem.IS-GOAL(node.STATE) then return node if DEPTH(node) > l then
result ← cutoff
else if not IS-CYCLE(node) do
for each child in EXPAND(problem, node) do add child to frontier
return result
Figure 3.12 Iterative deepening and depth-limited tree-like search. Iterative deepening re- peatedly applies depth-limited search with increasing limits. It returns one of three different types of values: either a solution node; or failure, when it has exhausted all nodes and proved there is no solution at any depth; or cutoff , to mean there might be a solution at a deeper depth than l. This is a tree-like search algorithm that does not keep track of reached states, and thus uses much less memory than best-first search, but runs the risk of visiting the same state multiple times on different paths. Also, if the IS-CYCLE check does not check all cycles, then the algorithm may get caught in a loop.

6 Chapter 3 Solving Problems by Searching
function BIBF-SEARCH(problemF , fF , problemB, fB) returns a solution node, or failure nodeF ← NODE(problemF .INITIAL) // Node for a start state
nodeB ←NODE(problemB.INITIAL) //Node for a goal state frontierF ←apriorityqueueorderedbyfF,withnodeF asanelement
frontier B ← a priority queue ordered by fB , with node B as an element reachedF ← a lookup table, with one key nodeF .STATE and value nodeF reachedB ←a lookup table, with one key nodeB.STATE and value nodeB solution ← failure
while not TERMINATED(solution, frontierF , frontierB) do if fF (TOP(frontierF )) < fB(TOP(frontierB)) then solution←PROCEED(F,problemF frontierF,reachedF,reachedB,solution) else solution ← PROCEED(B, problemB, frontierB, reachedB, reachedF , solution) return solution function PROCEED(dir, problem, frontier, reached, reached2, solution) returns a solution // Expand node on frontier; check against the other frontier in reached 2 . // The variable “dir” is the direction: either F for forward or B for backward. node ← POP(frontier) for each child in EXPAND(problem, node) do s ←child.STATE if s not in reached or PATH-COST(child) < PATH-COST(reached[s]) then reached[s]←child add child to frontier if s is in reached2 then solution2 ← JOIN-NODES(dir, child, reached2[s])) if PATH-COST(solution2) < PATH-COST(solution) then solution ← solution2 return solution Figure 3.14 Bidirectional best-first search keeps two frontiers and two tables of reached states. When a path in one frontier reaches a state that was also reached in the other half of the search, the two paths are joined (by the function JOIN-NODES) to form a solution. The first solution we get is not guaranteed to be the best; the function TERMINATED determines when to stop looking for new solutions. 7 function RECURSIVE-BEST-FIRST-SEARCH(problem) returns a solution or failure solution, fvalue ← RBFS(problem, NODE(problem.INITIAL), ∞) return solution function RBFS(problem,node,f limit) returns a solution or failure, and a new f-cost limit if problem.IS-GOAL(node.STATE) then return node successors ← LIST(EXPAND(node)) if successors is empty then return failure, ∞ for each s in successors do // update f with value from previous search s.f ← max(s.PATH-COST + h(s), node.f )) while true do best ← the node in successors with lowest f -value if best.f > f limit then return failure, best.f
alternative ← the second-lowest f -value among successors result,best.f ←RBFS(problem,best,min(f limit,alternative)) if result ̸= failure then return result, best.f
Figure 3.22 The algorithm for recursive best-first search.

CHAPTER 4 SEARCH IN COMPLEX
ENVIRONMENTS
function HILL-CLIMBING(problem) returns a state that is a local maximum current ← problem.INITIAL
whiletrue do
neighbor ← a highest-valued successor state of current
if VALUE(neighbor) ≤ VALUE(current) then return current current ← neighbor
Figure 4.2 The hill-climbing search algorithm, which is the most basic local search tech- nique. At each step the current node is replaced by the best neighbor.
function SIMULATED-ANNEALING(problem,schedule) returns a solution state current ← problem.INITIAL
fort =1to∞do
T ←schedule(t)
if T = 0 then return current
next ← a randomly selected successor of current ∆E ← VALUE(current) – VALUE(next)
if ∆E > 0 then current ← next
else current ← next only with probability e−∆E/T
Figure 4.4 The simulated annealing algorithm, a version of stochastic hill climbing where some downhill moves are allowed. The schedule input determines the value of the “temper- ature” T as a function of time.

9
function GENETIC-ALGORITHM(population,fitness) returns an individual repeat
weights ← WEIGHTED-BY(population, fitness) population2 ← empty list
for i = 1 to SIZE(population) do
parent1, parent2 ←WEIGHTED-RANDOM-CHOICES(population,weights,2) child ← REPRODUCE(parent1 , parent2 )
if (small random probability) then child ← MUTATE(child)
add child to population2
population ← population2
until some individual is fit enough, or enough time has elapsed return the best individual in population, according to fitness
function REPRODUCE(parent1 , parent2 ) returns an individual
n ← LENGTH(parent1)
c ← random number from 1 to n
return APPEND(SUBSTRING(parent1,1,c),SUBSTRING(parent2,c +1,n))
Figure 4.7 A genetic algorithm. Within the function, population is an ordered list of indi- viduals, weights is a list of corresponding fitness values for each individual, and fitness is a function to compute these values.
function AND-OR-SEARCH(problem) returns a conditional plan, or failure return OR-SEARCH(problem,problem.INITIAL,[])
function OR-SEARCH(problem, state, path) returns a conditional plan, or failure if problem.IS-GOAL(state) then return the empty plan
if IS-CYCLE(path) then return failure
for each action in problem.ACTIONS(state) do
plan←AND-SEARCH(problem,RESULTS(state,action),[state] + path])
if plan ̸= failure then return [action] + plan] return failure
function AND-SEARCH(problem, states, path) returns a conditional plan, or failure for each si in states do
plani ← OR-SEARCH(problem, si, path)
if plani = failure then return failure
return [if s1 then plan1 else if s2 then plan2 else . . . if sn−1 then plann−1 else plann]
Figure 4.10 An algorithm for searching AND–OR graphs generated by nondeterministic en- vironments. A solution is a conditional plan that considers every nondeterministic outcome and makes a plan for each one.

10 Chapter 4 Search in Complex Environments
function ONLINE-DFS-AGENT(problem, s′) returns an action s, a, the previous state and action, initially null
persistent: result, a table mapping (s, a) to s′, initially empty
untried, a table mapping s to a list of untried actions
unbacktracked , a table mapping s to a list of states never backtracked to
if problem.IS-GOAL(s′) then return stop
if s′ is a new state (not in untried) then untried[s′]←problem.ACTIONS(s′) if s is not null then
result[s,a]←s′
add s to the front of unbacktracked[s′] if untried[s′] is empty then
if unbacktracked[s′] is empty then return stop
else a ←an action b such that result[s′,b] = POP(unbacktracked[s′]) else a ← POP(untried[s′])
s ←s′
return a
Figure 4.20 An online search agent that uses depth-first exploration. The agent can safely explore only in state spaces in which every action can be “undone” by some other action.
function LRTA*-AGENT(problem, s′, h) returns an action s, a, the previous state and action, initially null
persistent: result, a table mapping (s, a) to s′, initially empty
H , a table mapping s to a cost estimate, initially empty
if IS-GOAL(s′) then return stop
if s′ is a new state (not in H) then H[s′]←h(s′) if s is not null then
result[s,a]←s′
H [s] ← min LRTA*-COST(s, b, result[s, b], H )
b∈ACTIONS(s)
a ← argmin LRTA*-COST(problem, s′,b,result[s′,b],H)
b∈ACTIONS(s) s ←s′
return a
function LRTA*-COST(problem,s,a,s′,H) returns a cost estimate if s′ is undefined then return h(s)
else return problem.ACTION-COST(s, a, s′) + H[s′]
Figure 4.23 LRTA∗-AGENT selects an action according to the values of neighboring states, which are updated as the agent moves about the state space.

CHAPTER 5
ADVERSARIAL SEARCH AND GAMES
function MINIMAX-SEARCH(game, state) returns an action player ← game.TO-MOVE(state)
value, move ← MAX-VALUE(game, state)
return move
function MAX-VALUE(game, state) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v ← −∞
for each a in game.ACTIONS(state) do
v2 , a2 ← MIN-VALUE(game, game.RESULT(state, a)) ifv2 >v then
v , move ← v2 , a return v, move
function MIN-VALUE(game, state) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v ← +∞
for each a in game.ACTIONS(state) do
v2 , a2 ← MAX-VALUE(game, game.RESULT(state, a)) ifv2 v then
v, move ←v2, a
α←MAX(α, v)
ifv ≥ βthenreturnv,move
return v, move
function MIN-VALUE(game, state,α,β) returns a (utility, move) pair
if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v ← +∞
for each a in game.ACTIONS(state) do
v2 , a2 ← MAX-VALUE(game, game.RESULT(state, a), α, β) if v2 < v then v, move ←v2, a β ← MIN(β, v) ifv ≤ αthenreturnv,move return v, move Figure 5.7 The alpha–beta search algorithm. Notice that these functions are the same as the MINIMAX-SEARCH functions in Figure ??, except that we maintain bounds in the variables α and β, and use them to cut off search when a value is outside the bounds. function MONTE-CARLO-TREE-SEARCH(state) returns an action tree ← NODE(state) while IS-TIME-REMAINING() do leaf ← SELECT(tree) child ← EXPAND(leaf ) result ← SIMULATE(child) BACK-PROPAGATE(result, child) return the move in ACTIONS(state) whose node has highest number of playouts Figure 5.11 The Monte Carlo tree search algorithm. A game tree, tree, is initialized, and then we repeat a cycle of SELECT / EXPAND / SIMULATE / BACK-PROPAGATE until we run out of time, and return the move that led to the node with the highest number of playouts. CHAPTER 6 CONSTRAINT SATISFACTION PROBLEMS function AC-3(csp) returns false if an inconsistency is found and true otherwise queue ← a queue of arcs, initially all the arcs in csp while queue is not empty do (Xi, Xj)←POP(queue) if REVISE(csp, Xi, Xj) then if size of Di = 0 then return false for each Xk in Xi.NEIGHBORS - {Xj} do add (Xk, Xi) to queue return true function REVISE(csp, Xi, Xj) returns true iff we revise the domain of Xi revised ← false for each x in Di do if no value y in Dj allows (x,y) to satisfy the constraint between Xi and Xj then delete x from Di revised ← true return revised Figure 6.3 The arc-consistency algorithm AC-3. After applying AC-3, either every arc is arc-consistent, or some variable has an empty domain, indicating that the CSP cannot be solved. The name “AC-3” was used by the algorithm’s inventor (?) because it was the third version developed in the paper. 14 Chapter 6 Constraint Satisfaction Problems function BACKTRACKING-SEARCH(csp) returns a solution or failure return BACKTRACK(csp, { }) function BACKTRACK(csp,assignment) returns a solution or failure if assignment is complete then return assignment var ←SELECT-UNASSIGNED-VARIABLE(csp,assignment) for each value in ORDER-DOMAIN-VALUES(csp,var,assignment) do ifvalueisconsistentwithassignment then add {var = value} to assignment inferences ←INFERENCE(csp,var,assignment) if inferences ̸= failure then add inferences to csp result ←BACKTRACK(csp,assignment) if result ̸= failure then return result remove inferences from csp remove {var = value} from assignment return failure Figure 6.5 A simple backtracking algorithm for constraint satisfaction problems. The algorithm is modeled on the recursive depth-first search of Chapter ??. The functions SELECT-UNASSIGNED-VARIABLE and ORDER-DOMAIN-VALUES, implement the general- purpose heuristics discussed in Section ??. The INFERENCE function can optionally im- pose arc-, path-, or k-consistency, as desired. If a value choice leads to failure (noticed either by INFERENCE or by BACKTRACK), then value assignments (including those made by INFERENCE) are retracted and a new value is tried. function MIN-CONFLICTS(csp,max steps) returns a solution or failure inputs: csp, a constraint satisfaction problem max steps, the number of steps allowed before giving up current ← an initial complete assignment for csp fori =1tomax steps do if current is a solution for csp then return current var ← a randomly chosen conflicted variable from csp.VARIABLES value ←the value v for var that minimizes CONFLICTS(csp,var,v,current) set var = value in current return failure Figure 6.9 The MIN-CONFLICTS local search algorithm for CSPs. The initial state may be chosen randomly or by a greedy assignment process that chooses a minimal-conflict value for each variable in turn. The CONFLICTS function counts the number of constraints violated by a particular value, given the rest of the current assignment. function TREE-CSP-SOLVER(csp) returns a solution, or failure inputs: csp, a CSP with components X, D, C n ← number of variables in X assignment ← an empty assignment root ← any variable in X X ←TOPOLOGICALSORT(X,root) forj =n downto2do MAKE-ARC-CONSISTENT(PARENT(Xj),Xj) if it cannot be made consistent then return failure for i = 1 to n do assignment [Xi ] ← any consistent value from Di if there is no consistent value then return failure return assignment Figure 6.11 The TREE-CSP-SOLVER algorithm for solving tree-structured CSPs. If the CSP has a solution, we will find it in linear time; if not, we will detect a contradiction. 15 CHAPTER 7 LOGICAL AGENTS function KB-AGENT(percept) returns an action persistent: KB, a knowledge base t, a counter, initially 0, indicating time TELL(KB,MAKE-PERCEPT-SENTENCE(percept,t)) action ← ASK(KB, MAKE-ACTION-QUERY(t)) TELL(KB,MAKE-ACTION-SENTENCE(action,t)) t←t+1 return action Figure 7.1 A generic knowledge-based agent. Given a percept, the agent adds the percept to its knowledge base, asks the knowledge base for the best action, and tells the knowledge base that it has in fact taken that action. 17 function TT-ENTAILS?(KB,α) returns true or false inputs: KB , the knowledge base, a sentence in propositional logic α, the query, a sentence in propositional logic symbols ← a list of the proposition symbols in KB and α return TT-CHECK-ALL(KB,α,symbols,{}) function TT-CHECK-ALL(KB,α,symbols,model) returns true or false if EMPTY?(symbols) then if PL-TRUE?(KB,model) then return PL-TRUE?(α,model) else return true // when KB is false, always return true else P ← FIRST(symbols) rest ← REST(symbols) return (TT-CHECK-ALL(KB,α,rest,model ∪ {P = true}) and TT-CHECK-ALL(KB,α,rest,model ∪ {P = false })) Figure 7.10 A truth-table enumeration algorithm for deciding propositional entailment. (TT stands for truth table.) PL-TRUE? returns true if a sentence holds within a model. The variable model represents a partial model—an assignment to some of the symbols. The key- word and here is an infix function symbol in the pseudocode programming language, not an operator in proposition logic; it takes two arguments and returns true or false. function PL-RESOLUTION(KB, α) returns true or false inputs: KB , the knowledge base, a sentence in propositional logic α, the query, a sentence in propositional logic clauses ← the set of clauses in the CNF representation of KB ∧ ¬α new ← { } whiletrue do for each pair of clauses Ci, Cj in clauses do resolvents ← PL-RESOLVE(Ci, Cj ) if resolvents contains the empty clause then return true new ← new ∪ resolvents if new ⊆ clauses then return false clauses ← clauses ∪ new Figure 7.13 A simple resolution algorithm for propositional logic. PL-RESOLVE returns the set of all possible clauses obtained by resolving its two inputs. 18 Chapter 7 Logical Agents function PL-FC-ENTAILS?(KB,q) returns true or false inputs: KB , the knowledge base, a set of propositional definite clauses q, the query, a proposition symbol count ← a table, where count[c] is initially the number of symbols in clause c’s premise inferred ← a table, where inferred [s ] is initially false for all symbols queue ← a queue of symbols, initially symbols known to be true in KB while queue is not empty do p ← POP(queue) if p = q then return true if inferred[p] = false then inferred[p]←true for each clause c in KB where p is in c.PREMISE do decrement count[c] if count[c] = 0 then add c.CONCLUSION to queue return false Figure 7.15 The forward-chaining algorithm for propositional logic. The agenda keeps track of symbols known to be true but not yet “processed.” The count table keeps track of how many premises of each implication are not yet proven. Whenever a new symbol p from the agenda is processed, the count is reduced by one for each implication in whose premise p appears (easily identified in constant time with appropriate indexing.) If a count reaches zero, all the premises of the implication are known, so its conclusion can be added to the agenda. Finally, we need to keep track of which symbols have been processed; a symbol that is already in the set of inferred symbols need not be added to the agenda again. This avoids redundant work and prevents loops caused by implications such as P ⇒ Q and Q ⇒ P . function DPLL-SATISFIABLE?(s) returns true or false inputs: s, a sentence in propositional logic clauses ← the set of clauses in the CNF representation of s symbols ← a list of the proposition symbols in s return DPLL(clauses, symbols, { }) function DPLL(clauses,symbols,model) returns true or false if every clause in clauses is true in model then return true if some clause in clauses is false in model then return false P,value ←FIND-PURE-SYMBOL(symbols,clauses,model) if P is non-null then return DPLL(clauses,symbols – P,model ∪ {P=value}) P,value ←FIND-UNIT-CLAUSE(clauses,model) if P is non-null then return DPLL(clauses,symbols – P,model ∪ {P=value}) P ← FIRST(symbols); rest ← REST(symbols) return DPLL(clauses,rest,model ∪ {P=true}) or DPLL(clauses,rest,model ∪ {P=false})) Figure 7.17 The DPLL algorithm for checking satisfiability of a sentence in propositional logic. The ideas behind FIND-PURE-SYMBOL and FIND-UNIT-CLAUSE are described in the text; each returns a symbol (or null) and the truth value to assign to that symbol. Like TT-ENTAILS?, DPLL operates over partial models. function WALKSAT(clauses,p,max flips) returns a satisfying model or failure inputs: clauses , a set of clauses in propositional logic p, the probability of choosing to do a “random walk” move, typically around 0.5 max flips, number of value flips allowed before giving up model ← a random assignment of true/false to the symbols in clauses foreachi= 1tomax flips do if model satisfies clauses then return model clause ← a randomly selected clause from clauses that is false in model if RANDOM(0, 1) ≤ p then flip the value in model of a randomly selected symbol from clause else flip whichever symbol in clause maximizes the number of satisfied clauses return failure Figure 7.18 The WALKSAT algorithm for checking satisfiability by randomly flipping the values of variables. Many versions of the algorithm exist. 19 20 Chapter 7 Logical Agents function HYBRID-WUMPUS-AGENT(percept) returns an action inputs: percept, a list, [stench,breeze,glitter,bump,scream] persistent: KB , a knowledge base, initially the atemporal “wumpus physics” t, a counter, initially 0, indicating time plan, an action sequence, initially empty TELL(KB,MAKE-PERCEPT-SENTENCE(percept,t)) TELL the KB the temporal “physics” sentences for time t safe ←{[x,y] : ASK(KB,OKtx,y) = true} if ASK(KB,Glittert) = true then plan ← [Grab] + PLAN-ROUTE(current, {[1,1]}, safe) + [Climb] if plan is empty then unvisited←{[x,y]:ASK(KB,Lt′ )=falseforallt′≤t} x,y plan ← PLAN-ROUTE(current, unvisited ∩ safe, safe) if plan is empty and ASK(KB,HaveArrowt) = true then possible wumpus ←{[x,y] : ASK(KB,¬ Wx,y) = false} plan←PLAN-SHOT(current,possible wumpus,safe) if plan is empty then // no choice but to take a risk not unsafe←{[x,y] : ASK(KB,¬OKtx,y) = false} plan←PLAN-ROUTE(current,unvisited∩not unsafe,safe) if plan is empty then plan ← PLAN-ROUTE(current, {[1, 1]}, safe) + [Climb] action ← POP(plan) TELL(KB,MAKE-ACTION-SENTENCE(action,t)) t←t+1 return action function PLAN-ROUTE(current,goals,allowed) returns an action sequence inputs: current, the agent’s current position goals, a set of squares; try to plan a route to one of them allowed , a set of squares that can form part of the route problem ←ROUTE-PROBLEM(current,goals,allowed) return SEARCH(problem) // Any search algorithm from Chapter ?? Figure 7.20 A hybrid agent program for the wumpus world. It uses a propositional knowl- edge base to infer the state of the world, and a combination of problem-solving search and domain-specific code to choose actions. Each time HYBRID-WUMPUS-AGENT is called, it adds the percept to the knowledge base, and then either relies on a previously-defined plan or creates a new plan, and pops off the first step of the plan as the action to do next. function SATPLAN(init, transition, goal, T max) returns solution or failure inputs: init, transition, goal, constitute a description of the problem T max, an upper limit for plan length fort =0toTmax do cnf ←TRANSLATE-TO-SAT(init, transition, goal,t) model ← SAT-SOLVER(cnf ) if model is not null then return EXTRACT-SOLUTION(model) return failure Figure 7.22 The SATPLAN algorithm. The planning problem is translated into a CNF sen- tence in which the goal is asserted to hold at a fixed time step t and axioms are included for each time step up to t. If the satisfiability algorithm finds a model, then a plan is extracted by looking at those proposition symbols that refer to actions and are assigned true in the model. If no model exists, then the process is repeated with the goal moved one step later. 21 8 FIRST-ORDER LOGIC CHAPTER CHAPTER 9 INFERENCE IN FIRST-ORDER LOGIC function UNIFY(x,y,θ=empty) returns a substitution to make x and y identical, or failure if θ = failure then return failure else if x = y then return θ else if VARIABLE?(x) then return UNIFY-VAR(x,y,θ) else if VARIABLE?(y) then return UNIFY-VAR(y,x,θ) else if COMPOUND?(x) and COMPOUND?(y) then return UNIFY(ARGS(x),ARGS(y),UNIFY(OP(x),OP(y),θ)) else if LIST?(x) and LIST?(y) then return UNIFY(REST(x),REST(y),UNIFY(FIRST(x),FIRST(y),θ)) else return failure function UNIFY-VAR(var,x,θ) returns a substitution if {var/val} ∈ θ for some val then return UNIFY(val,x,θ) else if {x/val} ∈ θ for some val then return UNIFY(var,val,θ) else if OCCUR-CHECK?(var,x) then return failure else return add {var /x } to θ Figure 9.1 The unification algorithm. The arguments x and y can be any expression: a constant or variable, or a compound expression such as a complex sentence or term, or a list of expressions. The argument θ is a substitution, initially the empty substitution, but with {var/val} pairs added to it as we recurse through the inputs, comparing the expressions element by element. In a compound expression such as F (A, B), OP(x ) field picks out the function symbol F and ARGS(x ) field picks out the argument list (A, B). 24 Chapter 9 Inference in First-Order Logic function FOL-FC-ASK(KB,α) returns a substitution or false inputs: KB , the knowledge base, a set of first-order definite clauses α, the query, an atomic sentence while true do new ← { } // The set of new sentences inferred on each iteration for each rule in KB do (p1 ∧ . . . ∧ pn ⇒ q) ← STANDARDIZE-VARIABLES(rule) for each θ such that SUBST(θ,p1 ∧ ... ∧ pn) = SUBST(θ,p1′ ∧ ... ∧ pn′ ) for some p1′ ,...,pn′ in KB q′ ←SUBST(θ,q) if q′ does not unify with some sentence already in KB or new then add q′ to new φ←UNIFY(q′,α) if φ is not failure then return φ if new = { } then return false add new to KB Figure 9.3 A conceptually straightforward, but inefficient, forward-chaining algorithm. On each iteration, it adds to KB all the atomic sentences that can be inferred in one step from the implication sentences and the atomic sentences already in KB. The function STANDARDIZE-VARIABLES replaces all variables in its arguments with new ones that have not been used before. function FOL-BC-ASK(KB,query) returns a generator of substitutions return FOL-BC-OR(KB,query,{}) function FOL-BC-OR(KB,goal,θ) returns a substitution for each rule in FETCH-RULES-FOR-GOAL(KB, goal) do (lhs ⇒ rhs)←STANDARDIZE-VARIABLES(rule) for each θ′ in FOL-BC-AND(KB,lhs,UNIFY(rhs, goal, θ)) do yield θ′ function FOL-BC-AND(KB,goals,θ) returns a substitution if θ = failure then return else if LENGTH(goals) = 0 then yield θ else first,rest ← FIRST(goals), REST(goals) for each θ′ in FOL-BC-OR(KB, SUBST(θ, first), θ) do for each θ′′ in FOL-BC-AND(KB,rest,θ′) do yield θ′′ Figure 9.6 A simple backward-chaining algorithm for first-order knowledge bases. procedure APPEND(ax,y,az,continuation) trail ← GLOBAL-TRAIL-POINTER() if ax = [ ] and UNIFY(y, az ) then CALL(continuation) RESET-TRAIL(trail) a,x,z ←NEW-VARIABLE(), NEW-VARIABLE(), NEW-VARIABLE() if UNIFY(ax,[a] + x) and UNIFY(az,[a | z]) then APPEND(x,y,z,continuation) Figure 9.8 Pseudocode representing the result of compiling the Append predicate. The function NEW-VARIABLE returns a new variable, distinct from all other variables used so far. The procedure CALL(continuation) continues execution with the specified continuation. 25 10 KNOWLEDGE REPRESENTATION CHAPTER 11 AUTOMATED PLANNING Init(At(C1, SFO) ∧ At(C2, JFK) ∧ At(P1, SFO) ∧ At(P2, JFK) ∧ Cargo(C1) ∧ Cargo(C2) ∧ Plane(P1) ∧ Plane(P2) ∧ Airport(JFK) ∧ Airport(SFO)) Goal(At(C1, JFK) ∧ At(C2, SFO)) Action(Load(c, p, a), PRECOND: At(c, a) ∧ At(p, a) ∧ Cargo(c) ∧ Plane(p) ∧ Airport(a) EFFECT: ¬ At(c, a) ∧ In(c, p)) Action(Unload(c, p, a), PRECOND: In(c, p) ∧ At(p, a) ∧ Cargo(c) ∧ Plane(p) ∧ Airport(a) EFFECT: At(c, a) ∧ ¬ In(c, p)) Action(Fly(p, from, to), PRECOND: At(p, from) ∧ Plane(p) ∧ Airport(from) ∧ Airport(to) EFFECT: ¬ At(p, from) ∧ At(p, to)) Figure 11.1 A PDDL description of an air cargo transportation planning problem. Init(Tire(Flat) ∧ Tire(Spare) ∧ At(Flat,Axle) ∧ At(Spare,Trunk)) Goal (At (Spare , Axle )) Action(Remove(obj , loc), PRECOND: At(obj,loc) EFFECT: ¬ At(obj,loc) ∧ At(obj,Ground)) Action(PutOn(t, Axle), PRECOND: Tire(t) ∧ At(t, Ground) ∧ ¬ At(Flat, Axle) ∧ ¬ At(Spare, Axle) EFFECT: ¬ At(t, Ground) ∧ At(t, Axle)) Action (LeaveOvernight , PRECOND: EFFECT: ¬ At(Spare, Ground) ∧ ¬ At(Spare, Axle) ∧ ¬ At(Spare, Trunk) ∧ ¬ At(Flat,Ground) ∧ ¬ At(Flat,Axle) ∧ ¬ At(Flat, Trunk)) Figure 11.2 The simple spare tire problem. CHAPTER 28 Chapter 11 Automated Planning Init(On(A,Table) ∧ On(B,Table) ∧ On(C,A) ∧ Block(A) ∧ Block(B) ∧ Block(C) ∧ Clear(B) ∧ Clear(C) ∧ Clear(Table)) Goal(On(A,B) ∧ On(B,C)) Action(Move(b, x, y), PRECOND: On(b,x) ∧ Clear(b) ∧ Clear(y) ∧ Block(b) ∧ Block(y) ∧ (b̸=x) ∧ (b̸=y) ∧ (x̸=y), EFFECT:On(b,y) ∧ Clear(x) ∧ ¬On(b,x) ∧ ¬Clear(y)) Action (MoveToTable (b, x), PRECOND: On(b,x) ∧ Clear(b) ∧ Block(b) ∧ Block(x), EFFECT: On(b,Table) ∧ Clear(x) ∧ ¬On(b,x)) Figure 11.4 A planning problem in the blocks world: building a three-block tower. One solution is the sequence [MoveToTable(C, A), Move(B, Table, C), Move(A, Table, B)]. Refinement (Go (Home , SFO ), STEPS: [Drive(Home, SFOLongTermParking), Shuttle (SFOLongTermParking , SFO )] ) Refinement (Go (Home , SFO ), STEPS: [Taxi(Home,SFO)] ) Refinement(Navigate([a, b], [x, y]), PRECOND: a=x ∧ b=y STEPS: [ ] ) Refinement(Navigate([a, b], [x, y]), PRECOND:Connected([a, b], [a − 1, b]) STEPS: [Left, Navigate([a − 1, b], [x, y])] ) Refinement(Navigate([a, b], [x, y]), PRECOND:Connected([a, b], [a + 1, b]) STEPS: [Right, Navigate([a + 1, b], [x, y])] ) ... Figure 11.7 Definitions of possible refinements for two high-level actions: going to San Francisco airport and navigating in the vacuum world. In the latter case, note the recursive nature of the refinements and the use of preconditions. function HIERARCHICAL-SEARCH(problem,hierarchy) returns a solution or failure frontier ← a FIFO queue with [Act] as the only element while true do if IS-EMPTY(frontier) then return failure plan ←POP(frontier) //chooses the shallowest plan in frontier hla ← the first HLA in plan, or null if none prefix ,suffix ← the action subsequences before and after hla in plan outcome ← RESULT(problem.INITIAL, prefix) if hla is null then // so plan is primitive and outcome is its result if problem.IS-GOAL(outcome) then return plan else for each sequence in REFINEMENTS(hla,outcome,hierarchy) do add APPEND(prefix,sequence,suffix) to frontier Figure 11.8 A breadth-first implementation of hierarchical forward planning search. The initial plan supplied to the algorithm is [Act]. The REFINEMENTS function returns a set of action sequences, one for each refinement of the HLA whose preconditions are satisfied by the specified state, outcome. 29 30 Chapter 11 Automated Planning function ANGELIC-SEARCH(problem,hierarchy,initialPlan) returns solution or fail frontier ← a FIFO queue with initialPlan as the only element while true do if EMPTY?(frontier) then return fail plan ←POP(frontier) //chooses the shallowest node in frontier if REACH+(problem.INITIAL,plan) intersects problem.GOAL then if plan is primitive then return plan // REACH+ is exact for primitive plans guaranteed←REACH−(problem.INITIAL,plan) ∩ problem.GOAL if guaranteed̸={ } and MAKING-PROGRESS(plan, initialPlan) then finalState ← any element of guaranteed return DECOMPOSE(hierarchy,problem.INITIAL,plan,finalState) hla ← some HLA in plan prefix ,suffix ← the action subsequences before and after hla in plan outcome ← RESULT(problem.INITIAL, prefix) for each sequence in REFINEMENTS(hla,outcome,hierarchy) do frontier ←Insert(APPEND(prefix,sequence,suffix),frontier) function DECOMPOSE(hierarchy,s0 ,plan,sf ) returns a solution solution ← an empty plan while plan is not empty do action ← REMOVE-LAST(plan) si ←a state in REACH−(s0 ,plan) such that sf ∈REACH−(si,action) problem ← a problem with INITIAL = si and GOAL = sf solution ←APPEND(ANGELIC-SEARCH(problem,hierarchy,action),solution) sf ←si return solution Figure 11.11 A hierarchical planning algorithm that uses angelic semantics to identify and commit to high-level plans that work while avoiding high-level plans that don’t. The predi- cate MAKING-PROGRESS checks to make sure that we aren’t stuck in an infinite regression of refinements. At top level, call ANGELIC-SEARCH with [Act] as the initialPlan. Jobs ({AddEngine1 ≺ AddWheels1 ≺ Inspect1 }, {AddEngine2 ≺ AddWheels2 ≺ Inspect2 }) Resources(EngineHoists(1), WheelStations(1), Inspectors(e2), LugNuts(500)) Action(AddEngine1, DURATION:30, USE:EngineHoists(1)) Action(AddEngine2, DURATION:60, USE:EngineHoists(1)) Action(AddWheels1, DURATION:30, CONSUME:LugNuts(20), USE:WheelStations(1)) Action(AddWheels2, DURATION:15, CONSUME:LugNuts(20), USE:WheelStations(1)) Action(Inspecti, DURATION:10, USE:Inspectors(1)) Figure 11.13 A job-shop scheduling problem for assembling two cars, with resource con- straints. The notation A ≺ B means that action A must precede action B. 31 12 QUANTIFYING UNCERTAINTY function DT-AGENT(percept) returns an action persistent: belief state, probabilistic beliefs about the current state of the world action, the agent’s action update belief state based on action and percept calculate outcome probabilities for actions, given action descriptions and current belief state select action with highest expected utility given probabilities of outcomes and utility information return action Figure 12.1 A decision-theoretic agent that selects rational actions. CHAPTER 13 PROBABILISTIC REASONING function ENUMERATION-ASK(X , e, bn) returns a distribution over X inputs: X , the query variable e, observed values for variables E bn, a Bayes net with variables vars Q(X ) ← a distribution over X , initially empty for each value xi of X do Q(xi) ← ENUMERATE-ALL(vars, exi ) where exi is e extended with X = xi return NORMALIZE(Q(X)) function ENUMERATE-ALL(vars, e) returns a real number if EMPTY?(vars) then return 1.0 V ← FIRST(vars) if V is an evidence variable with value v in e thenreturnP(v|parents(V)) × ENUMERATE-ALL(REST(vars),e) elsereturn􏲹v P(v|parents(V)) × ENUMERATE-ALL(REST(vars),ev) where ev is e extended with V = v Figure 13.11 The enumeration algorithm for exact inference in Bayes nets. function ELIMINATION-ASK(X , e, bn) returns a distribution over X inputs: X , the query variable e, observed values for variables E bn, a Bayesian network with variables vars factors ← [ ] for each V in ORDER(vars) do factors←[MAKE-FACTOR(V,e)] + factors if V is a hidden variable then factors ←SUM-OUT(V,factors) return NORMALIZE(POINTWISE-PRODUCT(factors)) Figure 13.13 The variable elimination algorithm for exact inference in Bayes nets. CHAPTER 34 Chapter 13 Probabilistic Reasoning function PRIOR-SAMPLE(bn) returns an event sampled from the prior specified by bn inputs: bn , a Bayesian network specifying joint distribution P(X1 , . . . , Xn ) x ← an event with n elements for each variable Xi in X1,...,Xn do x[i] ← a random sample from P(Xi | parents(Xi)) return x Figure 13.16 A sampling algorithm that generates events from a Bayesian network. Each variable is sampled according to the conditional distribution given the values already sampled for the variable’s parents. function REJECTION-SAMPLING(X , e, bn, N ) returns an estimate of P(X | e) inputs: X , the query variable e, observed values for variables E bn, a Bayesian network N , the total number of samples to be generated local variables: C, a vector of counts for each value of X , initially zero for j = 1 to N do x ← PRIOR-SAMPLE(bn) if x is consistent with e then C[j]←C[j]+1 where xj is the value of X in x return NORMALIZE(C) Figure 13.17 The rejection-sampling algorithm for answering queries given evidence in a Bayesian network. 35 function LIKELIHOOD-WEIGHTING(X , e, bn, N ) returns an estimate of P(X | e) inputs: X , the query variable e, observed values for variables E bn, a Bayesian network specifying joint distribution P(X1, . . . , Xn) N , the total number of samples to be generated local variables: W, a vector of weighted counts for each value of X , initially zero for j = 1 to N do x, w ← WEIGHTED-SAMPLE(bn, e) W[j]←W[j]+w wherexj isthevalueofX inx return NORMALIZE(W) function WEIGHTED-SAMPLE(bn, e) returns an event and a weight w ← 1; x ← an event with n elements, with values fixed from e fori =1tondo if Xi is an evidence variable with value xij in e then w ←w × P(Xi = xij |parents(Xi)) else x[i] ← a random sample from P(Xi | parents(Xi )) return x, w Figure 13.18 The likelihood-weighting algorithm for inference in Bayesian networks. In WEIGHTED-SAMPLE, each nonevidence variable is sampled according to the conditional distribution given the values already sampled for the variable’s parents, while a weight is accumulated based on the likelihood for each evidence variable. function GIBBS-ASK(X , e, bn, N ) returns an estimate of P(X | e) local variables: C, a vector of counts for each value of X , initially zero Z, the nonevidence variables in bn x, the current state of the network, initialized from e initialize x with random values for the variables in Z fork =1toN do choose any variable Zi from Z according to any distribution ρ(i) set the value of Zi in x by sampling from P(Zi | mb(Zi)) C[j]←C[j]+1wherexj isthevalueofX inx return NORMALIZE(C) Figure 13.20 The Gibbs sampling algorithm for approximate inference in Bayes nets; this version chooses variables at random, but cycling through the variables but also works. 14 PROBABILISTIC REASONING OVER TIME function FORWARD-BACKWARD(ev, prior ) returns a vector of probability distributions inputs: ev, a vector of evidence values for steps 1, . . . , t prior, the prior distribution on the initial state, P(X0) local variables: fv, a vector of forward messages for steps 0, . . . , t b, a representation of the backward message, initially all 1s sv, a vector of smoothed estimates for steps 1, . . . , t fv[0] ← prior for i = 1 to t do fv[i] ← FORWARD(fv[i − 1], ev[i]) fori= tdownto1do sv[i] ← NORMALIZE(fv[i] × b) b ← BACKWARD(b, ev[i]) return sv Figure 14.4 The forward–backward algorithm for smoothing: computing posterior prob- abilities of a sequence of states given a sequence of observations. The FORWARD and BACKWARD operators are defined by Equations (??) and (??), respectively. CHAPTER function FIXED-LAG-SMOOTHING(et,hmm,d) returns a distribution over Xt−d inputs: et, the current evidence for time step t hmm, a hidden Markov model with S × S transition matrix T d, the length of the lag for smoothing persistent: t, the current time, initially 1 f, the forward message P(Xt | e1:t), initially hmm.PRIOR B, the d-step backward transformation matrix, initially the identity matrix et−d:t, double-ended list of evidence from t − d to t, initially empty local variables: Ot−d , Ot , diagonal matrices containing the sensor model information add et to the end of et−d:t Ot ← diagonal matrix containing P(et | Xt) if t > d then
f ← FORWARD(f, et−d)
remove et−d−1 from the beginning of et−d:t
Ot−d ← diagonal matrix containing P(et−d | Xt−d )
B ← O−1 T−1BTOt t−d
else B ← BTOt
t←t+1
if t > d + 1 then return NORMALIZE(f × B1) else return null
Figure 14.6 An algorithm for smoothing with a fixed time lag of d steps, implemented as an online algorithm that outputs the new smoothed estimate given the observation for a new time step. Notice that the final output NORMALIZE(f × B1) is just α f × b, by Equation (??).
function PARTICLE-FILTERING(e, N , dbn) returns a set of samples for the next time step inputs: e, the new incoming evidence
N , the number of samples to be maintained
dbn, a DBN defined by P(X0), P(X1 | X0), and P(E1 | X1) persistent: S, a vector of samples of size N, initially generated from P(X0) local variables: W , a vector of weights of size N
37
for i = 1 to N do
S[i]←sample from P(X1 |X0 = S[i]) //step 1 W[i]←P(e|X1 = S[i]) //step 2
S ← WEIGHTED-SAMPLE-WITH-REPLACEMENT(N , S , W ) return S
// step 3
Figure 14.17 The particle filtering algorithm implemented as a recursive update oper- ation with state (the set of samples). Each of the sampling operations involves sam- pling the relevant slice variables in topological order, much as in PRIOR-SAMPLE. The WEIGHTED-SAMPLE-WITH-REPLACEMENT operation can be implemented to run in O(N) expected time. The step numbers refer to the description in the text.

15
PROBABILISTIC PROGRAMMING
CHAPTER
type Researcher, Paper, Citation random String Name(Researcher) random String Title(Paper)
random Paper PubCited(Citation) random String Text(Citation)
random Boolean Professor(Researcher) origin Researcher Author(Paper)
#Researcher ∼ OM(3,1)
Name(r) ∼ NamePrior()
Professor(r) ∼ Boolean(0.2)
#Paper (Author = r) ∼ if Professor (r) then OM (1.5, 0.5) else OM (1, 0.5) Title(p) ∼ PaperTitlePrior()
CitedPaper(c) ∼ UniformChoice({Paper p})
Text(c) ∼ HMMGrammar(Name(Author(CitedPaper(c))),Title(CitedPaper(c)))
Figure15.5 AnOUPMforcitationinformationextraction.Forsimplicitythemodelassumes one author per paper and omits details of the grammar and error models.

39
#SeismicEvents ∼ Poisson(T ∗ λe)
Time(e) ∼ UniformReal(0,T)
EarthQuake(e) ∼ Boolean(0.999)
Location(e) ∼ if Earthquake(e) then SpatialPrior() else UniformEarth() Depth (e) ∼ if Earthquake (e) then UniformReal (0, 700) else Exactly (0) Magnitude(e) ∼ Exponential(log(10))
Detected(e,p,s) ∼ Logistic(weights(s,p),Magnitude(e), Depth(e), Dist(e,s)) #Detections(site = s) ∼ Poisson(T ∗ λf (s))
#Detections(event=e, phase=p, station=s) = ifDetected(e,p,s)then1else0 OnsetTime(a,s) if (event(a) = null) then ∼ UniformReal(0,T)
else=Time(event(a)) + GeoTT(Dist(event(a),s),Depth(event(a)),phase(a)) + Laplace(μt(s), σt(s))
Amplitude(a,s) if (event(a) = null) then ∼ NoiseAmpModel(s)
else = AmpModel (Magnitude (event (a)), Dist (event (a), s), Depth (event (a)), phase (a))
Azimuth(a, s) if (event(a) = null) then ∼ UniformReal(0, 360)
else = GeoAzimuth(Location(event(a)),Depth(event(a)),phase(a),Site(s))
+ Laplace (0, σa (s)) Slowness(a,s)if(event(a)=null)then ∼ UniformReal(0,20)
else = GeoSlowness(Location(event(a)),Depth(event(a)),phase(a),Site(s)) + Laplace(0,σs(s))
ObservedPhase(a,s) ∼ CategoricalPhaseModel(phase(a)) Figure 15.6 A simplified version of the NET-VISA model (see text).
#Aircraft(EntryTime =t) ∼ Poisson(λa)
Exits(a,t) ∼ if InFlight(a,t) then Boolean(αe)
InFlight(a,t) = (t=EntryTime(a)) ∨ (InFlight(a,t−1) ∧ ¬Exits(a,t−1)) X(a,t) ∼ ift = EntryTime(a)thenInitX()
else if InFlight (a, t) then N (F X (a, t − 1), Σx )
#Blip(Source=a, Time=t) ∼ if InFlight(a,t) then Bernoulli(DetectionProb(X(a,t))) #Blip(Time=t) ∼ Poisson(λf )
Z(b) ∼ if Source(b)=null then UniformZ(R) else N(HX(Source(b),Time(b)),Σz)
Figure 15.9 An OUPM for radar tracking of multiple targets with false alarms, detection failure, and entry and exit of aircraft. The rate at which new aircraft enter the scene is λa, while the probability per time step that an aircraft exits the scene is αe. False alarm blips (i.e., ones not produced by an aircraft) appear uniformly in space at a rate of λf per time step. The probability that an aircraft is detected (i.e., produces a blip) depends on its current position.

40 Chapter 15 Probabilistic Programming
function GENERATE-IMAGE() returns an image with some letters letters ← GENERATE-LETTERS(10)
return RENDER-NOISY-IMAGE(letters, 32, 128)
function GENERATE-LETTERS(λ) returns a vector of letters n ∼ Poisson(λ)
letters ← [ ]
for i = 1 to n do
letters[i] ∼ UniformChoice({a,b,c,···}) return letters
function RENDER-NOISY-IMAGE(letters,width,height) returns a noisy image of the letters clean image←RENDER(letters,width,height,text top=10,text left=10)
noisy image←[]
noise variance ∼ UniformReal(0.1, 1)
for row = 1 to width do for col = 1 to height do
noisy image[row,col] ∼ N(clean image[row,col],noise variance) return noisy image
Figure15.11 Generativeprogramforanopen-universeprobabilitymodelforopticalcharac- ter recognition. The generative program produces degraded images containing sequences of letters by generating each sequence, rendering it into a 2D image, and incorporating additive noise at each pixel.
function GENERATE-MARKOV-LETTERS(λ) returns a vector of letters n ∼ Poisson(λ)
letters ← [ ]
letter probs ← MARKOV-INITIAL()
for i = 1 to n do
letters[i] ∼ Categorical(letter probs)
letter probs ← MARKOV-TRANSITION(letters[i])
return letters
Figure 15.15 Generative program for an improved optical character recognition model that generates letters according to a letter bigram model whose pairwise letter frequencies are estimated from a list of English words.

16
MAKING SIMPLE DECISIONS
function INFORMATION-GATHERING-AGENT(percept) returns an action persistent: D, a decision network
integrate percept into D
j ← the value that maximizes VPI (Ej ) / C (Ej ) if VPI(Ej) > C(Ej)
then return Request(Ej) else return the best action from D
Figure 16.9 Design of a simple, myopic information-gathering agent. The agent works by repeatedly selecting the observation with the highest information value, until the cost of the next observation is greater than its expected benefit.
CHAPTER

17
MAKING COMPLEX DECISIONS
function VALUE-ITERATION(mdp, ǫ) returns a utility function
inputs: mdp , an MDP with states S , actions A(s), transition model P (s′ | s, a),
rewards R(s, a, s′), discount γ
ǫ, the maximum error allowed in the utility of any state
local variables: U , U ′, vectors of utilities for states in S , initially zero
δ, the maximum relative change in the utility of any state
repeat
U ← U ′; δ ← 0
for each state s in S do
U ′[s] ← maxa ∈ A(s) Q-VALUE(mdp, s, a, U )
if|U′[s] − U[s]| > δthenδ←|U′[s] − U[s]| untilδ ≤ ǫ(1−γ)/γ
return U
Figure 17.6 The value iteration algorithm for calculating utilities of states. The termination
condition is from Equation (??).
function POLICY-ITERATION(mdp) returns a policy
inputs: mdp , an MDP with states S , actions A(s), transition model P (s′ | s, a) local variables: U , a vector of utilities for states in S , initially zero
π, a policy vector indexed by state, initially random
repeat
U ← POLICY-EVALUATION(π, U , mdp) unchanged ? ← true
for each state s in S do
a∗ ← argmax Q-VALUE(mdp, s, a, U ) a∈A(s)
ifQ-VALUE(mdp,s,a∗,U) > Q-VALUE(mdp,s,π[s],U)then π[s]←a∗; unchanged?←false
until unchanged? return π
Figure 17.9 The policy iteration algorithm for calculating an optimal policy.
CHAPTER

function POMDP-VALUE-ITERATION(pomdp, ǫ) returns a utility function
inputs: pomdp , a POMDP with states S , actions A(s), transition model P (s′ | s, a),
sensor model P (e | s), rewards R(s), discount γ
ǫ, the maximum error allowed in the utility of any state
local variables: U , U ′, sets of plans p with associated utility vectors αp
U ′ ← a set containing just the empty plan [ ], with α[ ](s) = R(s) repeat
U ←U′
U ′ ← the set of all plans consisting of an action and, for each possible next percept,
a plan in U with utility vectors computed according to Equation (??) U ′ ← REMOVE-DOMINATED-PLANS(U ′)
untilMAX-DIFFERENCE(U,U′) ≤ ǫ(1−γ)/γ return U
Figure 17.16 A high-level sketch of the value iteration algorithm for POMDPs. The REMOVE-DOMINATED-PLANS step and MAX-DIFFERENCE test are typically implemented as linear programs.
43

18
MULTIAGENT DECISION MAKING
Actors(A,B)
Init(At(A,LeftBaseline) ∧ At(B,RightNet) ∧
Approaching(Ball,RightBaseline) ∧ Partner(A,B) ∧ Partner(B,A) Goal(Returned(Ball) ∧ (At(x,RightNet) ∨ At(x,LeftNet)) Action(Hit(actor,Ball),
PRECOND:Approaching(Ball,loc) ∧ At(actor,loc)
EFFECT:Returned(Ball)) Action(Go(actor,to),
PRECOND:At(actor,loc) ∧ to ̸= loc, EFFECT:At(actor,to) ∧ ¬ At(actor,loc))
Figure 18.1 The doubles tennis problem. Two actors, A and B, are playing together and can be in one of four locations: LeftBaseline, RightBaseline, LeftNet, and RightNet. The ball can be returned only if a player is in the right place. The NoOp action is a dummy, which has no effect. Note that each action must include the actor as an argument.
CHAPTER

19
LEARNING FROM EXAMPLES
function LEARN-DECISION-TREE(examples, attributes, parent examples) returns a tree
if examples is empty then return PLURALITY-VALUE(parent examples) else if all examples have the same classification then return the classification else if attributes is empty then return PLURALITY-VALUE(examples)
else
A ← argmaxa ∈ attributes IMPORTANCE(a, examples) tree ← a new decision tree with root test A
for each value v of A do
exs ←{e : e∈examples and e.A = v}
subtree ← LEARN-DECISION-TREE(exs, attributes − A, examples) add a branch to tree with label (A = v ) and subtree subtree
return tree
Figure19.5 Thedecisiontreelearningalgorithm.ThefunctionIMPORTANCEisdescribedin Section ??. The function PLURALITY-VALUE selects the most common output value among a set of examples, breaking ties randomly.
CHAPTER

46 Chapter 19 Learning from Examples
function MODEL-SELECTION(Learner,examples,k) returns a (hypothesis, error rate) pair
err ← an array, indexed by size, storing validation-set error rates training set , test set ← a partition of examples into two sets for size = 1 to ∞ do
err[size]←CROSS-VALIDATION(Learner,size,training set,k) if err is starting to increase significantly then
best size←the value of size with minimum err[size] h←Learner(best size,training set)
return h, ERROR-RATE(h, test set)
function CROSS-VALIDATION(Learner,size,examples,k) returns error rate
N ← the number of examples errs ← 0
for i = 1 to k do
validation set←examples[(i − 1) × N/k:i × N/k] training set ← examples − validation set h←Learner(size,training set)
errs ←errs + ERROR-RATE(h,validation set)
return errs / k // average error rate on validation sets, across k-fold cross-validation
Figure 19.8 An algorithm to select the model that has the lowest validation error. It builds models of increasing complexity, and choosing the one with best empirical error rate, err, on the validation data set. Learner(size,examples) returns a hypothesis whose complexity is set by the parameter size, and which is trained on examples. In CROSS-VALIDATION, each iteration of the for loop selects a different slice of the examples as the validation set, and keeps the other examples as the training set. It then returns the average validation set error over all the folds. Once we have determined which value of the size parameter is best, MODEL-SELECTION returns the model (i.e., learner/hypothesis) of that size, trained on all the training examples, along with its error rate on the held-out test examples.
function DECISION-LIST-LEARNING(examples) returns a decision list, or failure
if examples is empty then return the trivial decision list No
t ← a test that matches a nonempty subset examples t of examples
such that the members of examplest are all positive or all negative
if there is no such t then return failure
if the examples in examples t are positive then o ← Yes else o ← No
return a decision list with initial test t and outcome o and remaining tests given by
DECISION-LIST-LEARNING(examples − examplest) Figure 19.11 An algorithm for learning decision lists.

function ADABOOST(examples, L, K ) returns a hypothesis
inputs: examples, set of N labeled examples (x1, y1), . . . , (xN , yN )
L, a learning algorithm
K , the number of hypotheses in the ensemble
local variables: w, a vector of N example weights, initially all 1/N
h, a vector of K hypotheses
z, a vector of K hypothesis weights
ǫ ← a small positive number, used to avoid division by zero for k = 1 to K do
h[k ] ← L(examples , w)
error ← 0
for j = 1 to N do //Compute the total error for h[k]
if h[k](xj) ̸= yj then error ←error + w[j]
if error > 1/2 then break from loop
error ← min(error , 1 − ǫ)
for j = 1 to N do // Give more weight to the examples h[k ] got wrong
ifh[k](xj)=yj thenw[j]←w[j]· error/(1−error)
w ← NORMALIZE(w)
z[k ] ← 1 log ((1 − error )/error ) // Give more weight to accurate h[k ] 2􏲹
return Function(x) : zi hi(x)
Figure 19.25 The ADABOOST variant of the boosting method for ensemble learning. The algorithm generates hypotheses by successively reweighting the training examples. The func- tion WEIGHTED-MAJORITY generates a hypothesis that returns the output value with the highest vote from the hypotheses in h, with votes weighte􏲹d by z. For regression problems, or for binary classification with two classes -1 and 1, this is k h[k]z[k].
47

20
LEARNING PROBABILISTIC MODELS
CHAPTER

21
DEEP LEARNING
CHAPTER

22
REINFORCEMENT LEARNING
function PASSIVE-ADP-LEARNER(percept) returns an action
inputs: percept, a percept indicating the current state s′ and reward signal r persistent: π, a fixed policy
mdp, an MDP with model P, rewards R, actions A, discount γ
U , a table of utilities for states, initially empty
N s′ |s,a , a table of outcome count vectors indexed by state and action, initially zero s, a, the previous state and action, initially null
ifs′ isnewthenU[s′]←0 if s is not null then
increment N s′ |s,a [s , a ][s ’]
R[s, a, s′] ← r
add a to A[s]
P(· | s, a) ← NORMALIZE(N s′|s,a[s, a]) U ←POLICYEVALUATION(π,U,mdp) s,a ←s′,π[s′]
return a
Figure 22.2 A passive reinforcement learning agent based on adaptive dynamic program- ming. The agent chooses a value for γ and then incrementally computes the P and R values of the MDP. The POLICY-EVALUATION function solves the fixed-policy Bellman equations, as described on page ??.
CHAPTER

51
function PASSIVE-TD-LEARNER(percept) returns an action
inputs: percept, a percept indicating the current state s′ and reward signal r persistent: π, a fixed policy
s, the previous state, initially null
U , a table of utilities for states, initially empty Ns, a table of frequencies for states, initially zero
if s′ is new then U[s′]←0 if s is not null then
increment N s [s ]
U[s]←U[s]+α(Ns[s])×(r +γU[s′]-U[s]) s ←s′
return π[s′]
Figure 22.4 A passive reinforcement learning agent that learns utility estimates using tem-
poral differences. The step-size function α(n) is chosen to ensure convergence.
function Q-LEARNING-AGENT(percept) returns an action
inputs: percept, a percept indicating the current state s′ and reward signal r persistent: Q , a table of action values indexed by state and action, initially zero
Nsa , a table of frequencies for state–action pairs, initially zero s, a, the previous state and action, initially null
if s is not null then
increment Nsa [s , a ]
Q[s,a]←Q[s,a] + α(Nsa[s,a])(r + γ maxa′ Q[s′,a′] − Q[s,a])
s,a ←s′,argmaxa′ f(Q[s′,a′],Nsa[s′,a′]) return a
Figure 22.8 An exploratory Q-learning agent. It is an active learner that learns the value Q(s, a) of each action in each situation. It uses the same exploration function f as the ex- ploratory ADP agent, but avoids having to learn the transition model.

23
NATURAL LANGUAGE PROCESSING
function CYK-PARSE(words,grammar) returns a table of parse trees inputs: words, a list of words
grammar, a structure with LEXICALRULES and GRAMMARRULES
T ←a table //T[X, i, k] is most probable X tree spanning wordsi:k
P ←a table, initially all 0 //P[X, i, k] is probability of tree T[X, i, k] // Insert lexical categories for each word.
for i = 1 to LEN(words) do
for each (X, p) in grammar.LEXICALRULES(wordsi) do P [X , i , i ] ← p
T[X, i, i]←TREE(X, wordsi)
//Construct Xi:k from Yi:j + Zj+1:k, shortest spans first. for each (i, j, k) in SUBSPANS(LEN(words)) do
for each (X, Y, Z, p) in grammar.GRAMMARRULES do PYZ ←P[Y, i, j] × P[Z, j +1, k] × p
ifPYZ > P[X, i, k]do
P [X , i , k ] ← PYZ
T[X,i,k]←TREE(X,T[Y,i,j],T[Z,j +1,k]) return T
function SUBSPANS(N) yields (i, j, k) tuples forlength=2toN do
fori =1toN + 1 − length do k←i + length − 1
for j = i to k − 1 do
yield (i, j, k)
Figure 23.5 The CYK algorithm for parsing. Given a sequence of words, it finds the most probable parse tree for the sequence and its subsequences. The table P[X,i,k] gives the probability of the most probable tree of category X spanning wordsi:k. The output table T[X, i, k] contains the most probable tree of category X spanning positions i to k inclu- sive. The function SUBSPANS returns all tuples (i,j,k) covering a span of wordsi:k, with i ≤ j < k, listing the tuples by increasing length of the i : k span, so that when we go to combine two shorter spans into a longer one, the shorter spans are already in the table. LEXICALRULES(word) returns a collection of (X, p) pairs, one for each rule of the form X →word[htbp],andGRAMMARRULESgives(X,Y,Z,p)tuples,oneforeachgrammar ruleoftheformX →Y Z [p]. CHAPTER .] [SBAR-ADV as if [S [NP she] [VP did n’t [VP [VP hear [NP *-1]] or [VP [ADVP even] see [NP *-1]] [NP-1 him]]]]]]]] Figure 23.8 Annotated tree for the sentence “Her eyes were glazed as if she didn’t hear or even see him.” from the Penn Treebank. Note a grammatical phenomenon we have not covered yet: the movement of a phrase from one part of the tree to another. This tree analyzes the phrase “hear or even see him” as consisting of two constituent VP s, [VP hear [NP *-1]] and [VP [ADVP even] see [NP *-1]], both of which have a missing object, denoted *-1, which refers to the NP labeled elsewhere in the tree as [NP-1 him]. Similarly, the [NP *-2] refers to the [NP-2 Her eyes]. 53 [ [S [NP-2 Her eyes] [VP were [VP glazed [NP *-2] 24 DEEP LEARNING FOR NATURAL LANGUAGE PROCESSING It is a truth universally acknowledged that the earth is not the center of the uni- verse. There are those who assert there is. I do not accept them, but others I consider to be of the same opinion. The truth is, however, that if there are other than the center, and if there are any other living things in the universe and if they are not human, then we do not yet have our answers. We have to go on. This page gives a simplified, simplified answer to the problem. We don’t have all the answers. The truth is, however, that the truth is out there. When Gregor Samsa woke up one morning, he did not notice anything strange. “When my wife is looking at me, I feel like she is looking at a piece of art,” he said. “I think she is admiring something I have created.” The idea is that by looking at your own life, you learn something important and become a better person. It is a theory that emerged from psychologist Daniel Goleman’s work, in which he asked “How do you know you’re not a loser?” Alice was beginning to get very tired of sitting with her sister on the bank. She sat up, yawned, and said, with a loud little scream, “I hope you don’t mind if I keep on doing what I should like to do, and if someone asks me which of us will do more, don’t tell them that I won’t do much, my dear sister.” All happy families are alike; each happy family is like a garden of paradise. The only difference between happy families and unhappy families, is that the unhappy family doesn’t have any flowers or trees. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Tell me a story. Please fill out the following details. Thank you... Thank you for your interest in this interview. Please wait... Figure 24.13 Example completion texts generated by the GPT-2 language model, given the prompts in bold. Most of the texts are quite fluent English, at least locally. The final example demonstrates that sometimes the model just breaks down. CHAPTER 25 COMPUTER VISION CHAPTER 26 ROBOTICS function MONTE-CARLO-LOCALIZATIONa, z, N, P(X′|X, v, ω), P(z|z∗), map returns a set of samples, S, for the next time step inputs: a, robot velocities v and ω z, a vector of M range scan data points P(X′|X, v, ω), motion model P (z|z∗), a range sensor noise model map, a 2D map of the environment persistent: S, a vector of N samples local variables: W , a vector of N weights S′, a temporary vector of N samples if S is empty then for i = 1 to N do // initialization phase S[i] ← sample from P (X0) for i = 1 to N do //update cycle S′[i]←sample from P(X′|X = S[i],v,ω) W[i]←1 for j = 1 to M do z∗ ← RAYCAST(j, X = S′[i], map) W[i]←W[i] · P(zj| z∗) S ← WEIGHTED-SAMPLE-WITH-REPLACEMENT(N , S′, W ) return S Figure 26.6 A Monte Carlo localization algorithm using a range-scan sensor model with independent noise. CHAPTER 27 PHILOSOPHY, ETHICS, AND SAFETY OF AI CHAPTER 28 THE FUTURE OF AI CHAPTER 29 MATHEMATICAL BACKGROUND CHAPTER 30 NOTES ON LANGUAGES AND ALGORITHMS CHAPTER

Posted in Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *