hw3s

CS 540 - Introduction To Artificial Intelligence Spring(2000)

Homework #3 - Search

Problem 1 - Basic Search Strategies

Note: this is a paper-and-pencil question.

Assume you have the following search space.

State Name Next State Name Cost

S A 15

S B 12

S C 3

A F 10

B D 8

D G 4

C D 10

C E 6

E D 15

E G 14

F G 7

Part A

Draw this search space as a directed graph (ie, states are nodes and actions are directed arcs, labeled with their costs).

Part B

For each of the following, during each cycle of the search algorithm show the OPEN and CLOSED lists and report which node is expanded. Also report the solution path found.

Breadth-First Search
Depth-First Search
Uniform-Cost Search
Interative Deepening

The START STATE is S and the GOAL STATE is G. When multiple nodes tie for their position in OPEN, they should appear in alphabetical order front-to-back (ie, left-to-right). Eg, after the first cycle of breadth-first search, OPEN should equal {A, B, C}.

Part C

Again using the same START and GOAL states of Part B, along with the h function below, report what each of the following search strategies do.

   A* Search
   Hill Climbing (using f = h as its scoring function)
   Beam Search (with a beam width of 2 and f = h as its scoring function)

The Estimated
Cost to a Goal

State Name h

S 23

A 15

B 11

C 12

D 3

E 18

F 6

G 0

General Pseudocode for Searching

The following is the basic outline for the various search algorithms (some steps are modified depending on the specifics of the search, e.g. if doing hill climbing, iterative deepening, or beam search).

OPEN = { startNode } // Nodes under consideration.
CLOSED = { } // Nodes we're done with.

while OPEN is not empty
{

remove an item from OPEN based on search strategy used
- call it X

if goalState?(X) return the solution found

    otherwise // Expand node X.
    {
      1) add X to CLOSED
      2) generate the immediate neighbors (ie, children of X)
      3) eliminate those children already in OPEN or CLOSED
      4) add REMAINING children to OPEN
    }

}
return FAILURE // Failed if OPEN exhausted without a goal being found.

Problem 2 - Searching an Intranet

When you surf the web, you as a user occasionally click on some interesting hyperlinks (like this one) in a web page hoping to find another interesting page. After you follow the link and get to this new page, you read it for a while, click on links on this page, and travel to other pages, and on and on.

In this problem, you will implement in Java a system that uses various search algorithms to find information in a pseudo World-Wide Web. The states in this problem are web pages, and the actions are to follow a hyperlink.

Some code that describes the main method you must use and provides some hints appears in the file WebSearch.java

You can see some sample intranets in this directory.

Some sample statistics will be provided here later.

It's simplest if you copy the entire intranets directory (and its children subdirectories) to your own file system (see the directories below); if you're tight on filespace (you'll need about 1.1 MB if you copy all nine intranets) and are using Unix, you can make symbolic links to the directories as follows:

     mkdir ~/private/intranets
     ln -s /p/course/cs540-shavlik/public/intranets/intranet1 ~/private/intranets
     ln -s /p/course/cs540-shavlik/public/intranets/intranet2 ~/private/intranets
     ...
     ln -s /p/course/cs540-shavlik/public/intranets/intranet9 ~/private/intranets

You need only to turn in results on intranets 3, 5, and 7, so only use those three intranets if filespace is a problem.

In each of these intranets, page1.html is intended to be the START node, and the GOAL node is the one that contains QUERY1 QUERY2 QUERY3 QUERY4.

For ease in parsing web pages, all our pages will have a very simple structure. Besides simple words (eg, w6, QUERY2), there will be hyperlinks with the following syntax:

<A HREF = pageN.html word word ... word </A

There won't be any punctuation, quote marks, etc. There will always be spaces around words, the equal sign, and the HTML markers. Words on these pages are either w# or QUERY#, where # is some integer. Thus, you can simply walk through the source text of pages using Java's StringTokenizer, as illustrated in the provided hints.

Notice that you can use your Web browser to manually search these intranets. This will give you a good feel for the task. Also, be aware that web browsers provide a menu option that allows you to view the source of the page being displayed. Feel free to make your own simpler intranets for debugging purposes.

Part A

Write code that performs breadth-first and depth-first search and iterative-deepening for this task. Your code should report the number of nodes expanded and the solution path (ie, the path from the start state to the goal state - it is fine to print this path 'backwards,' from the goal to the start).

Part B

The nine sample intranets have been randomly created, with the following propensities (from weakest to strongest):

the more QUERY words on a page, the more likely the links on that page lead to the goal node
the more QUERY words in the hypertext associated with a hyperlink, the more likely that hyperlink leads to the goal node
the more consecutive and in numerical order QUERY words there are in a hyperlink, the more likely that hyperlink leads to the goal node (eg, seeing QUERY1 QUERY2 QUERY3 is a very good indicator)

Use the above information to devise a heuristic function for use in best-first search. Describe your motivation for your heuristic. Notice that unlike the standard approach where the heuristic is applied to the next state, here we want to use our heuristic to decide which hyperlink to 'click on' (fetch) next. This means that your heuristic function scores each arc (hyperlink) coming out of the current node and not each child node. You're not required to write an admissible heuristic.

Now consider this other heuristic function:

If none of the links in the page have any QUERY words, then choose the link that is closest to the beginning of the page;
otherwise choose randomly from the links in the page that have QUERY words with probability equal to the number of QUERY words in the link divided by the number of QUERY words in all the links in the page.

Write code that performs best-first search for this task considering both of the above heuristics. Note that in the second heuristic, you will only be following the one link that was probabilistically selected from all the links in the page. Thus, this is equivalent to a best-first beam search with a beam width of 1.

Also, extend your code to handle beam search for the first heuristic, with beam-widths of 2 and 1. Compare your results using the first heuristic and a beam search with beam-width 1 to the second heuristic. As in Part A, your code should report the number of nodes expanded and the solution path.

What to Turn In

Run all the search strategies used in Problem 2 on intranets 3, 5, and 7, and turn in a printout of the results obtained (nodes visited and solution path found). DON'T TURN IN THE RESULTS OF RUNNING THE CODE WITH debugging = true.

Create a directory called HW3 in your ~/CS540-handin/ directory, and put your Java files there. Also turn in a printout of your commented code and a neatly written lab report that includes the material and answers requested above.

State Name	Next State Name	Cost
S	A	15
S	B	12
S	C	3
A	F	10
B	D	8
D	G	4
C	D	10
C	E	6
E	D	15
E	G	14
F	G	7