AI代写 | 代写Homework | 代做Python | Algorithm – Foundations of Artificial Intelligence
AI代写 | 代写Homework | 代做Python | Algorithm – 这是利用python进行的AI算法任务代写
CSCI-561 – Spring 2018 – Foundations of Artificial Intelligence homework Problem description GridWorld is a 2D rectangular grid of size ( Nrows , Ncolumns ) with an agent starting off at one grid cell, moving from cell to cell through the grid, and eventually exiting after collecting a reward. This grid environment is described as follows: State space: GridWorld has Nrows Ncolumns distinct states. We use s to denote the state. The agent starts in the bottom-left cell (row 1, column 1, marked as a green cell). There exist one or more terminal states (blue cells) that can be located anywhere in the grid (except the bottom-left cell). There may also be walls (red cells) that the agent cannot be moved to. An instance of this grid is shown in the figure below. Actions: At every non-terminal state, the agent can either walk or run in any of the four directions (Up, Down, Left, and Right), which results in 8 possible actions: Walk Up, Walk Down, Walk Left, Walk Right, Run Up, Run Down, Run Left, Run Right. At the terminal state, the only possible action is Exit. We use A ( s ) to denote the set of all possible actions at state s.
Transition model : GridWorld is stochastic because the actions can be unreliable. In this environment, action Walk X ( X can be Up, Down, Left, or Right) moves the agent one cell in the X direction with probability pwalk , but with probabilities 0. 5 ( 1 pwalk )and 0. 5 ( 1 pwalk ) moves the agent one cell at angles of 90 and -90 to the direction X , respectively. Furthermore, action Run X moves the agent two cells in the X direction with probability prun , but with probabilities 0. 5 ( 1 prun )and 0. 5 ( 1 prun ) moves the agent two cells at angles of 90 and -90 to the direction X , respectively.
If moving in a particular direction causes the agent to bump into a wall, the movement fails, and the agent stays in the same cell " i , j ". We write P ( s | s , a ) to denote the probability of reaching state s if action a is done in state s. The following examples illustrate the environment dynamics: Assume that the agent chooses action Walk Up at 4,4 as shown in figure below. This action moves the agent to (5,4) with probability pwalk , but with probability 0. 5 ( 1 pwalk ), it moves the agent right to 4,5, and with probability 0. 5 ( 1 pwalk ), it moves the agent left to "4,3. Assume that the agent chooses action Run Up at 4,4 as shown in figure below. This action moves the agent two cells up, but because it causes the agent to bump into a wall, the agent stays at 4,4 with probability prun. With probability 0. 5 ( 1 prun ), the agent moves two cells right to 4,6. Finally, this action moves the agent two cells left with probability 0. 5 ( 1 prun ), but because of the wall at 4,2, the agent stays at 4,4 with probability 0. 5 ( 1 prun ). Hence, as a result of this action, the agent moves to 4,6 with probability 0. 5 ( 1 prun ) and stays at 4,4 with probability prun + 0. 5 ( 1 prun ).
Assume that the agent chooses action Walk Right at 4,1 as shown in figure below. Then, the agent moves to 5,1 and 3,1, each with probability 0. 5 ( 1 pwalk ) and stays at 4,1 with probability pwalk.
Assume that the agent chooses action Run Right at 4,1 as shown in figure below. Then, the agent moves to 2,1 with probability 0. 5 ( 1 prun ) and stays at 4,1 with probability prun + 0. 5 ( 1 prun ). Rewards: When the agent takes action a in state s , it receives a reward, R ( s , a ). For all non-terminal states, s : R ( s, Walk X ) =Rwalk (a constant). R ( s, Run X ) = Rrun (a constant). Furthermore, if there are K terminal states, the reward in the k th terminal state, s is R ( s, Exit) = Rterminal ( k ) (a constant). The agent prefers receiving its reward as soon as possible. Therefore, it applies a discount factor,, to future rewards. For example, if it runs once and then exits from the k th terminal state, it will receive a total reward of Rrun + Rterminal ( k ). If it instead walks twice and then exits from the k th terminal state, it will receive a total reward of Rwalk + Rwalk + ^2 Rterminal ( k ).
Your Task
In this assignment, you need to find the action that the agent should take at each state to maximize its expected reward. You can use any Algorithm for this homework. Implementation Notes
- Ties between the possible actions are broken based on the following preference order: Walk Up > Walk Down > Walk Left > Walk Right > Run Up > Run Down
Run Left > Run Right.
- The grid will contain at most 1,000,000 cells.
- As the wall states are never reached, it does not matter what the optimal actions are at these states. However, for consistency, you should print None as the optimal action for a wall state. File Formats Input format:
includes two positive integers separated by a comma , where the first one is the number of rows ( Nrows ) of the grid, and the second one is the number of columns ( Ncolumns ) of the grid. is a number (greater than or equal to zero) that specifies the number of wall cells in the grid. contains lines where each line includes 2 values separated by a comma ,. The two values in each line indicate the row and column numbers of the gird where a wall is located. is a number (greater than or equal to one) that specifies the number of terminal states in the grid. contains lines where each line includes 3 values separated by a comma ,. The first two values in the k -th line indicate the row and column numbers of the grid where the k th terminal state is located, and the third value (a float) denotes Rterminal ( k ). contains two probability values separated by a comma ,, where the first value is pwalk and the second value is prun. contains two floating-point numbers separated by a comma , where the first value is Rwalk and the second value is Rrun. which is a number in the interval [0,1].
Output format:
- You must use Python 2.7 to implement your homework assignment. You are allowed to use standard libraries only. You have to implement any other functions or methods by yourself.
- You need to create a file named hw3cs561s2018.py. When you submit the homework on labs.vocareum.com, the following commands will be executed: python hw3cs561s2018.py
- Your code must create a file named output.txt and print its output there. For each test case, the grading script will put an input.txt file in your work folder, runs your program (which reads input.txt), and check the output.txt file generated by your code. The grading script will replace the files automatically, so you do NOT need to do anything for that part.
- Homework should be submitted through Vocareum. Homework submitted through email, in person during office hours, or through any channel other than Vocareum will not be accepted. Please only upload your code to the /work directory. Dont create any subfolder or upload any other files. Please refer to http://help.vocareum.com/article/30-getting-started-students to get started with Vocareum.
- Your program should handle all test cases within a reasonable time. A maximum runtime of 3 minutes per test case will be set on Vocareum.
- You are strongly recommended to submit at least two hours ahead of the deadline, as the submission system around the deadline might be slow and possibly cause late submission. Late submissions will not be graded.