Algoritm in C

cstic

An agent moves into a space which is composed of several corridors in three related intersection of T-shaped, labeled S1, S2 and S3. Crossroads s1 and S2 are corridors to the east, south and west; intersection s3 are corridors to the east, north and west. Each intersection agent are 2 possible actions: left or right. Actions are Deterministic. If an agent reaches the wall (deadend) Teleport is instantly in intersection point S3 and the girl from the south. Thus, the only possible choice of agent is when you reach one of the 3 intersections and in front of the wall in T. The intersection terms of a learning algorithm by reward, the 3-point selection are the 3 states of the environment. Trazitiile matrix with the states of the agent is given in table 1. Immediate reward is given in table 2. Immediate reward can be interpreted as a negative amount of reward for every meter of the route corridor to the last point plus a selection of reward on teleportare (R and P). To write a program that: - Determine value pairs state-action optimal (Q-learning) and it shows; - Extract a policy of optimal state-action pairs and thus show: Status Worldwide | Action Selected cf optimal policy s1 | S2 | s3 |

Garth J Lancaster

so whats your problem ? you're not expecting US to write your code for you are you ? you could start reading here http://www-anw.cs.umass.edu/rlr/domains.html[^] for example and you might get some hints from some of those examples 'g'