Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Algoritm in C

Algoritm in C

Scheduled Pinned Locked Moved C / C++ / MFC
agentic-aialgorithmslearningworkspace
2 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    cstic
    wrote on last edited by
    #1

    An agent moves into a space which is composed of several corridors in three related intersection of T-shaped, labeled S1, S2 and S3. Crossroads s1 and S2 are corridors to the east, south and west; intersection s3 are corridors to the east, north and west. Each intersection agent are 2 possible actions: left or right. Actions are Deterministic. If an agent reaches the wall (deadend) Teleport is instantly in intersection point S3 and the girl from the south. Thus, the only possible choice of agent is when you reach one of the 3 intersections and in front of the wall in T. The intersection terms of a learning algorithm by reward, the 3-point selection are the 3 states of the environment. Trazitiile matrix with the states of the agent is given in table 1. Immediate reward is given in table 2. Immediate reward can be interpreted as a negative amount of reward for every meter of the route corridor to the last point plus a selection of reward on teleportare (R and P). To write a program that: - Determine value pairs state-action optimal (Q-learning) and it shows; - Extract a policy of optimal state-action pairs and thus show: Status Worldwide | Action Selected cf optimal policy s1 | S2 | s3 |

    G 1 Reply Last reply
    0
    • C cstic

      An agent moves into a space which is composed of several corridors in three related intersection of T-shaped, labeled S1, S2 and S3. Crossroads s1 and S2 are corridors to the east, south and west; intersection s3 are corridors to the east, north and west. Each intersection agent are 2 possible actions: left or right. Actions are Deterministic. If an agent reaches the wall (deadend) Teleport is instantly in intersection point S3 and the girl from the south. Thus, the only possible choice of agent is when you reach one of the 3 intersections and in front of the wall in T. The intersection terms of a learning algorithm by reward, the 3-point selection are the 3 states of the environment. Trazitiile matrix with the states of the agent is given in table 1. Immediate reward is given in table 2. Immediate reward can be interpreted as a negative amount of reward for every meter of the route corridor to the last point plus a selection of reward on teleportare (R and P). To write a program that: - Determine value pairs state-action optimal (Q-learning) and it shows; - Extract a policy of optimal state-action pairs and thus show: Status Worldwide | Action Selected cf optimal policy s1 | S2 | s3 |

      G Offline
      G Offline
      Garth J Lancaster
      wrote on last edited by
      #2

      so whats your problem ? you're not expecting US to write your code for you are you ? you could start reading here http://www-anw.cs.umass.edu/rlr/domains.html[^] for example and you might get some hints from some of those examples 'g'

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups