Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Artificial Intelligence
  3. Q_Learning to determine best path on the goal state

Q_Learning to determine best path on the goal state

Scheduled Pinned Locked Moved Artificial Intelligence
pythondatabasequestionloungelearning
1 Posts 1 Posters 3 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    longmen2022
    wrote on last edited by
    #1

    Hi everyone, I am using Python on Q_learning to determine the best action on each episode on the 4x4 board. I have a living_reward = -0.1 goal_reward = 100 forbidden_reward = 100 discount rate gamma = 0.1 learning rate alpha =0.5 greedy probability epsilon = 0.3 and max iteration = 10000. When there are two similar max q-value for the up and right actions, a clockwise priority for printing the final policy will be used (up, right, down, left). I feed the below input to my code

    12 7 5 6 p

    and it prints out the following output:

    1 right
    2 right
    3 up
    4 left
    5 forbid
    6 wall-square
    7 goal
    8 left
    9 up
    10 left
    11 left
    12 goal
    13 right
    14 left
    15 left
    16 left

    However, I am looking for it to print out this output

    1 right
    2 right
    3 up
    4 up
    5 forbid
    6 wall-square
    7 goal
    8 up
    9 up
    10 up
    11 up
    12 goal
    13 up
    14 up
    15 up
    16 up

    It goes wrong at index 4,8,10,13,14,15,16,17 For the second input

    15 12 8 6 p

    it prints out this

    1 up
    2 right
    3 up
    4 left
    5 up
    6 wall-square
    7 up
    8 forbid
    9 up
    10 up
    11 up
    12 goal
    13 right
    14 right
    15 goal
    16 left

    while I am looking for this output

    1 up
    2 right
    3 up
    4 left
    5 up
    6 wall-square
    7 up
    8 forbid
    9 up
    10 up
    11 up
    12 goal
    13 right
    14 right
    15 goal
    16 up

    It goes wrong at the 16 index. Instead of going up, it goes left. I wonder if anyone could advise what goes wrong with my code? I am also providing my codes below. Any advice would be very appreciated! Thanks

    import random
    import numpy as np
    import enum

    EACH_STEP_REWARD = -0.1
    GOAL_SQUARE_REWARD = 100
    FORBIDDEN_SQUARE_REWARD = -100
    DISCOUNT_RATE_GAMMA = 0.1 # Discount Rate
    LEARNING_RATE_ALPHA = 0.3 # Learning Rate
    GREEDY_PROBABILITY_EPSILON = 0.5 # Greedy Probability
    ITERATION_MAX_NUM = 10000 # Will be 10,000
    START_LABEL = 2
    LEVEL = 4

    class Direction(enum.Enum):
    up = 1
    right = 2
    down = 3
    left = 0
    exit = 4

    class Node:
    def __init__(self, title, next, Goal=False, Forbidden=False, Wall=False, qValues=None, actions=None):
    self.title = title
    self.next = next
    self.qValues = [qValues] * 5
    self.move = [actions] * 5
    self.goal = Goal
    self.forbidden = Forbidden
    self.wall = Wall

    def max\_Q\_value(self):
        if self.wall:
            return False
        max\_q = \[\]
        for q in self.qValues:
    
    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Recent
    • Tags
    • Popular
    • World
    • Users
    • Groups