Проблема с алгоритмом DQN не сходится для змеи

22.06.2019 17:43

Краткое содержание

Я использую алгоритм DQN для игры в Змейку. Вход в нейронную сеть – это стопка 4 изображений, взятых из игр с разрешением 80x80. Выход представляет собой массив из 4 значений, по одному для каждого направления. Проблема заключается в том, что программа не сходится, и у меня много сомнений относительно функции репликирования, где я обучаю нейронную сеть на пакете из 32 событий. Вот фрагмент кода: ```python def replay(self, batch_size): minibatch = random.sample(self.memory, batch_size) for state, action, reward, next_state, done in minibatch: target = reward if not done: target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0])) target_f = self.model.predict(state) target_f[0][action] = target self.model.fit(state, target_f, epochs=1, verbose=0) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay ``` Цели: +1 для поедания яблока 0 для выполнения движения без смерти -1000 за попадание в стену или за смерть змейки

Полный текст

Problem over DQN Algorithm not converging on snake Ask Question

Asked 6 years, 6 months ago Modified today Viewed 1k times

Asked 6 years, 6 months ago

1 $\begingroup$ I'm using a DQN Algorithm to play Snake. The input of the neural network is a stack of 4 images taken from the games 80x80. The output is an array of 4 values, one for every direction. The problem is that the program does not converge and I've a lot of doubts in the replay function, where I train the neural network over a batch of 32 events. That's the snippet: def replay(self, batch_size): minibatch = random.sample(self.memory, batch_size) for state, action, reward, next_state, done in minibatch: target = reward if not done: target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0])) target_f = self.model.predict(state) target_f[0][action] = target self.model.fit(state, target_f, epochs=1, verbose=0) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay` Targets are: +1 for eating an apple 0 for doing a movement without dying -1000 for hitting a wall or the snake hitting himself reinforcement-learning python tensorflow q-learning dqn Share Improve this question Follow edited Jun 22, 2019 at 19:04 skillsmuggler 403 3 3 silver badges 11 11 bronze badges asked Jun 22, 2019 at 17:43 Roberto Aureli 127 4 4 bronze badges $\endgroup$ 5 $\begingroup$ Hi! questions about implementation are off-topic here, you may want to try other SE. Anyway, isn't -1000 a bit too much? $\endgroup$ olinarr – olinarr 2019-06-22 18:10:22 +00:00 Commented Jun 22, 2019 at 18:10 $\begingroup$ Sorry. I was trying to convince the snake to not die but I can't get around the Q approximation with the neural network $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 18:15:08 +00:00 Commented Jun 22, 2019 at 18:15 1 $\begingroup$ @RobertoAureli Did you make some progress? I have the same problem that the snake is not converging $\endgroup$ greedsin – greedsin 2021-09-10 19:31:45 +00:00 Commented Sep 10, 2021 at 19:31 1 $\begingroup$ @greedsin Unfortunately not, I stopped the project $\endgroup$ Roberto Aureli – Roberto Aureli 2021-09-10 20:31:51 +00:00 Commented Sep 10, 2021 at 20:31 $\begingroup$ @greedsin -1000 is a really bad scaling for the loss. it will explode the gradient and not lead to good results. you should give something like +1/10 for eat and -1 for die. also deepq learning is notorious hard on pixels. you might want to look into Data regularized Q Learning. $\endgroup$ tnfru – tnfru 2021-09-10 22:06:29 +00:00 Commented Sep 10, 2021 at 22:06 Add a comment | 1 Answer 1 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first) 0 $\begingroup$ I think the main issue here is that you are trying to train the snake (network) on images. This will create a lot of issues a there are no set parameters that the model can learn from. From images, there is no logical way to define the boundary, directions and objects on the board. It will be much easier to write a simple computer vision script of game API to provide actual meaningful inputs to the model. Here is a great article on building a model to play the snake game. The author also provides the game API for input along with example code to train the snake game. Final results from the model Share Improve this answer Follow answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges $\endgroup$ 2 1 $\begingroup$ Thanks for you answer. The problem is that I need to do it in that way for academical pouposes I'm taking inspiration from cs.toronto.edu/~vmnih/docs/dqn.pdf where the state is only the image and the reward is where I have more freedom. This is another example medium.com/@hugo.sjoberg88/… $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 19:04:13 +00:00 Commented Jun 22, 2019 at 19:04 $\begingroup$ Is snake game harder for dqn than any atari game google use to train their dqn? $\endgroup$ JustOneMan – JustOneMan 2023-02-18 11:34:18 +00:00 Commented Feb 18, 2023 at 11:34 Add a comment | You must log in to answer this question. Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions reinforcement-learning python tensorflow q-learning dqn See similar questions with these tags.

1 $\begingroup$ I'm using a DQN Algorithm to play Snake. The input of the neural network is a stack of 4 images taken from the games 80x80. The output is an array of 4 values, one for every direction. The problem is that the program does not converge and I've a lot of doubts in the replay function, where I train the neural network over a batch of 32 events. That's the snippet: def replay(self, batch_size): minibatch = random.sample(self.memory, batch_size) for state, action, reward, next_state, done in minibatch: target = reward if not done: target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0])) target_f = self.model.predict(state) target_f[0][action] = target self.model.fit(state, target_f, epochs=1, verbose=0) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay` Targets are: +1 for eating an apple 0 for doing a movement without dying -1000 for hitting a wall or the snake hitting himself reinforcement-learning python tensorflow q-learning dqn Share Improve this question Follow edited Jun 22, 2019 at 19:04 skillsmuggler 403 3 3 silver badges 11 11 bronze badges asked Jun 22, 2019 at 17:43 Roberto Aureli 127 4 4 bronze badges $\endgroup$ 5 $\begingroup$ Hi! questions about implementation are off-topic here, you may want to try other SE. Anyway, isn't -1000 a bit too much? $\endgroup$ olinarr – olinarr 2019-06-22 18:10:22 +00:00 Commented Jun 22, 2019 at 18:10 $\begingroup$ Sorry. I was trying to convince the snake to not die but I can't get around the Q approximation with the neural network $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 18:15:08 +00:00 Commented Jun 22, 2019 at 18:15 1 $\begingroup$ @RobertoAureli Did you make some progress? I have the same problem that the snake is not converging $\endgroup$ greedsin – greedsin 2021-09-10 19:31:45 +00:00 Commented Sep 10, 2021 at 19:31 1 $\begingroup$ @greedsin Unfortunately not, I stopped the project $\endgroup$ Roberto Aureli – Roberto Aureli 2021-09-10 20:31:51 +00:00 Commented Sep 10, 2021 at 20:31 $\begingroup$ @greedsin -1000 is a really bad scaling for the loss. it will explode the gradient and not lead to good results. you should give something like +1/10 for eat and -1 for die. also deepq learning is notorious hard on pixels. you might want to look into Data regularized Q Learning. $\endgroup$ tnfru – tnfru 2021-09-10 22:06:29 +00:00 Commented Sep 10, 2021 at 22:06 Add a comment |

1 $\begingroup$ I'm using a DQN Algorithm to play Snake. The input of the neural network is a stack of 4 images taken from the games 80x80. The output is an array of 4 values, one for every direction. The problem is that the program does not converge and I've a lot of doubts in the replay function, where I train the neural network over a batch of 32 events. That's the snippet: def replay(self, batch_size): minibatch = random.sample(self.memory, batch_size) for state, action, reward, next_state, done in minibatch: target = reward if not done: target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0])) target_f = self.model.predict(state) target_f[0][action] = target self.model.fit(state, target_f, epochs=1, verbose=0) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay` Targets are: +1 for eating an apple 0 for doing a movement without dying -1000 for hitting a wall or the snake hitting himself reinforcement-learning python tensorflow q-learning dqn Share Improve this question Follow edited Jun 22, 2019 at 19:04 skillsmuggler 403 3 3 silver badges 11 11 bronze badges asked Jun 22, 2019 at 17:43 Roberto Aureli 127 4 4 bronze badges $\endgroup$ 5 $\begingroup$ Hi! questions about implementation are off-topic here, you may want to try other SE. Anyway, isn't -1000 a bit too much? $\endgroup$ olinarr – olinarr 2019-06-22 18:10:22 +00:00 Commented Jun 22, 2019 at 18:10 $\begingroup$ Sorry. I was trying to convince the snake to not die but I can't get around the Q approximation with the neural network $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 18:15:08 +00:00 Commented Jun 22, 2019 at 18:15 1 $\begingroup$ @RobertoAureli Did you make some progress? I have the same problem that the snake is not converging $\endgroup$ greedsin – greedsin 2021-09-10 19:31:45 +00:00 Commented Sep 10, 2021 at 19:31 1 $\begingroup$ @greedsin Unfortunately not, I stopped the project $\endgroup$ Roberto Aureli – Roberto Aureli 2021-09-10 20:31:51 +00:00 Commented Sep 10, 2021 at 20:31 $\begingroup$ @greedsin -1000 is a really bad scaling for the loss. it will explode the gradient and not lead to good results. you should give something like +1/10 for eat and -1 for die. also deepq learning is notorious hard on pixels. you might want to look into Data regularized Q Learning. $\endgroup$ tnfru – tnfru 2021-09-10 22:06:29 +00:00 Commented Sep 10, 2021 at 22:06 Add a comment |

$\begingroup$ I'm using a DQN Algorithm to play Snake. The input of the neural network is a stack of 4 images taken from the games 80x80. The output is an array of 4 values, one for every direction. The problem is that the program does not converge and I've a lot of doubts in the replay function, where I train the neural network over a batch of 32 events. That's the snippet: def replay(self, batch_size): minibatch = random.sample(self.memory, batch_size) for state, action, reward, next_state, done in minibatch: target = reward if not done: target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0])) target_f = self.model.predict(state) target_f[0][action] = target self.model.fit(state, target_f, epochs=1, verbose=0) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay` Targets are: +1 for eating an apple 0 for doing a movement without dying -1000 for hitting a wall or the snake hitting himself reinforcement-learning python tensorflow q-learning dqn Share Improve this question Follow edited Jun 22, 2019 at 19:04 skillsmuggler 403 3 3 silver badges 11 11 bronze badges asked Jun 22, 2019 at 17:43 Roberto Aureli 127 4 4 bronze badges $\endgroup$

I'm using a DQN Algorithm to play Snake. The input of the neural network is a stack of 4 images taken from the games 80x80. The output is an array of 4 values, one for every direction. The problem is that the program does not converge and I've a lot of doubts in the replay function, where I train the neural network over a batch of 32 events. That's the snippet: def replay(self, batch_size): minibatch = random.sample(self.memory, batch_size) for state, action, reward, next_state, done in minibatch: target = reward if not done: target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0])) target_f = self.model.predict(state) target_f[0][action] = target self.model.fit(state, target_f, epochs=1, verbose=0) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay` Targets are: +1 for eating an apple 0 for doing a movement without dying -1000 for hitting a wall or the snake hitting himself

I'm using a DQN Algorithm to play Snake.

The input of the neural network is a stack of 4 images taken from the games 80x80.

The output is an array of 4 values, one for every direction.

The problem is that the program does not converge and I've a lot of doubts in the replay function, where I train the neural network over a batch of 32 events.

reinforcement-learning python tensorflow q-learning dqn

reinforcement-learning python tensorflow q-learning dqn

reinforcement-learning python tensorflow q-learning dqn

Share Improve this question Follow edited Jun 22, 2019 at 19:04 skillsmuggler 403 3 3 silver badges 11 11 bronze badges asked Jun 22, 2019 at 17:43 Roberto Aureli 127 4 4 bronze badges

Share Improve this question Follow edited Jun 22, 2019 at 19:04 skillsmuggler 403 3 3 silver badges 11 11 bronze badges asked Jun 22, 2019 at 17:43 Roberto Aureli 127 4 4 bronze badges

Share Improve this question Follow

Share Improve this question Follow

Share Improve this question Follow

Improve this question

edited Jun 22, 2019 at 19:04 skillsmuggler 403 3 3 silver badges 11 11 bronze badges

edited Jun 22, 2019 at 19:04 skillsmuggler 403 3 3 silver badges 11 11 bronze badges

edited Jun 22, 2019 at 19:04

edited Jun 22, 2019 at 19:04

skillsmuggler 403 3 3 silver badges 11 11 bronze badges

403 3 3 silver badges 11 11 bronze badges

asked Jun 22, 2019 at 17:43 Roberto Aureli 127 4 4 bronze badges

asked Jun 22, 2019 at 17:43 Roberto Aureli 127 4 4 bronze badges

asked Jun 22, 2019 at 17:43

asked Jun 22, 2019 at 17:43

Roberto Aureli 127 4 4 bronze badges

127 4 4 bronze badges

$\begingroup$ Hi! questions about implementation are off-topic here, you may want to try other SE. Anyway, isn't -1000 a bit too much? $\endgroup$ olinarr – olinarr 2019-06-22 18:10:22 +00:00 Commented Jun 22, 2019 at 18:10 $\begingroup$ Sorry. I was trying to convince the snake to not die but I can't get around the Q approximation with the neural network $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 18:15:08 +00:00 Commented Jun 22, 2019 at 18:15 1 $\begingroup$ @RobertoAureli Did you make some progress? I have the same problem that the snake is not converging $\endgroup$ greedsin – greedsin 2021-09-10 19:31:45 +00:00 Commented Sep 10, 2021 at 19:31 1 $\begingroup$ @greedsin Unfortunately not, I stopped the project $\endgroup$ Roberto Aureli – Roberto Aureli 2021-09-10 20:31:51 +00:00 Commented Sep 10, 2021 at 20:31 $\begingroup$ @greedsin -1000 is a really bad scaling for the loss. it will explode the gradient and not lead to good results. you should give something like +1/10 for eat and -1 for die. also deepq learning is notorious hard on pixels. you might want to look into Data regularized Q Learning. $\endgroup$ tnfru – tnfru 2021-09-10 22:06:29 +00:00 Commented Sep 10, 2021 at 22:06 Add a comment |

$\begingroup$ Hi! questions about implementation are off-topic here, you may want to try other SE. Anyway, isn't -1000 a bit too much? $\endgroup$ olinarr – olinarr 2019-06-22 18:10:22 +00:00 Commented Jun 22, 2019 at 18:10 $\begingroup$ Sorry. I was trying to convince the snake to not die but I can't get around the Q approximation with the neural network $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 18:15:08 +00:00 Commented Jun 22, 2019 at 18:15 1 $\begingroup$ @RobertoAureli Did you make some progress? I have the same problem that the snake is not converging $\endgroup$ greedsin – greedsin 2021-09-10 19:31:45 +00:00 Commented Sep 10, 2021 at 19:31 1 $\begingroup$ @greedsin Unfortunately not, I stopped the project $\endgroup$ Roberto Aureli – Roberto Aureli 2021-09-10 20:31:51 +00:00 Commented Sep 10, 2021 at 20:31 $\begingroup$ @greedsin -1000 is a really bad scaling for the loss. it will explode the gradient and not lead to good results. you should give something like +1/10 for eat and -1 for die. also deepq learning is notorious hard on pixels. you might want to look into Data regularized Q Learning. $\endgroup$ tnfru – tnfru 2021-09-10 22:06:29 +00:00 Commented Sep 10, 2021 at 22:06

$\begingroup$ Hi! questions about implementation are off-topic here, you may want to try other SE. Anyway, isn't -1000 a bit too much? $\endgroup$ olinarr – olinarr 2019-06-22 18:10:22 +00:00 Commented Jun 22, 2019 at 18:10

$\begingroup$ Hi! questions about implementation are off-topic here, you may want to try other SE. Anyway, isn't -1000 a bit too much? $\endgroup$ olinarr – olinarr 2019-06-22 18:10:22 +00:00 Commented Jun 22, 2019 at 18:10

2019-06-22 18:10:22 +00:00

$\begingroup$ Sorry. I was trying to convince the snake to not die but I can't get around the Q approximation with the neural network $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 18:15:08 +00:00 Commented Jun 22, 2019 at 18:15

$\begingroup$ Sorry. I was trying to convince the snake to not die but I can't get around the Q approximation with the neural network $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 18:15:08 +00:00 Commented Jun 22, 2019 at 18:15

Roberto Aureli – Roberto Aureli

2019-06-22 18:15:08 +00:00

$\begingroup$ @RobertoAureli Did you make some progress? I have the same problem that the snake is not converging $\endgroup$ greedsin – greedsin 2021-09-10 19:31:45 +00:00 Commented Sep 10, 2021 at 19:31

$\begingroup$ @RobertoAureli Did you make some progress? I have the same problem that the snake is not converging $\endgroup$ greedsin – greedsin 2021-09-10 19:31:45 +00:00 Commented Sep 10, 2021 at 19:31

2021-09-10 19:31:45 +00:00

$\begingroup$ @greedsin Unfortunately not, I stopped the project $\endgroup$ Roberto Aureli – Roberto Aureli 2021-09-10 20:31:51 +00:00 Commented Sep 10, 2021 at 20:31

$\begingroup$ @greedsin Unfortunately not, I stopped the project $\endgroup$ Roberto Aureli – Roberto Aureli 2021-09-10 20:31:51 +00:00 Commented Sep 10, 2021 at 20:31

Roberto Aureli – Roberto Aureli

2021-09-10 20:31:51 +00:00

$\begingroup$ @greedsin -1000 is a really bad scaling for the loss. it will explode the gradient and not lead to good results. you should give something like +1/10 for eat and -1 for die. also deepq learning is notorious hard on pixels. you might want to look into Data regularized Q Learning. $\endgroup$ tnfru – tnfru 2021-09-10 22:06:29 +00:00 Commented Sep 10, 2021 at 22:06

$\begingroup$ @greedsin -1000 is a really bad scaling for the loss. it will explode the gradient and not lead to good results. you should give something like +1/10 for eat and -1 for die. also deepq learning is notorious hard on pixels. you might want to look into Data regularized Q Learning. $\endgroup$ tnfru – tnfru 2021-09-10 22:06:29 +00:00 Commented Sep 10, 2021 at 22:06

2021-09-10 22:06:29 +00:00

1 Answer 1 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first) 0 $\begingroup$ I think the main issue here is that you are trying to train the snake (network) on images. This will create a lot of issues a there are no set parameters that the model can learn from. From images, there is no logical way to define the boundary, directions and objects on the board. It will be much easier to write a simple computer vision script of game API to provide actual meaningful inputs to the model. Here is a great article on building a model to play the snake game. The author also provides the game API for input along with example code to train the snake game. Final results from the model Share Improve this answer Follow answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges $\endgroup$ 2 1 $\begingroup$ Thanks for you answer. The problem is that I need to do it in that way for academical pouposes I'm taking inspiration from cs.toronto.edu/~vmnih/docs/dqn.pdf where the state is only the image and the reward is where I have more freedom. This is another example medium.com/@hugo.sjoberg88/… $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 19:04:13 +00:00 Commented Jun 22, 2019 at 19:04 $\begingroup$ Is snake game harder for dqn than any atari game google use to train their dqn? $\endgroup$ JustOneMan – JustOneMan 2023-02-18 11:34:18 +00:00 Commented Feb 18, 2023 at 11:34 Add a comment | You must log in to answer this question. Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions reinforcement-learning python tensorflow q-learning dqn See similar questions with these tags.

1 Answer 1 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

1 Answer 1 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default

Highest score (default) Date modified (newest first) Date created (oldest first)

0 $\begingroup$ I think the main issue here is that you are trying to train the snake (network) on images. This will create a lot of issues a there are no set parameters that the model can learn from. From images, there is no logical way to define the boundary, directions and objects on the board. It will be much easier to write a simple computer vision script of game API to provide actual meaningful inputs to the model. Here is a great article on building a model to play the snake game. The author also provides the game API for input along with example code to train the snake game. Final results from the model Share Improve this answer Follow answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges $\endgroup$ 2 1 $\begingroup$ Thanks for you answer. The problem is that I need to do it in that way for academical pouposes I'm taking inspiration from cs.toronto.edu/~vmnih/docs/dqn.pdf where the state is only the image and the reward is where I have more freedom. This is another example medium.com/@hugo.sjoberg88/… $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 19:04:13 +00:00 Commented Jun 22, 2019 at 19:04 $\begingroup$ Is snake game harder for dqn than any atari game google use to train their dqn? $\endgroup$ JustOneMan – JustOneMan 2023-02-18 11:34:18 +00:00 Commented Feb 18, 2023 at 11:34 Add a comment |

0 $\begingroup$ I think the main issue here is that you are trying to train the snake (network) on images. This will create a lot of issues a there are no set parameters that the model can learn from. From images, there is no logical way to define the boundary, directions and objects on the board. It will be much easier to write a simple computer vision script of game API to provide actual meaningful inputs to the model. Here is a great article on building a model to play the snake game. The author also provides the game API for input along with example code to train the snake game. Final results from the model Share Improve this answer Follow answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges $\endgroup$ 2 1 $\begingroup$ Thanks for you answer. The problem is that I need to do it in that way for academical pouposes I'm taking inspiration from cs.toronto.edu/~vmnih/docs/dqn.pdf where the state is only the image and the reward is where I have more freedom. This is another example medium.com/@hugo.sjoberg88/… $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 19:04:13 +00:00 Commented Jun 22, 2019 at 19:04 $\begingroup$ Is snake game harder for dqn than any atari game google use to train their dqn? $\endgroup$ JustOneMan – JustOneMan 2023-02-18 11:34:18 +00:00 Commented Feb 18, 2023 at 11:34 Add a comment |

$\begingroup$ I think the main issue here is that you are trying to train the snake (network) on images. This will create a lot of issues a there are no set parameters that the model can learn from. From images, there is no logical way to define the boundary, directions and objects on the board. It will be much easier to write a simple computer vision script of game API to provide actual meaningful inputs to the model. Here is a great article on building a model to play the snake game. The author also provides the game API for input along with example code to train the snake game. Final results from the model Share Improve this answer Follow answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges $\endgroup$

I think the main issue here is that you are trying to train the snake (network) on images. This will create a lot of issues a there are no set parameters that the model can learn from. From images, there is no logical way to define the boundary, directions and objects on the board. It will be much easier to write a simple computer vision script of game API to provide actual meaningful inputs to the model. Here is a great article on building a model to play the snake game. The author also provides the game API for input along with example code to train the snake game. Final results from the model

I think the main issue here is that you are trying to train the snake (network) on images. This will create a lot of issues a there are no set parameters that the model can learn from.

From images, there is no logical way to define the boundary, directions and objects on the board. It will be much easier to write a simple computer vision script of game API to provide actual meaningful inputs to the model.

Here is a great article on building a model to play the snake game. The author also provides the game API for input along with example code to train the snake game.

Final results from the model

Share Improve this answer Follow answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges

Share Improve this answer Follow answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges

Share Improve this answer Follow

Share Improve this answer Follow

Share Improve this answer Follow

answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges

answered Jun 22, 2019 at 18:56 skillsmuggler 403 3 3 silver badges 11 11 bronze badges

answered Jun 22, 2019 at 18:56

answered Jun 22, 2019 at 18:56

skillsmuggler 403 3 3 silver badges 11 11 bronze badges

403 3 3 silver badges 11 11 bronze badges

1 $\begingroup$ Thanks for you answer. The problem is that I need to do it in that way for academical pouposes I'm taking inspiration from cs.toronto.edu/~vmnih/docs/dqn.pdf where the state is only the image and the reward is where I have more freedom. This is another example medium.com/@hugo.sjoberg88/… $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 19:04:13 +00:00 Commented Jun 22, 2019 at 19:04 $\begingroup$ Is snake game harder for dqn than any atari game google use to train their dqn? $\endgroup$ JustOneMan – JustOneMan 2023-02-18 11:34:18 +00:00 Commented Feb 18, 2023 at 11:34 Add a comment |

1 $\begingroup$ Thanks for you answer. The problem is that I need to do it in that way for academical pouposes I'm taking inspiration from cs.toronto.edu/~vmnih/docs/dqn.pdf where the state is only the image and the reward is where I have more freedom. This is another example medium.com/@hugo.sjoberg88/… $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 19:04:13 +00:00 Commented Jun 22, 2019 at 19:04 $\begingroup$ Is snake game harder for dqn than any atari game google use to train their dqn? $\endgroup$ JustOneMan – JustOneMan 2023-02-18 11:34:18 +00:00 Commented Feb 18, 2023 at 11:34

$\begingroup$ Thanks for you answer. The problem is that I need to do it in that way for academical pouposes I'm taking inspiration from cs.toronto.edu/~vmnih/docs/dqn.pdf where the state is only the image and the reward is where I have more freedom. This is another example medium.com/@hugo.sjoberg88/… $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 19:04:13 +00:00 Commented Jun 22, 2019 at 19:04

$\begingroup$ Thanks for you answer. The problem is that I need to do it in that way for academical pouposes I'm taking inspiration from cs.toronto.edu/~vmnih/docs/dqn.pdf where the state is only the image and the reward is where I have more freedom. This is another example medium.com/@hugo.sjoberg88/… $\endgroup$ Roberto Aureli – Roberto Aureli 2019-06-22 19:04:13 +00:00 Commented Jun 22, 2019 at 19:04

Roberto Aureli – Roberto Aureli

2019-06-22 19:04:13 +00:00

$\begingroup$ Is snake game harder for dqn than any atari game google use to train their dqn? $\endgroup$ JustOneMan – JustOneMan 2023-02-18 11:34:18 +00:00 Commented Feb 18, 2023 at 11:34

$\begingroup$ Is snake game harder for dqn than any atari game google use to train their dqn? $\endgroup$ JustOneMan – JustOneMan 2023-02-18 11:34:18 +00:00 Commented Feb 18, 2023 at 11:34

JustOneMan – JustOneMan

2023-02-18 11:34:18 +00:00

Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions reinforcement-learning python tensorflow q-learning dqn See similar questions with these tags.

Start asking to get answers Find the answer to your question by asking. Ask question

Start asking to get answers Find the answer to your question by asking. Ask question

Start asking to get answers

Find the answer to your question by asking.

Explore related questions reinforcement-learning python tensorflow q-learning dqn See similar questions with these tags.

Explore related questions reinforcement-learning python tensorflow q-learning dqn See similar questions with these tags.

Explore related questions

reinforcement-learning python tensorflow q-learning dqn

See similar questions with these tags.

Featured on Meta A proposal for bringing back Community Promotion & Open Source Ads Native Ads coming soon to Stack Overflow and Stack Exchange AI Assist is now available on Stack Overflow Related 5 Snake game: snake converges to going in the same direction every time 5 My DQN is stuck and can't see where the problem is 6 DQN arXiv 10-year anniversary: What are the outstanding problems being actively researched in deep Q-learning since 2019? Hot Network Questions How to expose "private" fields to a Service class while adhering to Single Responsibility Principle Need csvuniq functionality on the command line for CSV with embedded end of line Manhwa that has a time loop concept in it What does it mean for a manuscript to be "technically sound"? Would the orbital velocity of the solar system matter when the need is to get beyond the sun's bow shock? Create macro with named arguments How far away can you see a nuclear detonation if the world is flat? What means Consciousness Isaiah 65:20 says some will die on the new earth. Rev. 21:4 says there will be no more death. Contradiction vs. progressive revelation? A seeming contradiction on Wilks' Theorem of asymptotic behaviour of likelihood ratio The Arrow Note: which visit gets hit? Was 'genommen' ever simply 'nommen? по (distributive usage) + collective numeral A question on elementary embedding Why does pressing the brake move my car forward? Why is graph non-isomorphism in coC_=P? Is there a simple proof that there exists unprovable true propositions Converting a .docx to Markdown with pandoc; how can I get a monospace font (I'm not picky which) for computer terms? Does Ubunut 25.10 support python3 venv? Composing for viola: playability Check if a macro internally contains inner macros Do my floor joists need to be upgraded for a bathtub install? Is there a philosophy "simplifier"? Is there a shortcut / widget to set the iPhone call screening feature? more hot questions Question feed

Featured on Meta A proposal for bringing back Community Promotion & Open Source Ads Native Ads coming soon to Stack Overflow and Stack Exchange AI Assist is now available on Stack Overflow

A proposal for bringing back Community Promotion & Open Source Ads

Native Ads coming soon to Stack Overflow and Stack Exchange

AI Assist is now available on Stack Overflow

Related 5 Snake game: snake converges to going in the same direction every time 5 My DQN is stuck and can't see where the problem is 6 DQN arXiv 10-year anniversary: What are the outstanding problems being actively researched in deep Q-learning since 2019?

5 Snake game: snake converges to going in the same direction every time 5 My DQN is stuck and can't see where the problem is 6 DQN arXiv 10-year anniversary: What are the outstanding problems being actively researched in deep Q-learning since 2019?

5 Snake game: snake converges to going in the same direction every time

5 My DQN is stuck and can't see where the problem is

6 DQN arXiv 10-year anniversary: What are the outstanding problems being actively researched in deep Q-learning since 2019?

Hot Network Questions How to expose "private" fields to a Service class while adhering to Single Responsibility Principle Need csvuniq functionality on the command line for CSV with embedded end of line Manhwa that has a time loop concept in it What does it mean for a manuscript to be "technically sound"? Would the orbital velocity of the solar system matter when the need is to get beyond the sun's bow shock? Create macro with named arguments How far away can you see a nuclear detonation if the world is flat? What means Consciousness Isaiah 65:20 says some will die on the new earth. Rev. 21:4 says there will be no more death. Contradiction vs. progressive revelation? A seeming contradiction on Wilks' Theorem of asymptotic behaviour of likelihood ratio The Arrow Note: which visit gets hit? Was 'genommen' ever simply 'nommen? по (distributive usage) + collective numeral A question on elementary embedding Why does pressing the brake move my car forward? Why is graph non-isomorphism in coC_=P? Is there a simple proof that there exists unprovable true propositions Converting a .docx to Markdown with pandoc; how can I get a monospace font (I'm not picky which) for computer terms? Does Ubunut 25.10 support python3 venv? Composing for viola: playability Check if a macro internally contains inner macros Do my floor joists need to be upgraded for a bathtub install? Is there a philosophy "simplifier"? Is there a shortcut / widget to set the iPhone call screening feature? more hot questions

Читать оригинал статьи