Toy Fixture Training Project

My toy project for the study and application Strengthening training:
- The agent is trying to achieve the state of goal "safe" and "fast" ....
- But there are shells and missiles that are launched at the agent on the way.
- The agent can determine the position of the rocket with some "loud" noise - only if they are "nearby"
- Then the agent must learn to avoid malfunctions in these missiles ..
- The agent has -repaired time fuel that is consumed in the movement of the agent
- Continuous actions : acceleration forward - rotation with an angle


I need some hints and RL algorithm names that fit this case.
- I think this is POMDP, but can I simulate it as MDP and just ignore the noise?
- In the case of POMDP, what is the recommended method for assessing probability?
- What is better to use in this case: the Value or Policy Iterations functions?
- Can I use NN to model the dynamics of the environment instead of using explicit equations?
- If yes, is there a specific type / model of NN?
- I think that actions should be discretized, right?

I know that it will take time and effort to study such a topic, but I really want to ..
You can answer some of the questions if you can’t answer all the questions ...
Thanks

+3
2

, - , . , , , . POMDP, RL. .

, POMDP, MDP ?

. POMDP . , , , . , , , , , . . . POMDP.

POMDP, ?

. - . , - , ( ), . , .

: Value ?

/. , . , , , , , .

NN ? , / NN?

NN , .

, , ?

. . , , (- QLearning) .

, Sutton and Barto. , RL, , github ( Python). abstract_rl RL, . simple_rl.py - ( , , QLearning ), base_rl, , , . , , . , . , .

+7

NN ? , / NN?

, , , . , : , , , ? RL.

, , ?

, . , Actor-Critic . RL Gaussian Process. google.

0

Source: https://habr.com/ru/post/1745707/


All Articles