15 puzzle java source code5/18/2023 ![]() Whether we are playing a game or solving a physics problem or predicting the stock market with RL, we are always facing a current situation(or state) and a set of actions that we can take from that state. We might get stuck in a local maxima which is not a solution really. That is a greedy strategy and might not yield a globally optimal solution. ![]() We don’t want to necessarily maximize the reward with each action. The reward might be given to us after each action we take or could be a lump-sum reward at the end of the game or activity(That is if there is an end). The reward is like a signal which tells us if we are doing good or bad. ![]() The very abstract and general idea behind RL is that we solve the problem by experiencing(playing) it and get rewards for actions we take.
0 Comments
Leave a Reply. |