Multi Armed Bandit Problem

with No Comments

What is the Multi Armed Bandit Problem

  • A branch of reinforcement learning.
  • A one arm bandit is a slot machine in the casino. 
  • These machines take away your money very quickly. The chances of winning from slot machine are very low. 
  • The probability of player on winning is less than the probability of player losing. 
  • Assume that the distribution of return or probability of winning for each slot machine in the casino is different. 
  • The distribution of return are different for each slot machine, and the player does not know the distribution. 
  • When a player is playing more than one slot machine ( ex: gamble on 5 slot machines at the same time), we want to know how should the player play them to maximize the return. 
  • Hence, the longer or the more that the player gambles, the more money wasted on the low return slot machine. 
  • But if you do not spend enough time exploring, your result might not be real.
  • The goal of Multi Armed Bandit is the find the slot machine with the max return as quick as possible. 
  • This is the challenge that we are going to solve with some simple artificial intelligent methods.
    • Upper Confidence Bound
    • Thompson Sampling 
  • multi armed bandit problemIf 1 = positive return, the goal is to find the slot machine with distribution mean closest to 1.

What is Reinforcement Learning

  • The Multi Armed Bandit Problem is related to reinforcement learning. 
  • It is not the only type problem that reinforcement learning can solve. It is just an example 
  • Likewise, reinforcement learning can solve many kinds of problems. 
  • For example, Reinforcement learning is used to train robots on how to walk. 
  • In order for a robot to walk, you can problem it how to walk with a sequence of actions, or you can use reinforcement learning to train the robot to walk in a very interesting way. 
  • You tell the robot all the actions it can make. 
  • You tell the robot the goal is to walk forward. 
  • Whenever the robot moves forward, it will be given a reward (+1), and every time it moves backward, it will be given a punishment (-1 or 0). 
  • So the robot will try all the random sets of actions, and see what they lead to. 
  • The robots will remember the sets of actions that leads to a good result, and they will repeat them more often. 
  • So eventually, it will know how to walk without programmer coding the code on how to walk. 
Other Topics on Simple Artificial Intelligent
Other Topics – Association Rule : 
Other Topics – Multivariate Analysis : 
Other Topics on Deep Learning : 
  • Natural Language Processing (NLP) 
  • Artificial Neural Networks (ANN) 
  • Convolutional Neural Networks (CNN) 
  • Recurrent Neural Networks (RNN) 
  • Self-Organizing Maps (SOM) 
  • Boltzmann Machines 
  • Autoencoders 
  • XGBoost 

Leave a Reply