## What is the Multi Armed Bandit Problem

- A branch of reinforcement learning.
- A one arm bandit is a slot machine in the casino.
- These machines take away your money very quickly. The chances of winning from slot machine are very low.
- The probability of player on winning is less than the probability of player losing.
- Assume that the distribution of return or probability of winning for each slot machine in the casino is different.
- The distribution of return are different for each slot machine, and the player does not know the distribution.
- When a player is playing more than one slot machine ( ex: gamble on 5 slot machines at the same time), we want to know how should the player play them to maximize the return.
- Hence, the longer or the more that the player gambles, the more money wasted on the low return slot machine.
- But if you do not spend enough time exploring, your result might not be real.
- The goal of
**Multi Armed Bandit**is the find the slot machine with the max return as quick as possible. - This is the challenge that we are going to solve with some simple artificial intelligent methods.
- Upper Confidence Bound
- Thompson Sampling

- If 1 = positive return, the goal is to find the slot machine with distribution mean closest to 1.

## What is Reinforcement Learning

- The
**Multi Armed Bandit Problem**is related to reinforcement learning. - It is not the only type problem that reinforcement learning can solve. It is just an example
- Likewise, reinforcement learning can solve many kinds of problems.
- For example, Reinforcement learning is used to train robots on how to walk.
- In order for a robot to walk, you can problem it how to walk with a sequence of actions, or you can use reinforcement learning to train the robot to walk in a very interesting way.
- You tell the robot all the actions it can make.
- You tell the robot the goal is to walk forward.
- Whenever the robot moves forward, it will be given a reward (+1), and every time it moves backward, it will be given a punishment (-1 or 0).
- So the robot will try all the random sets of actions, and see what they lead to.
- The robots will remember the sets of actions that leads to a good result, and they will repeat them more often.
- So eventually, it will know how to walk without programmer coding the code on how to walk.

##### Other Topics on Simple Artificial Intelligent

**Multi Armed Bandit**- Upper Confidence Bound
- Thomson Sampling

##### Other Topics – Association Rule :

- Apriori
- Eclat

##### Other Topics – Multivariate Analysis :

##### Other Topics on Deep Learning :

- Natural Language Processing (NLP)
- Artificial Neural Networks (ANN)
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Self-Organizing Maps (SOM)
- Boltzmann Machines
- Autoencoders
- XGBoost

## Leave a Reply