# Upper Confidence Bound

## What is Upper Confidence Bound (UCB)

• One of the methods to solve the Multi Armed Bandit problem is to use the Upper Confidence Bound.
• Solve the exploitation-exploration and trade-of problem as the number of round increases.
• We are going to calculate the average of reward and the confidence bound (or variance) for each slot machine at each round.
• Since the confidence bound is going to be re-calculated at each round, we might select different machine every round.
• As the number of play increases,
• the average of reward is going to approach to its real value,
• and the confidence bound is going to shrink (variance decreases).
• [recall the Law of Large Number]
• Hence, the machine with the HIGHEST reward + variance could yield the best return.
• The idea is to select the slot machine with the HIGHEST Confidence Bound.

## Upper Confidence Bound Algorithm

Randomly select slot machines to play for few times at the beginning, and then start the Algorithm as below.

Step 1.) At each round, we have to calculate 2 numbers for each slot machine

• where n = round.
• N(n) = the number of times that the slot machine was played or gambled.
• R(n) = the sum of rewards of the slot machine up.

Step 2.) Calculate the average reward r(n), and confidence interval.

• r(n) = the average reward of slot machine = R(n) / N(n).
• [r(n) – Δ(n)  , r(n) + Δ(n)] = confidence interval.
• recall that the confidence interval is [lower bound , upper bound].
• where Δ(n) = √(3/2)(log(n)/N(n)).

Step 3.) Select the Slot Machine with the MAX UCB ie [ r(n) + Δ(n) ].

Repeat Step 1 to Step 3, and the player will find the slot machine with the best return at the end.

##### Other Topics on Deep Learning :
• Natural Language Processing (NLP)
• Artificial Neural Networks (ANN)
• Convolutional Neural Networks (CNN)
• Recurrent Neural Networks (RNN)
• Self-Organizing Maps (SOM)
• Boltzmann Machines
• Autoencoders
• XGBoost