You want a prove that hidden variables allow to optimize better? Here you are.
Imagine an optimizer that takes two input variables instead of one. The Perkun section of the zubr specification looks as follows:
values
{
value FALSE, TRUE;
}
variables
{
input variable alpha:{FALSE, TRUE}, reward:{FALSE, TRUE};
hidden variable gamma:{FALSE, TRUE};
output variable action:{FALSE, TRUE};
}
There are two input variables now: alpha and reward. What is the semantics? Alpha has a sequence FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE,... and so on, independently on the agent's action. But the agent does not know where we begin within the sequence. Action is a bet - it is an attempt to predict the next alpha. Depending on the action the agent receives a reward - an immediate information whether the prediction was correct. Reward TRUE means the prediction was right, FALSE means no reward.
You can execute the program directly from my server:
http://www.pawelbiernacki.net/hiddenVariableBasedPredictor.jnlp
For example let us start with FALSE, FALSE. The program sets its initial belief to gamma=>FALSE at 50% and gamma=>TRUE at 50%. The chosen action is FALSE (he bets the next alpha will be false). Let as assume he was wrong and the next alpha will be TRUE. So there will be no reward, enter TRUE, FALSE.
Now he knows that gamma is FALSE (the belief reflects this). The action will be TRUE. So he thinks the next alpha will be TRUE. Let's confirm his expectations: enter TRUE, TRUE. Now gamma=>TRUE. Action => FALSE.
In short - due to the usage of the hidden variables based state his prediction will always be correct after the first two signals. He will always get a reward TRUE. Only in the beginning there is an uncertainty (reflected by the belief).
When you compare this optimizer (in fact - this predictor) with the functions based merely on the input variables you will see that no function can beat him. I found two functions that are pretty good:
f1(FALSE, FALSE) = FALSE
f1(FALSE, TRUE) = FALSE
f1(TRUE, FALSE) = TRUE
f1(TRUE, TRUE) = FALSE
f2(FALSE, FALSE) = FALSE
f2(FALSE, TRUE) = TRUE
f2(TRUE, FALSE) = TRUE
f2(TRUE, TRUE) = TRUE
I tested all the possible 16 functions - only f1 and f2 get close. But even they make mistakes (after the first two signals). On the contrary - our predictor generated by zubr can make only one mistake, after first two signals he makes no more mistakes.
If you take a look at the file example22_hidden_variables_based_predictor.zubr (unpack perkun and see the "examples" folder) you will see that we use a custom dialog (extending JDialog) in the method getInput. This was necessary because we have two input variables here. You may process the example file with zubr:
zubr example22_hidden_variables_based_predictor.zubr > MyOptimizer.java
The result Java code can be compiled (remember to place it in a package "optimizer").
What is the conclusion? The optimizer/predictor with a state is much better for the case discussed here than any function based on the input variables. The state should be based on the hidden variables (it is not the only possibility, but the most natural one). This was the problem with the AI - we tried to achieve this with IF THEN, and IF THEN can only see the current state. The hidden variables are a natural way to compress our knowledge about the past. The history.
No comments:
Post a Comment