Optimization¶
It is often useful to construct a distribution \(d^\prime\) which is consistent with some marginal aspects of \(d\), but otherwise optimizes some information measure. For example, perhaps we are interested in constructing a distribution which matches pairwise marginals with another, but otherwise has maximum entropy:
In [1]: from dit.algorithms.distribution_optimizers import MaxEntOptimizer
In [2]: xor = dit.example_dists.Xor()
In [3]: meo = MaxEntOptimizer(xor, [[0,1], [0,2], [1,2]])
In [4]: meo.optimize()
Out[4]:
fun: -3.0000019471980566
jac: array([-2.99999976, -2.99999982, -2.99999982, -3.00000003, -2.99999988,
-3.0000003 , -2.9999997 , -3.00000024])
message: 'Optimization terminated successfully.'
nfev: 936
nit: 85
njev: 85
status: 0
success: True
x: array([0.12500004, 0.12500006, 0.1250001 , 0.12500009, 0.12500009,
0.12500011, 0.12500008, 0.12500007])
In [5]: dp = meo.construct_dist()
In [6]: print(dp)
Class: Distribution
Alphabet: ('0', '1') for all rvs
Base: linear
Outcome Class: str
Outcome Length: 3
RV Names: None
x p(x)
000 3789582/30316657
001 1/8
010 1705318/13642543
011 4231858/33854863
100 3632480/29059839
101 1627191/13017527
110 2030444/16243553
111 1158708/9269665
Helper Functions¶
There are three special functions to handle common optimization problems:
In [7]: from dit.algorithms import maxent_dist, marginal_maxent_dists
The first is maximum entropy distributions with specific fixed marginals. It encapsulates the steps run above:
In [8]: print(maxent_dist(xor, [[0,1], [0,2], [1,2]]))
Class: Distribution
Alphabet: ('0', '1') for all rvs
Base: linear
Outcome Class: str
Outcome Length: 3
RV Names: None
x p(x)
000 608652/4869215
001 734715/5877721
010 620444/4963553
011 619746/4957967
100 603291/4826329
101 736544/5892351
110 540494/4323951
111 545257/4362057
The second constructs several maximum entropy distributions, each with all subsets of variables of a particular size fixed:
In [9]: k0, k1, k2, k3 = marginal_maxent_dists(xor)
where k0
is the maxent dist corresponding the same alphabets as xor
; k1
fixes \(p(x_0)\), \(p(x_1)\), and \(p(x_2)\); k2
fixes \(p(x_0, x_1)\), \(p(x_0, x_2)\), and \(p(x_1, x_2)\) (as in the maxent_dist
example above), and finally k3
fixes \(p(x_0, x_1, x_2)\) (e.g. is the distribution we started with).