Optimization¶
It is often useful to construct a distribution \(d^\prime\) which is consistent with some marginal aspects of \(d\), but otherwise optimizes some information measure. For example, perhaps we are interested in constructing a distribution which matches pairwise marginals with another, but otherwise has maximum entropy:
In [1]: from dit.algorithms.distribution_optimizers import MaxEntOptimizer
In [2]: xor = dit.example_dists.Xor()
In [3]: meo = MaxEntOptimizer(xor, [[0,1], [0,2], [1,2]])
In [4]: meo.optimize()
Out[4]:
fun: -3.0000008720202707
jac: array([-2.99999866, -3.00000134, -3.0000014 , -2.99999836, -3.00000128,
-2.99999848, -2.99999863, -3.00000134])
message: 'Optimization terminated successfully.'
nfev: 1036
nit: 94
njev: 94
status: 0
success: True
x: array([0.12500016, 0.12499993, 0.12499992, 0.12500014, 0.1249999 ,
0.12500015, 0.12500016, 0.12499992])
In [5]: dp = meo.construct_dist()
In [6]: print(dp)
Class: Distribution
Alphabet: ('0', '1') for all rvs
Base: linear
Outcome Class: str
Outcome Length: 3
RV Names: None
x p(x)
000 133440/1067519
001 126623/1012985
010 127732/1021857
011 132302/1058415
100 135619/1084953
101 128122/1024975
110 128112/1024895
111 132216/1057729
Helper Functions¶
There are three special functions to handle common optimization problems:
In [7]: from dit.algorithms import maxent_dist, marginal_maxent_dists
The first is maximum entropy distributions with specific fixed marginals. It encapsulates the steps run above:
In [8]: print(maxent_dist(xor, [[0,1], [0,2], [1,2]]))
Class: Distribution
Alphabet: ('0', '1') for all rvs
Base: linear
Outcome Class: str
Outcome Length: 3
RV Names: None
x p(x)
000 2653341/21226727
001 1885872/15086977
010 1882798/15062385
011 4381242/35049935
100 1787501/14300007
101 4372858/34982865
110 1817034/14536273
111 1479631/11837047
The second constructs several maximum entropy distributions, each with all subsets of variables of a particular size fixed:
In [9]: k0, k1, k2, k3 = marginal_maxent_dists(xor)
where k0
is the maxent dist corresponding the same alphabets as xor
; k1
fixes \(p(x_0)\), \(p(x_1)\), and \(p(x_2)\); k2
fixes \(p(x_0, x_1)\), \(p(x_0, x_2)\), and \(p(x_1, x_2)\) (as in the maxent_dist
example above), and finally k3
fixes \(p(x_0, x_1, x_2)\) (e.g. is the distribution we started with).