Optimization

It is often useful to construct a distribution \(d^\prime\) which is consistent with some marginal aspects of \(d\), but otherwise optimizes some information measure. For example, perhaps we are interested in constructing a distribution which matches pairwise marginals with another, but otherwise has maximum entropy:

In [1]: from dit.algorithms.distribution_optimizers import MaxEntOptimizer

In [2]: xor = dit.example_dists.Xor()

In [3]: meo = MaxEntOptimizer(xor, [[0,1], [0,2], [1,2]])

In [4]: meo.optimize()
Out[4]: 
     fun: -3.0000008720202707
     jac: array([-2.99999866, -3.00000134, -3.0000014 , -2.99999836, -3.00000128,
       -2.99999848, -2.99999863, -3.00000134])
 message: 'Optimization terminated successfully.'
    nfev: 1036
     nit: 94
    njev: 94
  status: 0
 success: True
       x: array([0.12500016, 0.12499993, 0.12499992, 0.12500014, 0.1249999 ,
       0.12500015, 0.12500016, 0.12499992])

In [5]: dp = meo.construct_dist()

In [6]: print(dp)
Class:          Distribution
Alphabet:       ('0', '1') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 3
RV Names:       None

x     p(x)
000   133440/1067519
001   126623/1012985
010   127732/1021857
011   132302/1058415
100   135619/1084953
101   128122/1024975
110   128112/1024895
111   132216/1057729

Helper Functions

There are three special functions to handle common optimization problems:

In [7]: from dit.algorithms import maxent_dist, marginal_maxent_dists

The first is maximum entropy distributions with specific fixed marginals. It encapsulates the steps run above:

In [8]: print(maxent_dist(xor, [[0,1], [0,2], [1,2]]))
Class:          Distribution
Alphabet:       ('0', '1') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 3
RV Names:       None

x     p(x)
000   2653341/21226727
001   1885872/15086977
010   1882798/15062385
011   4381242/35049935
100   1787501/14300007
101   4372858/34982865
110   1817034/14536273
111   1479631/11837047

The second constructs several maximum entropy distributions, each with all subsets of variables of a particular size fixed:

In [9]: k0, k1, k2, k3 = marginal_maxent_dists(xor)

where k0 is the maxent dist corresponding the same alphabets as xor; k1 fixes \(p(x_0)\), \(p(x_1)\), and \(p(x_2)\); k2 fixes \(p(x_0, x_1)\), \(p(x_0, x_2)\), and \(p(x_1, x_2)\) (as in the maxent_dist example above), and finally k3 fixes \(p(x_0, x_1, x_2)\) (e.g. is the distribution we started with).