Optimization via simulated annealing

Local vs global optimization

As discussed to some extent in this example the objective function \(\chi^2\) corresponds to a highly dimensional space with many local minima and an extremely large variation in gradients and curvature. This complexity poses a considerable challenge for numerical optimization. Local minimizers are designed to find the closest minimum and thus rely on a good starting point. Global minimizer minimizers attempt to alleviate this problem by combining a more coarse grained sampling of the parameter space with local minimization (using low convergence targets for the later). Both local and global minimizers are available in atomicrex via the NLopt library. It is also possible to employ the Python interface to hook up atomicrex with suitable Python libraries (e.g., SciPy) or to implement custom optimization strategies. An example for the latter approach is discussed in the following.

Basic algorithm for simulation annealing

Simulated annealing is a particular optimization strategy that can be very effective for complex and/or multi-dimensional parameter spaces. The basic algorithm is in fact very simple and involves only evaluations of the objective function itself (but not derivatives thereof). Specifically it consists of the following basic steps:

  1. Set up an atomicrex fitjob.
  2. Initialize counters for the number of trial steps, the acceptance rate for different types of steps etc.
  3. Randomly select one of the parameters \(i\)
  4. Generate a trial parameter value \(p_i^{trial} = p_i^{previous} + \delta_i R\) where \(R\) is a uniformly distributed real number \(R \in [-1,1]\) and \(\delta_i\) is the trial step size.
  5. Evaluate the objective function using the new parameter vector \(\boldsymbol{p}^{trial}\) yielding \(\chi^2_{trial}\).
  6. Apply the Metropolis criterion:
    1. Accept if \(\Delta\chi^2 = \chi^2_{trial} - \chi^2_{previous} < 0\).
    2. Else generate a uniformly distributed random number \(\phi \in [0,1)\). Accept if \(\phi < \exp(-\Delta\chi^2 / \Theta)\); Reject otherwise. Here, \(\Theta\) is the current temperature of the ensemble being sampled.
  7. Update counters and collect averages.
  8. In periodic intervals, rescale the trial step sizes to yield the targeted acceptance rate. A reasonable target value is usually about 0.2 (see [Fre12] for more information.)
  9. Return to to step #3.

A practical example

The algorithm outlined above can be varied and tailored in a number of ways. It is implemented for demonstration in the sample_objective_function.py script in the subdirectory examples/monte_carlo_sampling (see below for a code listing). The potential sampled here is identical to the one used already in several other examples: [1], [2], [3], [4].

Objective function as a function of Monte Carlo trial step.

Objective function (blue) and temperature (orange) during a simulated annealing run. The temperature is reduced linearly from its initial value to close to zero during the course of the simulation. A log-scale is used for the x-axis in order to highlight the initial decrease in the objective function.

As illustrated in the figure above the objective function drops considerably during the first 20,000 to 40,000 MC trial steps, after which it merely oscillates. In this case, one could thus terminate the simulated annealing procedure at this point and continue with a conventional local optimization approach, e.g., a gradient based minimizer.

Parameter variation during simulation annealing.

Evolution of potential parameters during the simulated annealing run.

The figure above illustrates the variation of the four potential parameters during the simulated annealing procedure. These variations can be compared with the change in, e.g., the \(A\) parameter that is observed during the “conventional” (local) optimization described in this example. In the latter case the parameter \(A\) changes very little during the optimization (e.g., \(A_{initial} = 500\) vs \(A_{final} = 500.214\), compare this example) compared to the simulated annealing case. While the optimization procedure readily reaches the (rather tight) convergence criteria, the final objective function (\(\chi^2_{final}=30535.6\)) is substantially higher than in the present case (\(\chi^2_{final} = 15621.1\)).

The observed behavior can be attributed to the very “rough” landscape that applies already for this simple functional form. It is at least partly related to the very large ratio between the largest and smallest eigenvalue of the Hessian matrix \(\nabla^2_{\boldsymbol p}\,\chi^2\), which was discussed here and is related to the combination of exponential and linear terms in the functional form.

Location of files

examples/monte_carlo_sampling

Input files

  • sample_objective_function.py: script for sampling the objective function

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    #!/usr/bin/env python3
    
    """
    This script enable sampling of the parameter space of a potential using
    Monte Carlo (MC) simulations. It can also be used to conduct parameter
    optimization via simulated annealing.
    """
    
    from __future__ import print_function
    import atomicrex
    import random
    import numpy as np
    import argparse
    
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('temperature_initial',
                        type=float,
                        help='Initial temperature')
    defval = None
    parser.add_argument('-f', '--temperature_final',
                        default=defval,
                        type=float, metavar='T',
                        help='Final temperature'
                        ' [default: identical to initial temperature]')
    defval = 400000
    parser.add_argument('-n', dest='number_of_steps',
                        type=int, metavar='N',
                        default=defval,
                        help='Number of MC trial steps'
                        ' [default: {}]'.format(defval))
    defval = 0.2
    parser.add_argument('-t', dest='target_acceptance_probability',
                        type=float, metavar='phi',
                        default=defval,
                        help='Target acceptance probability for trial step'
                        ' optimization, see `<https://arxiv.org/abs/1211.4440>`_'
                        ' [default: {}]'.format(defval))
    defval = 300
    parser.add_argument('-w', dest='sampling_window',
                        type=int, metavar='N',
                        default=defval,
                        help='Number of MC trial steps over which acceptance'
                        ' probability is sampled [default: {}]'.format(defval))
    defval = 100
    parser.add_argument('-o', dest='output_spacing',
                        type=int, metavar='N',
                        default=defval,
                        help='Number of MC trial steps between output'
                        ' [default: {}]'.format(defval))
    defval = 15611
    parser.add_argument('-s', dest='seed',
                        type=int, metavar='N',
                        default=defval,
                        help='Seed for random number generator'
                        ' [default: {}]'.format(defval))
    args = parser.parse_args()
    
    if args.temperature_final is None:
        args.temperature_final = args.temperature_initial
    
    print('Arguments:')
    for arg in args.__dict__:
        print(' {:32}: {}'.format(arg, args.__dict__[arg]))
    print('')
    
    if args.temperature_final <= 0.0:
        print('Final temperature must be positive.')
        exit(1)
    
    # set up job
    print('Optimize parameters (without "temperature")')
    job = atomicrex.Job()
    job.parse_input_file('main.xml')
    job.prepare_fitting()
    print
    
    # output initial parameters
    print('Initial residual: {}'.format(job.calculate_residual()))
    print('Initial parameters:')
    print('%-14s' % 'parameters:')
    for p in job.get_potential_parameters():
        print(' {:10}'.format(p), end='')
    print('')
    print('{:14}'.format('1st derivative:'))
    grad = job.calculate_gradient()
    for g in grad:
        print('{:10}'.format(g), end='')
    print('')
    print('{:14}'.format('2nd derivative:'))
    hessian = job.calculate_hessian()
    for row in hessian:
        for d in row:
            print('{:10}'.format(d), end='')
        print('')
    print
    print
    
    # output properties
    for name, structure in sorted(job.structures.items()):
        print
        print(' Structure: {}'.format(name))
        structure.compute_properties()
        structure.print_properties()
    print
    
    # set up Monte Carlo
    print('trial step lengths:'),
    trial_step_lengths = 1.0 / np.diag(hessian)
    for s in trial_step_lengths:
        print(' {:12}'.format(s), end='')
    print('')
    random.seed(args.seed)
    
    # run Monte Carlo
    print('Monte Carlo sampling of parameter space:')
    params = job.get_potential_parameters()
    chi = job.calculate_residual()
    number_of_rejected_trial_steps = 0
    events = []
    for p in params:
        events.append([])
    temperature_delta = args.temperature_final - args.temperature_initial
    temperature_delta /= args.number_of_steps
    for it in range(args.number_of_steps + 1):
    
        temperature = args.temperature_initial + temperature_delta * it
    
        # trial step
        ip = random.randrange(len(params))
        delta = trial_step_lengths[ip] * random.uniform(-1, 1)
        params[ip] += delta
        chi_new = job.calculate_residual(params)
        events[ip].append(1)
        if len(events[ip]) > args.sampling_window:
            events[ip].pop(0)
        if chi_new < chi:
            chi = chi_new
            accprob = 1.0
        else:
            accprob = np.exp((chi - chi_new) / temperature)
            if accprob > random.uniform(0, 1):
                chi = chi_new
            else:
                number_of_rejected_trial_steps += 1
                params[ip] -= delta
                events[ip][-1] = 0
    
        # output and basic analysis
        if it % args.output_spacing != 0:
            continue
    
        print('mc', end='')
        print(' {:8}'.format(it), end='')
        print(' {:12.8}'.format(temperature), end='')
        print(' {:12.8g}'.format(chi), end='')
    
        print('  ', end='')
        for p in params:
            print(' {:12.8g}'.format(p), end='')
    
        print('  ', end='')
        for k, e in enumerate(events):
            try:
                prob = float(np.sum(e)) / len(e)
            except:
                prob = 0.0
            print(' {:9.5}'.format(prob), end='')
            print(' ( {:9.5})'.format(trial_step_lengths[k]), end='')
    
            # adapt trial step lengths
            if k == ip:
                if prob > 0:
                    trial_step_lengths[ip] *= prob
                    trial_step_lengths[ip] /= args.target_acceptance_probability
                else:
                    trial_step_lengths[ip] *= 0.9
    
        s = job.structures['FCC']
        s.compute_properties()
        print(' {:9.3}'.format(s.properties['atomic-energy'].computed_value),
              end='')
        print(' {:7.3}'.format(s.properties['lattice-parameter'].computed_value),
              end='')
        print(' {:9.2}'.format(s.properties['C11'].computed_value), end='')
        print(' {:9.2}'.format(s.properties['C12'].computed_value), end='')
        print(' {:9.2}'.format(s.properties['C44'].computed_value), end='')
    
        print('')
    
  • main.xml: main input file

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    <?xml version="1.0" encoding="iso-8859-1"?>
    <job>
      <name>EAM potential</name>
      
      <verbosity>medium</verbosity>
    
      <fitting>
        <BFGS conv-threshold="1e-10" max-iter="00" />
      </fitting>
    
      <atom-types>
        <species>Al</species>
      </atom-types>
    
      <potentials>
        <xi:include href="potential.xml"
    		xmlns:xi="http://www.w3.org/2003/XInclude" />
      </potentials>
      
      <structures>
        <xi:include href="structures.xml"
    		xmlns:xi="http://www.w3.org/2003/XInclude" />
      </structures>
    
    </job>
    
  • potential.xml: initial parameter set (included in main input file via XML Inclusions)

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    <eam id='Al' species-a='*' species-b='*'>
    
      <functions>
    
        <user-function id='V'>
          <input-var>r</input-var>
          <expression>
    	A * exp(-lambda * r)
          </expression>
          <derivative>
    	-lambda * A * exp(-lambda * r)
          </derivative>
          <param name='A'>500.2</param>
          <param name='lambda'>2.37</param>
          <fit-dof>
    	<A min='10.0'/>
    	<lambda min='0.001'/>
          </fit-dof>
          <screening>
    	<user-function id='rho_screening'>
    	  <cutoff>6.5</cutoff>
    	  <input-var>r</input-var>
    	  <expression>
    	    1 - 1/(1 + ((r - cutoff) / h)^4)
    	  </expression>
    	  <derivative>
    	    4 * h^4 * (r - cutoff)^3 / ((h^4 + (r - cutoff)^4)^2)
    	  </derivative>
    	  <param name='h'>3</param>
    	</user-function>
          </screening>
        </user-function>
    
        <user-function id='rho'>
          <input-var>r</input-var>
          <expression>
    	exp(-twomu * r)
          </expression>
          <derivative>
    	-twomu * exp(-twomu * r)
          </derivative>
          <param name='twomu'>1.38</param>
          <fit-dof>
    	<twomu min='0.02'/>
          </fit-dof>
          <screening>
    	<user-function id='rho_screening'>
    	  <cutoff>6.5</cutoff>
    	  <input-var>r</input-var>
    	  <expression>
    	    1 - 1/(1 + ((r - cutoff) / h)^4)
    	  </expression>
    	  <derivative>
    	    4 * h^4 * (r - cutoff)^3 / ((h^4 + (r - cutoff)^4)^2)
    	  </derivative>
    	  <param name='h'>3</param>
    	</user-function>
          </screening>
        </user-function>
    
        <user-function id='F'>
          <input-var>rho</input-var>
          <expression>
    	-D * sqrt(rho)
          </expression>
          <derivative>
            -D / (2 * sqrt(rho))
          </derivative>
          <param name='D'>15</param>
          <fit-dof>
    	<D min='0.1'/>
          </fit-dof>
        </user-function>
        
      </functions>
      
      <mapping>
        <pair-interaction species-a='*' species-b='*' function='V' />
        <electron-density species-a='*' species-b='*' function='rho' />
        <embedding-energy species='*' function='F' />
      </mapping>
      
    </eam>
    
  • structures.xml: input structures (included in main input file via XML Inclusions)

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    <group>
    
      <fcc-lattice id='FCC' relative-weight='3'>
        <atom-type>Al</atom-type>
        <lattice-parameter>4.0</lattice-parameter>
        <properties>
          <lattice-parameter fit='true' target='4.032' relative-weight='3' min='3.0' max='5.0'/>
          <atomic-energy fit='true' target='-3.36' relative-weight='3'/>
          <bulk-modulus fit='false' target='80.9' relative-weight='1'/>
          <C11/>
          <C12/>
          <C44/>
        </properties>
      </fcc-lattice>
      
      <user-structure id='structure_1'>
        <poscar-file>POSCAR_res_POSCAR_0.9.traj_forces</poscar-file>
        <properties>
          <atomic-forces fit='true' output-all='true'/>
        </properties>
      </user-structure>
      
      <user-structure id='structure_2'>
        <poscar-file>POSCAR_res_POSCAR_1.0.traj_forces</poscar-file>
        <properties>
          <atomic-forces fit='true' output-all='true'/>
        </properties>
      </user-structure>
      
      <user-structure id='structure_3'>
        <poscar-file>POSCAR_res_POSCAR_1.1.traj_forces</poscar-file>
        <properties>
          <atomic-forces fit='true' output-all='true'/>
        </properties>
      </user-structure>
      
    </group>