Optimization via simulated annealing¶

Local vs global optimization¶

As discussed to some extent in this example the objective function \(\chi^2\) corresponds to a highly dimensional space with many local minima and an extremely large variation in gradients and curvature. This complexity poses a considerable challenge for numerical optimization. Local minimizers are designed to find the closest minimum and thus rely on a good starting point. Global minimizer minimizers attempt to alleviate this problem by combining a more coarse grained sampling of the parameter space with local minimization (using low convergence targets for the later). Both local and global minimizers are available in atomicrex via the NLopt library. It is also possible to employ the Python interface to hook up atomicrex with suitable Python libraries (e.g., SciPy) or to implement custom optimization strategies. An example for the latter approach is discussed in the following.

Basic algorithm for simulation annealing¶

Simulated annealing is a particular optimization strategy that can be very effective for complex and/or multi-dimensional parameter spaces. The basic algorithm is in fact very simple and involves only evaluations of the objective function itself (but not derivatives thereof). Specifically it consists of the following basic steps:

Set up an atomicrex fitjob.
Initialize counters for the number of trial steps, the acceptance rate for different types of steps etc.
Randomly select one of the parameters \(i\)
Generate a trial parameter value \(p_i^{trial} = p_i^{previous} + \delta_i R\) where \(R\) is a uniformly distributed real number \(R \in [-1,1]\) and \(\delta_i\) is the trial step size.
Evaluate the objective function using the new parameter vector \(\boldsymbol{p}^{trial}\) yielding \(\chi^2_{trial}\).
Apply the Metropolis criterion:
1. Accept if \(\Delta\chi^2 = \chi^2_{trial} - \chi^2_{previous} < 0\).
2. Else generate a uniformly distributed random number \(\phi \in [0,1)\). Accept if \(\phi < \exp(-\Delta\chi^2 / \Theta)\); Reject otherwise. Here, \(\Theta\) is the current temperature of the ensemble being sampled.
Update counters and collect averages.
In periodic intervals, rescale the trial step sizes to yield the targeted acceptance rate. A reasonable target value is usually about 0.2 (see [Fre12] for more information.)
Return to to step #3.

A practical example¶

The algorithm outlined above can be varied and tailored in a number of ways. It is implemented for demonstration in the sample_objective_function.py script in the subdirectory examples/monte_carlo_sampling (see below for a code listing). The potential sampled here is identical to the one used already in several other examples: [1], [2], [3], [4].

Objective function as a function of Monte Carlo trial step.

Objective function (blue) and temperature (orange) during a simulated annealing run. The temperature is reduced linearly from its initial value to close to zero during the course of the simulation. A log-scale is used for the x-axis in order to highlight the initial decrease in the objective function.¶

As illustrated in the figure above the objective function drops considerably during the first 20,000 to 40,000 MC trial steps, after which it merely oscillates. In this case, one could thus terminate the simulated annealing procedure at this point and continue with a conventional local optimization approach, e.g., a gradient based minimizer.

Parameter variation during simulation annealing.

Evolution of potential parameters during the simulated annealing run.¶

The figure above illustrates the variation of the four potential parameters during the simulated annealing procedure. These variations can be compared with the change in, e.g., the \(A\) parameter that is observed during the “conventional” (local) optimization described in this example. In the latter case the parameter \(A\) changes very little during the optimization (e.g., \(A_{initial} = 500\) vs \(A_{final} = 500.214\), compare this example) compared to the simulated annealing case. While the optimization procedure readily reaches the (rather tight) convergence criteria, the final objective function (\(\chi^2_{final}=30535.6\)) is substantially higher than in the present case (\(\chi^2_{final} = 15621.1\)).

The observed behavior can be attributed to the very “rough” landscape that applies already for this simple functional form. It is at least partly related to the very large ratio between the largest and smallest eigenvalue of the Hessian matrix \(\nabla^2_{\boldsymbol p}\,\chi^2\), which was discussed here and is related to the combination of exponential and linear terms in the functional form.

Location of files¶

examples/monte_carlo_sampling

Input files¶

sample_objective_function.py: script for sampling the objective function

#!/usr/bin/env python3

"""
This script enable sampling of the parameter space of a potential using
Monte Carlo (MC) simulations. It can also be used to conduct parameter
optimization via simulated annealing.
"""

from __future__ import print_function
import atomicrex
import random
import numpy as np
import argparse

parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('temperature_initial',
                    type=float,
                    help='Initial temperature')
defval = None
parser.add_argument('-f', '--temperature_final',
                    default=defval,
                    type=float, metavar='T',
                    help='Final temperature'
                    ' [default: identical to initial temperature]')
defval = 400000
parser.add_argument('-n', dest='number_of_steps',
                    type=int, metavar='N',
                    default=defval,
                    help='Number of MC trial steps'
                    ' [default: {}]'.format(defval))
defval = 0.2
parser.add_argument('-t', dest='target_acceptance_probability',
                    type=float, metavar='phi',
                    default=defval,
                    help='Target acceptance probability for trial step'
                    ' optimization, see `<https://arxiv.org/abs/1211.4440>`_'
                    ' [default: {}]'.format(defval))
defval = 300
parser.add_argument('-w', dest='sampling_window',
                    type=int, metavar='N',
                    default=defval,
                    help='Number of MC trial steps over which acceptance'
                    ' probability is sampled [default: {}]'.format(defval))
defval = 100
parser.add_argument('-o', dest='output_spacing',
                    type=int, metavar='N',
                    default=defval,
                    help='Number of MC trial steps between output'
                    ' [default: {}]'.format(defval))
defval = 15611
parser.add_argument('-s', dest='seed',
                    type=int, metavar='N',
                    default=defval,
                    help='Seed for random number generator'
                    ' [default: {}]'.format(defval))
args = parser.parse_args()

if args.temperature_final is None:
    args.temperature_final = args.temperature_initial

print('Arguments:')
for arg in args.__dict__:
    print(' {:32}: {}'.format(arg, args.__dict__[arg]))
print('')

if args.temperature_final <= 0.0:
    print('Final temperature must be positive.')
    exit(1)

# set up job
print('Optimize parameters (without "temperature")')
job = atomicrex.Job()
job.parse_input_file('main.xml')
job.prepare_fitting()
print

# output initial parameters
print('Initial residual: {}'.format(job.calculate_residual()))
print('Initial parameters:')
print('%-14s' % 'parameters:')
for p in job.get_potential_parameters():
    print(' {:10}'.format(p), end='')
print('')
print('{:14}'.format('1st derivative:'))
grad = job.calculate_gradient()
for g in grad:
    print('{:10}'.format(g), end='')
print('')
print('{:14}'.format('2nd derivative:'))
hessian = job.calculate_hessian()
for row in hessian:
    for d in row:
        print('{:10}'.format(d), end='')
    print('')
print
print

# output properties
for name, structure in sorted(job.structures.items()):
    print
    print(' Structure: {}'.format(name))
    structure.compute_properties()
    structure.print_properties()
print

# set up Monte Carlo
print('trial step lengths:'),
trial_step_lengths = 1.0 / np.diag(hessian)
for s in trial_step_lengths:
    print(' {:12}'.format(s), end='')
print('')
random.seed(args.seed)

# run Monte Carlo
print('Monte Carlo sampling of parameter space:')
params = job.get_potential_parameters()
chi = job.calculate_residual()
number_of_rejected_trial_steps = 0
events = []
for p in params:
    events.append([])
temperature_delta = args.temperature_final - args.temperature_initial
temperature_delta /= args.number_of_steps
for it in range(args.number_of_steps + 1):

    temperature = args.temperature_initial + temperature_delta * it

    # trial step
    ip = random.randrange(len(params))
    delta = trial_step_lengths[ip] * random.uniform(-1, 1)
    params[ip] += delta
    chi_new = job.calculate_residual(params)
    events[ip].append(1)
    if len(events[ip]) > args.sampling_window:
        events[ip].pop(0)
    if chi_new < chi:
        chi = chi_new
        accprob = 1.0
    else:
        accprob = np.exp((chi - chi_new) / temperature)
        if accprob > random.uniform(0, 1):
            chi = chi_new
        else:
            number_of_rejected_trial_steps += 1
            params[ip] -= delta
            events[ip][-1] = 0

    # output and basic analysis
    if it % args.output_spacing != 0:
        continue

    print('mc', end='')
    print(' {:8}'.format(it), end='')
    print(' {:12.8}'.format(temperature), end='')
    print(' {:12.8g}'.format(chi), end='')

    print('  ', end='')
    for p in params:
        print(' {:12.8g}'.format(p), end='')

    print('  ', end='')
    for k, e in enumerate(events):
        try:
            prob = float(np.sum(e)) / len(e)
        except:
            prob = 0.0
        print(' {:9.5}'.format(prob), end='')
        print(' ( {:9.5})'.format(trial_step_lengths[k]), end='')

        # adapt trial step lengths
        if k == ip:
            if prob > 0:
                trial_step_lengths[ip] *= prob
                trial_step_lengths[ip] /= args.target_acceptance_probability
            else:
                trial_step_lengths[ip] *= 0.9

    s = job.structures['FCC']
    s.compute_properties()
    print(' {:9.3}'.format(s.properties['atomic-energy'].computed_value),
          end='')
    print(' {:7.3}'.format(s.properties['lattice-parameter'].computed_value),
          end='')
    print(' {:9.2}'.format(s.properties['C11'].computed_value), end='')
    print(' {:9.2}'.format(s.properties['C12'].computed_value), end='')
    print(' {:9.2}'.format(s.properties['C44'].computed_value), end='')

    print('')

main.xml: main input file

<?xml version="1.0" encoding="iso-8859-1"?>
<job>
  <name>EAM potential</name>
  
  <verbosity>medium</verbosity>

  <fitting>
    <BFGS conv-threshold="1e-10" max-iter="00" />
  </fitting>

  <atom-types>
    <species>Al</species>
  </atom-types>

  <potentials>
    <xi:include href="potential.xml"
		xmlns:xi="http://www.w3.org/2003/XInclude" />
  </potentials>
  
  <structures>
    <xi:include href="structures.xml"
		xmlns:xi="http://www.w3.org/2003/XInclude" />
  </structures>

</job>

potential.xml: initial parameter set (included in main input file via XML Inclusions)

<eam id='Al' species-a='*' species-b='*'>

  <functions>

    <user-function id='V'>
      <input-var>r</input-var>
      <expression>
	A * exp(-lambda * r)
      </expression>
      <derivative>
	-lambda * A * exp(-lambda * r)
      </derivative>
      <param name='A'>500.2</param>
      <param name='lambda'>2.37</param>
      <fit-dof>
	<A min='10.0'/>
	<lambda min='0.001'/>
      </fit-dof>
      <screening>
	<user-function id='rho_screening'>
	  <cutoff>6.5</cutoff>
	  <input-var>r</input-var>
	  <expression>
	    1 - 1/(1 + ((r - cutoff) / h)^4)
	  </expression>
	  <derivative>
	    4 * h^4 * (r - cutoff)^3 / ((h^4 + (r - cutoff)^4)^2)
	  </derivative>
	  <param name='h'>3</param>
	</user-function>
      </screening>
    </user-function>

    <user-function id='rho'>
      <input-var>r</input-var>
      <expression>
	exp(-twomu * r)
      </expression>
      <derivative>
	-twomu * exp(-twomu * r)
      </derivative>
      <param name='twomu'>1.38</param>
      <fit-dof>
	<twomu min='0.02'/>
      </fit-dof>
      <screening>
	<user-function id='rho_screening'>
	  <cutoff>6.5</cutoff>
	  <input-var>r</input-var>
	  <expression>
	    1 - 1/(1 + ((r - cutoff) / h)^4)
	  </expression>
	  <derivative>
	    4 * h^4 * (r - cutoff)^3 / ((h^4 + (r - cutoff)^4)^2)
	  </derivative>
	  <param name='h'>3</param>
	</user-function>
      </screening>
    </user-function>

    <user-function id='F'>
      <input-var>rho</input-var>
      <expression>
	-D * sqrt(rho)
      </expression>
      <derivative>
        -D / (2 * sqrt(rho))
      </derivative>
      <param name='D'>15</param>
      <fit-dof>
	<D min='0.1'/>
      </fit-dof>
    </user-function>
    
  </functions>
  
  <mapping>
    <pair-interaction species-a='*' species-b='*' function='V' />
    <electron-density species-a='*' species-b='*' function='rho' />
    <embedding-energy species='*' function='F' />
  </mapping>
  
</eam>

structures.xml: input structures (included in main input file via XML Inclusions)

<group>

  <fcc-lattice id='FCC' relative-weight='3'>
    <atom-type>Al</atom-type>
    <lattice-parameter>4.0</lattice-parameter>
    <properties>
      <lattice-parameter fit='true' target='4.032' relative-weight='3' min='3.0' max='5.0'/>
      <atomic-energy fit='true' target='-3.36' relative-weight='3'/>
      <bulk-modulus fit='false' target='80.9' relative-weight='1'/>
      <C11/>
      <C12/>
      <C44/>
    </properties>
  </fcc-lattice>
  
  <user-structure id='structure_1'>
    <poscar-file>POSCAR_res_POSCAR_0.9.traj_forces</poscar-file>
    <properties>
      <atomic-forces fit='true' output-all='true'/>
    </properties>
  </user-structure>
  
  <user-structure id='structure_2'>
    <poscar-file>POSCAR_res_POSCAR_1.0.traj_forces</poscar-file>
    <properties>
      <atomic-forces fit='true' output-all='true'/>
    </properties>
  </user-structure>
  
  <user-structure id='structure_3'>
    <poscar-file>POSCAR_res_POSCAR_1.1.traj_forces</poscar-file>
    <properties>
      <atomic-forces fit='true' output-all='true'/>
    </properties>
  </user-structure>
  
</group>