Optimization via simulated annealing¶
Local vs global optimization¶
As discussed to some extent in this example the objective function \(\chi^2\) corresponds to a highly dimensional space with many local minima and an extremely large variation in gradients and curvature. This complexity poses a considerable challenge for numerical optimization. Local minimizers are designed to find the closest minimum and thus rely on a good starting point. Global minimizer minimizers attempt to alleviate this problem by combining a more coarse grained sampling of the parameter space with local minimization (using low convergence targets for the later). Both local and global minimizers are available in atomicrex via the NLopt library. It is also possible to employ the Python interface to hook up atomicrex with suitable Python libraries (e.g., SciPy) or to implement custom optimization strategies. An example for the latter approach is discussed in the following.
Basic algorithm for simulation annealing¶
Simulated annealing is a particular optimization strategy that can be very effective for complex and/or multi-dimensional parameter spaces. The basic algorithm is in fact very simple and involves only evaluations of the objective function itself (but not derivatives thereof). Specifically it consists of the following basic steps:
Set up an atomicrex fitjob.
Initialize counters for the number of trial steps, the acceptance rate for different types of steps etc.
Randomly select one of the parameters \(i\)
Generate a trial parameter value \(p_i^{trial} = p_i^{previous} + \delta_i R\) where \(R\) is a uniformly distributed real number \(R \in [-1,1]\) and \(\delta_i\) is the trial step size.
Evaluate the objective function using the new parameter vector \(\boldsymbol{p}^{trial}\) yielding \(\chi^2_{trial}\).
Apply the Metropolis criterion:
Accept if \(\Delta\chi^2 = \chi^2_{trial} - \chi^2_{previous} < 0\).
Else generate a uniformly distributed random number \(\phi \in [0,1)\). Accept if \(\phi < \exp(-\Delta\chi^2 / \Theta)\); Reject otherwise. Here, \(\Theta\) is the current temperature of the ensemble being sampled.
Update counters and collect averages.
In periodic intervals, rescale the trial step sizes to yield the targeted acceptance rate. A reasonable target value is usually about 0.2 (see [Fre12] for more information.)
Return to to step #3.
A practical example¶
The algorithm outlined above can be varied and tailored in a number of ways. It is implemented for demonstration in the sample_objective_function.py script in the subdirectory examples/monte_carlo_sampling (see below for a code listing). The potential sampled here is identical to the one used already in several other examples: [1], [2], [3], [4].
As illustrated in the figure above the objective function drops considerably during the first 20,000 to 40,000 MC trial steps, after which it merely oscillates. In this case, one could thus terminate the simulated annealing procedure at this point and continue with a conventional local optimization approach, e.g., a gradient based minimizer.
The figure above illustrates the variation of the four potential parameters during the simulated annealing procedure. These variations can be compared with the change in, e.g., the \(A\) parameter that is observed during the “conventional” (local) optimization described in this example. In the latter case the parameter \(A\) changes very little during the optimization (e.g., \(A_{initial} = 500\) vs \(A_{final} = 500.214\), compare this example) compared to the simulated annealing case. While the optimization procedure readily reaches the (rather tight) convergence criteria, the final objective function (\(\chi^2_{final}=30535.6\)) is substantially higher than in the present case (\(\chi^2_{final} = 15621.1\)).
The observed behavior can be attributed to the very “rough” landscape that applies already for this simple functional form. It is at least partly related to the very large ratio between the largest and smallest eigenvalue of the Hessian matrix \(\nabla^2_{\boldsymbol p}\,\chi^2\), which was discussed here and is related to the combination of exponential and linear terms in the functional form.
Location of files¶
examples/monte_carlo_sampling
Input files¶
sample_objective_function.py: script for sampling the objective function
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188
#!/usr/bin/env python3 """ This script enable sampling of the parameter space of a potential using Monte Carlo (MC) simulations. It can also be used to conduct parameter optimization via simulated annealing. """ from __future__ import print_function import atomicrex import random import numpy as np import argparse parser = argparse.ArgumentParser(description=__doc__) parser.add_argument('temperature_initial', type=float, help='Initial temperature') defval = None parser.add_argument('-f', '--temperature_final', default=defval, type=float, metavar='T', help='Final temperature' ' [default: identical to initial temperature]') defval = 400000 parser.add_argument('-n', dest='number_of_steps', type=int, metavar='N', default=defval, help='Number of MC trial steps' ' [default: {}]'.format(defval)) defval = 0.2 parser.add_argument('-t', dest='target_acceptance_probability', type=float, metavar='phi', default=defval, help='Target acceptance probability for trial step' ' optimization, see `<https://arxiv.org/abs/1211.4440>`_' ' [default: {}]'.format(defval)) defval = 300 parser.add_argument('-w', dest='sampling_window', type=int, metavar='N', default=defval, help='Number of MC trial steps over which acceptance' ' probability is sampled [default: {}]'.format(defval)) defval = 100 parser.add_argument('-o', dest='output_spacing', type=int, metavar='N', default=defval, help='Number of MC trial steps between output' ' [default: {}]'.format(defval)) defval = 15611 parser.add_argument('-s', dest='seed', type=int, metavar='N', default=defval, help='Seed for random number generator' ' [default: {}]'.format(defval)) args = parser.parse_args() if args.temperature_final is None: args.temperature_final = args.temperature_initial print('Arguments:') for arg in args.__dict__: print(' {:32}: {}'.format(arg, args.__dict__[arg])) print('') if args.temperature_final <= 0.0: print('Final temperature must be positive.') exit(1) # set up job print('Optimize parameters (without "temperature")') job = atomicrex.Job() job.parse_input_file('main.xml') job.prepare_fitting() print # output initial parameters print('Initial residual: {}'.format(job.calculate_residual())) print('Initial parameters:') print('%-14s' % 'parameters:') for p in job.get_potential_parameters(): print(' {:10}'.format(p), end='') print('') print('{:14}'.format('1st derivative:')) grad = job.calculate_gradient() for g in grad: print('{:10}'.format(g), end='') print('') print('{:14}'.format('2nd derivative:')) hessian = job.calculate_hessian() for row in hessian: for d in row: print('{:10}'.format(d), end='') print('') print print # output properties for name, structure in sorted(job.structures.items()): print print(' Structure: {}'.format(name)) structure.compute_properties() structure.print_properties() print # set up Monte Carlo print('trial step lengths:'), trial_step_lengths = 1.0 / np.diag(hessian) for s in trial_step_lengths: print(' {:12}'.format(s), end='') print('') random.seed(args.seed) # run Monte Carlo print('Monte Carlo sampling of parameter space:') params = job.get_potential_parameters() chi = job.calculate_residual() number_of_rejected_trial_steps = 0 events = [] for p in params: events.append([]) temperature_delta = args.temperature_final - args.temperature_initial temperature_delta /= args.number_of_steps for it in range(args.number_of_steps + 1): temperature = args.temperature_initial + temperature_delta * it # trial step ip = random.randrange(len(params)) delta = trial_step_lengths[ip] * random.uniform(-1, 1) params[ip] += delta chi_new = job.calculate_residual(params) events[ip].append(1) if len(events[ip]) > args.sampling_window: events[ip].pop(0) if chi_new < chi: chi = chi_new accprob = 1.0 else: accprob = np.exp((chi - chi_new) / temperature) if accprob > random.uniform(0, 1): chi = chi_new else: number_of_rejected_trial_steps += 1 params[ip] -= delta events[ip][-1] = 0 # output and basic analysis if it % args.output_spacing != 0: continue print('mc', end='') print(' {:8}'.format(it), end='') print(' {:12.8}'.format(temperature), end='') print(' {:12.8g}'.format(chi), end='') print(' ', end='') for p in params: print(' {:12.8g}'.format(p), end='') print(' ', end='') for k, e in enumerate(events): try: prob = float(np.sum(e)) / len(e) except: prob = 0.0 print(' {:9.5}'.format(prob), end='') print(' ( {:9.5})'.format(trial_step_lengths[k]), end='') # adapt trial step lengths if k == ip: if prob > 0: trial_step_lengths[ip] *= prob trial_step_lengths[ip] /= args.target_acceptance_probability else: trial_step_lengths[ip] *= 0.9 s = job.structures['FCC'] s.compute_properties() print(' {:9.3}'.format(s.properties['atomic-energy'].computed_value), end='') print(' {:7.3}'.format(s.properties['lattice-parameter'].computed_value), end='') print(' {:9.2}'.format(s.properties['C11'].computed_value), end='') print(' {:9.2}'.format(s.properties['C12'].computed_value), end='') print(' {:9.2}'.format(s.properties['C44'].computed_value), end='') print('')
main.xml: main input file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
<?xml version="1.0" encoding="iso-8859-1"?> <job> <name>EAM potential</name> <verbosity>medium</verbosity> <fitting> <BFGS conv-threshold="1e-10" max-iter="00" /> </fitting> <atom-types> <species>Al</species> </atom-types> <potentials> <xi:include href="potential.xml" xmlns:xi="http://www.w3.org/2003/XInclude" /> </potentials> <structures> <xi:include href="structures.xml" xmlns:xi="http://www.w3.org/2003/XInclude" /> </structures> </job>
potential.xml: initial parameter set (included in main input file via XML Inclusions)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
<eam id='Al' species-a='*' species-b='*'> <functions> <user-function id='V'> <input-var>r</input-var> <expression> A * exp(-lambda * r) </expression> <derivative> -lambda * A * exp(-lambda * r) </derivative> <param name='A'>500.2</param> <param name='lambda'>2.37</param> <fit-dof> <A min='10.0'/> <lambda min='0.001'/> </fit-dof> <screening> <user-function id='rho_screening'> <cutoff>6.5</cutoff> <input-var>r</input-var> <expression> 1 - 1/(1 + ((r - cutoff) / h)^4) </expression> <derivative> 4 * h^4 * (r - cutoff)^3 / ((h^4 + (r - cutoff)^4)^2) </derivative> <param name='h'>3</param> </user-function> </screening> </user-function> <user-function id='rho'> <input-var>r</input-var> <expression> exp(-twomu * r) </expression> <derivative> -twomu * exp(-twomu * r) </derivative> <param name='twomu'>1.38</param> <fit-dof> <twomu min='0.02'/> </fit-dof> <screening> <user-function id='rho_screening'> <cutoff>6.5</cutoff> <input-var>r</input-var> <expression> 1 - 1/(1 + ((r - cutoff) / h)^4) </expression> <derivative> 4 * h^4 * (r - cutoff)^3 / ((h^4 + (r - cutoff)^4)^2) </derivative> <param name='h'>3</param> </user-function> </screening> </user-function> <user-function id='F'> <input-var>rho</input-var> <expression> -D * sqrt(rho) </expression> <derivative> -D / (2 * sqrt(rho)) </derivative> <param name='D'>15</param> <fit-dof> <D min='0.1'/> </fit-dof> </user-function> </functions> <mapping> <pair-interaction species-a='*' species-b='*' function='V' /> <electron-density species-a='*' species-b='*' function='rho' /> <embedding-energy species='*' function='F' /> </mapping> </eam>
structures.xml: input structures (included in main input file via XML Inclusions)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
<group> <fcc-lattice id='FCC' relative-weight='3'> <atom-type>Al</atom-type> <lattice-parameter>4.0</lattice-parameter> <properties> <lattice-parameter fit='true' target='4.032' relative-weight='3' min='3.0' max='5.0'/> <atomic-energy fit='true' target='-3.36' relative-weight='3'/> <bulk-modulus fit='false' target='80.9' relative-weight='1'/> <C11/> <C12/> <C44/> </properties> </fcc-lattice> <user-structure id='structure_1'> <poscar-file>POSCAR_res_POSCAR_0.9.traj_forces</poscar-file> <properties> <atomic-forces fit='true' output-all='true'/> </properties> </user-structure> <user-structure id='structure_2'> <poscar-file>POSCAR_res_POSCAR_1.0.traj_forces</poscar-file> <properties> <atomic-forces fit='true' output-all='true'/> </properties> </user-structure> <user-structure id='structure_3'> <poscar-file>POSCAR_res_POSCAR_1.1.traj_forces</poscar-file> <properties> <atomic-forces fit='true' output-all='true'/> </properties> </user-structure> </group>