• Название:

    Безусловная оптимизация в Mathematica


  • Размер: 19.75 Мб
  • Формат: PDF
  • или
  • Сообщить о нарушении / Abuse

Установите безопасный браузер



  • Название: Mathematica Tutorial: Unconstrained Optimization
  • Описание: Instruction on using Mathematica to solve unconstrained optimization, nonlinear equations, and nonlinear fitting problems.
  • Автор: Rob Knapp

Предпросмотр документа

Wolfram Mathematica® Tutorial Collection

UNCONSTRAINED OPTIMIZATION

For use with Wolfram Mathematica® 7.0 and later.

For the latest updates and corrections to this manual:
visit reference.wolfram.com
For information on additional copies of this documentation:
visit the Customer Service website at www.wolfram.com/services/customerservice
or email Customer Service at info@wolfram.com
Comments on this manual are welcomed at:
comments@wolfram.com
Content authored by:
Rob Knapp

Printed in the United States of America.
15 14 13 12 11 10 9 8 7 6 5 4 3 2

©2008 Wolfram Research, Inc.
All rights reserved. No part of this document may be reproduced or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the copyright
holder.
Wolfram Research is the holder of the copyright to the Wolfram Mathematica software system ("Software") described
in this document, including without limitation such aspects of the system as its code, structure, sequence,
organization, “look and feel,” programming language, and compilation of command names. Use of the Software unless
pursuant to the terms of a license granted by Wolfram Research or as otherwise authorized by law is an infringement
of the copyright.
Wolfram Research, Inc. and Wolfram Media, Inc. ("Wolfram") make no representations, express,
statutory, or implied, with respect to the Software (or any aspect thereof), including, without limitation,
any implied warranties of merchantability, interoperability, or fitness for a particular purpose, all of
which are expressly disclaimed. Wolfram does not warrant that the functions of the Software will meet
your requirements or that the operation of the Software will be uninterrupted or error free. As such,
Wolfram does not recommend the use of the software described in this document for applications in
which errors or omissions could threaten life, injury or significant loss.
Mathematica, MathLink, and MathSource are registered trademarks of Wolfram Research, Inc. J/Link, MathLM,
.NET/Link, and webMathematica are trademarks of Wolfram Research, Inc. Windows is a registered trademark of
Microsoft Corporation in the United States and other countries. Macintosh is a registered trademark of Apple
Computer, Inc. All other trademarks used herein are the property of their respective owners. Mathematica is not
associated with Mathematica Policy Research, Inc.

Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Methods for Local Minimization
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

Newton's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

Quasi-Newton Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

Gauss-Newton Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

Nonlinear Conjugate Gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

Principal Axis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

Methods for Solving Nonlinear Equations
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

The Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

Brent’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

Step Control
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

Line Search Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

Trust Region Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Setting Up Optimization Problems in Mathematica
Specifying Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

Variables and Starting Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

Termination Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Symbolic Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

UnconstrainedProblems Package
Plotting Search Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

Test Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

Introduction to Unconstrained
Optimization
Mathematica has a collection of commands that do unconstrained optimization (FindMinimum
and FindMaximum ) and solve nonlinear equations (FindRoot) and nonlinear fitting problems
(FindFit). All these functions work, in general, by doing a search, starting at some initial
values and taking steps that decrease (or for FindMaximum , increase) an objective or merit
function.
The search process for FindMaximum is somewhat analogous to a climber trying to reach a
mountain peak in a thick fog; at any given point, basically all that climbers know is their position, how steep the slope is, and the direction of the fall line. One approach is always to go
uphill. As long as climbers go uphill steeply enough, they will eventually reach a peak, though it
may not be the highest one. Similarly, in a search for a maximum, most methods are ascent
methods where every step increases the height and stops when it reaches any peak, whether it
is the highest one or not.
The analogy with hill climbing can be reversed to consider descent methods for finding local
minima. For the most part, the literature in optimization considers the problem of finding minima, and since this applies to most of the Mathematica commands, from here on, this documentation will follow that convention.
For example, the function x sinHx + 1L is not bounded from below, so it has no global minimum,
but it has an infinite number of local minima.
This loads a package that contains some utility functions.
In[1]:=

<< Optimization`UnconstrainedProblems`
This shows a plot of the function x Sin@x + 1D.

In[2]:=

Plot@x Sin@x + 1D, 8x, - 10, 10<D
5

Out[2]=

-10

5

-5
-5
-10

10

2

Unconstrained Optimization

This shows the steps taken by FindMinimum for the function x Sin@x + 1D starting at x = 0.
In[3]:=

FindMinimumPlot@x Sin@x + 1D, 8x, 0<D
-0.5

-0.4

-0.3

-0.2

-0.1
-0.05
-0.10

Out[3]= :8-0.240125, 8x Ø -0.520269<<, 8Steps Ø 5, Function Ø 6, Gradient Ø 6<,

>

-0.15
-0.20

The FindMinimumPlot command is defined in the Optimization`UnconstrainedProblems`
package loaded automatically by this notebook. It runs FindMinimum , keeps track of the function and gradient evaluations and steps taken during the search (using the EvaluationMonitor
and StepMonitor options), and shows them superimposed on a plot of the function. Steps are
indicated with blue lines, function evaluations are shown with green points, and gradient evaluations are shown with red points. The minimum found is shown with a large black point. From
the plot, it is clear that FindMinimum has found a local minimum point.
This shows the steps taken by FindMinimum for the function x Sin@x + 1D starting at x = 2.
In[4]:=

FindMinimumPlot@x Sin@x + 1D, 8x, 2<D
6
4

Out[4]= :8- 3.83922, 8x Ø 3.95976<<, 8Steps Ø 4, Function Ø 9, Gradient Ø 9<,

2

>
3

4

5

6

7

-2
-4

Starting at 2, FindMinimum heads to different local minima, at which the function is smaller
than at the first minimum found.
From these two plots, you might come to the conclusion that if you start at a point where the
function is sloping downward, you will always head toward the next minimum in that direction.
However, this is not always the case; the steps FindMinimum takes are typically determined
using the value of the function and its derivatives, so if the derivative is quite small,
FindMinimum may think it has to go quite a long way to find a minimum point.

Unconstrained Optimization

3

This shows the steps taken by FindMinimum for the function x Sin@x + 1D starting at x = 7.
In[5]:=

FindMinimumPlot@x Sin@x + 1D, 8x, 7<D
40
20

Out[5]=

:8- 41.4236, 8x Ø 41.4356<<, 8Steps Ø 3, Function Ø 14, Gradient Ø 14<,

15

20

25

30

35

40

>

-20
-40

When starting at x = 7, which is near a local maximum, the first step is quite large, so
FindMinimum returns a completely different local minimum.
All these commands have "find" in their name because, in general, their design is to search to
find any point where the desired condition is satisfied. The point found may not be the only one
(in the case of roots) or even the best one (in the case of fits, minima, or maxima), or, as you
have seen, not even the closest one to the starting condition. In other words, the goal is to find
any point at which there is a root or a local maximum or minimum. In contrast, the function
NMinimize tries harder to find the global minimum for the function, but NMinimize is also
generally given constraints to bound the problem domain. However, there is a price to pay for
this generality~NMinimize has to do much more work and, in fact, may call one of the "Find"
functions to polish a result at the end of its process, so it generally takes much more time than
the "Find" functions.
In two dimensions, the minimization problem is more complicated because both a step direction
and step length need to be determined.
This shows the steps taken by FindMinimum to find a local minimum of the function
cosIx2 - 3 yM + sinIx2 + y2 M starting at the point 8x, y< = 81, 1<.
In[6]:=

FindMinimumPlot@Cos@x ^ 2 - 3 yD + Sin@x ^ 2 + y ^ 2D, 88x, 1<, 8y, 1<<D
2.0
1.8
1.6

Out[6]= :8-2., 8x Ø 1.37638, y Ø 1.67868<<, 8Steps Ø 9, Function Ø 13, Gradient Ø 13<,

>
1.4
1.2
1.0
0.8

1.0

1.2

1.4

1.6

4

Unconstrained Optimization

The FindMinimumPlot command for two dimensions is similar to the one-dimensional case, but
it shows the steps and evaluations superimposed on a contour plot of the function. In this
example, it is apparent that FindMinimum needed to change direction several times to get to
the local minimum. You may notice that the first step starts in the direction of steepest descent
(i.e., perpendicular to the contour or parallel to the gradient). Steepest descent is indeed a
possible strategy for local minimization, but it often does not converge quickly. In subsequent
steps in this example, you may notice that the search direction is not exactly perpendicular to
the contours. The search is using information from past steps to try to get information about
the curvature of the function, which typically gives it a better direction to go. Another strategy,
which usually converges faster, but can be more expensive, is to use the second derivative of
the function. This is usually referred to as "Newton's" method.
This shows the steps taken using Newton's method.
In[7]:=

FindMinimumPlot@Cos@x ^ 2 - 3 yD + Sin@x ^ 2 + y ^ 2D, 88x, 1<, 8y, 1<<, Method Ø NewtonD

Out[7]= :8-2., 8x Ø 1.37638, y Ø 1.67868<<,
1.8

1.6

8Steps Ø 5, Function Ø 6, Gradient Ø 6, Hessian Ø 6<,

1.4

>

1.2

1.0
1.0

1.1

1.2

1.3

1.4

In this example, it is clear that the extra information that "Newton's" method uses about the
curvature of the function makes a big difference in how many steps it takes to get to the minimum. Even though Newton's method takes fewer steps, it may take more total execution time
since the symbolic Hessian has to be computed once and then evaluated numerically at each
step.
Usually there are tradeoffs between the rate of convergence or total number of steps taken and
cost per step. Depending on the size of the problems you want to solve, you may want to pick a
particular method to best match that tradeoff for a particular problem. This documentation is
intended to help you understand those choices as well as some ways to get the best results
from the functions in Mathematica. For the most part, examples will be used to illustrate the
ideas, but a limited exposition on the mathematical theory behind the methods will be given so
that you can better understand how the examples work.

Unconstrained Optimization

5

For the most part, local minimization methods for a function f are based on a quadratic model
qk HpL = f Hxk L + “ f Hxk LT p +

1
2

pT Bk p.

(1)

The subscript k refers to the kth iterative step. In Newton's method, the model is based on the
exact Hessian matrix, Bk = “2 f Hxk L , but other methods use approximations to “2 f Hxk L, which are
typically less expensive to compute. A trial step sk is typically computed to be the minimizer of
the model, which satisfies the system of linear equations.
Bk sk = - “ f Hxk L
If f is sufficiently smooth and xk is sufficiently close to a local minimum, then with Bk = “2 f Hxk L,
the sequence of steps xk+1 = sk + xk is guaranteed to converge to the local minimum. However, in
a typical search, the starting value is rarely close enough to give the desired convergence.
Furthermore, Bk is often an approximation to the actual Hessian and, at the beginning of a
search, the approximation is frequently quite inaccurate. Thus, it is necessary to provide additional control to the step sequence to improve the chance and rate of convergence. There are
two frequently used methods for controlling the steps: line search and trust region methods.
In a "line search" method, for each trial step sk found, a one-dimensional search is done along
the direction of sk so that xk+1 = xk + ak sk . You could choose ak so that it minimizes f Hxk+1 L in this
direction, but this is excessive, and with conditions that require that f Hxk+1 L decreases sufficiently in value and slope, convergence for reasonable approximations Bk can be proven. Mathematica uses a formulation of these conditions called the Wolfe conditions.
In a "trust region" method, a radius Dk within which the quadratic model qk HpL in equation (1) is
“trusted” to be reasonably representative of the function. Then, instead of solving for the unconstrained minimum of (1), the trust region method tries to find the constrained minimum of (1)
with °p¥ § Dk . If the xk are sufficiently close to a minimum and the model is good, then often the
minimum lies within the circle, and convergence is quite rapid. However, near the start of a
search, the minimum will lie on the boundary, and there are a number of techniques to find an
approximate solution to the constrained problem. Once an approximate solution is found, the
actual reduction of the function value is compared to the predicted reduction in the function
value and, depending on how close the actual value is to the predicted, an adjustment is made
for Dk+1 .

6

Unconstrained Optimization

For symbolic minimization of a univariate smooth function, all that is necessary is to find a point
at which the derivative is zero and the second derivative is positive. In multiple dimensions,
this means that the gradient vanishes and the Hessian needs to be positive definite. (If the
Hessian is positive semidefinite, the point is a minimizer, but is not necessarily a strict one.) As
a numerical algorithm converges, it is necessary to keep track of the convergence and make
some judgment as to when a minimum has been approached closely enough. This is based on
the sequence of steps taken and the values of the function, its gradient, and possibly its Hessian at these points. Usually, the Mathematica Find… functions will issue a message if they
cannot be fairly certain that this judgment is correct. However, keep in mind that discontinuous
functions or functions with rapid changes of scale can fool any numerical algorithm.
When solving "nonlinear equations", many of the same issues arise as when finding a "local
minimum". In fact, by considering a so-called merit function, which is zero at the root of the
eq