The main function : rdiff()
¶
The differentiation function is called rdiff()
and is called with the following parameters:
rdiff( ex::Expr; outsym::Symbol; order::Int, init... )
Arguments¶
ex: | is a Julia Expression containing the code to derive |
---|---|
outsym: | (default = nothing) is the symbol of the variable within ex containing the expression output (the result whose derivatives are needed). This variable must evaluate to a Real . If not specified, outsym defaults to nothing which signals to rdiff that the last statement is the result of interest for derivation. |
order: | (default = 1) is an integer indicating the derivation order (1 for 1st order, etc.). Order 0 is allowed and will produce an expression that is a processed version of ex with some variables names rewritten and possibly some optimizations. |
init: | (multiple keyword arguments) is one or several symbol / DataType pairs used to indicate for which variable a derivative is needed and how they should be interpreted. By default the generated expression will yield the derivative for each variable given unless the variable is listed in the ignore argument. |
evalmod: | (default=Main) module where the expression is meant to be evaluated. External variables and functions should be evaluable in this module. |
debug: | (default=false) indicates if rdiff should dump the graph of the generating expression, instead of returning the expression itself. |
allorders: | (default=true) indicates whether to generate the code for all orders up to order (true) or only the last order. |
ignore: | (default=[]) do not differentiate against the listed variables, useful if you are not interested in having the derivative of one of several variables in init . |
Output¶
An expression which, when evaluated, will return a tuple containing the expression value and the derivative at first, second , etc.. order.
Usage¶
rdiff
takes an expression consisting of a subset of Julia statements ( assignments, getindex, setindex!, for loops, function calls ) and transforms it into a new expression whose evaluation will provide the derivatives at all orders between 0 and the order specified (unless allorders
is false).
The generated expression will attempt to remove all unneeded calculations (e.g. x + 0) and factorize repeated function calls as much as possible.
All the variables appearing in the init
argument are considered as the expression’s arguments and a derivative is calculated for it (and cross derivatives if order is >= 2), unless they are listed in the ``ignore`` argument. The other variables, if not defined by the expression, are expected to be top level variables in evalmod
. If they are not defined there an error will be thrown.
For orders >= 2 only a single variable, of type Real or Vector, is allowed. For orders 0 and 1 variables can be of type Real, Vector or Matrix and can be in an unlimited number:
julia> rdiff( :(x^3) , x=Float64) # first order
:(begin
(x^3,3 * x^2.0)
end)
julia> rdiff( :(x^3) , order=3, x=Float64) # orders up to 3
:(begin
(x^3,3 * x^2.0,2.0 * (x * 3),6.0)
end)
rdiff
runs several simplification heuristics on the generated code to remove neutral statements and factorize repeated calculations. For instance calculating the derivatives of sin(x)
for large orders will reduce to the calculations of sin(x)
and cos(x)
:
julia> rdiff( :(sin(x)) , order=10, x=Float64) # derivatives up to order 10
:(begin
_tmp1 = sin(x)
_tmp2 = cos(x)
_tmp3 = -_tmp1
_tmp4 = -_tmp2
_tmp5 = -_tmp3
(_tmp1,_tmp2,_tmp3,_tmp4,_tmp5,_tmp2,_tmp3,_tmp4,_tmp5,_tmp2,_tmp3)
end)
The expression produced can easily be turned into a function with the @eval
macro:
julia> res = rdiff( :(sin(x)) , order=10, x=Float64)
julia> @eval foo(x) = $res
julia> foo(2.)
(0.9092974268256817,-0.4161468365471424,-0.9092974268256817,0.4161468365471424,0.9092974268256817,-0.4161468365471424,-0.9092974268256817,0.4161468365471424,0.9092974268256817,-0.4161468365471424,-0.9092974268256817)
When a second derivative expression is needed, only a single derivation variable is allowed. If you are dealing with a function of several (scalar) variables you will have you aggregate them into a vector:
julia> ex = :( (1 - x[1])^2 + 100(x[2] - x[1]^2)^2 ) # the rosenbrock function
julia> res = rdiff(ex, x=Vector{Float64}, order=2)
:(begin
_tmp1 = 1
_tmp2 = 2
_tmp3 = 100.0
_tmp4 = _tmp1 - x[_tmp1]
_tmp5 = length(x)
_tmp6 = zeros(size(x))
_tmp7 = x[_tmp2] - x[_tmp1] ^ _tmp2
_tmp8 = zeros((_tmp5,_tmp5))
_tmp9 = _tmp2 * (_tmp7 * _tmp3)
_tmp10 = -_tmp9
_tmp6[_tmp1] = _tmp6[_tmp1] + (_tmp2 * (x[_tmp1] * _tmp10) + -(_tmp2 * _tmp4))
_tmp6[_tmp2] = _tmp6[_tmp2] + _tmp9
for _idx1 = _tmp1:_tmp5
_tmp11 = zeros(size(_tmp6))
_tmp12 = zeros(size(x))
_tmp11[_idx1] = _tmp11[_idx1] + 1.0
_tmp13 = _tmp11[_tmp2]
_tmp11[_tmp2] = 0.0
_tmp11[_tmp2] = _tmp11[_tmp2] + _tmp13
_tmp14 = _tmp2 * _tmp11[_tmp1]
_tmp15 = _tmp3 * (_tmp2 * (_tmp13 + -(x[_tmp1] * _tmp14)))
_tmp12[_tmp1] = _tmp12[_tmp1] + ((_tmp10 * _tmp14 + _tmp2 * (x[_tmp1] * -_tmp15)) + -(_tmp2 * -(_tmp11[_tmp1])))
_tmp12[_tmp2] = _tmp12[_tmp2] + _tmp15
_tmp8[(_idx1 - 1) * _tmp5 + 1:_idx1 * _tmp5] = _tmp12
end
(_tmp4 ^ _tmp2 + 100 * _tmp7 ^ _tmp2,_tmp6,_tmp8)
end)
julia> @eval foo(x) = $res
julia> foo([0.5, 2.])
(306.5,[-351.0,350.0],
2x2 Array{Float64,2}:
-498.0 -200.0
-200.0 200.0)
foo(x)
returns a tuple containing respectively the value of the expression at x
, the gradient (a 2-vector) and the hessian (a 2x2 matrix)
Limitations¶
The canonical implementation of
for
loops derivation in reverse accumulation requires the caching of the complete state of each iteration which makes the generated code complex and memory intensive. The current algorithm uses a simpler approach that limits the kind of loops that can be correctly derived : in short, loops should not have any kind of recursivity in them (the calculations of each iteration should not depend on the calculations of previous iterations):# will work for i in 1:n a = f(x[i]) b = a + g(y[i]) c[i] = b end # will (probably) not work for i in 1:n c[i] = f( c[i-1] ) end
However simple accumulations are an instance of recursive calculations that will work:
# will work
for i in 1:n
a += b[i] # new a value depends on previous a
end
for
loops are limited to a single index. If you have afor i,j in 1:10, 1:10
in your expression you will have to translate it to nested loops as a workaround- All variables should be type-stable (not change from a scalar to a vector for example).
- Only a limited set of Julia semantics are supported at this stage. Some frequently used statements such as comprehensions,
if else
,while
loops cannot be used in the expression. - Mutating functions cannot be used (with the exception of
setindex!
andsetfield!
).