Skip to content

Support exact Hessian of objective function J in NonLinMPC and MovingHorizonEstimator #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
franckgaga opened this issue Apr 20, 2025 · 9 comments · May be fixed by #194
Open

Support exact Hessian of objective function J in NonLinMPC and MovingHorizonEstimator #193

franckgaga opened this issue Apr 20, 2025 · 9 comments · May be fixed by #194

Comments

@franckgaga
Copy link
Member

No description provided.

@franckgaga franckgaga changed the title Support exact Hessian of objective function Support exact Hessian of objective function in NonLinMPC and MovingHorizonEstimator Apr 20, 2025
@franckgaga franckgaga changed the title Support exact Hessian of objective function in NonLinMPC and MovingHorizonEstimator Support exact Hessian of objective function J in NonLinMPC and MovingHorizonEstimator Apr 30, 2025
@franckgaga
Copy link
Member Author

franckgaga commented Apr 30, 2025

@gdalle What's the approach to troubleshoot overly-conservative TracerSparsityDetector() results?

I almost finished the implementation of exact objective function Hessians for NonLinMPC. Doing some tests on my side with this Hessian backend:

hess = AutoSparse(
        AutoForwardDiff(),
        sparsity_detector  = TracerSparsityDetector(), 
        coloring_algorithm = GreedyColoringAlgorithm()
    )

and looking at the sparsity pattern with:

prep_∇²J = prepare_hessian(Jfunc!, hess, Z̃_J, context_J...; strict)
display(sparsity_pattern(prep_∇²J))

always results in a fully dense patern, e.g.:

65×65 SparseArrays.SparseMatrixCSC{Bool, Int64} with 4225 stored entries:
⎡⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎤
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥
⎣⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎦

I look at the entries and many of them are zero. I'm fairly certain that many of them are globally zero (as it is generally the case for the Hessian of a NLP) but not correctly detected by TracerSparsityDetector().

I also tried TracerLocalSparsityDetector() and it works better for the detection:

65×65 SparseArrays.SparseMatrixCSC{Bool, Int64} with 185 stored entries:
⎡⠑⣤⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠉⠿⢇⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠘⢻⣶⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠿⣧⡄⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠉⢱⣶⣀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⠿⣧⡄⠀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⢿⣷⎦

It also works well for the optimization from my quick closed-loop tests. But, I'm not entirely confident that this tracer is too optimistic, and the optimization will fails in some corner cases.

@franckgaga
Copy link
Member Author

franckgaga commented Apr 30, 2025

Here's another hint that the global detector is "wrong". If I use this sparse symbolic backend:

adbackend = AutoSparse(AutoSymbolics())

the returned sparsity pattern is the same as the one returned by TracerLocalSparsityDetector():

65×65 SparseArrays.SparseMatrixCSC{Bool, Int64} with 185 stored entries:
⎡⠑⣤⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠉⠿⢇⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠘⢻⣶⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠿⣧⡄⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠉⢱⣶⣀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⠿⣧⡄⠀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⢿⣷⎦

I assume that the built-in sparsity pattern detector of AutoSymbolics() is a global one.

@gdalle
Copy link

gdalle commented Apr 30, 2025

Could you maybe provide

  1. An MWE using MPC?
  2. Ideally an MWE without MPC?

Essentially the question is whether this is unavoidable or not. Right now I can't tell you ^^

@franckgaga
Copy link
Member Author

franckgaga commented Apr 30, 2025

I'm not able to produce a MWE without MPC.jl, unfortunately. 🫤

You can dev the hessian_objective_debug branch of ModelPredictiveControl.jl (to modify it as your wish), and see the results of:

using ModelPredictiveControl, ControlSystemsBase
using DifferentiationInterface, SparseConnectivityTracer, SparseMatrixColorings
import ForwardDiff

f(x,u,_,_) = 0.1*x + u
h(x,_,_)   = x
model = NonLinModel(f, h, 50.0, 1, 1, 1, solver=nothing)

adbackend = AutoSparse(
        AutoForwardDiff(),
        sparsity_detector  = TracerSparsityDetector(),  # or TracerLocalSparsityDetector(),
        coloring_algorithm = GreedyColoringAlgorithm()
    )
mpc = NonLinMPC(model; Hp=10, hessian=adbackend, transcription=MultipleShooting())
res = sim!(mpc, 1, x̂_0=zeros(2), x_0=zeros(1))

giving for TracerSparsityDetector():

23×23 SparseArrays.SparseMatrixCSC{Bool, Int64} with 529 stored entries:
⎡⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⎤
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⎥
⎣⠿⠿⠿⠿⠿⠿⠿⠿⠿⠿⠿⠇⎦

and for TracerLocalSparsityDetector:

23×23 SparseArrays.SparseMatrixCSC{Bool, Int64} with 43 stored entries:
⎡⠑⣤⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠛⣤⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠛⣤⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠛⣤⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠛⣤⠀⠀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠛⠄⎦

@gdalle
Copy link

gdalle commented Apr 30, 2025

My best guess is that you're hitting some of the overly conservative linear algebra overloads, but I'm not sure which one. Will investigate

@gdalle
Copy link

gdalle commented May 1, 2025

OK, I figured it out and there are two layers of shady stuff going on.

The underlying issue is that SCT ignores values when handling global tracers, even the values of fixed, ordinary numbers. In a nutshell, f(x) = 0 * x and g(x) = 1 * x have the same sparsity pattern. This is something we should fix by special-casing * behavior, I have opened adrhill/SparseConnectivityTracer.jl#242 to keep track.

But this wouldn't be an issue at all if MPC handled structural sparsity better. I managed to make the SCT routine work in #198 by replacing some dense matrices with sparse ones, which turns "accidental" zeros into structural zeros. I am not sure this is the right implementation for your use case, but I definitely think you should pay more attention to this idea in general.
Many of the matrices you're working with are extremely sparse and structured, like block diagonals or mostly straight up diagonals. Is there a reason why you don't use any of the options available in the ecosystem, like SparseArrays or BlockDiagonals? To me that seems to cause excessive memory consumption, and possibly slower linear-algebraic operations.

@franckgaga
Copy link
Member Author

Thanks a lot for your time and the PR, there is certainly great ideas in it.

I avoided "hard-coded" sparse matrices in MPC.jl to avoid loosing performances on small matrices. It is well known that dense matrix are generally faster than sparse one when the matrix are small. This is even more important for real-time applications like MPC. Indeed, a plant model that is well-suited for real-time applications is a simple and low-order one. If an engineer as the choice between a crude and low-order plant model (thus with small matrices) or a high-fidelity and high-order plant model (thus with large matrices), the crude model is typically more suited for MPC. Remember that MPC incorporate feedback, it is thus more forgiving on approximate plant model in comparison to open-loop optimal control (trajectory optimization). That being said:

  1. If I can use LinearAlgebra.Diagonnal, LinearAlgebra.Hermitian or any other structured "dense matrix" available in the standard library, I use them. Did you noticed any diagonal matrix that was not wrap as a LinearAlgebra.Diagonnal in the code?
  2. Yes, I use normal dense matrices for block diagonal matrices in the package. Using BlockDiagonal.jl is a neat idea. I'm not able to load the doc, don't know why, but do you know if matrix operation like +, or * are specialized on them?

@gdalle
Copy link

gdalle commented May 1, 2025

I guess it's hard to create a framework that works equally well with small and large instances. But there are various ways to be small or large. Typically, regarding block-diagonal matrices, are we talking about the size of each block, or the number of blocks?

  1. Yes, for instance every diagm creates a dense matrix whereas Diagonal doesn't. Same when you do naive concatenation, e.g. to add the slack variable, you get a dense matrix by default.
  2. I would assume that basic operations are implemented, but I don't know how many or how well. In any case, sparse matrices are super optimized so the comparison between both tools may be interesting.

Perhaps an interesting question to ask is whether these large matrices are even needed in the first place. I don't know what requires the block diagonal structure (I guess time indices?) but maybe there is a way to apply these matrices as lazy operators, without even materializing them. In other words, maybe the way we write down the math is not the way we should code?

Take everything I say with a grain of salt of course, I know nothing about optimal control. But I'd be happy to collaborate on research if new perspectives open from the software side

@gdalle
Copy link

gdalle commented May 2, 2025

@franckgaga there's an SCT fix on the branch adrhill/SparseConnectivityTracer.jl#243 if you want to take it out for a spin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants