-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Perf] Reduce allocations in basic circuit operations #2620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
Note: we can probably remove some of the repeated code comments and instead refer to a single reference implementation that has them. |
Analysis regarding impact on pricing and deployment limits.For simplicity, we assume the number of variables equals the number of constraints, which is roughly the case for most deployed programs. And recall that compute is currently priced at 100 credits / second. Currently a This PR's benchmark implies 5-threaded runtime of (5x2x1'721'211)/1.15=14'967'052 (constraints+variables) per second. So:
|
Since I'm now switching to a different optimization, I'll post the final bits related to arithmetic circuit operations that I've found since the filing of this PR. This is the end result of these changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimization!
Related to what Victor said, I was thinking if it is possible to remove code duplications here with a new helper function (without cloning)?
I see the bitxor operation 5 times:
let output = Boolean(
E::new_variable(Mode::Private, match self.eject_value() ^ other.eject_value() {
true => E::BaseField::one(),
false => E::BaseField::zero(),
})
.into(),
);
and also the bitand operation 5 times:
let output = Boolean(
E::new_variable(Mode::Private, match self.eject_value() & other.eject_value() {
true => E::BaseField::one(),
false => E::BaseField::zero(),
})
.into(),
);
I'll see what I can do to compress these 👍. |
Good idea! |
1dfb988
to
cd70ab9
Compare
I rebased and applied the first round of "compressions", which already greatly improved the diff:
I'm going to do another pass in case I can find any substitutions that can be made without a drop in performance. Also, these new changes have actually resulted in a further small improvement:
|
I managed to squeeze the impls a bit further:
The perf improvement was reduced a little bit, but it's still a win:
It might still be possible to reduce the amount of duplication a bit more. |
Thank you. The |
Done. I was worried that it would wipe more of the performance win, but actually the impact was pretty benign:
|
2a32e60
to
c325d9d
Compare
c325d9d
to
15e3b79
Compare
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
…d Variables Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
…nation Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some suggestions on this PR ages ago but never finished the review... Hopefully we can merge this soon?
}; | ||
use smallvec::SmallVec; | ||
use std::borrow::Cow; | ||
|
||
// Before high level program operations are converted into constraints, they are first tracked as linear combinations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change this to a doc comment (///
instead of //
) and use Markdown formatting for the list?
false => { | ||
match output.terms.binary_search_by(|(v, _)| v.cmp(variable)) { | ||
match output.terms.binary_search_by(|(v, _)| v.cmp(&variable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be slightly nicer to do
output.terms.binary_search_by_key(&variable, |(v, _)| v)
@@ -249,7 +227,7 @@ impl<F: PrimeField> Add<Variable<F>> for LinearCombination<F> { | |||
|
|||
#[allow(clippy::op_ref)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this decorator, because the &other
part is gone. (See here what the lint does)
} | ||
} | ||
|
||
impl<F: PrimeField> Add<LinearCombination<F>> for LinearCombination<F> { | ||
type Output = Self; | ||
|
||
fn add(self, other: Self) -> Self::Output { | ||
self + &other | ||
if self.terms.len() > other.terms.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here that we expand the larger terms to reduce heap allocations? Might be good to add a comment.
self + &other | ||
if self.terms.len() > other.terms.len() { | ||
let mut output = self; | ||
output += Cow::Owned(other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AddAssign more efficient than Add? Again, it might be good to document that with a comment.
These performance improvements were found via profiling of the
large_program
, which is included in the 1st commit. It is clear that a lot of the time is being spent on allocations (which go into millions), many of which are linked to some of the basic circuit operations.This PR is not the end of this endeavor, but I decided to isolate these changes in order for the diff to not spiral out of control. While it is already considerable, note that almost 20% is the benchmark, and most of the new additions are almost exact duplicates of the existing code, but with unnecessary clone operations removed. The current code reimplements many operations using the "lowest common denominator", which reduces the amount of code, but results in performance penalties. I believe this amount of duplication is acceptable considering how impactful these operations are, and how unlikely they are to change in the future.
Based on the new
large_program
(which involves the synthesis of1_721_211
constraints) benchmark, these changes result in a boost to runtime performance:(updated results)
According to additional heap profiling, these changes reduce the total number of allocations involved by over 22%.
As suggested by @vicsn, I ran the
credits.sh
Canary script with these changes, and saw no change to the diff, suggesting that there is no impact to logic.