Skip to content

Re-transmission of empty votes #150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yacovm opened this issue Apr 24, 2025 · 4 comments
Closed

Re-transmission of empty votes #150

yacovm opened this issue Apr 24, 2025 · 4 comments

Comments

@yacovm
Copy link
Collaborator

yacovm commented Apr 24, 2025

Currently, if the nodes suspect the leader is faulty, they send an empty vote and write the empty vote to the WAL to indicate they have timed out on that round.

However, the empty vote broadcast may not get to enough nodes and as a result, an empty notarization cannot be assembled.

We should therefore try to re-broadcast the empty vote after a timeout, but without writing it again to the WAL.

@dpc
Copy link

dpc commented Apr 25, 2025

Please excuse my ignorance. I haven't ever implemented BFT algo myself, and I wonder if anytime the whitepaper (this or any other BFT one) mentions "broadcast X" it doesn't mean that X needs to keep being attempted to be delivered until successful ack from the receiver?

I was actually thinking about how to best model this, as I guess it needs to be persisted in case of a crash? And it needs to be continued even after the current round on a local peer is gone? But that would also require storing it in some external per-destination-peer buffer. But then... how to trim that buffer so it doesn't grow unbounded?

Paper just says "broadcast X" and ignores the matter altogether. :D

I wonder if for this reason it wouldn't be better to reverse the communication in the protocol and simply have each side request things they expect. Assuming the rpc layer can do delayed responses. Instead of "broadcast vote", every peer that is missing a vote would just "request vote" and then if the vote is not ready, the rpc would wait for a response. The latency is the same.

@dpc
Copy link

dpc commented Apr 25, 2025

So it all would go:

  • all peers request vote from all other peers
  • the leader responds with a proposal, all other peers with their votes

In parallel, peers would always be subscribed to finalization messages from all other peers.

If any peer is missing anything from the past / that was already done, it would just be provided to them again.

This might also avoid storing any WALs even. Peer just needs to persist all votes they collected (most importantly their own, so they don't accidentally become faulty).

@dpc
Copy link

dpc commented Apr 25, 2025

The whole state machine also neatly becomes just a "who do I still need votes from". Yes. I like that very much. Someone should tell me why (if) I'm wrong. :D

@samliok samliok linked a pull request May 2, 2025 that will close this issue
@samliok
Copy link
Collaborator

samliok commented May 14, 2025

closed with #159

@samliok samliok closed this as completed May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants