-
Notifications
You must be signed in to change notification settings - Fork 2
Re-transmission of empty votes #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please excuse my ignorance. I haven't ever implemented BFT algo myself, and I wonder if anytime the whitepaper (this or any other BFT one) mentions "broadcast X" it doesn't mean that X needs to keep being attempted to be delivered until successful ack from the receiver? I was actually thinking about how to best model this, as I guess it needs to be persisted in case of a crash? And it needs to be continued even after the current round on a local peer is gone? But that would also require storing it in some external per-destination-peer buffer. But then... how to trim that buffer so it doesn't grow unbounded? Paper just says "broadcast X" and ignores the matter altogether. :D I wonder if for this reason it wouldn't be better to reverse the communication in the protocol and simply have each side request things they expect. Assuming the rpc layer can do delayed responses. Instead of "broadcast vote", every peer that is missing a vote would just "request vote" and then if the vote is not ready, the rpc would wait for a response. The latency is the same. |
So it all would go:
In parallel, peers would always be subscribed to finalization messages from all other peers. If any peer is missing anything from the past / that was already done, it would just be provided to them again. This might also avoid storing any WALs even. Peer just needs to persist all votes they collected (most importantly their own, so they don't accidentally become faulty). |
The whole state machine also neatly becomes just a "who do I still need votes from". Yes. I like that very much. Someone should tell me why (if) I'm wrong. :D |
closed with #159 |
Currently, if the nodes suspect the leader is faulty, they send an empty vote and write the empty vote to the WAL to indicate they have timed out on that round.
However, the empty vote broadcast may not get to enough nodes and as a result, an empty notarization cannot be assembled.
We should therefore try to re-broadcast the empty vote after a timeout, but without writing it again to the WAL.
The text was updated successfully, but these errors were encountered: