MIT DCC Lab 3, Part A
Implementation of leader election
Part A is to implement leader election. Some skeleton code is provided, as well as a basic Raft struct. The goal is to allow these skeletion nodes to decide a leader even across a faulty network + crashing nodes.
// A Go object implementing a single Raft peer.
type Raft struct {
mu sync.Mutex // Lock to protect shared access to this peer's state
peers []*labrpc.ClientEnd // RPC end points of all peers
persister *Persister // Object to hold this peer's persisted state
me int // this peer's index into peers[]
dead int32 // set by Kill()
// Your data here (3A, 3B, 3C).
// Look at the paper's Figure 2 for a description of what
// state a Raft server must maintain.
}
// return currentTerm and whether this server
// believes it is the leader.
func (rf *Raft) GetState() (int, bool) {
var term int
var isleader bool
// Your code here (3A).
return term, isleader
}
// example RequestVote RPC arguments structure.
// field names must start with capital letters!
type RequestVoteArgs struct {
// Your data here (3A, 3B).
}
// example RequestVote RPC reply structure.
// field names must start with capital letters!
type RequestVoteReply struct {
// Your data here (3A).
}
// example RequestVote RPC handler.
func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
// Your code here (3A, 3B).
}
k
func (rf *Raft) ticker() {
for rf.killed() == false {
// Your code here (3A)
// Check if a leader election should be started.
// pause for a random amount of time between 50 and 350
// milliseconds.
ms := 50 + (rand.Int63() % 300)
time.Sleep(time.Duration(ms) * time.Millisecond)
}
}
Step 1: Getting The Data In Order
When you’re unsure of where to start, it’s always a good idea to clarify what problems you’re trying to solve, and then what the unknowns are. In this case: the question is what procedures we need to write and what data they’ll be operating on. This is from G. Polya’s “How To Solve It”.
There are five core bits of logic defined in the Raft paper:
- The election timer
- The heartbeat
- RequestVote
- AppendEntires
- Terms
Procedure 1 is a slightly randomized timer slower than the heartbeat. If the timer goes off, we become a candidate and initiate an election. Election in this case means requesting votes from all peers. If we recieve a majority for this term, we become a leader.
Procedure 2 is a ticker that regularly sends out AppendEntries requests. These requests will reset our peer’s election timers, allowing us to maintain leader status as long as we can successfully talk to a majority of our peers.
Procedure 3 defines whether we give our vote to a peer. We do this if their log is at least as up to date as ours and we haven’t voted for anyone in the term they’re requesting a vote for. Ex. if we’re on term 4 and they request a vote for term 5, and we haven’t voted yet, we ll give it to them. Anyone else that tries to become a leader for term 5 fails.
Procedure 4 is simple for now since we have no actual logging logic. Whenever we recieve an AppendEntry RPC with a term greater than our equal to our current term, we just reset our election timer.
Terms are a more general idea. It’s how nodes know who is behind, and who is ahead. All RPC calls need to come with a term attatched. If a node ever recieves a term higher than their current term, they can be sure they’ve fallen behind and should become a follower. If they recieve an RPC with a lower term, it’s outdated and they know to ignore it.
Writing it all out, the state we need to add to the skeleton is easy to enumerate:
Nodes need to keep an election timer, our node type (follower, candidate, leader), our term, how many votes we’ve recieved, and who we’ve voted for this term (if anyone).
When asking for a vote, the other person needs our term and how up to date our log is so they can tell if we’re worthy of leader status, plus who we are so they can track who they voted for. In return they answer with their term (as all RPC’s do) and if they give us their vote.
type NodeType uint8
const (
Follower NodeType = iota
Canditate
Leader
)
// A Go object implementing a single Raft peer.
type Raft struct {
// Context
nodeType NodeType
electionTimer *time.Timer
currentTerm int
votedFor int
collectedVotes int
}
type RequestVoteArgs struct {
Term int
CandidateID int
LastLogIndex int
LastLogTerm int
}
type RequestVoteReply struct {
Term int
VoteGranted bool
}
Step 2: Vote Logic
The next clearest step is to write the RequestVote RPC. Since we have the Args and Reply structs ready to go, we can just focus on the logic. Extrapolating a bit from the paper, I wrote this little function. There is likely lots of room for refactoring, but that comes after the tests pass.
func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
rf.mu.Lock()
defer rf.mu.Unlock()
reply.Term = rf.currentTerm
// If they're behind, reject. Not qualified!
if args.Term < rf.currentTerm {
reply.VoteGranted = false
return
}
// If we're behind, we must be out of sync. Become a follower.
if args.Term > rf.currentTerm {
rf.becomeFollower(args.Term)
reply.Term = rf.currentTerm
}
// If we've already voted this term or are a leader, reject
if (rf.votedFor != -1 && rf.votedFor != args.CandidateId) || rf.nodeType == Leader {
reply.VoteGranted = false
return
}
// Only if we haven't voted this term *and* they are at least as caught up as us, do we give them our vote
// Since logs never get appended to, this is just always true for now.
if rf.UpToDate(args.LastLogTerm, args.LastLogIndex) {
rf.votedFor = args.CandidateId
reply.VoteGranted = true
rf.resetElectionTimer()
return
}
reply.VoteGranted = false
}
Part 3: Voting
Now that we have a way for nodes to ask for permission to become a leader, nodes need to actaully participate in democracy. For the sake of simplicity, I just use two goroutines with for loops.
func (rf *Raft) startElectionInitiationTimer() {
for !rf.killed() {
select {
case <-rf.electionTimer.C:
rf.mu.Lock()
isLeader := rf.nodeType == Leader
rf.mu.Unlock()
if isLeader {
rf.pauseElectionTimer()
continue
}
rf.becomeCandidate()
rf.requestVotes()
rf.resetElectionTimer()
}
}
}
func (rf *Raft) startHeartbeat() {
for !rf.killed() {
time.Sleep(heartrate)
rf.mu.Lock()
isLeader := rf.nodeType == Leader
if isLeader {
rf.sendHeartbeats()
}
rf.mu.Unlock()
}
}
func (rf *Raft) startLoops() {
go rf.startElectionInitiationTimer()
go rf.startHeartbeat()
// Starts the timer for the first time
rf.resetElectionTimer()
}
Some basic helper functions are used to make it easy to work with the election timer.
func randElectionTimeout() time.Duration {
return time.Duration(150+rand.Int63()%300) * time.Millisecond
}
func (rf *Raft) pauseElectionTimer() {
// Stop the timer. If it returns false, that means it already fired, so clear the channel.
if !rf.electionTimer.Stop() {
select {
case <-rf.electionTimer.C:
default:
}
}
}
func (rf *Raft) resetElectionTimer() {
rf.pauseElectionTimer()
rf.electionTimer.Reset(randElectionTimeout())
}
Then, a bit of core logic to request votes and send heartbeats.
func (rf *Raft) requestVotes() {
args := &RequestVoteArgs{
Term: rf.currentTerm,
CandidateId: rf.me,
LastLogIndex: rf.lastLogIndex(),
LastLogTerm: rf.lastLogTerm(),
}
for i, _ := range rf.peers {
if i == rf.me {
continue
}
go func(peer int) {
reply := &RequestVoteReply{}
ok := rf.sendRequestVote(peer, args, reply)
rf.mu.Lock()
defer rf.mu.Unlock()
if args.Term != rf.currentTerm {
return
}
if ok && reply.Term > rf.currentTerm {
rf.becomeFollower(reply.Term)
return
}
if ok && reply.VoteGranted {
rf.collectedVotes++
if rf.collectedVotes >= len(rf.peers)/2+1 {
rf.state = Leader
rf.sendHeartbeats()
}
}
}(i)
}
}
func (rf *Raft) sendHeartbeats() {
args := &AppendEntriesArgs{
Term: rf.currentTerm,
LeaderId: rf.me,
// Right now, these are always 0 and 1 because actual operations aren't implemented yet. Eventually these will be incremented when we do stuff.
PrevLogIndex: 0,
PrevLogTerm: 1,
Entries: make([]LogEntry, 0),
// Also unused right now
LeaderCommit: rf.commitIndex,
}
for i, _ := range rf.peers {
if i == rf.me {
continue
}
go func(peer int) {
// Heartbeats are just empty AppendEntries! Keeps things simple. Love it.
reply := &AppendEntriesReply{}
ok := rf.sendHeartbeat(peer, args, reply)
rf.mu.Lock()
defer rf.mu.Unlock()
// Request is stale, ignore.
if args.Term != rf.currentTerm {
return
}
// If a peer is ahead of us, demote
if ok && args.Term > rf.currentTerm {
rf.becomeFollower(args.Term)
return
}
}(i)
}
}
And.. that’s it!
Test (3A): initial election ...
... Passed -- 3.1 3 62 15986 0
Test (3A): election after network failure ...
... Passed -- 4.6 3 142 28090 0
Test (3A): multiple elections ...
... Passed -- 5.5 7 708 134514 0
PASS
ok 6.5840/raft 13.118s
Boom!
Final structure
By default all nodes are followers with randomized election timers and a 100ms heartbeat.
If a node’s election timer goes off, they become a candidate and request votes from their peers.
If they get majority vote, they start sending heartbeats.
If at any point they receive data indicating they’re outdated (either from RequestVote or AppendEntires), they step down from office right away.
If the other nodes stop hearing heartbeats, eventually the one with the shortest timer will initiate an election and become the new leader. The cycle repeats.
It’s a pretty simple algorithm, which was what the designers had in mind when designing it.
Final Conclusions
What I talked about earlier, getting the data in order before embarking on algorithm implementation, was pretty critical here.
In a distributed system, debugging can take many steps. Since there’s a lot happening at once on many machines, bugs have more places to hide.
By cementing the data structures first before trying to do the algorithm design, it cut down on the number of things I had to keep in my head while actually writing the code. A bunch of different conditions have to be handled, so my mind had to be on logic rather than data.
I wish I handled the locks a bit better. Some functions expect the data to be unlocked before they’re called. Others expect it to be locked by the caller. And the decision for both cases was kind of just made on the whim based on the context the function is called in, and nothing else. Privately, I guess this is fine. But it makes it harder to follow the logic, and I spent at least an hour debugging something that was caused by something trying to obtain a lock for data that was being locked by a far off caller.
I think this was the thick of it though. Since leader election is working and passes all tests, it gives me a strong base to work off of. It’s clear where to start next time.