Blog


MIT DCC Lab 3, Part A

Implementation of leader election


Part A is to implement leader election. Some skeleton code is provided, as well as a basic Raft struct. The goal is to allow these skeletion nodes to decide a leader even across a faulty network + crashing nodes.

// A Go object implementing a single Raft peer.
type Raft struct {
	mu        sync.Mutex          // Lock to protect shared access to this peer's state
	peers     []*labrpc.ClientEnd // RPC end points of all peers
	persister *Persister          // Object to hold this peer's persisted state
	me        int                 // this peer's index into peers[]
	dead      int32               // set by Kill()

	// Your data here (3A, 3B, 3C).
	// Look at the paper's Figure 2 for a description of what
	// state a Raft server must maintain.
}

// return currentTerm and whether this server
// believes it is the leader.
func (rf *Raft) GetState() (int, bool) {

	var term int
	var isleader bool
	// Your code here (3A).
	return term, isleader
}

// example RequestVote RPC arguments structure.
// field names must start with capital letters!
type RequestVoteArgs struct {
	// Your data here (3A, 3B).
}

// example RequestVote RPC reply structure.
// field names must start with capital letters!
type RequestVoteReply struct {
	// Your data here (3A).
}

// example RequestVote RPC handler.
func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
	// Your code here (3A, 3B).
}
k
func (rf *Raft) ticker() {
	for rf.killed() == false {

		// Your code here (3A)
		// Check if a leader election should be started.


		// pause for a random amount of time between 50 and 350
		// milliseconds.
		ms := 50 + (rand.Int63() % 300)
		time.Sleep(time.Duration(ms) * time.Millisecond)
	}
}

Step 1: Getting The Data In Order

When you’re unsure of where to start, it’s always a good idea to clarify what problems you’re trying to solve, and then what the unknowns are. In this case: the question is what procedures we need to write and what data they’ll be operating on. This is from G. Polya’s “How To Solve It”.

There are five core bits of logic defined in the Raft paper:

  1. The election timer
  2. The heartbeat
  3. RequestVote
  4. AppendEntires
  5. Terms

Procedure 1 is a slightly randomized timer slower than the heartbeat. If the timer goes off, we become a candidate and initiate an election. Election in this case means requesting votes from all peers. If we recieve a majority for this term, we become a leader.

Procedure 2 is a ticker that regularly sends out AppendEntries requests. These requests will reset our peer’s election timers, allowing us to maintain leader status as long as we can successfully talk to a majority of our peers.

Procedure 3 defines whether we give our vote to a peer. We do this if their log is at least as up to date as ours and we haven’t voted for anyone in the term they’re requesting a vote for. Ex. if we’re on term 4 and they request a vote for term 5, and we haven’t voted yet, we ll give it to them. Anyone else that tries to become a leader for term 5 fails.

Procedure 4 is simple for now since we have no actual logging logic. Whenever we recieve an AppendEntry RPC with a term greater than our equal to our current term, we just reset our election timer.

Terms are a more general idea. It’s how nodes know who is behind, and who is ahead. All RPC calls need to come with a term attatched. If a node ever recieves a term higher than their current term, they can be sure they’ve fallen behind and should become a follower. If they recieve an RPC with a lower term, it’s outdated and they know to ignore it.

Writing it all out, the state we need to add to the skeleton is easy to enumerate:

Nodes need to keep an election timer, our node type (follower, candidate, leader), our term, how many votes we’ve recieved, and who we’ve voted for this term (if anyone).

When asking for a vote, the other person needs our term and how up to date our log is so they can tell if we’re worthy of leader status, plus who we are so they can track who they voted for. In return they answer with their term (as all RPC’s do) and if they give us their vote.

type NodeType uint8

const (
	Follower NodeType = iota
	Canditate
	Leader
)

// A Go object implementing a single Raft peer.
type Raft struct {
    // Context

    nodeType       NodeType 
	electionTimer  *time.Timer
	currentTerm    int
	votedFor       int
    collectedVotes int
}

type RequestVoteArgs struct {
	Term         int
	CandidateID  int
	LastLogIndex int
	LastLogTerm  int
}

type RequestVoteReply struct {
	Term        int
	VoteGranted bool
}

Step 2: Vote Logic

The next clearest step is to write the RequestVote RPC. Since we have the Args and Reply structs ready to go, we can just focus on the logic. Extrapolating a bit from the paper, I wrote this little function. There is likely lots of room for refactoring, but that comes after the tests pass.

func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
	rf.mu.Lock()
	defer rf.mu.Unlock()

	reply.Term = rf.currentTerm

	// If they're behind, reject. Not qualified!
	if args.Term < rf.currentTerm {
		reply.VoteGranted = false
		return
	}

	// If we're behind, we must be out of sync. Become a follower.
	if args.Term > rf.currentTerm {
		rf.becomeFollower(args.Term)
		reply.Term = rf.currentTerm
	}

	// If we've already voted this term or are a leader, reject
	if (rf.votedFor != -1 && rf.votedFor != args.CandidateId) || rf.nodeType == Leader {
		reply.VoteGranted = false
		return
	}

	// Only if we haven't voted this term *and* they are at least as caught up as us, do we give them our vote
    // Since logs never get appended to, this is just always true for now.
	if rf.UpToDate(args.LastLogTerm, args.LastLogIndex) {
		rf.votedFor = args.CandidateId
		reply.VoteGranted = true

		rf.resetElectionTimer()

		return
	}

	reply.VoteGranted = false
}

Part 3: Voting

Now that we have a way for nodes to ask for permission to become a leader, nodes need to actaully participate in democracy. For the sake of simplicity, I just use two goroutines with for loops.

func (rf *Raft) startElectionInitiationTimer() {
	for !rf.killed() {
		select {
		case <-rf.electionTimer.C:
			rf.mu.Lock()
			isLeader := rf.nodeType == Leader
			rf.mu.Unlock()

			if isLeader {
				rf.pauseElectionTimer()
				continue
			}

			rf.becomeCandidate()
			rf.requestVotes()
			rf.resetElectionTimer()
		}
	}
}

func (rf *Raft) startHeartbeat() {
	for !rf.killed() {
		time.Sleep(heartrate)

		rf.mu.Lock()
		isLeader := rf.nodeType == Leader

		if isLeader {
			rf.sendHeartbeats()
		}

		rf.mu.Unlock()
	}
}

func (rf *Raft) startLoops() {

	go rf.startElectionInitiationTimer()
	go rf.startHeartbeat()

    // Starts the timer for the first time
	rf.resetElectionTimer()
}

Some basic helper functions are used to make it easy to work with the election timer.

func randElectionTimeout() time.Duration {
	return time.Duration(150+rand.Int63()%300) * time.Millisecond
}

func (rf *Raft) pauseElectionTimer() {
	// Stop the timer. If it returns false, that means it already fired, so clear the channel.
	if !rf.electionTimer.Stop() {
		select {
		case <-rf.electionTimer.C:
		default:
		}
	}
}

func (rf *Raft) resetElectionTimer() {
	rf.pauseElectionTimer()
	rf.electionTimer.Reset(randElectionTimeout())
}

Then, a bit of core logic to request votes and send heartbeats.

func (rf *Raft) requestVotes() {
	args := &RequestVoteArgs{
		Term:         rf.currentTerm,
		CandidateId:  rf.me,
		LastLogIndex: rf.lastLogIndex(),
		LastLogTerm:  rf.lastLogTerm(),
	}

	for i, _ := range rf.peers {
		if i == rf.me {
			continue
		}

		go func(peer int) {
			reply := &RequestVoteReply{}
			ok := rf.sendRequestVote(peer, args, reply)

			rf.mu.Lock()
			defer rf.mu.Unlock()

			if args.Term != rf.currentTerm {
				return
			}

			if ok && reply.Term > rf.currentTerm {
				rf.becomeFollower(reply.Term)
				return
			}

			if ok && reply.VoteGranted {
				rf.collectedVotes++

				if rf.collectedVotes >= len(rf.peers)/2+1 {
					rf.state = Leader
					rf.sendHeartbeats()
				}
			}
		}(i)
	}
}
func (rf *Raft) sendHeartbeats() {
	args := &AppendEntriesArgs{
		Term:         rf.currentTerm,
		LeaderId:     rf.me,
		
		// Right now, these are always 0 and 1 because actual operations aren't implemented yet. Eventually these will be incremented when we do stuff.
		PrevLogIndex: 0,
		PrevLogTerm: 1,
		Entries:     make([]LogEntry, 0),

		// Also unused right now
		LeaderCommit: rf.commitIndex,
	}

	for i, _ := range rf.peers {
		if i == rf.me {
			continue
		}

		go func(peer int) {
			// Heartbeats are just empty AppendEntries! Keeps things simple. Love it.
			reply := &AppendEntriesReply{}
			ok := rf.sendHeartbeat(peer, args, reply)

			rf.mu.Lock()
			defer rf.mu.Unlock()

			// Request is stale, ignore.
			if args.Term != rf.currentTerm {
				return
			}

			// If a peer is ahead of us, demote
			if ok && args.Term > rf.currentTerm {
				rf.becomeFollower(args.Term)
				return
			}
		}(i)
	}
}

And.. that’s it!

Test (3A): initial election ...
  ... Passed --   3.1  3   62   15986    0
Test (3A): election after network failure ...
  ... Passed --   4.6  3  142   28090    0
Test (3A): multiple elections ...
  ... Passed --   5.5  7  708  134514    0
PASS
ok      6.5840/raft     13.118s

Boom!


Final structure

By default all nodes are followers with randomized election timers and a 100ms heartbeat.

If a node’s election timer goes off, they become a candidate and request votes from their peers.

If they get majority vote, they start sending heartbeats.

If at any point they receive data indicating they’re outdated (either from RequestVote or AppendEntires), they step down from office right away.

If the other nodes stop hearing heartbeats, eventually the one with the shortest timer will initiate an election and become the new leader. The cycle repeats.

It’s a pretty simple algorithm, which was what the designers had in mind when designing it.


Final Conclusions

What I talked about earlier, getting the data in order before embarking on algorithm implementation, was pretty critical here.

In a distributed system, debugging can take many steps. Since there’s a lot happening at once on many machines, bugs have more places to hide.

By cementing the data structures first before trying to do the algorithm design, it cut down on the number of things I had to keep in my head while actually writing the code. A bunch of different conditions have to be handled, so my mind had to be on logic rather than data.

I wish I handled the locks a bit better. Some functions expect the data to be unlocked before they’re called. Others expect it to be locked by the caller. And the decision for both cases was kind of just made on the whim based on the context the function is called in, and nothing else. Privately, I guess this is fine. But it makes it harder to follow the logic, and I spent at least an hour debugging something that was caused by something trying to obtain a lock for data that was being locked by a far off caller.

I think this was the thick of it though. Since leader election is working and passes all tests, it gives me a strong base to work off of. It’s clear where to start next time.