Raft 实验 Part C 总结回顾

在 MIT 6.824 分布式系统课程中，第二个实验是实现分布式共识算法 Raft，分成了 A/B/C 三个部分。之前的 A 和 B 部分实现了 Raft 中的领导选举（Leader election）和日志复制（Log replication），第 C 部分将会实现保存持久状态（Persistent State）。

Part 2A 和 Part 2B 部分基本已经把整个 Raft 的框架搭好了。如果基于Raft的服务器重新启动，则应从中断的位置恢复服务。这就要求 Raft 保持持久状态，使其在重启后仍然有效。论文的 Figure2 提到了哪些状态应该保持不变。

前两部分文章链接：

实验部分

Persistent State

论文图 2 直接告诉了我们 Persistent state on all servers: currentTerm、votedFor 和 log[]。在这个部分里面，我们不需要负责把数据写到磁盘上的文件里，而是只需要通过 Persister 对象实现 persist() 和 readPersist() 函数保存和恢复状态。

1
2
3
4
5
6
7
8
9
func (rf *Raft) persist() {
	// Your code here (2C).
	w := new(bytes.Buffer)
	e := labgob.NewEncoder(w)
	e.Encode(rf.currentTerm)
	e.Encode(rf.votedFor)
	e.Encode(rf.log)
	rf.persister.SaveRaftState(w.Bytes())
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
func (rf *Raft) readPersist(data []byte) {
	if data == nil || len(data) < 1 {
		return
	}
	// Your code here (2C).
	r := bytes.NewBuffer(data)
	d := labgob.NewDecoder(r)
	d.Decode(&rf.currentTerm)
	d.Decode(&rf.votedFor)
	d.Decode(&rf.log)
}

基本按照原来注释里的例子写就可以了。如果想实现的更好一些，也可以再定义一个结构体。注意在开头要 import "../labgob"

那么现在问题来了，该在哪些地方调用 persist() 和 readPersist() 呢？直觉告诉我们当上面提到的三个状态（currentTerm、votedFor 和 log[]）改变了就应该进行持久化。回顾一下之前的实现：

Make() 会在 Raft 实例被初始化的时候调用，因此这里会用到 readPersist() 来读取先前保存好的状态
Leader 在 Start() 函数里可能会更新自己本地的 log，需要 persist()
当 Raft 转变为 Candidate 和 Follower 的时候，currentTerm 和 votedFor 都可能发生变化，因此在相应地方加上 persist()
RequestVote 和 AppendEntries 函数同样也会改变状态，最后也需要 persist()

如果以上都完成了，Part C 的前几个基本测试就能够通过。

Back up nextIndex Optimization

不过还有几个 unreliable 的测试是没法通过的。这里我花了很久的时间去 Debug，因为理论上按 Figure 2 把所有的细节实现好了，那么 Figure 8 的问题是不会再出现的。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Test (2C): Figure 8 (unreliable) ...
--- FAIL: TestFigure8Unreliable2C (41.11s)
    config.go:475: one(4800) failed to reach agreement
Test (2C): churn ...
--- FAIL: TestReliableChurn2C (26.33s)
    config.go:475: one(7825959370052427039) failed to reach agreement
Test (2C): unreliable churn ...
--- FAIL: TestUnreliableChurn2C (26.18s)
    config.go:475: one(1988810278690071914) failed to reach agreement
FAIL

后来又读了一遍实验要求，发现这么一条 Hint：

You will probably need the optimization that backs up nextIndex by more than one entry at a time. Look at the extended Raft paper starting at the bottom of page 7 and top of page 8 (marked by a gray line). The paper is vague about the details; you will need to fill in the gaps, perhaps with the help of the 6.824 Raft lectures.

简而言之就是在之前的实现里面，每次当 AppendEntries 因为不一致的 log 而返回 false 时，我们只让 nextIndex 往回退了一个位置。对于有很多不一致的 entry 的话这样做显然不够 efficient。因此需要优化这一步骤，具体的做法是在 AppendEntriesReply 结构中增加一个 conflictIndex 项。

如果 Follower 没有 prevLogIndex 那一项，即 Follower log 长度不够，那么 conflictIndex = len(log)。具体可以看 Student's Guide to Raft，他同时用了 conflictIndex 和 conflictTerm，这里我偷懒了，最好要补上。之后 decrement nextIndex and retry 那一步改写成：

1
2
3
4
5
6
7
8
9
// If last log index ≥ nextIndex for a follower: send
// AppendEntries RPC with log entries starting at nextIndex
// • If successful: update nextIndex and matchIndex for follower (§5.3)
// • If AppendEntries fails because of log inconsistency:
// decrement nextIndex and retry (§5.3)
if reply.Success {
} else {
	rf.nextIndex[server] = reply.conflictIndex
}

测试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
➜  raft git:(lab2) ✗ go test -run 2C                     
Test (2C): basic persistence ...
labgob warning: Decoding into a non-default variable/field int may not work
  ... Passed --   3.4  3  308   64734    6
Test (2C): more persistence ...
  ... Passed --  18.7  5 2364  413576   16
Test (2C): partitioned leader and one follower crash, leader restarts ...
  ... Passed --   2.1  3   82   18519    4
Test (2C): Figure 8 ...
  ... Passed --  26.8  5 26948 6112820   14
Test (2C): unreliable agreement ...
  ... Passed --   3.5  5  228   80581  246
Test (2C): Figure 8 (unreliable) ...
  ... Passed --  27.9  5 3900 21820584  132
Test (2C): churn ...
  ... Passed --  16.2  5 1204 1825170  696
Test (2C): unreliable churn ...
  ... Passed --  16.3  5 1755 1290154   72
PASS
ok      _/src/raft  117.704s

总结回顾

Raft 简明扼要，比较好懂，但是实现起来要注意的细节非常多，得把 Figure 2 仔仔细细的看完并且实现每一个部分。因为是分布式的系统，Raft 调试起来也会有不小的麻烦，Youtube 上的这节课 Lecture 5: Go, Threads, and Raft 对调试会提供不少的帮助，后半段视频也提供了 Leader Election 部分思路。

这门课以前 TA 写的 Student's Guide to Raft 非常详细的解释了 Raft 实现中的一些注意点，非常有用。我在写完了整个 Lab 之后才看，发现很多坑和理解了很久的部分，这篇文章都有所涉及。甚至对于一些具体的细节也解释的非常清楚，强烈推荐看一下。

Raft 实验 Part C 总结回顾

实验部分

Persistent State

Back up nextIndex Optimization

测试

总结回顾

参考

相关文章：