├── LICENSE
├── README.md
├── project1
    ├── CS380D_P1.pdf
    ├── Raft Locking Advice.pdf
    ├── Raft Structure Advice.pdf
    └── src
    │   ├── labgob
    │       ├── labgob.go
    │       └── test_test.go
    │   ├── labrpc
    │       ├── labrpc.go
    │       └── test_test.go
    │   └── raft
    │       ├── config.go
    │       ├── persister.go
    │       ├── raft.go
    │       ├── test_test.go
    │       └── util.go
├── slides
    ├── dist-sys-slides.pdf
    ├── slides-class1.pdf
    ├── slides-lecture2.pdf
    ├── slides-lecture3.pdf
    ├── slides-lecture4.pdf
    ├── slides-lecture5.pdf
    ├── slides-lecture6.pdf
    ├── slides-lecture7.pdf
    └── slides-lecture8.pdf
└── slides_from_spring_2020
    ├── .DS_Store
    ├── all-slides.pdf
    ├── dist-sys-slides-feb11.pdf
    ├── dist-sys-slides-feb13.pdf
    ├── dist-sys-slides-feb18.pdf
    ├── dist-sys-slides-feb20.pdf
    ├── dist-sys-slides-feb25.pdf
    ├── dist-sys-slides-feb27.pdf
    ├── dist-sys-slides-feb4.pdf
    ├── dist-sys-slides-feb6.pdf
    ├── dist-sys-slides-jan23.pdf
    ├── dist-sys-slides-jan28.pdf
    ├── dist-sys-slides-jan30.pdf
    ├── dist-sys-slides-mar3.pdf
    ├── dist-sys-slides-mar5.pdf
    ├── slides-spring20.key
    └── slides.pdf


/LICENSE:
--------------------------------------------------------------------------------
 1 | License
 2 | 
 3 | Copyright (c) 2021, Vijay Chidambaram and the University of Texas at Austin
 4 | All rights reserved.
 5 | 
 6 | These course materials, including, but not limited to, lecture notes,
 7 | homeworks, and projects are copyright protected. You must ask me
 8 | permission to use these materials.
 9 | 
10 | I do not grant to you the right to publish these materials for profit
11 | in any form. Any unauthorized copying of the class materials is a
12 | violation of federal law and may result in disciplinary actions being
13 | taken against the student or other legal action against an outside
14 | entity. Additionally, the sharing of class materials without the
15 | specific, express approval of the instructor may be a violation of the
16 | University's Student Honor Code and an act of academic dishonesty,
17 | which could result in further disciplinary action. This includes,
18 | among other things, uploading class materials to websites for the
19 | purpose of sharing those materials with other current or future
20 | students.
21 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## CS 380D Distributed Systems (Spring 2021)
  2 | 
  3 | Welcome to CS 380D Distributed Systems. This is a course designed to
  4 | expose students to the basics of distributed systems. This course has
  5 | elements of both undergraduate and graduate classes in it. Like
  6 | undergraduate classes, there will be lectures where I present the
  7 | fundamentals. Like graduate classes, there will lectures where we will
  8 | read and discuss research papers. We will also be doing two projects:
  9 | an implementation project, and a research project.
 10 | 
 11 | Undergrads considering taking this course: this course will be very
 12 | different from the traditional undergraduate course. Whenever we are
 13 | discussing papers, the reading for each class will be heavy. The
 14 | research project will be vague (by definition). That being said,
 15 | undergraduate students have previously taken and excelled in this
 16 | class.
 17 | 
 18 | This course will introduce students to
 19 |   a range of exciting topics including:
 20 | 
 21 | - State machines
 22 | - Consensus
 23 | - Failure detectors
 24 | - Distributed Storage Systems
 25 | - Byzantine failures
 26 | - Lamport clocks
 27 | - Snapshots
 28 | - Consistency models
 29 | - Replication protocols
 30 | - MapReduce
 31 | 
 32 | and more! 
 33 | 
 34 | **Piazza Link: [piazza.com/utexas/spring2021/cs380d](http://piazza.com/utexas/spring2021/cs380d)** 
 35 | 
 36 | **Canvas Link:
 37 | [https://utexas.instructure.com/courses/1304719](https://utexas.instructure.com/courses/1304719)** 
 38 | 
 39 | **Class Timing and Location**: TuTh 3:30 pm - 5 pm, [Zoom](https://utexas.zoom.us/j/99987044895)
 40 | 
 41 | **[Schedule](https://docs.google.com/spreadsheets/d/1oZlRvpU_vd8KGebd1mJD71xMpTFW3-e1vMuhYJDJqjs/edit?usp=sharing)**
 42 | 
 43 | **Instructor: [Vijay Chidambaram](https://www.cs.utexas.edu/~vijay/)**
 44 | 
 45 | Email: vijayc@utexas.edu
 46 | 
 47 | Office hours: Fri 3 to 4 PM, [Zoom](https://utexas.zoom.us/j/92558966439)
 48 | 
 49 | **TA: [Bingye Li](https://www.linkedin.com/in/bingye-li/)**
 50 | 
 51 | Email: libingye@utexas.edu
 52 | 
 53 | Office hours: Wed 5-6 PM CST, [Zoom](https://utexas.zoom.us/j/92494447869)
 54 | 
 55 | ### Learning Goals and Objectives
 56 | 
 57 | By the end of course, the student must know basic concepts in building distributed systems, must have first-hand experience in building and testing a small distributed system, and gain experience in either distributed-systems research or working on open-source distributed systems.
 58 | 
 59 | ### Grading 
 60 | 
 61 | **10%** Weekly reading <br>
 62 | **20%** Midterm-1 <br>
 63 | **20%** Midterm-2 <br>
 64 | **25%** Project 1: Implementing a distributed key-value store <br>
 65 | **25%** Project 2: Research project <br>
 66 | 
 67 | ### Letter grades 
 68 | 
 69 | Letter grades will be given out based on absolute scores. Typically, grades are given as follows:
 70 | 
 71 | ```
 72 | >= 94 A
 73 | >= 90 A-
 74 | >= 87.5 B+
 75 | >= 85 B
 76 | >= 80 B-
 77 | >= 77.5 C+
 78 | ```
 79 | 
 80 | ### Extra Credit
 81 | 
 82 | There are a number of opportunities to earn extra credit. One of the
 83 | goals of this class is to get you started in research in distributed
 84 | systems. 
 85 | 
 86 | 1. Introduction note. Submit a PDF to the Canvas extra-credit assignment with your
 87 | name and photo, introducing yourself. Tell me some interesting facts
 88 | about yourself! This is worth 0.25% (of the total grade) extra credit.
 89 | 
 90 | 2. Class survey. You will receive 0.25% for completing an official survey near the end of
 91 | class. You will need to upload a screenshot of the completed survey to Canvas.
 92 | 
 93 | ### Exams
 94 | 
 95 | There will be two midterms. There will not be a final exam.
 96 | 
 97 | Midterm 1: **Mar 12th** (in-class) <br>
 98 | Midterm 2: **May 7th** (in-class, tentatively) <br>
 99 | 
100 | You will be allowed one A4 sheet of paper on which you can bring notes
101 | for the exam. Laptops, tablets, and ereaders are **banned** from
102 | exams. You should not need them in an exam, and they are far too
103 | flexible as communication devices to make enforcement of
104 | non-communication policies enforceable. Any use of a communication
105 | device for any reason in the exam room will earn an automatic zero on
106 | the exam.
107 | 
108 | ### Projects
109 | 
110 | <p>There will be two big projects in the course. Students will work in
111 |   groups of 1--2 for Project 1 and 1--3 for Project 2.</p>
112 | 
113 | <p>The first project will involve building Raft, a replicated state machine protocol. More details can be found <a href="https://github.com/vijay03/cs380d-s21/blob/master/project1/CS380D_P1.pdf">here</a>. </p>
114 | 
115 | 
116 | <p>The second project is an open-ended research project. The team will
117 | need to extend an existing research project or come up with a new
118 | idea. The team is required to build the prototype, perform
119 | experiments, and write-up a conference style 12 page report.</p>
120 | 
121 | ### Deadlines (tentative)
122 | 
123 | **Mar 5** Project 1 due <br>
124 | **Mar 12** Midterm 1 <br>
125 | **May 6** Midterm 2 <br>
126 | **Apr 6** Project 2 Proposal due <br>
127 | **May 14** Project 2 Report due  <br>
128 | 
129 | ### Course Policies
130 | 
131 | <p>Students with disabilities may request appropriate academic
132 | accommodations from the Division of Diversity and Community
133 | Engagement, Services for Students with Disabilities, 512-471-6259,
134 | <a href="http://www.utexas.edu/diversity/ddce/ssd/">http://www.utexas.edu/diversity/ddce/ssd/</a>.</p>
135 | 
136 | <p><b>Religious Holy Days</b>: A student who is absent from an
137 | examination or cannot meet an assignment deadline due to the
138 | observance of a religious holy day may take the exam on an alternate
139 | day or submit the assignment up to 24 hours late without penalty, if
140 | proper notice of the planned absence has been given. Notice must be
141 | given at least 14 days prior to the classes which will be missed. For
142 | religious holy days that fall within the first 2 weeks of the
143 | semester, notice should be given on the first day of the
144 | semester. Notice must be personally delivered to the instructor and
145 | signed and dated by the instructor, or sent certified mail. Email
146 | notification will be accepted if received, but a student submitting
147 | email notification must receive email confirmation from the
148 | instructor.</p>
149 | 
150 | #### Collaboration 
151 | 
152 | 1. The students are encouraged to do the projects in groups of two or three.
153 | 2. All exams are done individually, with absolutely no collaboration.
154 | 3. Each student must present.
155 | 4. I strongly encourage you to discuss the projects and assignments with
156 | anyone you can. That's the way good science happens. But all work and
157 | writeup for the assignment must be your own, and only your own.
158 | 5. As a professional, you should acknowledge significant contributions or
159 | collaborations in your written or spoken presentations.
160 | 6. The student code of conduct
161 | is <a href="http://www.cs.utexas.edu/users/ear/CodeOfConduct.html">here</a>. Intellectual
162 | dishonesty can end your career, and it is your responsibility to stay
163 | on the right side of the line. If you are not sure about something,
164 |   ask.
165 | 7. **The penalty for cheating on an exam, project or assignment in
166 |     this course is an F in the course and a referral to the Dean of
167 |     Students office.**
168 | 8. You cross over from collaboration to cheating when you look at
169 |     another person/team's source code. **Discussing ideas is okay,
170 |   sharing code is not**.
171 | 9. You also may not look at any course project material relating to
172 |   any project similar to or the same as this course's class
173 |   projects. For example, you may not look at the work done by a
174 |   student in past years' courses, and you may not look at similar
175 |   course projects at other universities.
176 | 10. All submitted work must be new and original.
177 | 
178 | #### Late Policy
179 | 
180 | 1. All projects/assignments must be submitted in class the day they
181 | are due.
182 | 2. For each day a project/assignment is late, you lose 5% of the
183 |   points for that project. So if you submit two days after the
184 |   deadline, your maximum points on that project will be 90%.
185 | 3. In this class, it is always better to do the work (even late) than not
186 | do it at all.
187 | 4. If you become ill: contact the instructor. A medical note is
188 |  required to miss an exam.
189 | 
190 | ### Acknowledgements
191 | 
192 | This course is inspired by (and uses material from) courses taught
193 |   by <a href="http://www.cs.cornell.edu/lorenzo/">Lorenzo Alvisi</a>, <a href="http://www.cs.unc.edu/~porter/">Don
194 |   Porter</a>, <a href="www.cs.utexas.edu/~ans">Alison
195 |   Norman</a>, <a href="http://pages.cs.wisc.edu/~remzi/">Remzi
196 |   Arpaci-Dusseau</a>, <a href="http://www.cs.utexas.edu/~simon/">Simon
197 |   Peter</a>, and <a href="https://www.cs.utexas.edu/~rossbach/">Chris
198 |   Rossbach</a>.
199 | 
200 | The course follows Prof. [Martin Kleppmann](https://martin.kleppmann.com/)'s course on [Distributed
201 | Systems](https://www.cst.cam.ac.uk/teaching/2021/ConcDisSys), and uses [his lecture slides](https://martin.kleppmann.com/2020/11/18/distributed-systems-and-elliptic-curves.html).
202 | 
203 | ### Copyright
204 | 
205 | <p>Copyright Notice: These course materials, including, but not
206 | limited to, lecture notes, homeworks, and projects are copyright
207 | protected.  You must ask me, or the original author, permission to use
208 | these materials.</p>
209 | 
210 | <p>I do not grant to you the right to publish these materials for profit
211 |   in any form. Any unauthorized copying of the class materials is a
212 |   violation of federal law and may result in disciplinary actions
213 |   being taken against the student or other legal action against an
214 |   outside entity. Additionally, the sharing of class materials without
215 |   the specific, express approval of the instructor may be a violation
216 |   of the University's Student Honor Code and an act of academic
217 |   dishonesty, which could result in further disciplinary action. This
218 |   includes, among other things, uploading class materials to websites
219 |   for the purpose of sharing those materials with other current or
220 |   future students.
221 | 


--------------------------------------------------------------------------------
/project1/CS380D_P1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/project1/CS380D_P1.pdf


--------------------------------------------------------------------------------
/project1/Raft Locking Advice.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/project1/Raft Locking Advice.pdf


--------------------------------------------------------------------------------
/project1/Raft Structure Advice.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/project1/Raft Structure Advice.pdf


--------------------------------------------------------------------------------
/project1/src/labgob/labgob.go:
--------------------------------------------------------------------------------
  1 | package labgob
  2 | 
  3 | //
  4 | // trying to send non-capitalized fields over RPC produces a range of
  5 | // misbehavior, including both mysterious incorrect computation and
  6 | // outright crashes. so this wrapper around Go's encoding/gob warns
  7 | // about non-capitalized field names.
  8 | //
  9 | 
 10 | import "encoding/gob"
 11 | import "io"
 12 | import "reflect"
 13 | import "fmt"
 14 | import "sync"
 15 | import "unicode"
 16 | import "unicode/utf8"
 17 | 
 18 | var mu sync.Mutex
 19 | var errorCount int // for TestCapital
 20 | var checked map[reflect.Type]bool
 21 | 
 22 | type LabEncoder struct {
 23 | 	gob *gob.Encoder
 24 | }
 25 | 
 26 | func NewEncoder(w io.Writer) *LabEncoder {
 27 | 	enc := &LabEncoder{}
 28 | 	enc.gob = gob.NewEncoder(w)
 29 | 	return enc
 30 | }
 31 | 
 32 | func (enc *LabEncoder) Encode(e interface{}) error {
 33 | 	checkValue(e)
 34 | 	return enc.gob.Encode(e)
 35 | }
 36 | 
 37 | func (enc *LabEncoder) EncodeValue(value reflect.Value) error {
 38 | 	checkValue(value.Interface())
 39 | 	return enc.gob.EncodeValue(value)
 40 | }
 41 | 
 42 | type LabDecoder struct {
 43 | 	gob *gob.Decoder
 44 | }
 45 | 
 46 | func NewDecoder(r io.Reader) *LabDecoder {
 47 | 	dec := &LabDecoder{}
 48 | 	dec.gob = gob.NewDecoder(r)
 49 | 	return dec
 50 | }
 51 | 
 52 | func (dec *LabDecoder) Decode(e interface{}) error {
 53 | 	checkValue(e)
 54 | 	checkDefault(e)
 55 | 	return dec.gob.Decode(e)
 56 | }
 57 | 
 58 | func Register(value interface{}) {
 59 | 	checkValue(value)
 60 | 	gob.Register(value)
 61 | }
 62 | 
 63 | func RegisterName(name string, value interface{}) {
 64 | 	checkValue(value)
 65 | 	gob.RegisterName(name, value)
 66 | }
 67 | 
 68 | func checkValue(value interface{}) {
 69 | 	checkType(reflect.TypeOf(value))
 70 | }
 71 | 
 72 | func checkType(t reflect.Type) {
 73 | 	k := t.Kind()
 74 | 
 75 | 	mu.Lock()
 76 | 	// only complain once, and avoid recursion.
 77 | 	if checked == nil {
 78 | 		checked = map[reflect.Type]bool{}
 79 | 	}
 80 | 	if checked[t] {
 81 | 		mu.Unlock()
 82 | 		return
 83 | 	}
 84 | 	checked[t] = true
 85 | 	mu.Unlock()
 86 | 
 87 | 	switch k {
 88 | 	case reflect.Struct:
 89 | 		for i := 0; i < t.NumField(); i++ {
 90 | 			f := t.Field(i)
 91 | 			rune, _ := utf8.DecodeRuneInString(f.Name)
 92 | 			if unicode.IsUpper(rune) == false {
 93 | 				// ta da
 94 | 				fmt.Printf("labgob error: lower-case field %v of %v in RPC or persist/snapshot will break your Raft\n",
 95 | 					f.Name, t.Name())
 96 | 				mu.Lock()
 97 | 				errorCount += 1
 98 | 				mu.Unlock()
 99 | 			}
100 | 			checkType(f.Type)
101 | 		}
102 | 		return
103 | 	case reflect.Slice, reflect.Array, reflect.Ptr:
104 | 		checkType(t.Elem())
105 | 		return
106 | 	case reflect.Map:
107 | 		checkType(t.Elem())
108 | 		checkType(t.Key())
109 | 		return
110 | 	default:
111 | 		return
112 | 	}
113 | }
114 | 
115 | //
116 | // warn if the value contains non-default values,
117 | // as it would if one sent an RPC but the reply
118 | // struct was already modified. if the RPC reply
119 | // contains default values, GOB won't overwrite
120 | // the non-default value.
121 | //
122 | func checkDefault(value interface{}) {
123 | 	if value == nil {
124 | 		return
125 | 	}
126 | 	checkDefault1(reflect.ValueOf(value), 1, "")
127 | }
128 | 
129 | func checkDefault1(value reflect.Value, depth int, name string) {
130 | 	if depth > 3 {
131 | 		return
132 | 	}
133 | 
134 | 	t := value.Type()
135 | 	k := t.Kind()
136 | 
137 | 	switch k {
138 | 	case reflect.Struct:
139 | 		for i := 0; i < t.NumField(); i++ {
140 | 			vv := value.Field(i)
141 | 			name1 := t.Field(i).Name
142 | 			if name != "" {
143 | 				name1 = name + "." + name1
144 | 			}
145 | 			checkDefault1(vv, depth+1, name1)
146 | 		}
147 | 		return
148 | 	case reflect.Ptr:
149 | 		if value.IsNil() {
150 | 			return
151 | 		}
152 | 		checkDefault1(value.Elem(), depth+1, name)
153 | 		return
154 | 	case reflect.Bool,
155 | 		reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64,
156 | 		reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64,
157 | 		reflect.Uintptr, reflect.Float32, reflect.Float64,
158 | 		reflect.String:
159 | 		if reflect.DeepEqual(reflect.Zero(t).Interface(), value.Interface()) == false {
160 | 			mu.Lock()
161 | 			if errorCount < 1 {
162 | 				what := name
163 | 				if what == "" {
164 | 					what = t.Name()
165 | 				}
166 | 				// this warning typically arises if code re-uses the same RPC reply
167 | 				// variable for multiple RPC calls, or if code restores persisted
168 | 				// state into variable that already have non-default values.
169 | 				fmt.Printf("labgob warning: Decoding into a non-default variable/field %v may not work\n",
170 | 					what)
171 | 			}
172 | 			errorCount += 1
173 | 			mu.Unlock()
174 | 		}
175 | 		return
176 | 	}
177 | }
178 | 


--------------------------------------------------------------------------------
/project1/src/labgob/test_test.go:
--------------------------------------------------------------------------------
  1 | package labgob
  2 | 
  3 | import "testing"
  4 | 
  5 | import "bytes"
  6 | 
  7 | type T1 struct {
  8 | 	T1int0    int
  9 | 	T1int1    int
 10 | 	T1string0 string
 11 | 	T1string1 string
 12 | }
 13 | 
 14 | type T2 struct {
 15 | 	T2slice []T1
 16 | 	T2map   map[int]*T1
 17 | 	T2t3    interface{}
 18 | }
 19 | 
 20 | type T3 struct {
 21 | 	T3int999 int
 22 | }
 23 | 
 24 | //
 25 | // test that we didn't break GOB.
 26 | //
 27 | func TestGOB(t *testing.T) {
 28 | 	e0 := errorCount
 29 | 
 30 | 	w := new(bytes.Buffer)
 31 | 
 32 | 	Register(T3{})
 33 | 
 34 | 	{
 35 | 		x0 := 0
 36 | 		x1 := 1
 37 | 		t1 := T1{}
 38 | 		t1.T1int1 = 1
 39 | 		t1.T1string1 = "6.824"
 40 | 		t2 := T2{}
 41 | 		t2.T2slice = []T1{T1{}, t1}
 42 | 		t2.T2map = map[int]*T1{}
 43 | 		t2.T2map[99] = &T1{1, 2, "x", "y"}
 44 | 		t2.T2t3 = T3{999}
 45 | 
 46 | 		e := NewEncoder(w)
 47 | 		e.Encode(x0)
 48 | 		e.Encode(x1)
 49 | 		e.Encode(t1)
 50 | 		e.Encode(t2)
 51 | 	}
 52 | 	data := w.Bytes()
 53 | 
 54 | 	{
 55 | 		var x0 int
 56 | 		var x1 int
 57 | 		var t1 T1
 58 | 		var t2 T2
 59 | 
 60 | 		r := bytes.NewBuffer(data)
 61 | 		d := NewDecoder(r)
 62 | 		if d.Decode(&x0) != nil ||
 63 | 			d.Decode(&x1) != nil ||
 64 | 			d.Decode(&t1) != nil ||
 65 | 			d.Decode(&t2) != nil {
 66 | 			t.Fatalf("Decode failed")
 67 | 		}
 68 | 
 69 | 		if x0 != 0 {
 70 | 			t.Fatalf("wrong x0 %v\n", x0)
 71 | 		}
 72 | 		if x1 != 1 {
 73 | 			t.Fatalf("wrong x1 %v\n", x1)
 74 | 		}
 75 | 		if t1.T1int0 != 0 {
 76 | 			t.Fatalf("wrong t1.T1int0 %v\n", t1.T1int0)
 77 | 		}
 78 | 		if t1.T1int1 != 1 {
 79 | 			t.Fatalf("wrong t1.T1int1 %v\n", t1.T1int1)
 80 | 		}
 81 | 		if t1.T1string0 != "" {
 82 | 			t.Fatalf("wrong t1.T1string0 %v\n", t1.T1string0)
 83 | 		}
 84 | 		if t1.T1string1 != "6.824" {
 85 | 			t.Fatalf("wrong t1.T1string1 %v\n", t1.T1string1)
 86 | 		}
 87 | 		if len(t2.T2slice) != 2 {
 88 | 			t.Fatalf("wrong t2.T2slice len %v\n", len(t2.T2slice))
 89 | 		}
 90 | 		if t2.T2slice[1].T1int1 != 1 {
 91 | 			t.Fatalf("wrong slice value\n")
 92 | 		}
 93 | 		if len(t2.T2map) != 1 {
 94 | 			t.Fatalf("wrong t2.T2map len %v\n", len(t2.T2map))
 95 | 		}
 96 | 		if t2.T2map[99].T1string1 != "y" {
 97 | 			t.Fatalf("wrong map value\n")
 98 | 		}
 99 | 		t3 := (t2.T2t3).(T3)
100 | 		if t3.T3int999 != 999 {
101 | 			t.Fatalf("wrong t2.T2t3.T3int999\n")
102 | 		}
103 | 	}
104 | 
105 | 	if errorCount != e0 {
106 | 		t.Fatalf("there were errors, but should not have been")
107 | 	}
108 | }
109 | 
110 | type T4 struct {
111 | 	Yes int
112 | 	no  int
113 | }
114 | 
115 | //
116 | // make sure we check capitalization
117 | // labgob prints one warning during this test.
118 | //
119 | func TestCapital(t *testing.T) {
120 | 	e0 := errorCount
121 | 
122 | 	v := []map[*T4]int{}
123 | 
124 | 	w := new(bytes.Buffer)
125 | 	e := NewEncoder(w)
126 | 	e.Encode(v)
127 | 	data := w.Bytes()
128 | 
129 | 	var v1 []map[T4]int
130 | 	r := bytes.NewBuffer(data)
131 | 	d := NewDecoder(r)
132 | 	d.Decode(&v1)
133 | 
134 | 	if errorCount != e0+1 {
135 | 		t.Fatalf("failed to warn about lower-case field")
136 | 	}
137 | }
138 | 
139 | //
140 | // check that we warn when someone sends a default value over
141 | // RPC but the target into which we're decoding holds a non-default
142 | // value, which GOB seems not to overwrite as you'd expect.
143 | //
144 | // labgob does not print a warning.
145 | //
146 | func TestDefault(t *testing.T) {
147 | 	e0 := errorCount
148 | 
149 | 	type DD struct {
150 | 		X int
151 | 	}
152 | 
153 | 	// send a default value...
154 | 	dd1 := DD{}
155 | 
156 | 	w := new(bytes.Buffer)
157 | 	e := NewEncoder(w)
158 | 	e.Encode(dd1)
159 | 	data := w.Bytes()
160 | 
161 | 	// and receive it into memory that already
162 | 	// holds non-default values.
163 | 	reply := DD{99}
164 | 
165 | 	r := bytes.NewBuffer(data)
166 | 	d := NewDecoder(r)
167 | 	d.Decode(&reply)
168 | 
169 | 	if errorCount != e0+1 {
170 | 		t.Fatalf("failed to warn about decoding into non-default value")
171 | 	}
172 | }
173 | 


--------------------------------------------------------------------------------
/project1/src/labrpc/labrpc.go:
--------------------------------------------------------------------------------
  1 | package labrpc
  2 | 
  3 | //
  4 | // channel-based RPC, for 824 labs.
  5 | //
  6 | // simulates a network that can lose requests, lose replies,
  7 | // delay messages, and entirely disconnect particular hosts.
  8 | //
  9 | // we will use the original labrpc.go to test your code for grading.
 10 | // so, while you can modify this code to help you debug, please
 11 | // test against the original before submitting.
 12 | //
 13 | // adapted from Go net/rpc/server.go.
 14 | //
 15 | // sends labgob-encoded values to ensure that RPCs
 16 | // don't include references to program objects.
 17 | //
 18 | // net := MakeNetwork() -- holds network, clients, servers.
 19 | // end := net.MakeEnd(endname) -- create a client end-point, to talk to one server.
 20 | // net.AddServer(servername, server) -- adds a named server to network.
 21 | // net.DeleteServer(servername) -- eliminate the named server.
 22 | // net.Connect(endname, servername) -- connect a client to a server.
 23 | // net.Enable(endname, enabled) -- enable/disable a client.
 24 | // net.Reliable(bool) -- false means drop/delay messages
 25 | //
 26 | // end.Call("Raft.AppendEntries", &args, &reply) -- send an RPC, wait for reply.
 27 | // the "Raft" is the name of the server struct to be called.
 28 | // the "AppendEntries" is the name of the method to be called.
 29 | // Call() returns true to indicate that the server executed the request
 30 | // and the reply is valid.
 31 | // Call() returns false if the network lost the request or reply
 32 | // or the server is down.
 33 | // It is OK to have multiple Call()s in progress at the same time on the
 34 | // same ClientEnd.
 35 | // Concurrent calls to Call() may be delivered to the server out of order,
 36 | // since the network may re-order messages.
 37 | // Call() is guaranteed to return (perhaps after a delay) *except* if the
 38 | // handler function on the server side does not return.
 39 | // the server RPC handler function must declare its args and reply arguments
 40 | // as pointers, so that their types exactly match the types of the arguments
 41 | // to Call().
 42 | //
 43 | // srv := MakeServer()
 44 | // srv.AddService(svc) -- a server can have multiple services, e.g. Raft and k/v
 45 | //   pass srv to net.AddServer()
 46 | //
 47 | // svc := MakeService(receiverObject) -- obj's methods will handle RPCs
 48 | //   much like Go's rpcs.Register()
 49 | //   pass svc to srv.AddService()
 50 | //
 51 | 
 52 | import "../labgob"
 53 | import "bytes"
 54 | import "reflect"
 55 | import "sync"
 56 | import "log"
 57 | import "strings"
 58 | import "math/rand"
 59 | import "time"
 60 | import "sync/atomic"
 61 | 
 62 | type reqMsg struct {
 63 | 	endname  interface{} // name of sending ClientEnd
 64 | 	svcMeth  string      // e.g. "Raft.AppendEntries"
 65 | 	argsType reflect.Type
 66 | 	args     []byte
 67 | 	replyCh  chan replyMsg
 68 | }
 69 | 
 70 | type replyMsg struct {
 71 | 	ok    bool
 72 | 	reply []byte
 73 | }
 74 | 
 75 | type ClientEnd struct {
 76 | 	endname interface{}   // this end-point's name
 77 | 	ch      chan reqMsg   // copy of Network.endCh
 78 | 	done    chan struct{} // closed when Network is cleaned up
 79 | }
 80 | 
 81 | // send an RPC, wait for the reply.
 82 | // the return value indicates success; false means that
 83 | // no reply was received from the server.
 84 | func (e *ClientEnd) Call(svcMeth string, args interface{}, reply interface{}) bool {
 85 | 	req := reqMsg{}
 86 | 	req.endname = e.endname
 87 | 	req.svcMeth = svcMeth
 88 | 	req.argsType = reflect.TypeOf(args)
 89 | 	req.replyCh = make(chan replyMsg)
 90 | 
 91 | 	qb := new(bytes.Buffer)
 92 | 	qe := labgob.NewEncoder(qb)
 93 | 	qe.Encode(args)
 94 | 	req.args = qb.Bytes()
 95 | 
 96 | 	//
 97 | 	// send the request.
 98 | 	//
 99 | 	select {
100 | 	case e.ch <- req:
101 | 		// the request has been sent.
102 | 	case <-e.done:
103 | 		// entire Network has been destroyed.
104 | 		return false
105 | 	}
106 | 
107 | 	//
108 | 	// wait for the reply.
109 | 	//
110 | 	rep := <-req.replyCh
111 | 	if rep.ok {
112 | 		rb := bytes.NewBuffer(rep.reply)
113 | 		rd := labgob.NewDecoder(rb)
114 | 		if err := rd.Decode(reply); err != nil {
115 | 			log.Fatalf("ClientEnd.Call(): decode reply: %v\n", err)
116 | 		}
117 | 		return true
118 | 	} else {
119 | 		return false
120 | 	}
121 | }
122 | 
123 | type Network struct {
124 | 	mu             sync.Mutex
125 | 	reliable       bool
126 | 	longDelays     bool                        // pause a long time on send on disabled connection
127 | 	longReordering bool                        // sometimes delay replies a long time
128 | 	ends           map[interface{}]*ClientEnd  // ends, by name
129 | 	enabled        map[interface{}]bool        // by end name
130 | 	servers        map[interface{}]*Server     // servers, by name
131 | 	connections    map[interface{}]interface{} // endname -> servername
132 | 	endCh          chan reqMsg
133 | 	done           chan struct{} // closed when Network is cleaned up
134 | 	count          int32         // total RPC count, for statistics
135 | 	bytes          int64         // total bytes send, for statistics
136 | }
137 | 
138 | func MakeNetwork() *Network {
139 | 	rn := &Network{}
140 | 	rn.reliable = true
141 | 	rn.ends = map[interface{}]*ClientEnd{}
142 | 	rn.enabled = map[interface{}]bool{}
143 | 	rn.servers = map[interface{}]*Server{}
144 | 	rn.connections = map[interface{}](interface{}){}
145 | 	rn.endCh = make(chan reqMsg)
146 | 	rn.done = make(chan struct{})
147 | 
148 | 	// single goroutine to handle all ClientEnd.Call()s
149 | 	go func() {
150 | 		for {
151 | 			select {
152 | 			case xreq := <-rn.endCh:
153 | 				atomic.AddInt32(&rn.count, 1)
154 | 				atomic.AddInt64(&rn.bytes, int64(len(xreq.args)))
155 | 				go rn.processReq(xreq)
156 | 			case <-rn.done:
157 | 				return
158 | 			}
159 | 		}
160 | 	}()
161 | 
162 | 	return rn
163 | }
164 | 
165 | func (rn *Network) Cleanup() {
166 | 	close(rn.done)
167 | }
168 | 
169 | func (rn *Network) Reliable(yes bool) {
170 | 	rn.mu.Lock()
171 | 	defer rn.mu.Unlock()
172 | 
173 | 	rn.reliable = yes
174 | }
175 | 
176 | func (rn *Network) LongReordering(yes bool) {
177 | 	rn.mu.Lock()
178 | 	defer rn.mu.Unlock()
179 | 
180 | 	rn.longReordering = yes
181 | }
182 | 
183 | func (rn *Network) LongDelays(yes bool) {
184 | 	rn.mu.Lock()
185 | 	defer rn.mu.Unlock()
186 | 
187 | 	rn.longDelays = yes
188 | }
189 | 
190 | func (rn *Network) readEndnameInfo(endname interface{}) (enabled bool,
191 | 	servername interface{}, server *Server, reliable bool, longreordering bool,
192 | ) {
193 | 	rn.mu.Lock()
194 | 	defer rn.mu.Unlock()
195 | 
196 | 	enabled = rn.enabled[endname]
197 | 	servername = rn.connections[endname]
198 | 	if servername != nil {
199 | 		server = rn.servers[servername]
200 | 	}
201 | 	reliable = rn.reliable
202 | 	longreordering = rn.longReordering
203 | 	return
204 | }
205 | 
206 | func (rn *Network) isServerDead(endname interface{}, servername interface{}, server *Server) bool {
207 | 	rn.mu.Lock()
208 | 	defer rn.mu.Unlock()
209 | 
210 | 	if rn.enabled[endname] == false || rn.servers[servername] != server {
211 | 		return true
212 | 	}
213 | 	return false
214 | }
215 | 
216 | func (rn *Network) processReq(req reqMsg) {
217 | 	enabled, servername, server, reliable, longreordering := rn.readEndnameInfo(req.endname)
218 | 
219 | 	if enabled && servername != nil && server != nil {
220 | 		if reliable == false {
221 | 			// short delay
222 | 			ms := (rand.Int() % 27)
223 | 			time.Sleep(time.Duration(ms) * time.Millisecond)
224 | 		}
225 | 
226 | 		if reliable == false && (rand.Int()%1000) < 100 {
227 | 			// drop the request, return as if timeout
228 | 			req.replyCh <- replyMsg{false, nil}
229 | 			return
230 | 		}
231 | 
232 | 		// execute the request (call the RPC handler).
233 | 		// in a separate thread so that we can periodically check
234 | 		// if the server has been killed and the RPC should get a
235 | 		// failure reply.
236 | 		ech := make(chan replyMsg)
237 | 		go func() {
238 | 			r := server.dispatch(req)
239 | 			ech <- r
240 | 		}()
241 | 
242 | 		// wait for handler to return,
243 | 		// but stop waiting if DeleteServer() has been called,
244 | 		// and return an error.
245 | 		var reply replyMsg
246 | 		replyOK := false
247 | 		serverDead := false
248 | 		for replyOK == false && serverDead == false {
249 | 			select {
250 | 			case reply = <-ech:
251 | 				replyOK = true
252 | 			case <-time.After(100 * time.Millisecond):
253 | 				serverDead = rn.isServerDead(req.endname, servername, server)
254 | 				if serverDead {
255 | 					go func() {
256 | 						<-ech // drain channel to let the goroutine created earlier terminate
257 | 					}()
258 | 				}
259 | 			}
260 | 		}
261 | 
262 | 		// do not reply if DeleteServer() has been called, i.e.
263 | 		// the server has been killed. this is needed to avoid
264 | 		// situation in which a client gets a positive reply
265 | 		// to an Append, but the server persisted the update
266 | 		// into the old Persister. config.go is careful to call
267 | 		// DeleteServer() before superseding the Persister.
268 | 		serverDead = rn.isServerDead(req.endname, servername, server)
269 | 
270 | 		if replyOK == false || serverDead == true {
271 | 			// server was killed while we were waiting; return error.
272 | 			req.replyCh <- replyMsg{false, nil}
273 | 		} else if reliable == false && (rand.Int()%1000) < 100 {
274 | 			// drop the reply, return as if timeout
275 | 			req.replyCh <- replyMsg{false, nil}
276 | 		} else if longreordering == true && rand.Intn(900) < 600 {
277 | 			// delay the response for a while
278 | 			ms := 200 + rand.Intn(1+rand.Intn(2000))
279 | 			// Russ points out that this timer arrangement will decrease
280 | 			// the number of goroutines, so that the race
281 | 			// detector is less likely to get upset.
282 | 			time.AfterFunc(time.Duration(ms)*time.Millisecond, func() {
283 | 				atomic.AddInt64(&rn.bytes, int64(len(reply.reply)))
284 | 				req.replyCh <- reply
285 | 			})
286 | 		} else {
287 | 			atomic.AddInt64(&rn.bytes, int64(len(reply.reply)))
288 | 			req.replyCh <- reply
289 | 		}
290 | 	} else {
291 | 		// simulate no reply and eventual timeout.
292 | 		ms := 0
293 | 		if rn.longDelays {
294 | 			// let Raft tests check that leader doesn't send
295 | 			// RPCs synchronously.
296 | 			ms = (rand.Int() % 7000)
297 | 		} else {
298 | 			// many kv tests require the client to try each
299 | 			// server in fairly rapid succession.
300 | 			ms = (rand.Int() % 100)
301 | 		}
302 | 		time.AfterFunc(time.Duration(ms)*time.Millisecond, func() {
303 | 			req.replyCh <- replyMsg{false, nil}
304 | 		})
305 | 	}
306 | 
307 | }
308 | 
309 | // create a client end-point.
310 | // start the thread that listens and delivers.
311 | func (rn *Network) MakeEnd(endname interface{}) *ClientEnd {
312 | 	rn.mu.Lock()
313 | 	defer rn.mu.Unlock()
314 | 
315 | 	if _, ok := rn.ends[endname]; ok {
316 | 		log.Fatalf("MakeEnd: %v already exists\n", endname)
317 | 	}
318 | 
319 | 	e := &ClientEnd{}
320 | 	e.endname = endname
321 | 	e.ch = rn.endCh
322 | 	e.done = rn.done
323 | 	rn.ends[endname] = e
324 | 	rn.enabled[endname] = false
325 | 	rn.connections[endname] = nil
326 | 
327 | 	return e
328 | }
329 | 
330 | func (rn *Network) AddServer(servername interface{}, rs *Server) {
331 | 	rn.mu.Lock()
332 | 	defer rn.mu.Unlock()
333 | 
334 | 	rn.servers[servername] = rs
335 | }
336 | 
337 | func (rn *Network) DeleteServer(servername interface{}) {
338 | 	rn.mu.Lock()
339 | 	defer rn.mu.Unlock()
340 | 
341 | 	rn.servers[servername] = nil
342 | }
343 | 
344 | // connect a ClientEnd to a server.
345 | // a ClientEnd can only be connected once in its lifetime.
346 | func (rn *Network) Connect(endname interface{}, servername interface{}) {
347 | 	rn.mu.Lock()
348 | 	defer rn.mu.Unlock()
349 | 
350 | 	rn.connections[endname] = servername
351 | }
352 | 
353 | // enable/disable a ClientEnd.
354 | func (rn *Network) Enable(endname interface{}, enabled bool) {
355 | 	rn.mu.Lock()
356 | 	defer rn.mu.Unlock()
357 | 
358 | 	rn.enabled[endname] = enabled
359 | }
360 | 
361 | // get a server's count of incoming RPCs.
362 | func (rn *Network) GetCount(servername interface{}) int {
363 | 	rn.mu.Lock()
364 | 	defer rn.mu.Unlock()
365 | 
366 | 	svr := rn.servers[servername]
367 | 	return svr.GetCount()
368 | }
369 | 
370 | func (rn *Network) GetTotalCount() int {
371 | 	x := atomic.LoadInt32(&rn.count)
372 | 	return int(x)
373 | }
374 | 
375 | func (rn *Network) GetTotalBytes() int64 {
376 | 	x := atomic.LoadInt64(&rn.bytes)
377 | 	return x
378 | }
379 | 
380 | //
381 | // a server is a collection of services, all sharing
382 | // the same rpc dispatcher. so that e.g. both a Raft
383 | // and a k/v server can listen to the same rpc endpoint.
384 | //
385 | type Server struct {
386 | 	mu       sync.Mutex
387 | 	services map[string]*Service
388 | 	count    int // incoming RPCs
389 | }
390 | 
391 | func MakeServer() *Server {
392 | 	rs := &Server{}
393 | 	rs.services = map[string]*Service{}
394 | 	return rs
395 | }
396 | 
397 | func (rs *Server) AddService(svc *Service) {
398 | 	rs.mu.Lock()
399 | 	defer rs.mu.Unlock()
400 | 	rs.services[svc.name] = svc
401 | }
402 | 
403 | func (rs *Server) dispatch(req reqMsg) replyMsg {
404 | 	rs.mu.Lock()
405 | 
406 | 	rs.count += 1
407 | 
408 | 	// split Raft.AppendEntries into service and method
409 | 	dot := strings.LastIndex(req.svcMeth, ".")
410 | 	serviceName := req.svcMeth[:dot]
411 | 	methodName := req.svcMeth[dot+1:]
412 | 
413 | 	service, ok := rs.services[serviceName]
414 | 
415 | 	rs.mu.Unlock()
416 | 
417 | 	if ok {
418 | 		return service.dispatch(methodName, req)
419 | 	} else {
420 | 		choices := []string{}
421 | 		for k, _ := range rs.services {
422 | 			choices = append(choices, k)
423 | 		}
424 | 		log.Fatalf("labrpc.Server.dispatch(): unknown service %v in %v.%v; expecting one of %v\n",
425 | 			serviceName, serviceName, methodName, choices)
426 | 		return replyMsg{false, nil}
427 | 	}
428 | }
429 | 
430 | func (rs *Server) GetCount() int {
431 | 	rs.mu.Lock()
432 | 	defer rs.mu.Unlock()
433 | 	return rs.count
434 | }
435 | 
436 | // an object with methods that can be called via RPC.
437 | // a single server may have more than one Service.
438 | type Service struct {
439 | 	name    string
440 | 	rcvr    reflect.Value
441 | 	typ     reflect.Type
442 | 	methods map[string]reflect.Method
443 | }
444 | 
445 | func MakeService(rcvr interface{}) *Service {
446 | 	svc := &Service{}
447 | 	svc.typ = reflect.TypeOf(rcvr)
448 | 	svc.rcvr = reflect.ValueOf(rcvr)
449 | 	svc.name = reflect.Indirect(svc.rcvr).Type().Name()
450 | 	svc.methods = map[string]reflect.Method{}
451 | 
452 | 	for m := 0; m < svc.typ.NumMethod(); m++ {
453 | 		method := svc.typ.Method(m)
454 | 		mtype := method.Type
455 | 		mname := method.Name
456 | 
457 | 		//fmt.Printf("%v pp %v ni %v 1k %v 2k %v no %v\n",
458 | 		//	mname, method.PkgPath, mtype.NumIn(), mtype.In(1).Kind(), mtype.In(2).Kind(), mtype.NumOut())
459 | 
460 | 		if method.PkgPath != "" || // capitalized?
461 | 			mtype.NumIn() != 3 ||
462 | 			//mtype.In(1).Kind() != reflect.Ptr ||
463 | 			mtype.In(2).Kind() != reflect.Ptr ||
464 | 			mtype.NumOut() != 0 {
465 | 			// the method is not suitable for a handler
466 | 			//fmt.Printf("bad method: %v\n", mname)
467 | 		} else {
468 | 			// the method looks like a handler
469 | 			svc.methods[mname] = method
470 | 		}
471 | 	}
472 | 
473 | 	return svc
474 | }
475 | 
476 | func (svc *Service) dispatch(methname string, req reqMsg) replyMsg {
477 | 	if method, ok := svc.methods[methname]; ok {
478 | 		// prepare space into which to read the argument.
479 | 		// the Value's type will be a pointer to req.argsType.
480 | 		args := reflect.New(req.argsType)
481 | 
482 | 		// decode the argument.
483 | 		ab := bytes.NewBuffer(req.args)
484 | 		ad := labgob.NewDecoder(ab)
485 | 		ad.Decode(args.Interface())
486 | 
487 | 		// allocate space for the reply.
488 | 		replyType := method.Type.In(2)
489 | 		replyType = replyType.Elem()
490 | 		replyv := reflect.New(replyType)
491 | 
492 | 		// call the method.
493 | 		function := method.Func
494 | 		function.Call([]reflect.Value{svc.rcvr, args.Elem(), replyv})
495 | 
496 | 		// encode the reply.
497 | 		rb := new(bytes.Buffer)
498 | 		re := labgob.NewEncoder(rb)
499 | 		re.EncodeValue(replyv)
500 | 
501 | 		return replyMsg{true, rb.Bytes()}
502 | 	} else {
503 | 		choices := []string{}
504 | 		for k, _ := range svc.methods {
505 | 			choices = append(choices, k)
506 | 		}
507 | 		log.Fatalf("labrpc.Service.dispatch(): unknown method %v in %v; expecting one of %v\n",
508 | 			methname, req.svcMeth, choices)
509 | 		return replyMsg{false, nil}
510 | 	}
511 | }
512 | 


--------------------------------------------------------------------------------
/project1/src/labrpc/test_test.go:
--------------------------------------------------------------------------------
  1 | package labrpc
  2 | 
  3 | import "testing"
  4 | import "strconv"
  5 | import "sync"
  6 | import "runtime"
  7 | import "time"
  8 | import "fmt"
  9 | 
 10 | type JunkArgs struct {
 11 | 	X int
 12 | }
 13 | type JunkReply struct {
 14 | 	X string
 15 | }
 16 | 
 17 | type JunkServer struct {
 18 | 	mu   sync.Mutex
 19 | 	log1 []string
 20 | 	log2 []int
 21 | }
 22 | 
 23 | func (js *JunkServer) Handler1(args string, reply *int) {
 24 | 	js.mu.Lock()
 25 | 	defer js.mu.Unlock()
 26 | 	js.log1 = append(js.log1, args)
 27 | 	*reply, _ = strconv.Atoi(args)
 28 | }
 29 | 
 30 | func (js *JunkServer) Handler2(args int, reply *string) {
 31 | 	js.mu.Lock()
 32 | 	defer js.mu.Unlock()
 33 | 	js.log2 = append(js.log2, args)
 34 | 	*reply = "handler2-" + strconv.Itoa(args)
 35 | }
 36 | 
 37 | func (js *JunkServer) Handler3(args int, reply *int) {
 38 | 	js.mu.Lock()
 39 | 	defer js.mu.Unlock()
 40 | 	time.Sleep(20 * time.Second)
 41 | 	*reply = -args
 42 | }
 43 | 
 44 | // args is a pointer
 45 | func (js *JunkServer) Handler4(args *JunkArgs, reply *JunkReply) {
 46 | 	reply.X = "pointer"
 47 | }
 48 | 
 49 | // args is a not pointer
 50 | func (js *JunkServer) Handler5(args JunkArgs, reply *JunkReply) {
 51 | 	reply.X = "no pointer"
 52 | }
 53 | 
 54 | func (js *JunkServer) Handler6(args string, reply *int) {
 55 | 	js.mu.Lock()
 56 | 	defer js.mu.Unlock()
 57 | 	*reply = len(args)
 58 | }
 59 | 
 60 | func (js *JunkServer) Handler7(args int, reply *string) {
 61 | 	js.mu.Lock()
 62 | 	defer js.mu.Unlock()
 63 | 	*reply = ""
 64 | 	for i := 0; i < args; i++ {
 65 | 		*reply = *reply + "y"
 66 | 	}
 67 | }
 68 | 
 69 | func TestBasic(t *testing.T) {
 70 | 	runtime.GOMAXPROCS(4)
 71 | 
 72 | 	rn := MakeNetwork()
 73 | 	defer rn.Cleanup()
 74 | 
 75 | 	e := rn.MakeEnd("end1-99")
 76 | 
 77 | 	js := &JunkServer{}
 78 | 	svc := MakeService(js)
 79 | 
 80 | 	rs := MakeServer()
 81 | 	rs.AddService(svc)
 82 | 	rn.AddServer("server99", rs)
 83 | 
 84 | 	rn.Connect("end1-99", "server99")
 85 | 	rn.Enable("end1-99", true)
 86 | 
 87 | 	{
 88 | 		reply := ""
 89 | 		e.Call("JunkServer.Handler2", 111, &reply)
 90 | 		if reply != "handler2-111" {
 91 | 			t.Fatalf("wrong reply from Handler2")
 92 | 		}
 93 | 	}
 94 | 
 95 | 	{
 96 | 		reply := 0
 97 | 		e.Call("JunkServer.Handler1", "9099", &reply)
 98 | 		if reply != 9099 {
 99 | 			t.Fatalf("wrong reply from Handler1")
100 | 		}
101 | 	}
102 | }
103 | 
104 | func TestTypes(t *testing.T) {
105 | 	runtime.GOMAXPROCS(4)
106 | 
107 | 	rn := MakeNetwork()
108 | 	defer rn.Cleanup()
109 | 
110 | 	e := rn.MakeEnd("end1-99")
111 | 
112 | 	js := &JunkServer{}
113 | 	svc := MakeService(js)
114 | 
115 | 	rs := MakeServer()
116 | 	rs.AddService(svc)
117 | 	rn.AddServer("server99", rs)
118 | 
119 | 	rn.Connect("end1-99", "server99")
120 | 	rn.Enable("end1-99", true)
121 | 
122 | 	{
123 | 		var args JunkArgs
124 | 		var reply JunkReply
125 | 		// args must match type (pointer or not) of handler.
126 | 		e.Call("JunkServer.Handler4", &args, &reply)
127 | 		if reply.X != "pointer" {
128 | 			t.Fatalf("wrong reply from Handler4")
129 | 		}
130 | 	}
131 | 
132 | 	{
133 | 		var args JunkArgs
134 | 		var reply JunkReply
135 | 		// args must match type (pointer or not) of handler.
136 | 		e.Call("JunkServer.Handler5", args, &reply)
137 | 		if reply.X != "no pointer" {
138 | 			t.Fatalf("wrong reply from Handler5")
139 | 		}
140 | 	}
141 | }
142 | 
143 | //
144 | // does net.Enable(endname, false) really disconnect a client?
145 | //
146 | func TestDisconnect(t *testing.T) {
147 | 	runtime.GOMAXPROCS(4)
148 | 
149 | 	rn := MakeNetwork()
150 | 	defer rn.Cleanup()
151 | 
152 | 	e := rn.MakeEnd("end1-99")
153 | 
154 | 	js := &JunkServer{}
155 | 	svc := MakeService(js)
156 | 
157 | 	rs := MakeServer()
158 | 	rs.AddService(svc)
159 | 	rn.AddServer("server99", rs)
160 | 
161 | 	rn.Connect("end1-99", "server99")
162 | 
163 | 	{
164 | 		reply := ""
165 | 		e.Call("JunkServer.Handler2", 111, &reply)
166 | 		if reply != "" {
167 | 			t.Fatalf("unexpected reply from Handler2")
168 | 		}
169 | 	}
170 | 
171 | 	rn.Enable("end1-99", true)
172 | 
173 | 	{
174 | 		reply := 0
175 | 		e.Call("JunkServer.Handler1", "9099", &reply)
176 | 		if reply != 9099 {
177 | 			t.Fatalf("wrong reply from Handler1")
178 | 		}
179 | 	}
180 | }
181 | 
182 | //
183 | // test net.GetCount()
184 | //
185 | func TestCounts(t *testing.T) {
186 | 	runtime.GOMAXPROCS(4)
187 | 
188 | 	rn := MakeNetwork()
189 | 	defer rn.Cleanup()
190 | 
191 | 	e := rn.MakeEnd("end1-99")
192 | 
193 | 	js := &JunkServer{}
194 | 	svc := MakeService(js)
195 | 
196 | 	rs := MakeServer()
197 | 	rs.AddService(svc)
198 | 	rn.AddServer(99, rs)
199 | 
200 | 	rn.Connect("end1-99", 99)
201 | 	rn.Enable("end1-99", true)
202 | 
203 | 	for i := 0; i < 17; i++ {
204 | 		reply := ""
205 | 		e.Call("JunkServer.Handler2", i, &reply)
206 | 		wanted := "handler2-" + strconv.Itoa(i)
207 | 		if reply != wanted {
208 | 			t.Fatalf("wrong reply %v from Handler1, expecting %v", reply, wanted)
209 | 		}
210 | 	}
211 | 
212 | 	n := rn.GetCount(99)
213 | 	if n != 17 {
214 | 		t.Fatalf("wrong GetCount() %v, expected 17\n", n)
215 | 	}
216 | }
217 | 
218 | //
219 | // test net.GetTotalBytes()
220 | //
221 | func TestBytes(t *testing.T) {
222 | 	runtime.GOMAXPROCS(4)
223 | 
224 | 	rn := MakeNetwork()
225 | 	defer rn.Cleanup()
226 | 
227 | 	e := rn.MakeEnd("end1-99")
228 | 
229 | 	js := &JunkServer{}
230 | 	svc := MakeService(js)
231 | 
232 | 	rs := MakeServer()
233 | 	rs.AddService(svc)
234 | 	rn.AddServer(99, rs)
235 | 
236 | 	rn.Connect("end1-99", 99)
237 | 	rn.Enable("end1-99", true)
238 | 
239 | 	for i := 0; i < 17; i++ {
240 | 		args := "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
241 | 		args = args + args
242 | 		args = args + args
243 | 		reply := 0
244 | 		e.Call("JunkServer.Handler6", args, &reply)
245 | 		wanted := len(args)
246 | 		if reply != wanted {
247 | 			t.Fatalf("wrong reply %v from Handler6, expecting %v", reply, wanted)
248 | 		}
249 | 	}
250 | 
251 | 	n := rn.GetTotalBytes()
252 | 	if n < 4828 || n > 6000 {
253 | 		t.Fatalf("wrong GetTotalBytes() %v, expected about 5000\n", n)
254 | 	}
255 | 
256 | 	for i := 0; i < 17; i++ {
257 | 		args := 107
258 | 		reply := ""
259 | 		e.Call("JunkServer.Handler7", args, &reply)
260 | 		wanted := args
261 | 		if len(reply) != wanted {
262 | 			t.Fatalf("wrong reply len=%v from Handler6, expecting %v", len(reply), wanted)
263 | 		}
264 | 	}
265 | 
266 | 	nn := rn.GetTotalBytes() - n
267 | 	if nn < 1800 || nn > 2500 {
268 | 		t.Fatalf("wrong GetTotalBytes() %v, expected about 2000\n", nn)
269 | 	}
270 | }
271 | 
272 | //
273 | // test RPCs from concurrent ClientEnds
274 | //
275 | func TestConcurrentMany(t *testing.T) {
276 | 	runtime.GOMAXPROCS(4)
277 | 
278 | 	rn := MakeNetwork()
279 | 	defer rn.Cleanup()
280 | 
281 | 	js := &JunkServer{}
282 | 	svc := MakeService(js)
283 | 
284 | 	rs := MakeServer()
285 | 	rs.AddService(svc)
286 | 	rn.AddServer(1000, rs)
287 | 
288 | 	ch := make(chan int)
289 | 
290 | 	nclients := 20
291 | 	nrpcs := 10
292 | 	for ii := 0; ii < nclients; ii++ {
293 | 		go func(i int) {
294 | 			n := 0
295 | 			defer func() { ch <- n }()
296 | 
297 | 			e := rn.MakeEnd(i)
298 | 			rn.Connect(i, 1000)
299 | 			rn.Enable(i, true)
300 | 
301 | 			for j := 0; j < nrpcs; j++ {
302 | 				arg := i*100 + j
303 | 				reply := ""
304 | 				e.Call("JunkServer.Handler2", arg, &reply)
305 | 				wanted := "handler2-" + strconv.Itoa(arg)
306 | 				if reply != wanted {
307 | 					t.Fatalf("wrong reply %v from Handler1, expecting %v", reply, wanted)
308 | 				}
309 | 				n += 1
310 | 			}
311 | 		}(ii)
312 | 	}
313 | 
314 | 	total := 0
315 | 	for ii := 0; ii < nclients; ii++ {
316 | 		x := <-ch
317 | 		total += x
318 | 	}
319 | 
320 | 	if total != nclients*nrpcs {
321 | 		t.Fatalf("wrong number of RPCs completed, got %v, expected %v", total, nclients*nrpcs)
322 | 	}
323 | 
324 | 	n := rn.GetCount(1000)
325 | 	if n != total {
326 | 		t.Fatalf("wrong GetCount() %v, expected %v\n", n, total)
327 | 	}
328 | }
329 | 
330 | //
331 | // test unreliable
332 | //
333 | func TestUnreliable(t *testing.T) {
334 | 	runtime.GOMAXPROCS(4)
335 | 
336 | 	rn := MakeNetwork()
337 | 	defer rn.Cleanup()
338 | 	rn.Reliable(false)
339 | 
340 | 	js := &JunkServer{}
341 | 	svc := MakeService(js)
342 | 
343 | 	rs := MakeServer()
344 | 	rs.AddService(svc)
345 | 	rn.AddServer(1000, rs)
346 | 
347 | 	ch := make(chan int)
348 | 
349 | 	nclients := 300
350 | 	for ii := 0; ii < nclients; ii++ {
351 | 		go func(i int) {
352 | 			n := 0
353 | 			defer func() { ch <- n }()
354 | 
355 | 			e := rn.MakeEnd(i)
356 | 			rn.Connect(i, 1000)
357 | 			rn.Enable(i, true)
358 | 
359 | 			arg := i * 100
360 | 			reply := ""
361 | 			ok := e.Call("JunkServer.Handler2", arg, &reply)
362 | 			if ok {
363 | 				wanted := "handler2-" + strconv.Itoa(arg)
364 | 				if reply != wanted {
365 | 					t.Fatalf("wrong reply %v from Handler1, expecting %v", reply, wanted)
366 | 				}
367 | 				n += 1
368 | 			}
369 | 		}(ii)
370 | 	}
371 | 
372 | 	total := 0
373 | 	for ii := 0; ii < nclients; ii++ {
374 | 		x := <-ch
375 | 		total += x
376 | 	}
377 | 
378 | 	if total == nclients || total == 0 {
379 | 		t.Fatalf("all RPCs succeeded despite unreliable")
380 | 	}
381 | }
382 | 
383 | //
384 | // test concurrent RPCs from a single ClientEnd
385 | //
386 | func TestConcurrentOne(t *testing.T) {
387 | 	runtime.GOMAXPROCS(4)
388 | 
389 | 	rn := MakeNetwork()
390 | 	defer rn.Cleanup()
391 | 
392 | 	js := &JunkServer{}
393 | 	svc := MakeService(js)
394 | 
395 | 	rs := MakeServer()
396 | 	rs.AddService(svc)
397 | 	rn.AddServer(1000, rs)
398 | 
399 | 	e := rn.MakeEnd("c")
400 | 	rn.Connect("c", 1000)
401 | 	rn.Enable("c", true)
402 | 
403 | 	ch := make(chan int)
404 | 
405 | 	nrpcs := 20
406 | 	for ii := 0; ii < nrpcs; ii++ {
407 | 		go func(i int) {
408 | 			n := 0
409 | 			defer func() { ch <- n }()
410 | 
411 | 			arg := 100 + i
412 | 			reply := ""
413 | 			e.Call("JunkServer.Handler2", arg, &reply)
414 | 			wanted := "handler2-" + strconv.Itoa(arg)
415 | 			if reply != wanted {
416 | 				t.Fatalf("wrong reply %v from Handler2, expecting %v", reply, wanted)
417 | 			}
418 | 			n += 1
419 | 		}(ii)
420 | 	}
421 | 
422 | 	total := 0
423 | 	for ii := 0; ii < nrpcs; ii++ {
424 | 		x := <-ch
425 | 		total += x
426 | 	}
427 | 
428 | 	if total != nrpcs {
429 | 		t.Fatalf("wrong number of RPCs completed, got %v, expected %v", total, nrpcs)
430 | 	}
431 | 
432 | 	js.mu.Lock()
433 | 	defer js.mu.Unlock()
434 | 	if len(js.log2) != nrpcs {
435 | 		t.Fatalf("wrong number of RPCs delivered")
436 | 	}
437 | 
438 | 	n := rn.GetCount(1000)
439 | 	if n != total {
440 | 		t.Fatalf("wrong GetCount() %v, expected %v\n", n, total)
441 | 	}
442 | }
443 | 
444 | //
445 | // regression: an RPC that's delayed during Enabled=false
446 | // should not delay subsequent RPCs (e.g. after Enabled=true).
447 | //
448 | func TestRegression1(t *testing.T) {
449 | 	runtime.GOMAXPROCS(4)
450 | 
451 | 	rn := MakeNetwork()
452 | 	defer rn.Cleanup()
453 | 
454 | 	js := &JunkServer{}
455 | 	svc := MakeService(js)
456 | 
457 | 	rs := MakeServer()
458 | 	rs.AddService(svc)
459 | 	rn.AddServer(1000, rs)
460 | 
461 | 	e := rn.MakeEnd("c")
462 | 	rn.Connect("c", 1000)
463 | 
464 | 	// start some RPCs while the ClientEnd is disabled.
465 | 	// they'll be delayed.
466 | 	rn.Enable("c", false)
467 | 	ch := make(chan bool)
468 | 	nrpcs := 20
469 | 	for ii := 0; ii < nrpcs; ii++ {
470 | 		go func(i int) {
471 | 			ok := false
472 | 			defer func() { ch <- ok }()
473 | 
474 | 			arg := 100 + i
475 | 			reply := ""
476 | 			// this call ought to return false.
477 | 			e.Call("JunkServer.Handler2", arg, &reply)
478 | 			ok = true
479 | 		}(ii)
480 | 	}
481 | 
482 | 	time.Sleep(100 * time.Millisecond)
483 | 
484 | 	// now enable the ClientEnd and check that an RPC completes quickly.
485 | 	t0 := time.Now()
486 | 	rn.Enable("c", true)
487 | 	{
488 | 		arg := 99
489 | 		reply := ""
490 | 		e.Call("JunkServer.Handler2", arg, &reply)
491 | 		wanted := "handler2-" + strconv.Itoa(arg)
492 | 		if reply != wanted {
493 | 			t.Fatalf("wrong reply %v from Handler2, expecting %v", reply, wanted)
494 | 		}
495 | 	}
496 | 	dur := time.Since(t0).Seconds()
497 | 
498 | 	if dur > 0.03 {
499 | 		t.Fatalf("RPC took too long (%v) after Enable", dur)
500 | 	}
501 | 
502 | 	for ii := 0; ii < nrpcs; ii++ {
503 | 		<-ch
504 | 	}
505 | 
506 | 	js.mu.Lock()
507 | 	defer js.mu.Unlock()
508 | 	if len(js.log2) != 1 {
509 | 		t.Fatalf("wrong number (%v) of RPCs delivered, expected 1", len(js.log2))
510 | 	}
511 | 
512 | 	n := rn.GetCount(1000)
513 | 	if n != 1 {
514 | 		t.Fatalf("wrong GetCount() %v, expected %v\n", n, 1)
515 | 	}
516 | }
517 | 
518 | //
519 | // if an RPC is stuck in a server, and the server
520 | // is killed with DeleteServer(), does the RPC
521 | // get un-stuck?
522 | //
523 | func TestKilled(t *testing.T) {
524 | 	runtime.GOMAXPROCS(4)
525 | 
526 | 	rn := MakeNetwork()
527 | 	defer rn.Cleanup()
528 | 
529 | 	e := rn.MakeEnd("end1-99")
530 | 
531 | 	js := &JunkServer{}
532 | 	svc := MakeService(js)
533 | 
534 | 	rs := MakeServer()
535 | 	rs.AddService(svc)
536 | 	rn.AddServer("server99", rs)
537 | 
538 | 	rn.Connect("end1-99", "server99")
539 | 	rn.Enable("end1-99", true)
540 | 
541 | 	doneCh := make(chan bool)
542 | 	go func() {
543 | 		reply := 0
544 | 		ok := e.Call("JunkServer.Handler3", 99, &reply)
545 | 		doneCh <- ok
546 | 	}()
547 | 
548 | 	time.Sleep(1000 * time.Millisecond)
549 | 
550 | 	select {
551 | 	case <-doneCh:
552 | 		t.Fatalf("Handler3 should not have returned yet")
553 | 	case <-time.After(100 * time.Millisecond):
554 | 	}
555 | 
556 | 	rn.DeleteServer("server99")
557 | 
558 | 	select {
559 | 	case x := <-doneCh:
560 | 		if x != false {
561 | 			t.Fatalf("Handler3 returned successfully despite DeleteServer()")
562 | 		}
563 | 	case <-time.After(100 * time.Millisecond):
564 | 		t.Fatalf("Handler3 should return after DeleteServer()")
565 | 	}
566 | }
567 | 
568 | func TestBenchmark(t *testing.T) {
569 | 	runtime.GOMAXPROCS(4)
570 | 
571 | 	rn := MakeNetwork()
572 | 	defer rn.Cleanup()
573 | 
574 | 	e := rn.MakeEnd("end1-99")
575 | 
576 | 	js := &JunkServer{}
577 | 	svc := MakeService(js)
578 | 
579 | 	rs := MakeServer()
580 | 	rs.AddService(svc)
581 | 	rn.AddServer("server99", rs)
582 | 
583 | 	rn.Connect("end1-99", "server99")
584 | 	rn.Enable("end1-99", true)
585 | 
586 | 	t0 := time.Now()
587 | 	n := 100000
588 | 	for iters := 0; iters < n; iters++ {
589 | 		reply := ""
590 | 		e.Call("JunkServer.Handler2", 111, &reply)
591 | 		if reply != "handler2-111" {
592 | 			t.Fatalf("wrong reply from Handler2")
593 | 		}
594 | 	}
595 | 	fmt.Printf("%v for %v\n", time.Since(t0), n)
596 | 	// march 2016, rtm laptop, 22 microseconds per RPC
597 | }
598 | 


--------------------------------------------------------------------------------
/project1/src/raft/config.go:
--------------------------------------------------------------------------------
  1 | package raft
  2 | 
  3 | //
  4 | // support for Raft tester.
  5 | //
  6 | // we will use the original config.go to test your code for grading.
  7 | // so, while you can modify this code to help you debug, please
  8 | // test with the original before submitting.
  9 | //
 10 | 
 11 | import "../labrpc"
 12 | import "log"
 13 | import "sync"
 14 | import "testing"
 15 | import "runtime"
 16 | import "math/rand"
 17 | import crand "crypto/rand"
 18 | import "math/big"
 19 | import "encoding/base64"
 20 | import "time"
 21 | import "fmt"
 22 | 
 23 | func randstring(n int) string {
 24 | 	b := make([]byte, 2*n)
 25 | 	crand.Read(b)
 26 | 	s := base64.URLEncoding.EncodeToString(b)
 27 | 	return s[0:n]
 28 | }
 29 | 
 30 | func makeSeed() int64 {
 31 | 	max := big.NewInt(int64(1) << 62)
 32 | 	bigx, _ := crand.Int(crand.Reader, max)
 33 | 	x := bigx.Int64()
 34 | 	return x
 35 | }
 36 | 
 37 | type config struct {
 38 | 	mu        sync.Mutex
 39 | 	t         *testing.T
 40 | 	net       *labrpc.Network
 41 | 	n         int
 42 | 	rafts     []*Raft
 43 | 	applyErr  []string // from apply channel readers
 44 | 	connected []bool   // whether each server is on the net
 45 | 	saved     []*Persister
 46 | 	endnames  [][]string            // the port file names each sends to
 47 | 	logs      []map[int]interface{} // copy of each server's committed entries
 48 | 	start     time.Time             // time at which make_config() was called
 49 | 	// begin()/end() statistics
 50 | 	t0        time.Time // time at which test_test.go called cfg.begin()
 51 | 	rpcs0     int       // rpcTotal() at start of test
 52 | 	cmds0     int       // number of agreements
 53 | 	bytes0    int64
 54 | 	maxIndex  int
 55 | 	maxIndex0 int
 56 | }
 57 | 
 58 | var ncpu_once sync.Once
 59 | 
 60 | func make_config(t *testing.T, n int, unreliable bool) *config {
 61 | 	ncpu_once.Do(func() {
 62 | 		if runtime.NumCPU() < 2 {
 63 | 			fmt.Printf("warning: only one CPU, which may conceal locking bugs\n")
 64 | 		}
 65 | 		rand.Seed(makeSeed())
 66 | 	})
 67 | 	runtime.GOMAXPROCS(4)
 68 | 	cfg := &config{}
 69 | 	cfg.t = t
 70 | 	cfg.net = labrpc.MakeNetwork()
 71 | 	cfg.n = n
 72 | 	cfg.applyErr = make([]string, cfg.n)
 73 | 	cfg.rafts = make([]*Raft, cfg.n)
 74 | 	cfg.connected = make([]bool, cfg.n)
 75 | 	cfg.saved = make([]*Persister, cfg.n)
 76 | 	cfg.endnames = make([][]string, cfg.n)
 77 | 	cfg.logs = make([]map[int]interface{}, cfg.n)
 78 | 	cfg.start = time.Now()
 79 | 
 80 | 	cfg.setunreliable(unreliable)
 81 | 
 82 | 	cfg.net.LongDelays(true)
 83 | 
 84 | 	// create a full set of Rafts.
 85 | 	for i := 0; i < cfg.n; i++ {
 86 | 		cfg.logs[i] = map[int]interface{}{}
 87 | 		cfg.start1(i)
 88 | 	}
 89 | 
 90 | 	// connect everyone
 91 | 	for i := 0; i < cfg.n; i++ {
 92 | 		cfg.connect(i)
 93 | 	}
 94 | 
 95 | 	return cfg
 96 | }
 97 | 
 98 | // shut down a Raft server but save its persistent state.
 99 | func (cfg *config) crash1(i int) {
100 | 	cfg.disconnect(i)
101 | 	cfg.net.DeleteServer(i) // disable client connections to the server.
102 | 
103 | 	cfg.mu.Lock()
104 | 	defer cfg.mu.Unlock()
105 | 
106 | 	// a fresh persister, in case old instance
107 | 	// continues to update the Persister.
108 | 	// but copy old persister's content so that we always
109 | 	// pass Make() the last persisted state.
110 | 	if cfg.saved[i] != nil {
111 | 		cfg.saved[i] = cfg.saved[i].Copy()
112 | 	}
113 | 
114 | 	rf := cfg.rafts[i]
115 | 	if rf != nil {
116 | 		cfg.mu.Unlock()
117 | 		rf.Kill()
118 | 		cfg.mu.Lock()
119 | 		cfg.rafts[i] = nil
120 | 	}
121 | 
122 | 	if cfg.saved[i] != nil {
123 | 		raftlog := cfg.saved[i].ReadRaftState()
124 | 		cfg.saved[i] = &Persister{}
125 | 		cfg.saved[i].SaveRaftState(raftlog)
126 | 	}
127 | }
128 | 
129 | //
130 | // start or re-start a Raft.
131 | // if one already exists, "kill" it first.
132 | // allocate new outgoing port file names, and a new
133 | // state persister, to isolate previous instance of
134 | // this server. since we cannot really kill it.
135 | //
136 | func (cfg *config) start1(i int) {
137 | 	cfg.crash1(i)
138 | 
139 | 	// a fresh set of outgoing ClientEnd names.
140 | 	// so that old crashed instance's ClientEnds can't send.
141 | 	cfg.endnames[i] = make([]string, cfg.n)
142 | 	for j := 0; j < cfg.n; j++ {
143 | 		cfg.endnames[i][j] = randstring(20)
144 | 	}
145 | 
146 | 	// a fresh set of ClientEnds.
147 | 	ends := make([]*labrpc.ClientEnd, cfg.n)
148 | 	for j := 0; j < cfg.n; j++ {
149 | 		ends[j] = cfg.net.MakeEnd(cfg.endnames[i][j])
150 | 		cfg.net.Connect(cfg.endnames[i][j], j)
151 | 	}
152 | 
153 | 	cfg.mu.Lock()
154 | 
155 | 	// a fresh persister, so old instance doesn't overwrite
156 | 	// new instance's persisted state.
157 | 	// but copy old persister's content so that we always
158 | 	// pass Make() the last persisted state.
159 | 	if cfg.saved[i] != nil {
160 | 		cfg.saved[i] = cfg.saved[i].Copy()
161 | 	} else {
162 | 		cfg.saved[i] = MakePersister()
163 | 	}
164 | 
165 | 	cfg.mu.Unlock()
166 | 
167 | 	// listen to messages from Raft indicating newly committed messages.
168 | 	applyCh := make(chan ApplyMsg)
169 | 	go func() {
170 | 		for m := range applyCh {
171 | 			err_msg := ""
172 | 			if m.CommandValid == false {
173 | 				// ignore other types of ApplyMsg
174 | 			} else {
175 | 				v := m.Command
176 | 				cfg.mu.Lock()
177 | 				for j := 0; j < len(cfg.logs); j++ {
178 | 					if old, oldok := cfg.logs[j][m.CommandIndex]; oldok && old != v {
179 | 						// some server has already committed a different value for this entry!
180 | 						err_msg = fmt.Sprintf("commit index=%v server=%v %v != server=%v %v",
181 | 							m.CommandIndex, i, m.Command, j, old)
182 | 					}
183 | 				}
184 | 				_, prevok := cfg.logs[i][m.CommandIndex-1]
185 | 				cfg.logs[i][m.CommandIndex] = v
186 | 				if m.CommandIndex > cfg.maxIndex {
187 | 					cfg.maxIndex = m.CommandIndex
188 | 				}
189 | 				cfg.mu.Unlock()
190 | 
191 | 				if m.CommandIndex > 1 && prevok == false {
192 | 					err_msg = fmt.Sprintf("server %v apply out of order %v", i, m.CommandIndex)
193 | 				}
194 | 			}
195 | 
196 | 			if err_msg != "" {
197 | 				log.Fatalf("apply error: %v\n", err_msg)
198 | 				cfg.applyErr[i] = err_msg
199 | 				// keep reading after error so that Raft doesn't block
200 | 				// holding locks...
201 | 			}
202 | 		}
203 | 	}()
204 | 
205 | 	rf := Make(ends, i, cfg.saved[i], applyCh)
206 | 
207 | 	cfg.mu.Lock()
208 | 	cfg.rafts[i] = rf
209 | 	cfg.mu.Unlock()
210 | 
211 | 	svc := labrpc.MakeService(rf)
212 | 	srv := labrpc.MakeServer()
213 | 	srv.AddService(svc)
214 | 	cfg.net.AddServer(i, srv)
215 | }
216 | 
217 | func (cfg *config) checkTimeout() {
218 | 	// enforce a two minute real-time limit on each test
219 | 	if !cfg.t.Failed() && time.Since(cfg.start) > 120*time.Second {
220 | 		cfg.t.Fatal("test took longer than 120 seconds")
221 | 	}
222 | }
223 | 
224 | func (cfg *config) cleanup() {
225 | 	for i := 0; i < len(cfg.rafts); i++ {
226 | 		if cfg.rafts[i] != nil {
227 | 			cfg.rafts[i].Kill()
228 | 		}
229 | 	}
230 | 	cfg.net.Cleanup()
231 | 	cfg.checkTimeout()
232 | }
233 | 
234 | // attach server i to the net.
235 | func (cfg *config) connect(i int) {
236 | 	// fmt.Printf("connect(%d)\n", i)
237 | 
238 | 	cfg.connected[i] = true
239 | 
240 | 	// outgoing ClientEnds
241 | 	for j := 0; j < cfg.n; j++ {
242 | 		if cfg.connected[j] {
243 | 			endname := cfg.endnames[i][j]
244 | 			cfg.net.Enable(endname, true)
245 | 		}
246 | 	}
247 | 
248 | 	// incoming ClientEnds
249 | 	for j := 0; j < cfg.n; j++ {
250 | 		if cfg.connected[j] {
251 | 			endname := cfg.endnames[j][i]
252 | 			cfg.net.Enable(endname, true)
253 | 		}
254 | 	}
255 | }
256 | 
257 | // detach server i from the net.
258 | func (cfg *config) disconnect(i int) {
259 | 	// fmt.Printf("disconnect(%d)\n", i)
260 | 
261 | 	cfg.connected[i] = false
262 | 
263 | 	// outgoing ClientEnds
264 | 	for j := 0; j < cfg.n; j++ {
265 | 		if cfg.endnames[i] != nil {
266 | 			endname := cfg.endnames[i][j]
267 | 			cfg.net.Enable(endname, false)
268 | 		}
269 | 	}
270 | 
271 | 	// incoming ClientEnds
272 | 	for j := 0; j < cfg.n; j++ {
273 | 		if cfg.endnames[j] != nil {
274 | 			endname := cfg.endnames[j][i]
275 | 			cfg.net.Enable(endname, false)
276 | 		}
277 | 	}
278 | }
279 | 
280 | func (cfg *config) rpcCount(server int) int {
281 | 	return cfg.net.GetCount(server)
282 | }
283 | 
284 | func (cfg *config) rpcTotal() int {
285 | 	return cfg.net.GetTotalCount()
286 | }
287 | 
288 | func (cfg *config) setunreliable(unrel bool) {
289 | 	cfg.net.Reliable(!unrel)
290 | }
291 | 
292 | func (cfg *config) bytesTotal() int64 {
293 | 	return cfg.net.GetTotalBytes()
294 | }
295 | 
296 | func (cfg *config) setlongreordering(longrel bool) {
297 | 	cfg.net.LongReordering(longrel)
298 | }
299 | 
300 | // check that there's exactly one leader.
301 | // try a few times in case re-elections are needed.
302 | func (cfg *config) checkOneLeader() int {
303 | 	for iters := 0; iters < 10; iters++ {
304 | 		ms := 450 + (rand.Int63() % 100)
305 | 		time.Sleep(time.Duration(ms) * time.Millisecond)
306 | 
307 | 		leaders := make(map[int][]int)
308 | 		for i := 0; i < cfg.n; i++ {
309 | 			if cfg.connected[i] {
310 | 				if term, leader := cfg.rafts[i].GetState(); leader {
311 | 					leaders[term] = append(leaders[term], i)
312 | 				}
313 | 			}
314 | 		}
315 | 
316 | 		lastTermWithLeader := -1
317 | 		for term, leaders := range leaders {
318 | 			if len(leaders) > 1 {
319 | 				cfg.t.Fatalf("term %d has %d (>1) leaders", term, len(leaders))
320 | 			}
321 | 			if term > lastTermWithLeader {
322 | 				lastTermWithLeader = term
323 | 			}
324 | 		}
325 | 
326 | 		if len(leaders) != 0 {
327 | 			return leaders[lastTermWithLeader][0]
328 | 		}
329 | 	}
330 | 	cfg.t.Fatalf("expected one leader, got none")
331 | 	return -1
332 | }
333 | 
334 | // check that everyone agrees on the term.
335 | func (cfg *config) checkTerms() int {
336 | 	term := -1
337 | 	for i := 0; i < cfg.n; i++ {
338 | 		if cfg.connected[i] {
339 | 			xterm, _ := cfg.rafts[i].GetState()
340 | 			if term == -1 {
341 | 				term = xterm
342 | 			} else if term != xterm {
343 | 				cfg.t.Fatalf("servers disagree on term")
344 | 			}
345 | 		}
346 | 	}
347 | 	return term
348 | }
349 | 
350 | // check that there's no leader
351 | func (cfg *config) checkNoLeader() {
352 | 	for i := 0; i < cfg.n; i++ {
353 | 		if cfg.connected[i] {
354 | 			_, is_leader := cfg.rafts[i].GetState()
355 | 			if is_leader {
356 | 				cfg.t.Fatalf("expected no leader, but %v claims to be leader", i)
357 | 			}
358 | 		}
359 | 	}
360 | }
361 | 
362 | // how many servers think a log entry is committed?
363 | func (cfg *config) nCommitted(index int) (int, interface{}) {
364 | 	count := 0
365 | 	var cmd interface{} = nil
366 | 	for i := 0; i < len(cfg.rafts); i++ {
367 | 		if cfg.applyErr[i] != "" {
368 | 			cfg.t.Fatal(cfg.applyErr[i])
369 | 		}
370 | 
371 | 		cfg.mu.Lock()
372 | 		cmd1, ok := cfg.logs[i][index]
373 | 		cfg.mu.Unlock()
374 | 
375 | 		if ok {
376 | 			if count > 0 && cmd != cmd1 {
377 | 				cfg.t.Fatalf("committed values do not match: index %v, %v, %v\n",
378 | 					index, cmd, cmd1)
379 | 			}
380 | 			count += 1
381 | 			cmd = cmd1
382 | 		}
383 | 	}
384 | 	return count, cmd
385 | }
386 | 
387 | // wait for at least n servers to commit.
388 | // but don't wait forever.
389 | func (cfg *config) wait(index int, n int, startTerm int) interface{} {
390 | 	to := 10 * time.Millisecond
391 | 	for iters := 0; iters < 30; iters++ {
392 | 		nd, _ := cfg.nCommitted(index)
393 | 		if nd >= n {
394 | 			break
395 | 		}
396 | 		time.Sleep(to)
397 | 		if to < time.Second {
398 | 			to *= 2
399 | 		}
400 | 		if startTerm > -1 {
401 | 			for _, r := range cfg.rafts {
402 | 				if t, _ := r.GetState(); t > startTerm {
403 | 					// someone has moved on
404 | 					// can no longer guarantee that we'll "win"
405 | 					return -1
406 | 				}
407 | 			}
408 | 		}
409 | 	}
410 | 	nd, cmd := cfg.nCommitted(index)
411 | 	if nd < n {
412 | 		cfg.t.Fatalf("only %d decided for index %d; wanted %d\n",
413 | 			nd, index, n)
414 | 	}
415 | 	return cmd
416 | }
417 | 
418 | // do a complete agreement.
419 | // it might choose the wrong leader initially,
420 | // and have to re-submit after giving up.
421 | // entirely gives up after about 10 seconds.
422 | // indirectly checks that the servers agree on the
423 | // same value, since nCommitted() checks this,
424 | // as do the threads that read from applyCh.
425 | // returns index.
426 | // if retry==true, may submit the command multiple
427 | // times, in case a leader fails just after Start().
428 | // if retry==false, calls Start() only once, in order
429 | // to simplify the early Lab 2B tests.
430 | func (cfg *config) one(cmd interface{}, expectedServers int, retry bool) int {
431 | 	t0 := time.Now()
432 | 	starts := 0
433 | 	for time.Since(t0).Seconds() < 10 {
434 | 		// try all the servers, maybe one is the leader.
435 | 		index := -1
436 | 		for si := 0; si < cfg.n; si++ {
437 | 			starts = (starts + 1) % cfg.n
438 | 			var rf *Raft
439 | 			cfg.mu.Lock()
440 | 			if cfg.connected[starts] {
441 | 				rf = cfg.rafts[starts]
442 | 			}
443 | 			cfg.mu.Unlock()
444 | 			if rf != nil {
445 | 				index1, _, ok := rf.Start(cmd)
446 | 				if ok {
447 | 					index = index1
448 | 					break
449 | 				}
450 | 			}
451 | 		}
452 | 
453 | 		if index != -1 {
454 | 			// somebody claimed to be the leader and to have
455 | 			// submitted our command; wait a while for agreement.
456 | 			t1 := time.Now()
457 | 			for time.Since(t1).Seconds() < 2 {
458 | 				nd, cmd1 := cfg.nCommitted(index)
459 | 				if nd > 0 && nd >= expectedServers {
460 | 					// committed
461 | 					if cmd1 == cmd {
462 | 						// and it was the command we submitted.
463 | 						return index
464 | 					}
465 | 				}
466 | 				time.Sleep(20 * time.Millisecond)
467 | 			}
468 | 			if retry == false {
469 | 				cfg.t.Fatalf("one(%v) failed to reach agreement", cmd)
470 | 			}
471 | 		} else {
472 | 			time.Sleep(50 * time.Millisecond)
473 | 		}
474 | 	}
475 | 	cfg.t.Fatalf("one(%v) failed to reach agreement", cmd)
476 | 	return -1
477 | }
478 | 
479 | // start a Test.
480 | // print the Test message.
481 | // e.g. cfg.begin("Test (2B): RPC counts aren't too high")
482 | func (cfg *config) begin(description string) {
483 | 	fmt.Printf("%s ...\n", description)
484 | 	cfg.t0 = time.Now()
485 | 	cfg.rpcs0 = cfg.rpcTotal()
486 | 	cfg.bytes0 = cfg.bytesTotal()
487 | 	cfg.cmds0 = 0
488 | 	cfg.maxIndex0 = cfg.maxIndex
489 | }
490 | 
491 | // end a Test -- the fact that we got here means there
492 | // was no failure.
493 | // print the Passed message,
494 | // and some performance numbers.
495 | func (cfg *config) end() {
496 | 	cfg.checkTimeout()
497 | 	if cfg.t.Failed() == false {
498 | 		cfg.mu.Lock()
499 | 		t := time.Since(cfg.t0).Seconds()       // real time
500 | 		npeers := cfg.n                         // number of Raft peers
501 | 		nrpc := cfg.rpcTotal() - cfg.rpcs0      // number of RPC sends
502 | 		nbytes := cfg.bytesTotal() - cfg.bytes0 // number of bytes
503 | 		ncmds := cfg.maxIndex - cfg.maxIndex0   // number of Raft agreements reported
504 | 		cfg.mu.Unlock()
505 | 
506 | 		fmt.Printf("  ... Passed --")
507 | 		fmt.Printf("  %4.1f  %d %4d %7d %4d\n", t, npeers, nrpc, nbytes, ncmds)
508 | 	}
509 | }
510 | 


--------------------------------------------------------------------------------
/project1/src/raft/persister.go:
--------------------------------------------------------------------------------
 1 | package raft
 2 | 
 3 | //
 4 | // support for Raft and kvraft to save persistent
 5 | // Raft state (log &c) and k/v server snapshots.
 6 | //
 7 | // we will use the original persister.go to test your code for grading.
 8 | // so, while you can modify this code to help you debug, please
 9 | // test with the original before submitting.
10 | //
11 | 
12 | import "sync"
13 | 
14 | type Persister struct {
15 | 	mu        sync.Mutex
16 | 	raftstate []byte
17 | 	snapshot  []byte
18 | }
19 | 
20 | func MakePersister() *Persister {
21 | 	return &Persister{}
22 | }
23 | 
24 | func (ps *Persister) Copy() *Persister {
25 | 	ps.mu.Lock()
26 | 	defer ps.mu.Unlock()
27 | 	np := MakePersister()
28 | 	np.raftstate = ps.raftstate
29 | 	np.snapshot = ps.snapshot
30 | 	return np
31 | }
32 | 
33 | func (ps *Persister) SaveRaftState(state []byte) {
34 | 	ps.mu.Lock()
35 | 	defer ps.mu.Unlock()
36 | 	ps.raftstate = state
37 | }
38 | 
39 | func (ps *Persister) ReadRaftState() []byte {
40 | 	ps.mu.Lock()
41 | 	defer ps.mu.Unlock()
42 | 	return ps.raftstate
43 | }
44 | 
45 | func (ps *Persister) RaftStateSize() int {
46 | 	ps.mu.Lock()
47 | 	defer ps.mu.Unlock()
48 | 	return len(ps.raftstate)
49 | }
50 | 
51 | // Save both Raft state and K/V snapshot as a single atomic action,
52 | // to help avoid them getting out of sync.
53 | func (ps *Persister) SaveStateAndSnapshot(state []byte, snapshot []byte) {
54 | 	ps.mu.Lock()
55 | 	defer ps.mu.Unlock()
56 | 	ps.raftstate = state
57 | 	ps.snapshot = snapshot
58 | }
59 | 
60 | func (ps *Persister) ReadSnapshot() []byte {
61 | 	ps.mu.Lock()
62 | 	defer ps.mu.Unlock()
63 | 	return ps.snapshot
64 | }
65 | 
66 | func (ps *Persister) SnapshotSize() int {
67 | 	ps.mu.Lock()
68 | 	defer ps.mu.Unlock()
69 | 	return len(ps.snapshot)
70 | }
71 | 


--------------------------------------------------------------------------------
/project1/src/raft/raft.go:
--------------------------------------------------------------------------------
  1 | package raft
  2 | 
  3 | //
  4 | // this is an outline of the API that raft must expose to
  5 | // the service (or tester). see comments below for
  6 | // each of these functions for more details.
  7 | //
  8 | // rf = Make(...)
  9 | //   create a new Raft server.
 10 | // rf.Start(command interface{}) (index, term, isleader)
 11 | //   start agreement on a new log entry
 12 | // rf.GetState() (term, isLeader)
 13 | //   ask a Raft for its current term, and whether it thinks it is leader
 14 | // ApplyMsg
 15 | //   each time a new entry is committed to the log, each Raft peer
 16 | //   should send an ApplyMsg to the service (or tester)
 17 | //   in the same server.
 18 | //
 19 | 
 20 | import "sync"
 21 | import "sync/atomic"
 22 | import "../labrpc"
 23 | 
 24 | // import "bytes"
 25 | // import "../labgob"
 26 | 
 27 | 
 28 | 
 29 | //
 30 | // as each Raft peer becomes aware that successive log entries are
 31 | // committed, the peer should send an ApplyMsg to the service (or
 32 | // tester) on the same server, via the applyCh passed to Make(). set
 33 | // CommandValid to true to indicate that the ApplyMsg contains a newly
 34 | // committed log entry.
 35 | //
 36 | // in Lab 3 you'll want to send other kinds of messages (e.g.,
 37 | // snapshots) on the applyCh; at that point you can add fields to
 38 | // ApplyMsg, but set CommandValid to false for these other uses.
 39 | //
 40 | type ApplyMsg struct {
 41 | 	CommandValid bool
 42 | 	Command      interface{}
 43 | 	CommandIndex int
 44 | }
 45 | 
 46 | //
 47 | // A Go object implementing a single Raft peer.
 48 | //
 49 | type Raft struct {
 50 | 	mu        sync.Mutex          // Lock to protect shared access to this peer's state
 51 | 	peers     []*labrpc.ClientEnd // RPC end points of all peers
 52 | 	persister *Persister          // Object to hold this peer's persisted state
 53 | 	me        int                 // this peer's index into peers[]
 54 | 	dead      int32               // set by Kill()
 55 | 
 56 | 	// Your data here (2A, 2B, 2C).
 57 | 	// Look at the paper's Figure 2 for a description of what
 58 | 	// state a Raft server must maintain.
 59 | 
 60 | }
 61 | 
 62 | // return currentTerm and whether this server
 63 | // believes it is the leader.
 64 | func (rf *Raft) GetState() (int, bool) {
 65 | 
 66 | 	var term int
 67 | 	var isleader bool
 68 | 	// Your code here (2A).
 69 | 	return term, isleader
 70 | }
 71 | 
 72 | //
 73 | // save Raft's persistent state to stable storage,
 74 | // where it can later be retrieved after a crash and restart.
 75 | // see paper's Figure 2 for a description of what should be persistent.
 76 | //
 77 | func (rf *Raft) persist() {
 78 | 	// Your code here (2C).
 79 | 	// Example:
 80 | 	// w := new(bytes.Buffer)
 81 | 	// e := labgob.NewEncoder(w)
 82 | 	// e.Encode(rf.xxx)
 83 | 	// e.Encode(rf.yyy)
 84 | 	// data := w.Bytes()
 85 | 	// rf.persister.SaveRaftState(data)
 86 | }
 87 | 
 88 | 
 89 | //
 90 | // restore previously persisted state.
 91 | //
 92 | func (rf *Raft) readPersist(data []byte) {
 93 | 	if data == nil || len(data) < 1 { // bootstrap without any state?
 94 | 		return
 95 | 	}
 96 | 	// Your code here (2C).
 97 | 	// Example:
 98 | 	// r := bytes.NewBuffer(data)
 99 | 	// d := labgob.NewDecoder(r)
100 | 	// var xxx
101 | 	// var yyy
102 | 	// if d.Decode(&xxx) != nil ||
103 | 	//    d.Decode(&yyy) != nil {
104 | 	//   error...
105 | 	// } else {
106 | 	//   rf.xxx = xxx
107 | 	//   rf.yyy = yyy
108 | 	// }
109 | }
110 | 
111 | 
112 | 
113 | 
114 | //
115 | // example RequestVote RPC arguments structure.
116 | // field names must start with capital letters!
117 | //
118 | type RequestVoteArgs struct {
119 | 	// Your data here (2A, 2B).
120 | }
121 | 
122 | //
123 | // example RequestVote RPC reply structure.
124 | // field names must start with capital letters!
125 | //
126 | type RequestVoteReply struct {
127 | 	// Your data here (2A).
128 | }
129 | 
130 | //
131 | // example RequestVote RPC handler.
132 | //
133 | func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
134 | 	// Your code here (2A, 2B).
135 | }
136 | 
137 | //
138 | // example code to send a RequestVote RPC to a server.
139 | // server is the index of the target server in rf.peers[].
140 | // expects RPC arguments in args.
141 | // fills in *reply with RPC reply, so caller should
142 | // pass &reply.
143 | // the types of the args and reply passed to Call() must be
144 | // the same as the types of the arguments declared in the
145 | // handler function (including whether they are pointers).
146 | //
147 | // The labrpc package simulates a lossy network, in which servers
148 | // may be unreachable, and in which requests and replies may be lost.
149 | // Call() sends a request and waits for a reply. If a reply arrives
150 | // within a timeout interval, Call() returns true; otherwise
151 | // Call() returns false. Thus Call() may not return for a while.
152 | // A false return can be caused by a dead server, a live server that
153 | // can't be reached, a lost request, or a lost reply.
154 | //
155 | // Call() is guaranteed to return (perhaps after a delay) *except* if the
156 | // handler function on the server side does not return.  Thus there
157 | // is no need to implement your own timeouts around Call().
158 | //
159 | // look at the comments in ../labrpc/labrpc.go for more details.
160 | //
161 | // if you're having trouble getting RPC to work, check that you've
162 | // capitalized all field names in structs passed over RPC, and
163 | // that the caller passes the address of the reply struct with &, not
164 | // the struct itself.
165 | //
166 | func (rf *Raft) sendRequestVote(server int, args *RequestVoteArgs, reply *RequestVoteReply) bool {
167 | 	ok := rf.peers[server].Call("Raft.RequestVote", args, reply)
168 | 	return ok
169 | }
170 | 
171 | 
172 | //
173 | // the service using Raft (e.g. a k/v server) wants to start
174 | // agreement on the next command to be appended to Raft's log. if this
175 | // server isn't the leader, returns false. otherwise start the
176 | // agreement and return immediately. there is no guarantee that this
177 | // command will ever be committed to the Raft log, since the leader
178 | // may fail or lose an election. even if the Raft instance has been killed,
179 | // this function should return gracefully.
180 | //
181 | // the first return value is the index that the command will appear at
182 | // if it's ever committed. the second return value is the current
183 | // term. the third return value is true if this server believes it is
184 | // the leader.
185 | //
186 | func (rf *Raft) Start(command interface{}) (int, int, bool) {
187 | 	index := -1
188 | 	term := -1
189 | 	isLeader := true
190 | 
191 | 	// Your code here (2B).
192 | 
193 | 
194 | 	return index, term, isLeader
195 | }
196 | 
197 | //
198 | // the tester doesn't halt goroutines created by Raft after each test,
199 | // but it does call the Kill() method. your code can use killed() to
200 | // check whether Kill() has been called. the use of atomic avoids the
201 | // need for a lock.
202 | //
203 | // the issue is that long-running goroutines use memory and may chew
204 | // up CPU time, perhaps causing later tests to fail and generating
205 | // confusing debug output. any goroutine with a long-running loop
206 | // should call killed() to check whether it should stop.
207 | //
208 | func (rf *Raft) Kill() {
209 | 	atomic.StoreInt32(&rf.dead, 1)
210 | 	// Your code here, if desired.
211 | }
212 | 
213 | func (rf *Raft) killed() bool {
214 | 	z := atomic.LoadInt32(&rf.dead)
215 | 	return z == 1
216 | }
217 | 
218 | //
219 | // the service or tester wants to create a Raft server. the ports
220 | // of all the Raft servers (including this one) are in peers[]. this
221 | // server's port is peers[me]. all the servers' peers[] arrays
222 | // have the same order. persister is a place for this server to
223 | // save its persistent state, and also initially holds the most
224 | // recent saved state, if any. applyCh is a channel on which the
225 | // tester or service expects Raft to send ApplyMsg messages.
226 | // Make() must return quickly, so it should start goroutines
227 | // for any long-running work.
228 | //
229 | func Make(peers []*labrpc.ClientEnd, me int,
230 | 	persister *Persister, applyCh chan ApplyMsg) *Raft {
231 | 	rf := &Raft{}
232 | 	rf.peers = peers
233 | 	rf.persister = persister
234 | 	rf.me = me
235 | 
236 | 	// Your initialization code here (2A, 2B, 2C).
237 | 
238 | 	// initialize from state persisted before a crash
239 | 	rf.readPersist(persister.ReadRaftState())
240 | 
241 | 
242 | 	return rf
243 | }
244 | 


--------------------------------------------------------------------------------
/project1/src/raft/test_test.go:
--------------------------------------------------------------------------------
  1 | package raft
  2 | 
  3 | //
  4 | // Raft tests.
  5 | //
  6 | // we will use the original test_test.go to test your code for grading.
  7 | // so, while you can modify this code to help you debug, please
  8 | // test with the original before submitting.
  9 | //
 10 | 
 11 | import "testing"
 12 | import "fmt"
 13 | import "time"
 14 | import "math/rand"
 15 | import "sync/atomic"
 16 | import "sync"
 17 | 
 18 | // The tester generously allows solutions to complete elections in one second
 19 | // (much more than the paper's range of timeouts).
 20 | const RaftElectionTimeout = 1000 * time.Millisecond
 21 | 
 22 | func TestInitialElection2A(t *testing.T) {
 23 | 	servers := 3
 24 | 	cfg := make_config(t, servers, false)
 25 | 	defer cfg.cleanup()
 26 | 
 27 | 	cfg.begin("Test (2A): initial election")
 28 | 
 29 | 	// is a leader elected?
 30 | 	cfg.checkOneLeader()
 31 | 
 32 | 	// sleep a bit to avoid racing with followers learning of the
 33 | 	// election, then check that all peers agree on the term.
 34 | 	time.Sleep(50 * time.Millisecond)
 35 | 	term1 := cfg.checkTerms()
 36 | 	if term1 < 1 {
 37 | 		t.Fatalf("term is %v, but should be at least 1", term1)
 38 | 	}
 39 | 
 40 | 	// does the leader+term stay the same if there is no network failure?
 41 | 	time.Sleep(2 * RaftElectionTimeout)
 42 | 	term2 := cfg.checkTerms()
 43 | 	if term1 != term2 {
 44 | 		fmt.Printf("warning: term changed even though there were no failures")
 45 | 	}
 46 | 
 47 | 	// there should still be a leader.
 48 | 	cfg.checkOneLeader()
 49 | 
 50 | 	cfg.end()
 51 | }
 52 | 
 53 | func TestReElection2A(t *testing.T) {
 54 | 	servers := 3
 55 | 	cfg := make_config(t, servers, false)
 56 | 	defer cfg.cleanup()
 57 | 
 58 | 	cfg.begin("Test (2A): election after network failure")
 59 | 
 60 | 	leader1 := cfg.checkOneLeader()
 61 | 
 62 | 	// if the leader disconnects, a new one should be elected.
 63 | 	cfg.disconnect(leader1)
 64 | 	cfg.checkOneLeader()
 65 | 
 66 | 	// if the old leader rejoins, that shouldn't
 67 | 	// disturb the new leader.
 68 | 	cfg.connect(leader1)
 69 | 	leader2 := cfg.checkOneLeader()
 70 | 
 71 | 	// if there's no quorum, no leader should
 72 | 	// be elected.
 73 | 	cfg.disconnect(leader2)
 74 | 	cfg.disconnect((leader2 + 1) % servers)
 75 | 	time.Sleep(2 * RaftElectionTimeout)
 76 | 	cfg.checkNoLeader()
 77 | 
 78 | 	// if a quorum arises, it should elect a leader.
 79 | 	cfg.connect((leader2 + 1) % servers)
 80 | 	cfg.checkOneLeader()
 81 | 
 82 | 	// re-join of last node shouldn't prevent leader from existing.
 83 | 	cfg.connect(leader2)
 84 | 	cfg.checkOneLeader()
 85 | 
 86 | 	cfg.end()
 87 | }
 88 | 
 89 | func TestBasicAgree2B(t *testing.T) {
 90 | 	servers := 3
 91 | 	cfg := make_config(t, servers, false)
 92 | 	defer cfg.cleanup()
 93 | 
 94 | 	cfg.begin("Test (2B): basic agreement")
 95 | 
 96 | 	iters := 3
 97 | 	for index := 1; index < iters+1; index++ {
 98 | 		nd, _ := cfg.nCommitted(index)
 99 | 		if nd > 0 {
100 | 			t.Fatalf("some have committed before Start()")
101 | 		}
102 | 
103 | 		xindex := cfg.one(index*100, servers, false)
104 | 		if xindex != index {
105 | 			t.Fatalf("got index %v but expected %v", xindex, index)
106 | 		}
107 | 	}
108 | 
109 | 	cfg.end()
110 | }
111 | 
112 | //
113 | // check, based on counting bytes of RPCs, that
114 | // each command is sent to each peer just once.
115 | //
116 | func TestRPCBytes2B(t *testing.T) {
117 | 	servers := 3
118 | 	cfg := make_config(t, servers, false)
119 | 	defer cfg.cleanup()
120 | 
121 | 	cfg.begin("Test (2B): RPC byte count")
122 | 
123 | 	cfg.one(99, servers, false)
124 | 	bytes0 := cfg.bytesTotal()
125 | 
126 | 	iters := 10
127 | 	var sent int64 = 0
128 | 	for index := 2; index < iters+2; index++ {
129 | 		cmd := randstring(5000)
130 | 		xindex := cfg.one(cmd, servers, false)
131 | 		if xindex != index {
132 | 			t.Fatalf("got index %v but expected %v", xindex, index)
133 | 		}
134 | 		sent += int64(len(cmd))
135 | 	}
136 | 
137 | 	bytes1 := cfg.bytesTotal()
138 | 	got := bytes1 - bytes0
139 | 	expected := int64(servers) * sent
140 | 	if got > expected+50000 {
141 | 		t.Fatalf("too many RPC bytes; got %v, expected %v", got, expected)
142 | 	}
143 | 
144 | 	cfg.end()
145 | }
146 | 
147 | func TestFailAgree2B(t *testing.T) {
148 | 	servers := 3
149 | 	cfg := make_config(t, servers, false)
150 | 	defer cfg.cleanup()
151 | 
152 | 	cfg.begin("Test (2B): agreement despite follower disconnection")
153 | 
154 | 	cfg.one(101, servers, false)
155 | 
156 | 	// disconnect one follower from the network.
157 | 	leader := cfg.checkOneLeader()
158 | 	cfg.disconnect((leader + 1) % servers)
159 | 
160 | 	// the leader and remaining follower should be
161 | 	// able to agree despite the disconnected follower.
162 | 	cfg.one(102, servers-1, false)
163 | 	cfg.one(103, servers-1, false)
164 | 	time.Sleep(RaftElectionTimeout)
165 | 	cfg.one(104, servers-1, false)
166 | 	cfg.one(105, servers-1, false)
167 | 
168 | 	// re-connect
169 | 	cfg.connect((leader + 1) % servers)
170 | 
171 | 	// the full set of servers should preserve
172 | 	// previous agreements, and be able to agree
173 | 	// on new commands.
174 | 	cfg.one(106, servers, true)
175 | 	time.Sleep(RaftElectionTimeout)
176 | 	cfg.one(107, servers, true)
177 | 
178 | 	cfg.end()
179 | }
180 | 
181 | func TestFailNoAgree2B(t *testing.T) {
182 | 	servers := 5
183 | 	cfg := make_config(t, servers, false)
184 | 	defer cfg.cleanup()
185 | 
186 | 	cfg.begin("Test (2B): no agreement if too many followers disconnect")
187 | 
188 | 	cfg.one(10, servers, false)
189 | 
190 | 	// 3 of 5 followers disconnect
191 | 	leader := cfg.checkOneLeader()
192 | 	cfg.disconnect((leader + 1) % servers)
193 | 	cfg.disconnect((leader + 2) % servers)
194 | 	cfg.disconnect((leader + 3) % servers)
195 | 
196 | 	index, _, ok := cfg.rafts[leader].Start(20)
197 | 	if ok != true {
198 | 		t.Fatalf("leader rejected Start()")
199 | 	}
200 | 	if index != 2 {
201 | 		t.Fatalf("expected index 2, got %v", index)
202 | 	}
203 | 
204 | 	time.Sleep(2 * RaftElectionTimeout)
205 | 
206 | 	n, _ := cfg.nCommitted(index)
207 | 	if n > 0 {
208 | 		t.Fatalf("%v committed but no majority", n)
209 | 	}
210 | 
211 | 	// repair
212 | 	cfg.connect((leader + 1) % servers)
213 | 	cfg.connect((leader + 2) % servers)
214 | 	cfg.connect((leader + 3) % servers)
215 | 
216 | 	// the disconnected majority may have chosen a leader from
217 | 	// among their own ranks, forgetting index 2.
218 | 	leader2 := cfg.checkOneLeader()
219 | 	index2, _, ok2 := cfg.rafts[leader2].Start(30)
220 | 	if ok2 == false {
221 | 		t.Fatalf("leader2 rejected Start()")
222 | 	}
223 | 	if index2 < 2 || index2 > 3 {
224 | 		t.Fatalf("unexpected index %v", index2)
225 | 	}
226 | 
227 | 	cfg.one(1000, servers, true)
228 | 
229 | 	cfg.end()
230 | }
231 | 
232 | func TestConcurrentStarts2B(t *testing.T) {
233 | 	servers := 3
234 | 	cfg := make_config(t, servers, false)
235 | 	defer cfg.cleanup()
236 | 
237 | 	cfg.begin("Test (2B): concurrent Start()s")
238 | 
239 | 	var success bool
240 | loop:
241 | 	for try := 0; try < 5; try++ {
242 | 		if try > 0 {
243 | 			// give solution some time to settle
244 | 			time.Sleep(3 * time.Second)
245 | 		}
246 | 
247 | 		leader := cfg.checkOneLeader()
248 | 		_, term, ok := cfg.rafts[leader].Start(1)
249 | 		if !ok {
250 | 			// leader moved on really quickly
251 | 			continue
252 | 		}
253 | 
254 | 		iters := 5
255 | 		var wg sync.WaitGroup
256 | 		is := make(chan int, iters)
257 | 		for ii := 0; ii < iters; ii++ {
258 | 			wg.Add(1)
259 | 			go func(i int) {
260 | 				defer wg.Done()
261 | 				i, term1, ok := cfg.rafts[leader].Start(100 + i)
262 | 				if term1 != term {
263 | 					return
264 | 				}
265 | 				if ok != true {
266 | 					return
267 | 				}
268 | 				is <- i
269 | 			}(ii)
270 | 		}
271 | 
272 | 		wg.Wait()
273 | 		close(is)
274 | 
275 | 		for j := 0; j < servers; j++ {
276 | 			if t, _ := cfg.rafts[j].GetState(); t != term {
277 | 				// term changed -- can't expect low RPC counts
278 | 				continue loop
279 | 			}
280 | 		}
281 | 
282 | 		failed := false
283 | 		cmds := []int{}
284 | 		for index := range is {
285 | 			cmd := cfg.wait(index, servers, term)
286 | 			if ix, ok := cmd.(int); ok {
287 | 				if ix == -1 {
288 | 					// peers have moved on to later terms
289 | 					// so we can't expect all Start()s to
290 | 					// have succeeded
291 | 					failed = true
292 | 					break
293 | 				}
294 | 				cmds = append(cmds, ix)
295 | 			} else {
296 | 				t.Fatalf("value %v is not an int", cmd)
297 | 			}
298 | 		}
299 | 
300 | 		if failed {
301 | 			// avoid leaking goroutines
302 | 			go func() {
303 | 				for range is {
304 | 				}
305 | 			}()
306 | 			continue
307 | 		}
308 | 
309 | 		for ii := 0; ii < iters; ii++ {
310 | 			x := 100 + ii
311 | 			ok := false
312 | 			for j := 0; j < len(cmds); j++ {
313 | 				if cmds[j] == x {
314 | 					ok = true
315 | 				}
316 | 			}
317 | 			if ok == false {
318 | 				t.Fatalf("cmd %v missing in %v", x, cmds)
319 | 			}
320 | 		}
321 | 
322 | 		success = true
323 | 		break
324 | 	}
325 | 
326 | 	if !success {
327 | 		t.Fatalf("term changed too often")
328 | 	}
329 | 
330 | 	cfg.end()
331 | }
332 | 
333 | func TestRejoin2B(t *testing.T) {
334 | 	servers := 3
335 | 	cfg := make_config(t, servers, false)
336 | 	defer cfg.cleanup()
337 | 
338 | 	cfg.begin("Test (2B): rejoin of partitioned leader")
339 | 
340 | 	cfg.one(101, servers, true)
341 | 
342 | 	// leader network failure
343 | 	leader1 := cfg.checkOneLeader()
344 | 	cfg.disconnect(leader1)
345 | 
346 | 	// make old leader try to agree on some entries
347 | 	cfg.rafts[leader1].Start(102)
348 | 	cfg.rafts[leader1].Start(103)
349 | 	cfg.rafts[leader1].Start(104)
350 | 
351 | 	// new leader commits, also for index=2
352 | 	cfg.one(103, 2, true)
353 | 
354 | 	// new leader network failure
355 | 	leader2 := cfg.checkOneLeader()
356 | 	cfg.disconnect(leader2)
357 | 
358 | 	// old leader connected again
359 | 	cfg.connect(leader1)
360 | 
361 | 	cfg.one(104, 2, true)
362 | 
363 | 	// all together now
364 | 	cfg.connect(leader2)
365 | 
366 | 	cfg.one(105, servers, true)
367 | 
368 | 	cfg.end()
369 | }
370 | 
371 | func TestBackup2B(t *testing.T) {
372 | 	servers := 5
373 | 	cfg := make_config(t, servers, false)
374 | 	defer cfg.cleanup()
375 | 
376 | 	cfg.begin("Test (2B): leader backs up quickly over incorrect follower logs")
377 | 
378 | 	cfg.one(rand.Int(), servers, true)
379 | 
380 | 	// put leader and one follower in a partition
381 | 	leader1 := cfg.checkOneLeader()
382 | 	cfg.disconnect((leader1 + 2) % servers)
383 | 	cfg.disconnect((leader1 + 3) % servers)
384 | 	cfg.disconnect((leader1 + 4) % servers)
385 | 
386 | 	// submit lots of commands that won't commit
387 | 	for i := 0; i < 50; i++ {
388 | 		cfg.rafts[leader1].Start(rand.Int())
389 | 	}
390 | 
391 | 	time.Sleep(RaftElectionTimeout / 2)
392 | 
393 | 	cfg.disconnect((leader1 + 0) % servers)
394 | 	cfg.disconnect((leader1 + 1) % servers)
395 | 
396 | 	// allow other partition to recover
397 | 	cfg.connect((leader1 + 2) % servers)
398 | 	cfg.connect((leader1 + 3) % servers)
399 | 	cfg.connect((leader1 + 4) % servers)
400 | 
401 | 	// lots of successful commands to new group.
402 | 	for i := 0; i < 50; i++ {
403 | 		cfg.one(rand.Int(), 3, true)
404 | 	}
405 | 
406 | 	// now another partitioned leader and one follower
407 | 	leader2 := cfg.checkOneLeader()
408 | 	other := (leader1 + 2) % servers
409 | 	if leader2 == other {
410 | 		other = (leader2 + 1) % servers
411 | 	}
412 | 	cfg.disconnect(other)
413 | 
414 | 	// lots more commands that won't commit
415 | 	for i := 0; i < 50; i++ {
416 | 		cfg.rafts[leader2].Start(rand.Int())
417 | 	}
418 | 
419 | 	time.Sleep(RaftElectionTimeout / 2)
420 | 
421 | 	// bring original leader back to life,
422 | 	for i := 0; i < servers; i++ {
423 | 		cfg.disconnect(i)
424 | 	}
425 | 	cfg.connect((leader1 + 0) % servers)
426 | 	cfg.connect((leader1 + 1) % servers)
427 | 	cfg.connect(other)
428 | 
429 | 	// lots of successful commands to new group.
430 | 	for i := 0; i < 50; i++ {
431 | 		cfg.one(rand.Int(), 3, true)
432 | 	}
433 | 
434 | 	// now everyone
435 | 	for i := 0; i < servers; i++ {
436 | 		cfg.connect(i)
437 | 	}
438 | 	cfg.one(rand.Int(), servers, true)
439 | 
440 | 	cfg.end()
441 | }
442 | 
443 | func TestCount2B(t *testing.T) {
444 | 	servers := 3
445 | 	cfg := make_config(t, servers, false)
446 | 	defer cfg.cleanup()
447 | 
448 | 	cfg.begin("Test (2B): RPC counts aren't too high")
449 | 
450 | 	rpcs := func() (n int) {
451 | 		for j := 0; j < servers; j++ {
452 | 			n += cfg.rpcCount(j)
453 | 		}
454 | 		return
455 | 	}
456 | 
457 | 	leader := cfg.checkOneLeader()
458 | 
459 | 	total1 := rpcs()
460 | 
461 | 	if total1 > 30 || total1 < 1 {
462 | 		t.Fatalf("too many or few RPCs (%v) to elect initial leader\n", total1)
463 | 	}
464 | 
465 | 	var total2 int
466 | 	var success bool
467 | loop:
468 | 	for try := 0; try < 5; try++ {
469 | 		if try > 0 {
470 | 			// give solution some time to settle
471 | 			time.Sleep(3 * time.Second)
472 | 		}
473 | 
474 | 		leader = cfg.checkOneLeader()
475 | 		total1 = rpcs()
476 | 
477 | 		iters := 10
478 | 		starti, term, ok := cfg.rafts[leader].Start(1)
479 | 		if !ok {
480 | 			// leader moved on really quickly
481 | 			continue
482 | 		}
483 | 		cmds := []int{}
484 | 		for i := 1; i < iters+2; i++ {
485 | 			x := int(rand.Int31())
486 | 			cmds = append(cmds, x)
487 | 			index1, term1, ok := cfg.rafts[leader].Start(x)
488 | 			if term1 != term {
489 | 				// Term changed while starting
490 | 				continue loop
491 | 			}
492 | 			if !ok {
493 | 				// No longer the leader, so term has changed
494 | 				continue loop
495 | 			}
496 | 			if starti+i != index1 {
497 | 				t.Fatalf("Start() failed")
498 | 			}
499 | 		}
500 | 
501 | 		for i := 1; i < iters+1; i++ {
502 | 			cmd := cfg.wait(starti+i, servers, term)
503 | 			if ix, ok := cmd.(int); ok == false || ix != cmds[i-1] {
504 | 				if ix == -1 {
505 | 					// term changed -- try again
506 | 					continue loop
507 | 				}
508 | 				t.Fatalf("wrong value %v committed for index %v; expected %v\n", cmd, starti+i, cmds)
509 | 			}
510 | 		}
511 | 
512 | 		failed := false
513 | 		total2 = 0
514 | 		for j := 0; j < servers; j++ {
515 | 			if t, _ := cfg.rafts[j].GetState(); t != term {
516 | 				// term changed -- can't expect low RPC counts
517 | 				// need to keep going to update total2
518 | 				failed = true
519 | 			}
520 | 			total2 += cfg.rpcCount(j)
521 | 		}
522 | 
523 | 		if failed {
524 | 			continue loop
525 | 		}
526 | 
527 | 		if total2-total1 > (iters+1+3)*3 {
528 | 			t.Fatalf("too many RPCs (%v) for %v entries\n", total2-total1, iters)
529 | 		}
530 | 
531 | 		success = true
532 | 		break
533 | 	}
534 | 
535 | 	if !success {
536 | 		t.Fatalf("term changed too often")
537 | 	}
538 | 
539 | 	time.Sleep(RaftElectionTimeout)
540 | 
541 | 	total3 := 0
542 | 	for j := 0; j < servers; j++ {
543 | 		total3 += cfg.rpcCount(j)
544 | 	}
545 | 
546 | 	if total3-total2 > 3*20 {
547 | 		t.Fatalf("too many RPCs (%v) for 1 second of idleness\n", total3-total2)
548 | 	}
549 | 
550 | 	cfg.end()
551 | }
552 | 
553 | func TestPersist12C(t *testing.T) {
554 | 	servers := 3
555 | 	cfg := make_config(t, servers, false)
556 | 	defer cfg.cleanup()
557 | 
558 | 	cfg.begin("Test (2C): basic persistence")
559 | 
560 | 	cfg.one(11, servers, true)
561 | 
562 | 	// crash and re-start all
563 | 	for i := 0; i < servers; i++ {
564 | 		cfg.start1(i)
565 | 	}
566 | 	for i := 0; i < servers; i++ {
567 | 		cfg.disconnect(i)
568 | 		cfg.connect(i)
569 | 	}
570 | 
571 | 	cfg.one(12, servers, true)
572 | 
573 | 	leader1 := cfg.checkOneLeader()
574 | 	cfg.disconnect(leader1)
575 | 	cfg.start1(leader1)
576 | 	cfg.connect(leader1)
577 | 
578 | 	cfg.one(13, servers, true)
579 | 
580 | 	leader2 := cfg.checkOneLeader()
581 | 	cfg.disconnect(leader2)
582 | 	cfg.one(14, servers-1, true)
583 | 	cfg.start1(leader2)
584 | 	cfg.connect(leader2)
585 | 
586 | 	cfg.wait(4, servers, -1) // wait for leader2 to join before killing i3
587 | 
588 | 	i3 := (cfg.checkOneLeader() + 1) % servers
589 | 	cfg.disconnect(i3)
590 | 	cfg.one(15, servers-1, true)
591 | 	cfg.start1(i3)
592 | 	cfg.connect(i3)
593 | 
594 | 	cfg.one(16, servers, true)
595 | 
596 | 	cfg.end()
597 | }
598 | 
599 | func TestPersist22C(t *testing.T) {
600 | 	servers := 5
601 | 	cfg := make_config(t, servers, false)
602 | 	defer cfg.cleanup()
603 | 
604 | 	cfg.begin("Test (2C): more persistence")
605 | 
606 | 	index := 1
607 | 	for iters := 0; iters < 5; iters++ {
608 | 		cfg.one(10+index, servers, true)
609 | 		index++
610 | 
611 | 		leader1 := cfg.checkOneLeader()
612 | 
613 | 		cfg.disconnect((leader1 + 1) % servers)
614 | 		cfg.disconnect((leader1 + 2) % servers)
615 | 
616 | 		cfg.one(10+index, servers-2, true)
617 | 		index++
618 | 
619 | 		cfg.disconnect((leader1 + 0) % servers)
620 | 		cfg.disconnect((leader1 + 3) % servers)
621 | 		cfg.disconnect((leader1 + 4) % servers)
622 | 
623 | 		cfg.start1((leader1 + 1) % servers)
624 | 		cfg.start1((leader1 + 2) % servers)
625 | 		cfg.connect((leader1 + 1) % servers)
626 | 		cfg.connect((leader1 + 2) % servers)
627 | 
628 | 		time.Sleep(RaftElectionTimeout)
629 | 
630 | 		cfg.start1((leader1 + 3) % servers)
631 | 		cfg.connect((leader1 + 3) % servers)
632 | 
633 | 		cfg.one(10+index, servers-2, true)
634 | 		index++
635 | 
636 | 		cfg.connect((leader1 + 4) % servers)
637 | 		cfg.connect((leader1 + 0) % servers)
638 | 	}
639 | 
640 | 	cfg.one(1000, servers, true)
641 | 
642 | 	cfg.end()
643 | }
644 | 
645 | func TestPersist32C(t *testing.T) {
646 | 	servers := 3
647 | 	cfg := make_config(t, servers, false)
648 | 	defer cfg.cleanup()
649 | 
650 | 	cfg.begin("Test (2C): partitioned leader and one follower crash, leader restarts")
651 | 
652 | 	cfg.one(101, 3, true)
653 | 
654 | 	leader := cfg.checkOneLeader()
655 | 	cfg.disconnect((leader + 2) % servers)
656 | 
657 | 	cfg.one(102, 2, true)
658 | 
659 | 	cfg.crash1((leader + 0) % servers)
660 | 	cfg.crash1((leader + 1) % servers)
661 | 	cfg.connect((leader + 2) % servers)
662 | 	cfg.start1((leader + 0) % servers)
663 | 	cfg.connect((leader + 0) % servers)
664 | 
665 | 	cfg.one(103, 2, true)
666 | 
667 | 	cfg.start1((leader + 1) % servers)
668 | 	cfg.connect((leader + 1) % servers)
669 | 
670 | 	cfg.one(104, servers, true)
671 | 
672 | 	cfg.end()
673 | }
674 | 
675 | //
676 | // Test the scenarios described in Figure 8 of the extended Raft paper. Each
677 | // iteration asks a leader, if there is one, to insert a command in the Raft
678 | // log.  If there is a leader, that leader will fail quickly with a high
679 | // probability (perhaps without committing the command), or crash after a while
680 | // with low probability (most likey committing the command).  If the number of
681 | // alive servers isn't enough to form a majority, perhaps start a new server.
682 | // The leader in a new term may try to finish replicating log entries that
683 | // haven't been committed yet.
684 | //
685 | func TestFigure82C(t *testing.T) {
686 | 	servers := 5
687 | 	cfg := make_config(t, servers, false)
688 | 	defer cfg.cleanup()
689 | 
690 | 	cfg.begin("Test (2C): Figure 8")
691 | 
692 | 	cfg.one(rand.Int(), 1, true)
693 | 
694 | 	nup := servers
695 | 	for iters := 0; iters < 1000; iters++ {
696 | 		leader := -1
697 | 		for i := 0; i < servers; i++ {
698 | 			if cfg.rafts[i] != nil {
699 | 				_, _, ok := cfg.rafts[i].Start(rand.Int())
700 | 				if ok {
701 | 					leader = i
702 | 				}
703 | 			}
704 | 		}
705 | 
706 | 		if (rand.Int() % 1000) < 100 {
707 | 			ms := rand.Int63() % (int64(RaftElectionTimeout/time.Millisecond) / 2)
708 | 			time.Sleep(time.Duration(ms) * time.Millisecond)
709 | 		} else {
710 | 			ms := (rand.Int63() % 13)
711 | 			time.Sleep(time.Duration(ms) * time.Millisecond)
712 | 		}
713 | 
714 | 		if leader != -1 {
715 | 			cfg.crash1(leader)
716 | 			nup -= 1
717 | 		}
718 | 
719 | 		if nup < 3 {
720 | 			s := rand.Int() % servers
721 | 			if cfg.rafts[s] == nil {
722 | 				cfg.start1(s)
723 | 				cfg.connect(s)
724 | 				nup += 1
725 | 			}
726 | 		}
727 | 	}
728 | 
729 | 	for i := 0; i < servers; i++ {
730 | 		if cfg.rafts[i] == nil {
731 | 			cfg.start1(i)
732 | 			cfg.connect(i)
733 | 		}
734 | 	}
735 | 
736 | 	cfg.one(rand.Int(), servers, true)
737 | 
738 | 	cfg.end()
739 | }
740 | 
741 | func TestUnreliableAgree2C(t *testing.T) {
742 | 	servers := 5
743 | 	cfg := make_config(t, servers, true)
744 | 	defer cfg.cleanup()
745 | 
746 | 	cfg.begin("Test (2C): unreliable agreement")
747 | 
748 | 	var wg sync.WaitGroup
749 | 
750 | 	for iters := 1; iters < 50; iters++ {
751 | 		for j := 0; j < 4; j++ {
752 | 			wg.Add(1)
753 | 			go func(iters, j int) {
754 | 				defer wg.Done()
755 | 				cfg.one((100*iters)+j, 1, true)
756 | 			}(iters, j)
757 | 		}
758 | 		cfg.one(iters, 1, true)
759 | 	}
760 | 
761 | 	cfg.setunreliable(false)
762 | 
763 | 	wg.Wait()
764 | 
765 | 	cfg.one(100, servers, true)
766 | 
767 | 	cfg.end()
768 | }
769 | 
770 | func TestFigure8Unreliable2C(t *testing.T) {
771 | 	servers := 5
772 | 	cfg := make_config(t, servers, true)
773 | 	defer cfg.cleanup()
774 | 
775 | 	cfg.begin("Test (2C): Figure 8 (unreliable)")
776 | 
777 | 	cfg.one(rand.Int()%10000, 1, true)
778 | 
779 | 	nup := servers
780 | 	for iters := 0; iters < 1000; iters++ {
781 | 		if iters == 200 {
782 | 			cfg.setlongreordering(true)
783 | 		}
784 | 		leader := -1
785 | 		for i := 0; i < servers; i++ {
786 | 			_, _, ok := cfg.rafts[i].Start(rand.Int() % 10000)
787 | 			if ok && cfg.connected[i] {
788 | 				leader = i
789 | 			}
790 | 		}
791 | 
792 | 		if (rand.Int() % 1000) < 100 {
793 | 			ms := rand.Int63() % (int64(RaftElectionTimeout/time.Millisecond) / 2)
794 | 			time.Sleep(time.Duration(ms) * time.Millisecond)
795 | 		} else {
796 | 			ms := (rand.Int63() % 13)
797 | 			time.Sleep(time.Duration(ms) * time.Millisecond)
798 | 		}
799 | 
800 | 		if leader != -1 && (rand.Int()%1000) < int(RaftElectionTimeout/time.Millisecond)/2 {
801 | 			cfg.disconnect(leader)
802 | 			nup -= 1
803 | 		}
804 | 
805 | 		if nup < 3 {
806 | 			s := rand.Int() % servers
807 | 			if cfg.connected[s] == false {
808 | 				cfg.connect(s)
809 | 				nup += 1
810 | 			}
811 | 		}
812 | 	}
813 | 
814 | 	for i := 0; i < servers; i++ {
815 | 		if cfg.connected[i] == false {
816 | 			cfg.connect(i)
817 | 		}
818 | 	}
819 | 
820 | 	cfg.one(rand.Int()%10000, servers, true)
821 | 
822 | 	cfg.end()
823 | }
824 | 
825 | func internalChurn(t *testing.T, unreliable bool) {
826 | 
827 | 	servers := 5
828 | 	cfg := make_config(t, servers, unreliable)
829 | 	defer cfg.cleanup()
830 | 
831 | 	if unreliable {
832 | 		cfg.begin("Test (2C): unreliable churn")
833 | 	} else {
834 | 		cfg.begin("Test (2C): churn")
835 | 	}
836 | 
837 | 	stop := int32(0)
838 | 
839 | 	// create concurrent clients
840 | 	cfn := func(me int, ch chan []int) {
841 | 		var ret []int
842 | 		ret = nil
843 | 		defer func() { ch <- ret }()
844 | 		values := []int{}
845 | 		for atomic.LoadInt32(&stop) == 0 {
846 | 			x := rand.Int()
847 | 			index := -1
848 | 			ok := false
849 | 			for i := 0; i < servers; i++ {
850 | 				// try them all, maybe one of them is a leader
851 | 				cfg.mu.Lock()
852 | 				rf := cfg.rafts[i]
853 | 				cfg.mu.Unlock()
854 | 				if rf != nil {
855 | 					index1, _, ok1 := rf.Start(x)
856 | 					if ok1 {
857 | 						ok = ok1
858 | 						index = index1
859 | 					}
860 | 				}
861 | 			}
862 | 			if ok {
863 | 				// maybe leader will commit our value, maybe not.
864 | 				// but don't wait forever.
865 | 				for _, to := range []int{10, 20, 50, 100, 200} {
866 | 					nd, cmd := cfg.nCommitted(index)
867 | 					if nd > 0 {
868 | 						if xx, ok := cmd.(int); ok {
869 | 							if xx == x {
870 | 								values = append(values, x)
871 | 							}
872 | 						} else {
873 | 							cfg.t.Fatalf("wrong command type")
874 | 						}
875 | 						break
876 | 					}
877 | 					time.Sleep(time.Duration(to) * time.Millisecond)
878 | 				}
879 | 			} else {
880 | 				time.Sleep(time.Duration(79+me*17) * time.Millisecond)
881 | 			}
882 | 		}
883 | 		ret = values
884 | 	}
885 | 
886 | 	ncli := 3
887 | 	cha := []chan []int{}
888 | 	for i := 0; i < ncli; i++ {
889 | 		cha = append(cha, make(chan []int))
890 | 		go cfn(i, cha[i])
891 | 	}
892 | 
893 | 	for iters := 0; iters < 20; iters++ {
894 | 		if (rand.Int() % 1000) < 200 {
895 | 			i := rand.Int() % servers
896 | 			cfg.disconnect(i)
897 | 		}
898 | 
899 | 		if (rand.Int() % 1000) < 500 {
900 | 			i := rand.Int() % servers
901 | 			if cfg.rafts[i] == nil {
902 | 				cfg.start1(i)
903 | 			}
904 | 			cfg.connect(i)
905 | 		}
906 | 
907 | 		if (rand.Int() % 1000) < 200 {
908 | 			i := rand.Int() % servers
909 | 			if cfg.rafts[i] != nil {
910 | 				cfg.crash1(i)
911 | 			}
912 | 		}
913 | 
914 | 		// Make crash/restart infrequent enough that the peers can often
915 | 		// keep up, but not so infrequent that everything has settled
916 | 		// down from one change to the next. Pick a value smaller than
917 | 		// the election timeout, but not hugely smaller.
918 | 		time.Sleep((RaftElectionTimeout * 7) / 10)
919 | 	}
920 | 
921 | 	time.Sleep(RaftElectionTimeout)
922 | 	cfg.setunreliable(false)
923 | 	for i := 0; i < servers; i++ {
924 | 		if cfg.rafts[i] == nil {
925 | 			cfg.start1(i)
926 | 		}
927 | 		cfg.connect(i)
928 | 	}
929 | 
930 | 	atomic.StoreInt32(&stop, 1)
931 | 
932 | 	values := []int{}
933 | 	for i := 0; i < ncli; i++ {
934 | 		vv := <-cha[i]
935 | 		if vv == nil {
936 | 			t.Fatal("client failed")
937 | 		}
938 | 		values = append(values, vv...)
939 | 	}
940 | 
941 | 	time.Sleep(RaftElectionTimeout)
942 | 
943 | 	lastIndex := cfg.one(rand.Int(), servers, true)
944 | 
945 | 	really := make([]int, lastIndex+1)
946 | 	for index := 1; index <= lastIndex; index++ {
947 | 		v := cfg.wait(index, servers, -1)
948 | 		if vi, ok := v.(int); ok {
949 | 			really = append(really, vi)
950 | 		} else {
951 | 			t.Fatalf("not an int")
952 | 		}
953 | 	}
954 | 
955 | 	for _, v1 := range values {
956 | 		ok := false
957 | 		for _, v2 := range really {
958 | 			if v1 == v2 {
959 | 				ok = true
960 | 			}
961 | 		}
962 | 		if ok == false {
963 | 			cfg.t.Fatalf("didn't find a value")
964 | 		}
965 | 	}
966 | 
967 | 	cfg.end()
968 | }
969 | 
970 | func TestReliableChurn2C(t *testing.T) {
971 | 	internalChurn(t, false)
972 | }
973 | 
974 | func TestUnreliableChurn2C(t *testing.T) {
975 | 	internalChurn(t, true)
976 | }
977 | 


--------------------------------------------------------------------------------
/project1/src/raft/util.go:
--------------------------------------------------------------------------------
 1 | package raft
 2 | 
 3 | import "log"
 4 | 
 5 | // Debugging
 6 | const Debug = 0
 7 | 
 8 | func DPrintf(format string, a ...interface{}) (n int, err error) {
 9 | 	if Debug > 0 {
10 | 		log.Printf(format, a...)
11 | 	}
12 | 	return
13 | }
14 | 


--------------------------------------------------------------------------------
/slides/dist-sys-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/dist-sys-slides.pdf


--------------------------------------------------------------------------------
/slides/slides-class1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/slides-class1.pdf


--------------------------------------------------------------------------------
/slides/slides-lecture2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/slides-lecture2.pdf


--------------------------------------------------------------------------------
/slides/slides-lecture3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/slides-lecture3.pdf


--------------------------------------------------------------------------------
/slides/slides-lecture4.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/slides-lecture4.pdf


--------------------------------------------------------------------------------
/slides/slides-lecture5.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/slides-lecture5.pdf


--------------------------------------------------------------------------------
/slides/slides-lecture6.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/slides-lecture6.pdf


--------------------------------------------------------------------------------
/slides/slides-lecture7.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/slides-lecture7.pdf


--------------------------------------------------------------------------------
/slides/slides-lecture8.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides/slides-lecture8.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/.DS_Store


--------------------------------------------------------------------------------
/slides_from_spring_2020/all-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/all-slides.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-feb11.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-feb11.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-feb13.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-feb13.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-feb18.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-feb18.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-feb20.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-feb20.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-feb25.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-feb25.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-feb27.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-feb27.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-feb4.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-feb4.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-feb6.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-feb6.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-jan23.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-jan23.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-jan28.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-jan28.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-jan30.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-jan30.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-mar3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-mar3.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/dist-sys-slides-mar5.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/dist-sys-slides-mar5.pdf


--------------------------------------------------------------------------------
/slides_from_spring_2020/slides-spring20.key:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/slides-spring20.key


--------------------------------------------------------------------------------
/slides_from_spring_2020/slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vijay03/cs380d-s21/32d3e21bcb810293134f9518e3d55a5fc04164a7/slides_from_spring_2020/slides.pdf


--------------------------------------------------------------------------------