In this assignment, you will implement some useful algorithms that apply 27 | to friendship graphs of the Facebook kind. 28 | 29 |
You will work on this assignment in PAIRS. 30 | INDIVIDUAL SUBMISSIONS WILL NOT BE ACCEPTED. 31 | 32 |
Read 33 | DCS Academic Integrity Policy for Programming Assignments - you are responsible for abiding 34 | by the policy. In particular, note that "All Violations of the Academic Integrity Policy will 35 | be reported by the instructor to the appropriate Dean". (For this assignment, each 36 | pair of partners must write its own code.) 37 | 38 |
There will NOT be an extension pass option for this assignment. 39 | 40 |
In this program, you will implement some useful algorithms for 53 | graphs that represent friendships, such as Facebook. A friendship graph 54 | is an undirected graph without any weights on the edges. It is a simple 55 | graph because there are no self loops (a self loop is an edge from a vertex to 56 | itself), or multiple edges (a multiple edge means more than edge between a 57 | pair of vertices). 58 | 59 |
The vertices in the graphs for this assignment represent two kinds of people: 60 | students and non-students. Each vertex will store the name of the person. 61 | If the person is a student, the name of the school will also be stored. 62 | 63 |
Here's a sample friendship graph: 64 |
(sam,rutgers)---(jane,rutgers)-----(bob,rutgers) (sergei,rutgers) 65 | | | | 66 | | | | 67 | (kaitlin,rutgers) (samir)----(aparna,rutgers) 68 | | | 69 | | | 70 | (ming,penn state)----(nick,penn state)----(ricardo,penn state) 71 | | 72 | | 73 | (heather,penn state) 74 | 75 | 76 | (michele,cornell)----(rachel) 77 | | 78 | | 79 | (rich,ucla)---(tom,ucla) 80 |81 | Note that the graph may not be connected, as seen in this example in which there 82 | are two "islands" or cliques that are not connected to each other by any edge. 83 | Also see that all the vertices represent students with names of schools, except for 84 | rachel and samir, who are not students. 85 | 86 |
You want to be able to focus exclusively on students in a particular school, and 94 | all the friendships between them. To do this, you will have to extract an 95 | appropriate subgraph out of the full graph. Here is the subgraph of students at 96 | rutgers, extracted from the original sample graph: 97 | 98 |
(sam,rutgers)---(jane,rutgers)-----(bob,rutgers) (sergei,rutgers) 99 | | | 100 | | | 101 | (kaitlin,rutgers) (aparna,rutgers) 102 |103 | 104 |
sam wants an intro to aparna through friends and friends 107 | of friends. There are two possible chains of intros: 108 |
sam--jane--kaitlin--nick--ricardo--aparna 109 | or 110 | sam--jane-bob--samir--aparna 111 |112 | The second chain is preferable since it is shorter. 113 | 114 |
If sam wants to be introduced to michele through a chain of 115 | friends, he is out of luck since there is no chain that leads from sam to 116 | michele in the graph. 117 | 118 |
Note that this algorithm does NOT have any restriction on the composition of the 119 | vertices: a vertex along the shortest chain need NOT be a student at a particular 120 | school, or even a student. So, for instance, you may have to find the shortest intro 121 | chain from nick to samir, which has the following solution: 122 |
nick--ricardo--aparana--samir 123 |124 | which consists of two penn state students, one rutgers student, and one non-student. 125 | 126 |
Students tend to form cliques with their friends, which creates 129 | islands that do not connect with each other. If these cliques could be identified, 130 | particularly in the student population at a particular school, 131 | introductions could be made between people in different cliques to build larger 132 | networks of friendships at that school. 133 | 134 |
In the sample graph, there are two island cliques for students at rutgers: 135 |
(sam,rutgers)---(jane,rutgers)-----(bob,rutgers) (sergei,rutgers) 136 | | | 137 | | | 138 | (kaitlin,rutgers) (aparna,rutgers) 139 |140 | 141 |
If we were to look at students at penn state, instead, there is a single clique: 142 |
(ming,penn state)----(nick,penn state)----(ricardo,penn state) 143 | | 144 | | 145 | (heather,penn state) 146 |147 |
And again, a single clique for students at ucla: 148 |
(rich,ucla)---(tom,ucla) 149 |150 |
And one for students at cornell: 151 |
(michele,cornell) 152 |153 |
From these examples, it should be clear that if there is at least one student 154 | in the graph that goes to a particular school, then there must be at least one 155 | island clique in the graph for students at that school. 156 | 157 |
If jane were to leave rutgers, sam would no longer be able to 160 | connect with anyone else--jane was the "connector" who could pull 161 | sam in to hang out with her other friends. Similarly, aparna 162 | is a connector, since without her, sergei would not be able to 163 | "reach" anyone else. (And there are more connectors in the graph...) 164 | 165 |
On the other hand samir is not a connector. Even if he were to leave, 166 | everyone else could still "reach" whoever they could when samir was there, 167 | even though they may have to go through a longer chain. 168 | 169 |
A precise definition of a connector in any undirected graph is a vertex, v, 170 | such that there are at least two other vertices x and w for which every path 171 | between x and w goes through v. For example, v=jane, 172 | x=sam, and w=bob. 173 | 174 |
Finding all connectors in an undirected graph can be done using DFS (depth-first search), 175 | but keeping track of a couple more numbers for every vertex v. These are: 176 |
When the DFS backs up from a neighbor, w, to v, 190 | if dfsnum(v) ≤ back(w), then v is identified as 191 | a connector, IF v is NOT the starting point 192 | for the DFS. 193 | (If v is a starting point for DFS, it can be a connector, but 194 | another check must be made - see the examples below. The examples don't tell 195 | you how to identify such cases, that's up to you.) 196 | 197 |
Here are some examples that show how this works. 198 | 199 |
A--B--C 202 |203 | The DFS starts at A. Neighbors for a vertex are stored in REVERSE alphabetical order: 204 |
205 | A: B 206 | B: C,A 207 | C: B 208 | 209 | dfs @ A 1/1 (dfsnum/back) 210 | dfs @ B 2/2 211 | dfs @ C 3/3 212 | neighbor B is already visited => C 3/2 213 | dfsnum(B) <= back(C) B is a CONNECTOR 214 | nbr A is already visited => B 2/1 215 | dfsnum(A) <= back(B) A is starting point of DFS, NOT connector 216 |217 | 218 |
A--B--C 220 |221 | The same example as the first, except DFS starts at B. This shows how even thought B is 222 | the starting point, it is still identified (correctly) as a connector. The code is 223 | not complete because it does not show HOW B is determined to be a connector 224 | in the last line - that's for you to figure out. Neighbors are 225 | stored in reverse alphabetical order as before. 226 |
dfs @ B 1/1 227 | dfs @ C 2/2 228 | nbr B is already visited => C 2/1 229 | dfsnum(B) <= back(C) B is starting point, NOT connector 230 | dfs @ A 3/3 231 | nbr B is already visited => A 3/1 232 | dfsnum(B) <= back(A) B is starting point, but is a CONNECTOR 233 |234 | 235 |
A---B---C 237 | | | 238 | E---D---F 239 |240 |
DFS starts at A. Neighbors stored in reverse alphabetical order again: 241 |
A: B 242 | B: E,C,A 243 | C: D,B 244 | D: F,E,C 245 | E: D,B 246 | F: D 247 | 248 | dfs @ A 1/1 249 | dfs @ B 2/2 250 | dfs @ E 3/3 251 | dfs @ D 4/4 252 | dfs @ F 5/5 253 | nbr D is already visited => F 5/4 254 | dfsnum(D) <= back(F) => D is a CONNECTOR 255 | nbr E already visited => D 4/3 256 | dfs @ C 6/6 257 | nbr D already visited => C 6/4 258 | nbr B already visited => C 6/2 259 | dfsnum(D) > back(C) => D 4/2 260 | dfsnum(E) > back(D) => E 3/2 261 | nbr B is already visited => E 3/2 262 | dfsnum(B) <= back(E) => B is a CONNECTOR 263 | nbr C is already visited => B 2/2 264 | nbr A is already visited => B 2/1 265 | dfsnum(A) <= back(B) A is starting point, NOT connector 266 |267 | 268 |
A---B---C 270 | | | 271 | E---D---F 272 |273 |
Same example as the previous, except DFS starts at D, and neighbors stored 274 | in alphabetical order. Connectors are still correctly identified as B and D. 275 |
A: B 276 | B: A,C,E 277 | C: B,D 278 | D: C,E,F 279 | E: B,D 280 | F: D 281 | 282 | dfs @ D 1/1 283 | dfs @ C 2/2 284 | dfs @ B 3/3 285 | dfs @ A 4/4 286 | nbr B is already visited => A 4/3 287 | dfsnum(B) <= back(A) => B is a CONNECTOR 288 | nbr C is already visited => B 3/2 289 | dfs @ E 5/5 290 | nbr B is already visited => E 5/3 291 | nbr D is already visited => E 5/1 292 | dfsnum(B) > back(E) => B 3/1 293 | dfsnum(C) > back(B) => C 2/1 294 | nbr D is already visited => C 2/1 295 | dfsnum(D) <= back(C) D is starting point, NOT connector 296 | dfs @ F 6/6 297 | nbr D is already visited => F 6/1 298 | dfsnum(D) <= back(F) D is starting point, is a CONNECTOR 299 |
You will write a program called Friends that will read a graph file, 307 | build a graph (using the adjacency linked lists representation), and implement the 308 | four algorithms described above. The following describes the inputs and outputs 309 | for all the functionality you need to implement, as well as point assignment. 310 | 311 |
15 325 | sam|y|rutgers 326 | jane|y|rutgers 327 | michele|y|cornell 328 | sergei|y|rutgers 329 | ricardo|y|penn state 330 | kaitlin|y|rutgers 331 | samir|n 332 | aparna|y|rutgers 333 | ming|y|penn state 334 | nick|y|penn state 335 | bob|y|rutgers 336 | heather|y|penn state 337 | rachel|n 338 | rich|y|ucla 339 | tom|y|ucla 340 | sam|jane 341 | jane|bob 342 | jane|kaitlin 343 | kaitlin|nick 344 | bob|samir 345 | sergei|aparna 346 | samir|aparna 347 | aparna|ricardo 348 | nick|ricardo 349 | ming|nick 350 | heather|nick 351 | michele|rachel 352 | michele|tom 353 | tom|rich 354 |355 |
The first line has the number of people in the graph (15 in this case). 356 | 357 |
The next set of lines has information about the people in the graph, one line per 358 | person (15 lines in this example), with the '|' used to separate the fields. 359 | In each line, the first field is the name of 360 | the person. Names of people can have any character except '|', and are case 361 | insensitive. The second field is 'y' if 362 | the person is a student, and 'n' if not. The third field is 363 | only present for students, and is the name of the school the student attends. 364 | The name of a school can have any character except '|', and is 365 | case insensitive. 366 | 367 |
The last set of lines, following the people information, lists the friendships 368 | between people, one friendship per line. Since friendship works both ways, any 369 | friendship is only listed once, and the order in which the names of the friends 370 | is listed does not matter. 371 | 372 |
sam--jane-bob--samir--aparna 399 |400 | If there is no way to get from the first person to the second person, then the 401 | output should be a message to this effect. 402 |
Clique 1: 410 | 411 | <subgraph output> 412 | 413 | Clique 2: 414 | 415 | <subgraph output> 416 | 417 | etc... 418 |419 | 420 | Note: If there is even one student at the named school in the graph, then 421 | there must be at least 422 | one clique in the output. If the graph has no students at all at that school, 423 | then the output will be empty. 424 |
Remember, translating between a person's name and vertex number, as well as the other 441 | way around, should done efficiently. When implementing the algorithms, you will 442 | need to maintain additional data structures--pay particular attention to the space 443 | you use for the data structures. 444 | 445 |
Your program MUST be called Friends - in other words, you must have a filed 449 | named Friends.java with a main method. 450 | 451 |
You may implement as many classes as you want, and separate them into packages as needed. 452 | 453 |
You may import any of the classes from java.lang, 454 | java.io and java.util, but you may NOT 455 | import classes from any of the other packages in the standard Java API, and you may NOT 456 | import classes from any external java APIs. (Of course, if you have more than one package in your 457 | application, you can cross-import classes among them.) Note: The java.util.TreeMap 458 | class may be useful--it implements a red-black 459 | tree, which, like the AVL tree, is a balanced binary search tree with O(log n) worst case 460 | insert/delete/search times. 461 | 462 |
Export your entire Eclipse project as a zip archive into a file called 467 | friends.zip, 468 | and upload this file to Sakai. (See the Eclipse page for how to do this.) 469 | 470 |
If you submit an 471 | incorrect/incomplete project, you will lose credit. So before you submit, 472 | you should check that all is well by importing your friends.zip file as a project 473 | into Eclipse (in a different workspace, so it doesn't conflict with your 474 | existing project), and running it. 475 | 476 |
ONLY ONE PERSON per team should submit the assignment. 477 | 478 | 479 |
--------------------------------------------------------------------------------