├── LICENSE.md
├── README.md
└── docs
├── 1.png
├── 10.png
├── 11.jpg
├── 12.jpg
├── 13.jpg
├── 14.jpg
├── 15.png
├── 16.jpg
├── 17.jpg
├── 18.jpg
├── 19.jpg
├── 2.jpg
├── 20.jpg
├── 21.jpg
├── 22.jpg
├── 23.png
├── 24.jpg
├── 25.jpg
├── 26.jpg
├── 27.jpg
├── 28.jpg
├── 29.jpg
├── 3.jpg
├── 30.jpg
├── 31.png
├── 32.png
├── 33.jpg
├── 34.jpg
├── 4.jpg
├── 5.jpg
├── 6.jpg
├── 7.jpg
├── 8.jpg
└── 9.jpg
/LICENSE.md:
--------------------------------------------------------------------------------
1 |
2 | The MIT License (MIT)
3 |
4 | Copyright (c) 2020 Xinyao Niu
5 |
6 | Permission is hereby granted, free of charge, to any person obtaining a copy
7 | of this software and associated documentation files (the "Software"), to deal
8 | in the Software without restriction, including without limitation the rights
9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10 | copies of the Software, and to permit persons to whom the Software is
11 | furnished to do so, subject to the following conditions:
12 |
13 | The above copyright notice and this permission notice shall be included in all
14 | copies or substantial portions of the Software.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22 | SOFTWARE.
23 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # COMP90024-CCC
2 |
3 | ## Week1 - How we got here
4 | 1. What is cloud computing?
5 | - In 2013, Cloud computing is a jargon term without a commonly accepted non-ambiguous scientific or technical definition. (Anything that is not on your computer, e.g.: gmail)
6 | - In 2016, Proponents claim that cloud computing allows companies to avoid upfront infrastructure costs, and focus on projects that differentiate their businesses instead of on infrastructure. Proponents also claim that cloud computing allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and enables IT to more rapidly adjust resources to meet fluctuating and unpredictable business demand. Cloud providers typically use a "pay as you go" model. This can lead to unexpectedly high charges if administrators do not adapt to the cloud pricing model. (Everyone has different flavor)
7 | 2. Cloud Characteristics (Lecture notes and then my paraphrasing)
8 | - On-demand self-service
9 | - A consumer can provision computing capabilities as needed without requiring human interaction with each service provider.
10 | - Scale computing resources up and down by needs without requiring human interaction with each service provider.
11 | - For anyone in any time - infinite availability (key)
12 | - Networked access
13 | - Capabilities are available over the network and access through standard mechanisms that promote use by heterogeneous client platforms.
14 | - Resources can be access through network and adapted to heterogeneous client platforms.
15 | - Resource pooling
16 | - The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model potentially with different physical and virtual resources that can be dynamically assigned and reassigned according to consumer demand.
17 | - Provider’s resources are pooled and can be dynamically assigned and reassigned by need.
18 | - Enough resource to scale up & down
19 | - Rapid Elasticity
20 | - Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly upon demand.
21 | - Capabilities can scale easily and rapidly upon demand.
22 | - Measured Service
23 | - Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service.
24 | - Resourcing optimization by measuring usage
25 | - monitor for load balance (e.g.: nigix)
26 | 3. Flavour
27 | - Compute clouds
28 | - Amazon Elastic compute cloud
29 | - Azure
30 | - Data Clouds
31 | - Amazon Simple Storage Service
32 | - Google docs
33 | - iCloud
34 | - Dropbox
35 | - Application Clouds
36 | - App store
37 | - Virtual image factories
38 | - Public(credit card and pay for using) /Private(Unimelb research cloud)/Hybrid(MRC run out of resource nad buy from Amazon)/Mobile/Health Clouds
39 | - complexity arise in: decision about what can we move out/what cost ot stay in/who is allowed this to happen
40 | 4. History - tends in computing
41 | 1. Computing and Communication Technologies (r)evolution
42 | - from centralised to decentralised
43 | 2. distributed system history
44 | - Once upon a time we had standards
45 | - Then we had more standards
46 | - mid-90s: focused on computer-computer interaction
47 | - internet: peer-to-peer
48 | - challenge: sharing data between different organizations
49 | - soln: grid computing
50 | - Grid: only need access to it no matter it is data or super computer the process to move things
51 | - problem: people have different ways to do it
52 | - Distributed System
53 | - **Transparency** and **heterogeneity** in computer-computer interactions
54 | - Finding resources -> Binding resources -> run time type checking -> invoking resources
55 | - Dealing with heterogeneous of system
56 | - Challenges
57 | - Complexity of implementations
58 | - Vendor specific solutions
59 | - Scalability problem
60 | - Sharing data between different organizations
61 | - Grid Computing
62 | - From computer-computer focus to organisation-organisation focus
63 | - Can be thought of as a distributed system with non-interactive workloads.
64 | - It is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus instead of Ethernet.
65 | - Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed (thus not physically coupled) than cluster computers.
66 | - Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.
67 | - Challenge
68 | - What resources are available
69 | - To determine the status of resources
70 | - Job scheduling
71 | - Virtual organization support
72 | - Security
73 | - Public key infrastructure
74 |
75 | 5. Comparison between Grid/Cluster/Cloud Computing
76 | ```
77 | Clusters "tend" to be tightly coupled, e.g. a bunch of servers in a rack with high speed interconnects - we'll go into some details of this in week 3;
78 | Grid is/was more loosely coupled resources that provided single sign-on access to distributed resources that are often hosted by different organisations;
79 | Cloud = we'll get to that soon! ;o)
80 | ```
81 | - Grid computing
82 | - Refer to the top
83 | - Cluster Computing
84 | - Clusters tend to be tightly coupled, e.g. a bunch of servers in a rack with high speed interconnects
85 | - Example
86 | - Super computer
87 | - Cloud Computing
88 | - Refer to week 5
89 | - Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources(networks, servers, storage, applications, services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
90 |
91 | ### past exam
92 | - > [2013 Q1] A) Explain what is meant by the terms:
93 | - > Grid Computing [1]
94 | - focus on organizational collaboration, coordination, activity and technologies to doing it
95 | - > Cluster Computing [1]
96 | - multiple servers rach-mounted which are accessible and you can run jobs across the cluster
97 | - > Cloud Computing [1]
98 | - is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources(networks, servers, storage, applications, services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
99 | - > [2013 Q1, 2017 Q1 B [5]] B) Current Cloud Computing systems do not solve many key challenges of large-scale distributed systems. Discuss. [7]
100 | - by below
101 | - > [sample Q2 A] Describe some of the current challenges associated with large-scale distributed systems. [4]
102 | - distributed systems didn't solve data heterogeneity. And we have big data challenges.
103 | - distributed systems has scalability and issues of fixed hardware system. We have distributed computers running on different hardware system.
104 | - fault tolerance not solved
105 | - Many diverse faults can happen with distributed systems
106 | - , e.g. server failures or partial failures, network outages, overloading of components etc etc.
107 | - There is no simple solution to this that has been widely adopted/accepted.
108 | - results in software stack
109 | - Each system tends to develop its own technical solution
110 | - , e.g. using queuing or having back-ups/failovers of system for failures.
111 | - This can result in complex software stacks and recipes that have to be cooked to address specific needs/demands.
112 | - (And all these erroneous assumption can't be made at week 2 last)
113 | - The network is reliable
114 | - Latency is zero
115 | - Bandwidth is infinite - I can send any amount of data I wish between any nodes
116 | - The network is secure
117 | - Topology doesn't change - Node x is always there
118 | - There is one administrator
119 | - Transport cost is zero - I can send as much data as I like for free
120 | - The network is homogeneous
121 | - Time is ubiquitous - Clock is same across all computers in network
122 | - > [sample Q2 B] Cloud computing solves some of these issues but not all. Explain. [4]
123 | - scalability and elastic scaling (purchase cloud when you need its service)
124 | - software deployment easier as we have snapshots/scripted deployment
125 | - more tools available, e.g.: load balancers, proven solutions. You might not have this much in distributed system
126 | - data centers better networked. they are targeted to solve your problems
127 | - geospatially distributed and easy to migrate application
128 | - doesn't address many of the above though (但并没有解决上面的很多问题) (bandwidth from user/organization to data center)
129 | - > [2015 Q1] A) Describe some of the erroneous assumptions that are often made in designing large-scale distributed systems. [5]
130 | - above
131 | - > [2014 Q1] A) Discuss the major trends in research and research computing over the last 20 years that have led to the emergence of Cloud computing. [6]
132 | - Mainframes
133 | - main frames to move to the distributed system
134 | - decentralised PCs
135 | - explosion of the Internet
136 | - distributed system move back to the centralised system
137 | - scale of compute/storage
138 | - clouds and data centres
139 |
140 | ## Week2 - Domain Drivers – tour of some big data projects
141 | 1. compute scaling
142 | - method:
143 | 1. Vertical Computational Scaling
144 | - Have faster processors
145 | - disabv: processor speed is limited
146 | Moore's law is no longer working, CPU stop goes faster as we expected
147 | 2. Horizontal Computational Scaling
148 | - Have more processors
149 | - adv:
150 | - 1) Easy to **add more** (more cores or cluster of nodes)
151 | - add more =
152 | - Single machine multiple cores
153 | - Typical laptop/PC/server these days
154 | - Loosely coupled collection/cluster of machines
155 | - Polling/sharing of resources
156 | - Dedicated vs available only when not in use by others
157 | - Tightly coupled cluster of machines
158 | - Typical HPC/HTC set-up (SPARTAN)
159 | - Which many servers in same room, often with fast message passing interconnects
160 | - Widely distributed clusters of machines
161 | - UK NGS, EGEE
162 | - Hybrid combination of the above
163 | - Leads to many challenges with distributed systems
164 | - Shared state
165 | - Delayed and lost in message passing
166 | - 2) cost increase not so much
167 | - disadv:
168 | - 1) **add more** limition (see week3 - Amdahl's law)
169 | - 2) harder to design, develop, test
170 | 2. network scaling
171 | - volume of data on network grows each year
172 | 3. massive amount of data generated among a time requires compute infrasture
173 | - e.g. mapping the sky with data from tele-scope
174 | 5. Cloud Computing in Different Domains
175 | - **High energy physics**
176 | - Astrophysics
177 | - **Macro-micro simulations**
178 | - Electronics
179 | - Arts and humanities
180 | - Life sciences
181 | - Extensive Research Community
182 | - Parkville Precinct for example
183 | - Many people care about them
184 | - Health, Food, Environment – truly interdisciplinary!
185 | - Interacts with virtually every discipline
186 | - Physics, Chemistry, Maths/Stats, Nano-engineering, …
187 | - Thousands of databases relevant to bioinformatics (and growing!)
188 | - Heterogeneity, Interdependence, Complexity, Change, …
189 | - Some of the Big Questions/Challenges
190 | - How does a cell work?
191 | - How does a brain work?
192 | - How does an organism develop?
193 | - Why do people who eat less tend to live longer?
194 | - Social sciences
195 | - Aurin
196 | - Clinical sciences
197 | - Data sharing and ethics
198 | - **e-Health**
199 | - **Security**
200 | - environmental
201 | - social
202 | - geographical
203 | - Genome
204 | - Hierarchical statistical system simulations
205 | - Very large device and circuit simulations
206 | - 3D devices
207 | - 10^5 circuit components
208 | - Large statistical samples
209 | - 1000 - 100000 3D simulations
210 | - 4D 1000 - 100000 circuit simulations
211 | - Complex flow and storage of data
212 | - Many files per simulation
213 | - Metadata capture and data provenance
214 | - Collaboration between 5 partners
215 | - Multidisciplinary background
216 | - Complex data exchange
217 | - Stringent security requirements
218 | - Commercial IP
219 | - Expensive software licenses
220 |
221 | 5. challenges are shaping the technological landscape
222 | - Challenges happen in multiple perspectives in research domains. - Big data - Big compute - Big distribution - Big collaboration - Big security
223 | - Tools, technologies and methodologies have been/can/are evolving to tackle these challenges
224 | - That there is a huge amount of work still to be done
225 | - Domain knowledge is also required
226 |
227 | ## Week3 - Overview of Distributed and Parallel Computing Systems
228 | 1. Question: If n processors (cores) are thrown at a problem how much faster will it go?
229 | - Some terminology:
230 | -
231 | - Proportion of speed up depends on parts of program that cannot be parallelised
232 | 1. Amdahl's law
233 | - assumes a fixed problem size – sometimes can’t predict length of time required for jobs,
234 | - e.g. state space exploration or differential equations that don’t solve
235 | -
236 | - That is, if 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20 times, no matter how many processors are used.
237 | - If the non-parallelisable part takes 1H, then no matter how many cores are used, it won’t complete in < 1H
238 | - Amdahl’s Law greatly simplifies the real world
239 | 2. Gustafson-Barsis's Law
240 | - speedup is a linear formula dependent on the number of processes and the fraction of time to run sequential parts
241 | -
242 | - Faster (more parallel) equipment available, larger problems can be solved in the same time.
243 | 3. comparison
244 | - Amdahl’s Law suggests that with limited task, speed up could not be too fast.
245 | - Gustafson-Barsis’s Law suggests that with enough processors and remaining tasks, speed up will always meet the requirement.
246 | 3. Computer Architecture
247 | - At the simplest level a computer comprises:
248 | - CPU for executing programs
249 | - Memory that stores/executing programs and related data
250 | - I/O systems
251 | - keyboards, networks
252 | - Permanent storage for read/writing data into out of memory
253 | - HPC needs to keep balance of these
254 | - Based on the problem needs to be solved
255 | - There are many different ways to design/architect computers
256 | - different flavours suitable to different problems (below)
257 | | | Simple Instruction | Multiple Instruction |
258 | | ------------- | ------------------ | -------------------- |
259 | | Single Data | SISD | MISD |
260 | | Multiple Data | SIMD | MIMD |
261 | - Single Instruction, Single Data Stream (SISD)
262 | - Sequential computer which exploits no parallelism in either the instruction of data streams
263 | - Single control unit fetches single instruction stream from memory. The CU/CPU then generates appropriate control signals to direct single processing element to operate on single Data Stream, i.e. one operation at a time.
264 | - Example
265 | - von Neumann computer
266 | - Multiple Instruction, Single Data stream (MISD)
267 | - **Parallel** computing architecture where many functional units (PU/CPU) perform different operations on the same data
268 | - Example
269 | - fault tolerant computer architectures: multiple error checking on the same date source
270 | - Single Instruction, Multiple Data Stream (SIMD)
271 | - Multiple processing elements that perform the same operation on multiple data points simultaneously
272 | - Focusing on data level parallelism: many parallel computations, but only a single process (instruction) at a given moment (**Concurrency**)
273 | - Example
274 | - to improve performance of multimedia use such as for image processing
275 | - Multiple Instruction, Multiple Data stream (MIMD)
276 | - Number of processors that function **asynchronously** and independently.
277 | - at any time, different processors may be executing different instructions on different pieces of data
278 | - Machines can be shared memory or distributed memory categories.
279 | - Depends on how MIMD processors access memory
280 | - Example
281 | - HPC
282 | 3. Approaches for Parallelism (Where and how)
283 | - Explicit vs Implicit Parallelisation
284 | - Implicit Parallelism
285 | - **Compiler** is responsible for identifying parallelism and scheduling of calculations and the placement of data
286 | - Disadv: Pretty hard to do
287 | - Explicit Parallelisation
288 | - **Programmer** is responsible for most of the parallelization effort
289 | - Hardware
290 | - Hardware Parallelisation
291 | - **Cache**: much faster than reading/writing to main memory; instruction cache, data cache (multi-level) and translation lookaside buffer used for virtual-physical address translation (more later on Cloud and hypervisors).
292 | - **Add CPU (parallelisation)**: Parallelisation by adding extra CPU to allow more instructions to be processed per cycle. Usually shares arithmetic units.
293 | - Disadv: Heavy use of one type of computation can tie up all the available units of the CPU preventing other threads from using them.
294 | - **Multiple cores**: Multiple cores that can process data and perform computational tasks in parallel.
295 | - Disadv: Typically share same cache, but issue of cache read/write performance and cache coherence.
296 | - Disadv: Possibility of cache stalls (CPU not doing anything whilst waiting for caching)
297 | - To address the issue that CPU not doing anything whilst waiting for caching. Many chips have mixture cache L1 for single core, L2 for pair cores and L3 shared with all cores.
298 | - Disadv: typical to have different cache speeds and cache sizes (higher hit rates but potentially higher latency).
299 | - Symmetric Multiprocessing (SMP)
300 | - Two (or more) identical processors connected to a single, shared main memory, with full access to all I/O devices, controlled by a single OS instance that treats all processors equally. Each processor executes different programs and works on different data but with capability of sharing common resources (memory, I/O device, …). Processors can be connected in a variety of ways: buses, crossbar switches, meshes.
301 | - Disadv: More complex to program since need to program both for CPU and inter-processor communication (bus).
302 | - Non-Uniform Memory Access (NUMA)
303 | - provides speed-up by allowing a processor to access its own local memory faster than non-local memory.
304 | - Disadv: Improved performance as long as data are localized to specific processes/processors.
305 | - Key is allocating memory/processors in NUMA friendly ways,
306 | - e.g. to avoid scheduling/locking and (expensive) inter-processor communication. Approaches such as ccNUMA with range of cache coherency protocols/products.
307 | - Operating System
308 | - parallel vs interleaved semantics
309 | - Most modern multi-core operating systems support different "forms" of parallelisation
310 | - e.g.: A || B vs A ||| B
311 | - Compute parallelism
312 | - Processes
313 | - Used to realize tasks, structure activities
314 | - Theads
315 | - Native threads
316 | - Fork, Spawn, Join
317 | - Green threads
318 | - Scheduled by a VM instead of natively by the OS
319 | - Data parallelism
320 | - Caching
321 | - Software/Applications
322 | - Programming language supports a range of parallelisation/concurrency features
323 | - Threads, thread pools, locks, semaphores ...
324 | - Programming languages developed specifically for parallel/concurrent systems
325 | - Key issues:
326 | - Deadlock
327 | - Processes involved constantly waiting for each other
328 | - LiveLock
329 | - Process constantly change with regard to one another, but none are progressing
330 | - Message Passing Interface (MPI)
331 | - Widely adopted approach for message passing in parallel systems
332 | - Supports point-point, broadcast communications
333 | - Key MPI functions
334 | - ```
335 | MPI_Init :initiate MPI computation
336 | MPI_Finalize :terminate computation
337 | MPI_COMM_SIZE :determine number of processors
338 | MPI_COMM_RANK :determine my process identifier
339 | MPI_SEND :send a message
340 | MPI_RECV :receive a message
341 | ```
342 | - Adv:
343 | - Standardised, widely adopted, portable, performant
344 | - Parallelisation = users problem (user controll how to parallel)
345 | - (HT) Condor
346 | - A specialized workload management system for compute-intensive jobs developed at University of Wisconsin
347 | - Adv:
348 | - Offers job queueing mechanisms, scheduling policies, priority schemes, resource monitoring/management
349 | - User submits jobs to Condor and it chooses when and where to run the jobs, monitors their progress, and informs the user upon completion
350 | - Allows to harvest “free” (?) CPU power from otherwise idle desktop workstations
351 | - e.g. use desktop machines when keyboard and mouse are idle
352 | - key press detected checkpoint and migrate a job to a different (idle) machine
353 | - No need for shared file system across machines
354 | - Data can be staged to machines as/when needed
355 | - Can work across organisational boundaries
356 | - Condor Flocking
357 | - ClassAds
358 | - Advertise resources and accept jobs (according to policy)
359 | - Data Parallelism Approaches (week 9)
360 | - Challenges of big data
361 | - The most important kind of parallelism challenge?
362 | - Distributed data
363 | - CAP Theorem: Consistency, Availability, Partition tolerance
364 | - ACID <-> BASE
365 | - Distributed File Systems
366 | - e.g. Hadoop, Lustre, Ceph…
367 | 4. Erroneous Assumptions of Distributed Systems (detail see slides)
368 | - Challenges with Distribution
369 | - "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable" by Leslie Lamport
370 | - The network is reliable
371 | - Latency is zero
372 | - Bandwidth is infinite - I can send any amount of data I wish between any nodes
373 | - The network is secure
374 | - People sending data to my services
375 | - Repeated password attempts, SQL injections, …!?
376 | - People actively attacking me
377 | - Distributed denial of service attacks
378 | - People reading the data sent over the network
379 | - Man in the middle attacks
380 | - People masquerading as one of my nodes
381 | - Spoofing
382 | - People breaking into one of my nodes
383 | - Trojans, viruses, brute force attacks, …
384 | - People stealing the physical hardware
385 | - Topology doesn't change - Node x is always there
386 | - There is one administrator
387 | - e.g.: Firewall changes, server reconfigurations, services, access control (students/staff/others…)
388 | - Transport cost is zero - I can send as much data as I like for free
389 | - The network is homogeneous
390 | - Time is ubiquitous - Clock is same across all computers in network
391 | - [-- Assumption ends --]
392 | - issues of heterogeneity of compute, data, security from lecture 1
393 | - Distributed systems are widespread - The Internet
394 | - Many approaches to design parallel or distributed systems (below)
395 | - No single algorithm
396 | - No single technical solution
397 | - Eco-system of approaches explored over time and many open research questions/challenges
398 | - Flavour of some of these…
399 | 5. Strategies for Development of Parallel/Distributed Systems
400 | - strategies: (detail see slides)
401 | - Automatic parallelization
402 | - Parallel libraries
403 | - Major recording
404 | - Challenges:
405 | - dependence analysis is hard for code that uses pointers, recursion, …;
406 | - loops can have unknown number of iterations;
407 | - access to global resources, e.g. shared variables
408 | 6. Design Stages of Parallel Programs
409 | - Partitioning
410 | - Decomposition of computational activities and data into smaller tasks
411 | - Numerous Pprallelisation paradigms:
412 | - Master-Worker/task-farming
413 | - Master decomposes the problem into small tasks
414 | - distributes to workers and gathers partial results to produce the result
415 | - Master-worker/task-farming is like divide and conquer with master doing both split and join operation
416 | -
417 | - Divide and Conquer
418 | - 1) A problem is divided into two or more sub problems
419 | - 2) each of these sub problems are solved independently
420 | - 3) their results are combined
421 | - 3 operations: split, compute, and join
422 | - Master-worker/task-farming is like divide and conquer with master doing both split and join operation
423 | -
424 | - Single Program Multiple Data (SPMD)
425 | - Each process executes the same piece of code, but on different parts of the data
426 | - Data is typically split among the available processors
427 | - Data splitting and analysis can be done in many ways
428 | - Commonly exploited model: MapReduce
429 | -
430 | - Pipelining
431 | - Suitable for applications involving multiple stages of execution
432 | - typically operate on large number of data sets.
433 | -
434 | - Speculation
435 | - Used when it is quite difficult to achieve parallelism through the previous paradigms
436 | - use "look ahead" execution
437 | - Like look ahead, if the data is predictable, we could use the predicted data to do the following action while waiting for data.
438 | - If the prediction is incorrect, we have to take corrective action.
439 | - procedure:
440 | - Consider a (long running) producer P and a consumer C such that C depends on P for the value of some variable V. If the value of V is *predictable*, we can execute C speculatively using a predicted value in parallel with P.
441 | - If the prediction turns out to be correct, we gain performance since C doesn’t wait for P anymore.
442 | - If the prediction is incorrect (which we can find out when P completes), we have to take corrective action, cancel C and restart C with the right value of V again.
443 | - Parametric Computation
444 | - not discussed?
445 | - Communication (relates with MPI)
446 | - Flow of information and coordination among tasks that are created in the partition stage
447 | - Agglomeration
448 | - (performance measuring) Tasks and communication created in above stages are evaluated for performance and implementation cost
449 | - Tasks may be grouped into larger tasks to improve communication
450 | - Individual communications can be bundled
451 | - Mapping/Scheduling
452 | - (design to be able to scale up/down) Assign tasks to processors such that job completion time is minimized and resource utilization is maximized
453 |
454 | ### past exam
455 | - > [sample Q5] A) Explain Amdahl's law and discuss the challenges of its practical implementation. [2]
456 | - Program always bound by limitations caused by sequential part.
457 | - no matter how mang cores thrown at problem will be limited to the sequential part of the algorithm.
458 | - Also inlcudes overheads required to deal with parallelism (loops, variables, communications)
459 | - > [2014 Q4] A) Define Gustafson-Barsis’ law for scaled speed-up of parallel programs. [2]
460 | - Gustafson-Barsis’s Law suggests that with enough processors and remaining tasks, speed up will always meet the requirement. Faster (more parallel) equipment available, larger problems can be solved in the same time.
461 | - > [2014 Q4] B) A parallel program takes 128 seconds to run on 32 processors. The total time spent in the sequential part of the program is 12 seconds. What is the scaled speedup? [2]
462 | - S(N) = N - alpha * (N - 1) where N = n processors, alpha = time on sequential / time on parallel
463 | - S(N) = 32 - (12/128) * (32-1) = 931/32 = 29.09375
464 | - > [2014 Q4] C) According to Gustafson-Barsis’ law, how much faster could the application _theoretically_ run if it ran across all 32 processors compared to running on a single processor? [3]
465 | - we know from b/ that it (theoretically) runs 29.09375 times faster using 32 processors compared to running on a single processor.
466 | - If it takes 128 seconds with the 32 processor case then it would (theoretically) take 29.09375*128 = 3724 seconds in the single processor case.
467 | - > [2014 Q4] D) Why is theoretically italicized in the above? [3]
468 | - you are not factoring the overheads dealing with the scalling system. If you have parallel processing, this can carry additional overheads, e.g. loops, communications, variables introduced to deal with parallel aspects. While you don't have this overheads in sequential programs.
469 |
470 | - > [2014 Q3] A) What is Flynn’s Taxonomy? [2]
471 | - | | Simple Instruction | Multiple Instruction |
472 | | ------------- | ------------------ | -------------------- |
473 | | Single Data | SISD | MISD |
474 | | Multiple Data | SIMD | MIMD |
475 |
476 | - > a. What have been the implications of Flynn’s taxonomy on modern computer architectures?
477 | Give examples of its consequences on modern multi-core servers and clusters of servers such as the University of Melbourne Edward HPC facility. [4]
478 | - The HPC uses MIMD so you can have multiple applications running at the same time, reading/writing/processing multiple different types of data but still on the same cluster
479 | - > [2015 Q4] A) Explain the following terms in the context of high performance computing.
480 | - > a. Data parallelization [1]
481 | - problem like you have a large amount of data But you need to process, analysis and aggregrate in a small amount in a parallel way.
482 | - > b. Compute parallelization [1]
483 | - many processes and many threads for process things concurrently
484 | - > [2015 Q4] D) Compute parallelization of an application can be achieved through a variety of paradigms including task farming and single program multiple data. Describe these approaches and explain when they might best be applied. [3]
485 | - Master-Worker/task-farming
486 | - Master decomposes the problem into small tasks
487 | - distributes to workers and gathers partial results to produce the result
488 | - Master-worker/task-farming is like divide and conquer with master doing both split and join operation
489 | - Single Program Multiple Data (SPMD)
490 | - Each process executes the same piece of code, but on different parts of the data
491 | - Data is typically split among the available processors
492 | - Data splitting and analysis can be done in many ways
493 | - Commonly exploited model: MapReduce
494 |
495 |
496 | ## Week4 - The Spartan HPC System
497 | - Some background on supercomputing, high performance computing, parallel computing, research computing (they're not the same thing!).
498 | - Supercomputer
499 | - Any single computer system that has exceptional processing power for its time.
500 | - Clustered computing
501 | - is when two or more computers serve a single resource
502 | - e.g.: A collection of smaller computers strapped together with a high-speed local network
503 | - Adv: improves performance and provides redundancy;
504 | - HPC - high performance computing
505 | - It is any computer system whose architecture allows for above average performance
506 | - The clustered HPC is the most efficient, economical, and scalable method, and for that reason it **dominates supercomputing**.
507 | - Parallel and Research Programming
508 | - Parallel computing refers to the submission of jobs or processes over multiple processors and by splitting up the data or tasks between them
509 | - With a cluster architecture, applications can be more easily parallelised across them.
510 | - Research computing is the software applications used by a research community to aid research.
511 | - challenge: This skills gap is a major problem and must be addressed because as the volume, velocity, and variety of datasets increases then researchers will need to be able to process this data.
512 | 2. Flynn’s Taxonomy and Multicore System
513 | - Over time computing systems have moved towards multi-processor, multi-core, and often multi-threaded and multi-node systems.
514 | - As computing technology has moved increasingly to the MIMD taxonomic classification additional categories have been added:
515 | - Single program, multiple data streams (SPMD)
516 | - Multiple program, multiple data streams (MPMD)
517 | 3. Things are more important than performance
518 | - Correctness of code and signal
519 | - Clarity of code and architecture
520 | - Reliability of code and equipment
521 | - Modularity of code and components
522 | - Readability of code and hardware documentation
523 | - Compatibility of code and hardware
524 | 4. x-windows forwarding
525 | - allows you to start up a remote application (on Spartan) but forward the display to your local machine.
526 | 5. Why Module?
527 | - have the advantages of being shared with many users on a system and easily allowing multiple installations of the same application but with different versions and compilation options. Sometimes users want the latest and greatest of a particular version of an application for the feature-set they offer. In other cases, such as someone who is participating in a research project, a consistent version of an application is desired. In both cases consistency and therefore reproducibility is attained.
528 | - Why performance and scale matters, and why it should matter to you.
529 | - An introduction to Spartan, University of Melbourne's HPC/cloud hybrid system
530 | - Logging in, help, and environment modules.
531 | - Job submission with Slurm workload manager; simple submissions, multicore, multi-node, job arrays, job dependencies, interactive jobs.
532 | - Parallel programming with shared memory and threads (OpenMP) and distributed memory and message passing (OpenMPI)
533 | - Tantalising hints about more advanced material on message passing routines.
534 |
535 | ### past exam
536 | - > [2015 Q4] B) Explain the role of a job scheduler on a high performance computing system like the University of Melbourne Edward cluster. What commands can be used to influence the behavior of the job scheduler in supporting parallel jobs running on single or multiple nodes (servers)? [3]
537 | - you can specify wall time, number of processess, number of threads in slurm scripts
538 | - and job scheduler schedule you job depend on theses
539 | - wall time is a massive influence on this
540 | - If you give a small wall time, the scheduler might schedule faster for you
541 | - > [sample Q5] B) The actual performance as experienced by users of shared-access HPC facilities such as the Edward cluster at the University of Melbourne can vary – where here performance can be considered as the throughput of jobs, i.e. from the time of first job submission to the time of last job completion. Explain why this can happen. [2]
542 | - Stuck in queue
543 | - Overall usage of facility (some nodes can be super busy)
544 | - e.g.: I/O or node load
545 | - not all nodes are identical
546 | - the nature of the application itself
547 | - > [sample Q5] C) Explain how the Edward cluster has been set up to minimize this. [2]
548 | - Stuck in queue: Multiple queues dedicated to certain jobs
549 | - e.g.: Cloud, physical, ...
550 | - Overall usage of facility (some nodes can be super busy): Queueing system to only schedule jobs when resources free
551 | - (avoid starvation/blocking of system by users with large reservation demands for their jobs)
552 | - Modules set uo with main libraries installed
553 | - > [sample Q5] D) Explain what users can do to optimize their throughput (use) of the Edward cluster. [2]
554 | - wall time choices (minimal necessary)
555 | - If too large, then the job might be queued in a longer time it actually needs
556 | - If too small, then the job might be terminated before it finishes
557 | - avoid demanding large scale resource
558 | - Load right modules
559 | - benchmark small data then scale up to appropriate large value
560 | - > [sample Q5] E) Describe some of the challenges with application benchmarking on HPC facilities. [2]
561 | - Stuck in queue for a long time
562 | - Shared facility is not just for you. Thus, can't guarantee runs the same results for same application
563 | - benchmarking apps is hard
564 | - different alogrithm implementation different performance
565 | - use Linpack which is a fixed set of algorithms that doesn't reflect real world apps
566 | - e.g.: Twitter analytics
567 | - > [2014 Q3, 2015 Q4 C [1]] B) What features does the Edward HPC facility offer to allow utilization of multiple servers (nodes)? [2]
568 | - firstly, they exist
569 | - secondly, you can specify your slurm scripts, you can specify the cloud resources you need (nodes/threads/cores). allows you to express these
570 | - > [2014 Q3] C) Why is the accuracy of the wall time estimate important to Edward end users? [2]
571 | - If too large, then the job might be queued in a longer time it actually needs
572 | - If too small, then the job might be terminated before it finishes
573 | - > [2015 Q4] A) Explain the following terms in the context of high performance computing.
574 | - > c. Wall-time [1]
575 | - the time limit when you submit job that you think the job will finish by
576 |
577 | ## Week5 - Cloud Computing & ~~Getting to Grips with the University of Melbourne Research Cloud~~
578 | Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction 可以通过最少的管理工作或服务提供者交互从而可以快速地配置和发布
579 | -
580 | 1. Deployment Models
581 |
582 | || Private| Community| Public| Hybrid |
583 | |---|---|---|---|---|
584 | |pro|1. **Control**
2. **Consolidation of resources**
3. **Easier to secure** - easy to setup firewall
4. **More trust**||1. Utility computing
2. **Can focus on core business** - no need to care infrasture or be a devop
3. Cost-effective - use as much as you need
4. “Right-sizing”
5. Democratisation of computing
|1. **Cloud-bursting** - Use private cloud, but burst into (突然变成) public cloud when needed (What is hybrid cloud) |
585 | |con|1. Relevance to core business?
e.g. Netflix to Amazon
2. Staff/management overheads - need devop
3. Hardware obsolescence - need to refesh hardware
4. Over/under utilisation challenges - recycle resources
| | 1. **Security** - people can see your sensitive data
2. Loss of control
3. **Possible lock-in** - difficult to switch Azure if using AWS
4. Dependency of Cloud provider continued existence
| 1. How do you move data/resources when needed?
2. How to decide (in real time?) what data can go to public cloud?
3. Is the public cloud compliant with PCI-DSS (Payment Card Industry – Data Security Standard)?
|
586 | |example| | | | Eucalyptus, VMWare vCloud Hybrid Service|
587 |
588 | 2. Delivery Models
589 | - responsibilities:
590 | -
591 | -
592 | | | Iaas| Paas|Saas|
593 | |---|---|---|---|
594 | | example| Amazon Web Services
Oracle Public Cloud
NeCTAR| Azure| Gmail|
595 |
596 | ### past exam
597 | - > [2015 Q6] C) Describe some of the challenges in delivering hybrid Clouds? [2]
598 | - How do you move data/resources when needed?
599 | - How to decide (in real time?) what data can go to public cloud?
600 | - Is the public cloud compliant with PCI-DSS (Payment Card Industry – Data Security Standard)?
601 | - > [2015 Q6] B) What are the advantages/disadvantages of public, private and hybrid clouds? [5]
602 | - below
603 | - > [2014 Q2] A) According to Wikipedia “Cloud Computing is a colloquial expression used to describe a variety of different types of computing concepts that involve a large number of computers that are connected through a real-time communication network (typically the Internet). Cloud computing is a jargon term without a commonly accepted non-ambiguous scientific or technical definition”.
604 | - > a. Is this justified? Your answer should cover:
605 | - > i. public, private and hybrid Cloud computing models and their advantages and disadvantages; [4]
606 |
607 | || Private|Public| Hybrid |
608 | |---|---|---|---|
609 | |pro|1. **Control**
2. **Consolidation of resources**
3. **Easier to secure** - easy to setup firewall
4. **More trust**|1. Utility computing
2. **Can focus on core business** - no need to care infrasture or be a devop
3. Cost-effective - use as much as you need
4. “Right-sizing”
5. Democratisation of computing
|1. **Cloud-bursting** - Use private cloud, but burst into (突然变成) public cloud when needed |
610 | |con|1. Relevance to core business?
e.g. Netflix to Amazon
2. Staff/management overheads - need devop
3. Hardware obsolescence - need to refesh hardware
4. Over/under utilisation challenges - recycle resources
| 1. **Security** - people can see your sensitive data
2. Loss of control
3. **Possible lock-in** - difficult to switch Azure if using AWS
4. Dependency of Cloud provider continued existence
| 1. How do you move data/resources when needed?
2. How to decide (in real time?) what data can go to public cloud?
3. Is the public cloud compliant with PCI-DSS (Payment Card Industry – Data Security Standard)?
|
611 | - > ii. the different flavours of “X as a Service (XaaS)” models including their associated advantages and disadvantages. [4]
612 | - IaaS
613 | - adv
614 | - give access to service where we can now deploy our services on top of that using Ansible/Heater
615 | disadv
616 | - you need to spend time to build such services
617 | - PaaS
618 | - adv
619 | - almost everything is organized by professionals, at the same time you have some freedom of action
620 | - disadv
621 | - great dependency on the vendor
622 | - SaaS
623 | - adv
624 | - everything is organized for you by professionals
625 | - disadv
626 | - no freedom, you fully depend on the vendor
627 | - > [2015 Q6] A) Describe the terms Cloud-based IaaS, PaaS and SaaS and give examples for each. [3]
628 | - IaaS
629 | - is a computing infrastructure giving access to service where we can now deploy our services on top of that using Ansible/Heater
630 | - Amazon Web Services
631 | - PaaS
632 | - is a computing infrastructure almost everything is organized by professionals, at the same time you have some freedom of action
633 | - Azure
634 | - SaaS
635 | - is a computing infrastructure everything is organized for you by professionals
636 | - Gmail
637 | - > [sample Q2 C] What are availability zones in NeCTAR and what restrictions do they impose on NeCTAR Cloud-based application developers? [2]
638 | - availability zone: locations of data centers used to provide logical view of cloud
639 | - restriction: can't mount volumes to VMs in remote locations. If you have computer in Melbourne, you can't have your storage somewhere else in a different availability zone and you can't mount that volume.
640 | - > [2017 Q6] b. What are the implications of availability zones with regards to virtual machine instance creation and data volumes offered by NeCTAR? [2]
641 | - The implications of availability zones with data volumes is that Can’t mount volumes to VMs in remote locations.
642 | - Instances (on Nectar) can be created in and availability zone.
643 |
644 |
645 | ## Workshop week5: Auto-Deployment -- Ansible
646 | - Reason for auto-deployment (**comparison**)
647 | - We are easy to forget what software we installed, and what steps we took to configure the system
648 | - Manual process is error-prone, can be non-repeatable
649 | - Snapshots are monolithic
650 | - provide no record of what has changed
651 | - Manual deployment provides no record of what has changed
652 | - Automation is the mechanism used to make servers reach a desirable state.
653 | - Automation provides (**advantages**)
654 | - A record of what you did
655 | - Knowledge about the system in code
656 | - Making the process repeatable
657 | - Making the process programmable
658 | - Infrastructure as Code
659 | - Configuration management (CM) tools
660 | - Configuration management refers to the process of systematically handling changes to a system in a way that it maintains integrity over time.
661 | - Ansible is an automation tool for configuring and managing computers.
662 | - Features about ansible (Pros)
663 | - Easy to learn
664 | - Playbooks in YAML, templates in Jinja2
665 | - Sequential execution
666 | - Minimal requirements
667 | - No need for centralized management servers/daemons
668 | - Single command to install
669 | - Using SSH to connect to target machine
670 | - Idempotent
671 | - Executing N times no different to executing once
672 | - Prevents side-effects from re-running scripts
673 | - Extensible
674 | - Write you own modules
675 | - Rolling updates
676 | - Useful for continuous deployment/zero downtime deployment
677 | - Inventory management
678 | - Dynamic inventory from external data sources
679 | - Execute tasks against host patterns
680 | - Ansible Vault for encryption
681 |
682 | ### past exam
683 | - > [Sample Q1, 2017 Q7 B each [2], 2015 Q7 A [4, 3]] Applications can be deployed across Clouds either through creation and deployment of virtual images (snapshots) or through scripting the installation and configuration of software applications.
684 | - > What are the benefits and drawbacks of these approaches? [3]
685 | - Snapshots
686 | - benefits
687 | - Snapshots are easy, can be created just by clicking buttons on dashboard
688 | - drawbacks
689 | - No history of how the instance built -> no control
690 | - Scripting
691 | - benefits
692 | - Scripting allows to do much more
693 | - start application
694 | - configure application
695 | - deploy application
696 | - upgrade system
697 | - thus, have more controll over the system
698 | - Scripting has complete record of how to build and deploy
699 | - drawbacks
700 | - harder compared to clicking buttons in Snapshots
701 | - > Discuss the mechanisms used to support these approaches. You may refer to specific tools used to support these processes on the NeCTAR Research Cloud. [3]
702 | - openstack API (Nova/Glance/Swift/etc)
703 | - openstack Service (Heat/etc)
704 | - templates the flavor of deployment
705 | - e.g.: specify the version of Ubuntu used
706 | - Ansible scripting allows to automate software deployment including tasks/role
707 | - > Describe the approach that would be taken using Ansible for scripted deployment of SaaS solutions onto the Cloud. [2]
708 | - Create a playbook that contains YAML files. Typical contents include variables, inventories and roles/tasks/templates. Inventories will include the servers/database used for the software etc etc.
709 | - Then say how you would run the script using openrc.sh etc. Note 2 points so massive amounts of detail not needed.
710 |
711 | ## Week 6 – Web Services, ReST Services ~~and Twitter demo~~
712 |
713 | ### SOA
714 |
715 | 1. What's in an Architecture?
716 | - A (system) architecture is just the way different components are distributed on computers,
717 | - and the way in which they interact with each other.
718 | 2. Why Service-oriented Architectures - SOA?
719 | - When an architecture is completely contained within the same machine, components can communicate directly
720 | - e.g. through function calls or object instantiations.
721 | - However, when components are distributed such a direct approach typically cannot be used (e.g. Assignment 2!)
722 | - Therefore, components (more properly, systems) have to interact in more loosely-coupled ways.
723 | - **Services** are often used for this. Typically combinations and commonality of services can be used to form a **Service-oriented Architecture (SoA)**.
724 | 3. SOA goal
725 |
726 | |||
727 | |---|---|
728 | |A set of externally facing services|that a business wants to provide to external collaborators|
729 | |An architectural pattern|based on service providers, one or more brokers, and service requestors based on agreed service descriptions|
730 | |A set of architectural principles, patterns and criteria|that support modularity, encapsulation, loose coupling, separation of concerns, reuse and composability|
731 | |A programming model|complete with standards, tools and technologies that supports development and support of services (note that there can be many flavours of services)|
732 | |A middleware solution|optimized for service assembly, orchestration, monitoring, and management (Can include tools and approaches that combine services together.)
e.g. as workflows. - later on security lecture|
733 |
734 | 4. SOA design principle
735 |
736 | |||exmaple|
737 | |---|---|---|
738 | |Standardized service contract| Services adhere to a communications agreement, as defined collectively by one or more service-description documents.|Use defined twitter API|
739 | |Service loose coupling| Services maintain a relationship that minimizes dependencies and only requires that they maintain an awareness of each other.|
740 | |Service abstraction| Beyond descriptions in the service contract, services hide logic from the outside world.|Twitter decide the API for you to use i.e. the rule how you can see inside through the Twitter, hide things which you have no access to
741 | |Service reusability| Logic is divided into services with the intention of promoting reuse.|
742 | |Service autonomy| Services have control over the logic they encapsulate.|you can have tweets older than 2 weeks than you really can't
743 | |Service statelessness| Services minimize resource consumption by deferring the management of state information when necessary.|
744 | |Service discoverability| Services are supplemented with communicative meta data by which they can be effectively discovered and interpreted.|
745 | |Service composability| Services are effective composition participants, regardless of the size and complexity of the composition.|
746 | |Service granularity| a design consideration to provide optimal scope at the right granular level of the business functionality in a service operation.|
747 | |Service normalization| services are decomposed and/or consolidated to a level that minimizes redundancy, for performance optimization, access, and aggregation.|
748 | |Service optimization| high-quality services that serve specific functions are generally preferable to general purpose low-quality ones.|
749 | |Service relevance| functionality is presented at a level of granularity recognized by the user as a meaningful service.|
750 | |Service encapsulation| many services are consolidated for use under a SOA and their inner workings hidden.|
751 | |Service location transparency| the ability of a service consumer to invoke a service regardless of its actual location in the network.|client only use url to use the service on the web regardless of location of service
752 |
753 | ### Web Services
754 | 1. Web Services & SOA
755 | - Web Services = SOA for the Web
756 | - Both use HTTP, hence can run over the web (although SOAP/WS often run over other protocols as well)
757 | - Web services used to implement SOA
758 | 2. Web Services flavor
759 | - (main focus of the lecture) SOAP-based Web Services
760 | - (main focus of the lecture) ReST-based Web Services
761 | - Both flavours to call services over HTTP
762 | - Geospatial services (WFS, WMS, WPS…)
763 | - Health services (HL7)
764 | - SDMX (Statistical Data Markup eXchange)
765 | - approach the statistical data around the world
766 | 3. SOAP/WS v.s. ReST
767 | SOAP (Simple Object Access Protocol)
768 |
769 | |ReST|SOAP/WS|
770 | |---|---|
771 | is centered around resources, and the way they can be manipulated (added, deleted, etc.) remotely|built upon the Remote Procedure Call paradigm (a language independent function call that spans another system)|
772 | |Actually ReST is more of a style to use HTTP than a separate protocol|while SOAP/WS is a stack of protocols that covers every aspect of using a remote service, from service discovery, to service description, to the actual request/response|
773 | 4. How to describes the functionality offered by a web service?
774 | - WSDL: is an XML-based interface description language that describes the functionality offered by a web service.
775 | - WSDL provides a machine-readable description of how the service can be called, what parameters it expects, and what results/data structures it returns:
776 | - Definition – what it does
777 | - Target Namespace – context for naming things
778 | - Data Types – simple/complex data structures inputs/outputs
779 | - Messages – messages and structures exchanged between client and server
780 | - Port Type - encapsulate input/output messages into one logical operation
781 | - Bindings - bind the operation to the particular port type
782 | - Service - name given to the web service itself
783 |
784 | ### ReST-based Web Services
785 | 1. What is ReST?
786 | Representational State Transfer (ReST) is intended to evoke an image of how a well-designed Web application behaves: a network of web pages (a virtual state-machine), where the user progresses through an application by selecting links (state transitions), resulting in the next page (representing the next state of the application) being transferred to the user and rendered for their use.
787 | 2. How it works?
788 | -
789 | - Client wants to access a service (Amazon) a product and things come back
790 | ```
791 | 1. Clients requests Resource through Identifier (URL)
792 | 2. Server/proxy sends representation of Resource
793 | 3. This puts the client in a certain state.
794 | 4. Representation contains URLs allowing navigation.
795 | 5. Client follows URL to fetch another resource.
796 | 6. This transitions client into yet another state.
797 | 7. Representational State Transfer!
798 | ```
799 | 3. ReST Best Practices (principle)
800 | - Keep your URIs short – and create URIs that don’t change.
801 | - URIs should be opaque identifiers that are meant to be discovered by following hyperlinks, not constructed by the client.
802 | - Use nouns, not verbs in URLs
803 | - Make all HTTP GETs side-effect free. Doing so makes the request "safe".
804 | - Use links in your responses to requests! Doing so connects your response with other data. It enables client applications to be "self-propelled". That is, the response itself contains info about "what's the next step to take". Contrast this to responses that do not contain links. Thus, the decision of "what's the next step to take" must be made out-of-band.
805 | - Minimize the use of query strings.
806 | - For example:
807 | - Prefer: http://www.amazon.com/products/AXFC
808 | - Over: http://www.amazon.com/products?product-id=AXFC
809 | - Use HTTP status codes to convey errors/success
810 | - In general, keep the REST principles in mind.
811 | - In particular:
812 | - Addressability (discussed above about address design)
813 | - Uniform Interface (below)
814 | - Resources and Representations instead of RPC (below Resource section)
815 | - HATEOAS (below)
816 | 4. ReST – Uniform Interface
817 | - Uniform Interface has four more constrains:
818 | - Identification of Resources
819 | - All important resources are identified by one (uniform) resource identifier mechanism (e.g. HTTP URL)
820 | - Manipulation of Resources through representations
821 | - Each resource can have one or more representations. Such as application/xml, application/json, text/html, etc. Clients and servers negotiate to select representation.
822 | - Self-descriptive messages
823 | - Requests and responses contain not only data but additional headers describing how the content should be handled.
824 | - (HTTP GET, HEAD, OPTIONS, PUT, POST, DELETE, CONNECTION, TRACE, PATCH): eneryone knows what these means, no need to google the document (one advantage over SOAP)
825 | - HATEOAS: Hyper Media as the Engine of Application State
826 | - Resource representations contain links to identified resources
827 | - Resources and state can be used by navigating links
828 | - links make interconnected resources navigable
829 | - without navigation, identifying new resources is service-specific
830 | - RESTful applications **navigate** instead of **calling**
831 | - Representations contain information about possible traversals
832 | - application navigates to the next resource depending on link semantics
833 | - navigation can be delegated since all links use identifiers
834 | - Making Resources Navigable
835 | - RPC-oriented systems need to expose the available functions
836 | - functions are essential for interacting with a service
837 | - introspection or interface descriptions make functions discoverable
838 | - ReSTful systems use a Uniform Interface
839 | - no need to learn about functions
840 | - To find resources
841 | - find them by following links from other resources
842 | - learn about them by using URI Templates
843 | - understand them by recognizing representations
844 |
845 | 4. Resource
846 | - is anything that’s important enough to be referenced as a thing in itself.
847 | - e.g.:
848 | ```
849 | If your users might
850 | - want to create a hypertext link to it
851 | - make or refute assertions about it
852 | - retrieve or cache a representation of it
853 | - include all or part of it by reference into another representation
854 | - annotate it
855 | - or perform other operations on it
856 | ...then you should make it a resource.
857 | ```
858 | 3. Resource-Oriented Architecture (ROA)
859 | - what is it?
860 | - is a way of turning a problem into a RESTful web service:
861 | an arrangement of URIs, HTTP, and XML that works like the rest of the Web
862 | - ROA has a style of supporting Restful services that allows folk to interact/navigate their functionality (HATEOS etc). The services still do PUT, POST, GET etc.
863 | - can be used to support the definition and creation of services or service endpoints.
864 | - ROA \& Rest
865 | - ROA has a style of supporting Restful services that allows folk to interact/navigate their functionality (HATEOS etc). The services still do PUT, POST, GET etc.
866 | - ROA v.s. SOA
867 | - similar
868 | - Much of the philosophy behind SOA applies to ROA,
869 | - e.g. services should support abstraction, contract, autonomy etc,
870 | - difference
871 | - ROA is compared to SOA as a different (better) approach that can be used to support the definition and creation of services or service endpoints.
872 | - ROA has advantages
873 | - there is no need to understand what methods mean or deal with complex WSDL etc. You can mix/match service models
874 | - e.g. consider the AURIN architecture with ReST, SOAP and many other service flavours.
875 | - ROA procedure
876 | - ```
877 | 1. Figure out the data set
878 | 2. Split the data set into resources and for each kind of resource
879 | 3. Name the resources with URIs
880 | 4. Expose a subset of the uniform interface
881 | 5. Design the representation(s) accepted from the client
882 | 6. Design the representation(s) served to the client
883 | 7. Integrate this resource into existing resources, using hypermedia links and forms
884 | 8. Consider the typical course of events: what’s supposed to happen? - How would a user interact with it?
885 | 9. Consider error conditions: what might go wrong?
886 | ```
887 | - ROA actions
888 | - | Action | HTTP METHOD |
889 | | :---------------: | :--------------------------------------: |
890 | | Create Resource | PUT to a new URI POST to an existing URI |
891 | | Retrieve Resource | GET |
892 | | Update Resource | POST to an existing URI |
893 | | Delete Resource | DELETE |
894 | - Don't mapping PUT to Update and POST to create
895 | - **PUT** should be used when target resource URL is known by the client.
896 | - **POST** should be used when target resource URL is server generated
897 | 7. HTTP Methods can be
898 | - **Safe**
899 | - Do not change, repeating a call is equivalent to not making a call at all
900 | - GET , OPTION, HEAD
901 | - **Idempotent**
902 | - Effect of repeating a call is equivalent to making a single call; if not can has side-effects
903 | - *PUT*, *DELETE*
904 | - **Neither**
905 | - POST
906 |
907 | ### past exam
908 | - > [2015 Q2] A) Explain the general principles that should underlie the design of Service-Oriented Architectures (SOA). [7]
909 | - |||exmaple|
910 | |---|---|---|
911 | |Standardized service contract| Services adhere to a communications agreement, as defined collectively by one or more service-description documents.|Use defined twitter API|
912 | |Service loose coupling| Services maintain a relationship that minimizes dependencies and only requires that they maintain an awareness of each other.|
913 | |Service abstraction| Beyond descriptions in the service contract, services hide logic from the outside world.|Twitter decide the API for you to use i.e. the rule how you can see inside through the Twitter, hide things which you have no access to
914 | |Service reusability| Logic is divided into services with the intention of promoting reuse.|
915 | |Service autonomy| Services have control over the logic they encapsulate.|you can have tweets older than 2 weeks than you really can't
916 | |Service statelessness| Services minimize resource consumption by deferring the management of state information when necessary.|
917 | |Service discoverability| Services are supplemented with communicative meta data by which they can be effectively discovered and interpreted.|
918 | |Service composability| Services are effective composition participants, regardless of the size and complexity of the composition.|
919 | |Service granularity| a design consideration to provide optimal scope at the right granular level of the business functionality in a service operation.|
920 | |Service normalization| services are decomposed and/or consolidated to a level that minimizes redundancy, for performance optimization, access, and aggregation.|
921 | |Service optimization| high-quality services that serve specific functions are generally preferable to general purpose low-quality ones.|
922 | |Service relevance| functionality is presented at a level of granularity recognized by the user as a meaningful service.|
923 | |Service encapsulation| many services are consolidated for use under a SOA and their inner workings hidden.|
924 | |Service location transparency| the ability of a service consumer to invoke a service regardless of its actual location in the network.|client only use url to use the service on the web regardless of location of service
925 | - > [2015 Q2] B) Explain why and how Cloud infrastructures have benefited from SOA. [3]
926 | - standardized interfaces available tn enable you not worry how the cloud internal do tasks for external interactions
927 | - When an architecture is completely contained within the same machine, components can communicate directly
928 | - e.g. through function calls or object instantiations.
929 | - However, when components are distributed such a direct approach typically cannot be used (e.g. Assignment 2!)
930 | - Therefore, components (more properly, systems) have to interact in more loosely-coupled ways.
931 | - **Services** are often used for this. Typically combinations and commonality of services can be used to form a **Service-oriented Architecture (SoA)**.
932 | - > [2014 Q1] B) How has the evolution of service-oriented architectures supported Cloud computing? [2]
933 | - SOA has Uniform interface, abstraction, standard contract etc etc avoid all Clouds building their own bespoke solutions (from the forum, below from recording)
934 | - you have standardized interface for the service-oriented architectures
935 | - it offers the autonomy
936 | - you are providing interface that people/software can interact with
937 | - Where it is helping cloud computing is every single cloud provider at the builder of the interface using different technlogies i.e. you have to learn the programming language to do that and this can be a major bottleneck. Adpoting SOA like ReST can help to solve this problem. We have apis provided by openstack where you can interact with the cloud with the help of set of libraries for doing that.
938 | - > [2013 Q4] A) Compare and contrast Representational State Transfer (ReST) based web services and Simple Object Access Protocol (SOAP)-based web services for implementing service-oriented architectures. [8]
939 | - They are different flavors of web services
940 | - complexity of SOAP
941 | - have namespace and standardization around us to do with the operation names of parameters
942 | - using XML which is bloated (臃肿的) and not easy to use
943 | - SOAP uses a stack of protocols that covers every aspect of using a remote service, from service discovery, to service description, to the actual request/response. While ReST uses HTTP than a separate protocol
944 | - SOAP uses WSDL which is an XML-based interface description language that describes the functionality offered by a web service. WSDL provides a machine-readable description of how the service can be called, what parameters it expects, and what results/data structures are.
945 | - While ReST doesn't deal with complex WSDL. You can mix/match service models
946 | - While ReST has no need to understand what methods mean. There is a very small subset of methods that are available in operation where you can do PUT, POST, GET, etc. This very limited vocabulary provide many advantages:
947 | - simplify understanding from developers who are implementing system to make client to interact with
948 | - too much standards compared to ReST
949 | - SOAP is built upon the Remote Procedure Call paradigm (a language independent function call that spans another system) while ReST is centered around resources, and the way they can be manipulated (added, deleted, etc.) remotely
950 | - > [2015 Q3] A) _SOAP is dead; ReST is the future!_ Explain this statement with regards to Representational State Transfer (ReST) based web services compared to Simple Object Access Protocol (SOAP)-based web services for implementing service-oriented architectures. [5]
951 | - above
952 | - > [2016 Q4] A) Representational State Transfer (ReST) based web services are often used for creating Resourceoriented Architectures (ROA) whilst Simple Object Access Protocol (SOAP)-based web services are often used to implement Service-oriented Architectures (SOA). Discuss the similarities and differences between a ROA and a SOA. [3]
953 | - similar
954 | - Much of the philosophy behind SOA applies to ROA,
955 | - Standardized service contract
956 | - Services adhere to a communications agreement, as defined collectively by one or more service-description documents.|Use defined twitter API|
957 | - Service abstraction
958 | - Beyond descriptions in the service contract, services hide logic from the outside world.|Twitter decide the API for you to use i.e. the rule how you can see inside through the Twitter, hide things which you have no access to
959 | - Service autonomy
960 | - Services have control over the logic they encapsulate.|you can have tweets older than 2 weeks than you really can't
961 | - both uses HTTP for the communication between client and the resource
962 | - difference
963 | - ROA is compared to SOA as a different (better) approach that can be used to support the definition and creation of services or service endpoints.
964 | - ROA has no need to understand what methods mean or deal with complex WSDL.
965 | - > [2013 Q4] B) Explain the differences between ReST-based PUT and POST methods and explain when one should be used over another. [2]
966 | - PUT is used to create resource
967 | - POST is used to update resource
968 | - **PUT** should be used when target resource URL is known by the client.
969 | - **POST** should be used when target resource URL is server generated
970 | - > [2014 Q1] C) A HTTP method can be _idempotent_
971 | - > What is meant by this italicized term? [1]
972 | - Effect of repeating a call is equivalent to making a single call; if not can has side-effects
973 | - > Give an example of an idempotent ReST method. [1]
974 | - PUT
975 | - > [2015 Q3] B) HTTP methods can be safe or idempotent.
976 | - > a. What is meant by a safe HTTP method? [1]
977 | - Do not change, repeating a call is equivalent to not making a call at all
978 | - > b. Give an example of a safe HTTP method. [1]
979 | - GET
980 | - > c. What is meant by an idempotent HTTP method? [1]
981 | - Effect of repeating a call is equivalent to making a single call; if not can has side-effects
982 | - > d. Give an example of an idempotent HTTP method. [1]
983 | - PUT
984 | - > e. Give an example of a HTTP method that is neither safe nor idempotent? [1]
985 | - POST
986 |
987 | ## Week 7 – Big Data and CouchDB
988 | ### "Big data" challenges and architectures
989 | #### challenge
990 | 1. four "Vs"
991 |
992 | |Big data challenges||
993 | |---|---|
994 | |Volume| No one really knows how much new data is being generated, but the amount of information being collected is huge.
995 | |Velocity| **the frequency (that data arrive)** at which new data is being brought into the system and analytics performed
996 | |Variety| **the variability and complexity** of data schema. The more complex the data schema(s) you have, the higher the probability of them changing along the way, adding more complexity.
997 | |Veracity| **the level of trust** in the data accuracy (provenance); the more diverse sources you have, the more unstructured they are, the less veracity you have.
998 |
999 | 2. Big Data Calls for Ad hoc Solutions
1000 | - While Relational DBMSs are extremely good at ensuring consistency, they rely on normalized data models that, in a world of big data (think about Veracity and Variety) can no longer be taken for granted.
1001 | - Therefore, it makes sense to use DBMSs that are built upon data models that are not relational (relational model: tables and relationships amongst tables).
1002 | - While there is nothing preventing SQL to be used in distributed environments, alternative query languages have been used for distributed databases, hence they are sometimes called NoSQL DBMSs
1003 |
1004 | 3. DBMSs for Distributed Environments
1005 | - |||e.g.|
1006 | |---|---|---|
1007 | |key-value store|is a DBMS that allows the retrieval of a chunk of data given a key: **fast, but crude**|(e.g. Redis, PostgreSQL Hstore, Berkeley DB)
1008 | |BigTable DBMS store|data in columns grouped into column families, with rows potentially containing different columns of the same family for **quick retrive**|(e.g. Apache Cassandra, Apache Accumulo)
1009 | |Document-oriented DBMS store|- data as structured documents, usually expressed as XML or JSON
- Document-oriented databases are one of the main categories of NoSQL databases|(e.g. Apache CouchDB, MongoDB)
1010 | |NoSQL DBMSs|While there is nothing preventing SQL to be used in distributed environments, alternative query languages have been used for distributed databases, hence they are sometimes called NoSQL DBMSs|
1011 | - Why Document-oriented DBMS for Big data?
1012 | - While Relational DBMSs are extremely good for ensuring consistency and availability, the normalization that lies at the heart of a relational database model implies fine-grained data, which are less conducive to partition-tolerance than coarse-grained data.
1013 | - Example:
1014 | - A typical contact database in a relational data model may include: a person table, a telephone table, an email table and an address table, all relate to each other.
1015 | - The same database in a document-oriented database would entail one document type only, with telephones numbers, email addresses, etc., nested as arrays in the same document.
1016 | - While Relational DBMSs are extremely good at ensuring consistency, they rely on normalized data models that, in a world of big data (think about Veracity and Variety) can no longer be taken for granted.
1017 | - Therefore, it makes sense to use DBMSs that are built upon data models that are not relational (relational model: tables and relationships amongst tables).
1018 | - While there is nothing preventing SQL to be used in distributed environments, alternative query languages have been used for distributed databases, hence they are sometimes called NoSQL DBMSs
1019 | - Relational database finds it challenging to handle such huge data volumes. To address this, RDBMS added more central processing units (or CPUs) or more memory to the database management system to scale up vertically
1020 | - The majority of the data comes in a semi-structured or unstructured format from social media, audio, video, texts, and emails.
1021 | - Big data is generated at a very high velocity. RDBMS lacks in high velocity because it’s designed for steady data retention rather than rapid growth
1022 |
1023 | 4. Brewer’s CAP Theorem
1024 | - Consistency, Availability, Partition-Tolerance
1025 |
1026 | |||
1027 | |---|---|
1028 | |Consistency|every client receiving an answer receives **the same answer** from all nodes in the cluster (it doesn't depend on which node is quired)
1029 | |Availability|every client receives **an answer** from any node in the cluster (which might differ from node to node)
1030 | |Partition-Tolerance|the cluster **keeps on operating** when one or more nodes cannot communicate with the rest of the cluster
1031 | - Brewer’s CAP Theorem: you can only pick any two of Consistency, Availability and Partition-Tolerance.
1032 | -
1033 | - The CAP theorem forces us to consider trade-offs among different options
1034 | - (not quite) While the theorem shows all three qualities are symmetrical, Consistency and Availability are at odds when a Partition happens (虽然定理表明这三个性质都是对称的,但是当一个分区发生时,一致性和可用性是不一致的)
1035 | - “Hard” network partitions may be rare, but “soft” ones are not (a slow node may be considered dead even if it is not); ultimately, every partition is detected by a timeout
1036 | - Can have consequences that impact the cluster as a whole
1037 | - e.g. a distributed join is only complete when all sub-queries return
1038 | - Traditional DBMS architectures were not concerned with network partitions, since all data were supposed to be in a small, co-located cluster of servers
1039 | - Consequence:
1040 | - The emphasis on numerous commodity servers, can result in an increased number of hardware failures
1041 |
1042 | 5. CAP Theorem and the Classification of Distributed Processing Algorithms
1043 | -
1044 |
1045 | 1. Two phase commit: Consistency and Availability
1046 | - This is the usual algorithm used in relational DBMS's (and MongoDB)
1047 |
1048 | ||what does it entail?|by|
1049 | |---|---|---|
1050 | |enforces consistency|every database is in a consistent state, and all are left in the same state|1. locking data that are within the transaction scope
2. performing transactions on write-ahead logs
3. completing transactions (commit) only when all nodes in the cluster have performed the transaction
4. aborts transactions (rollback) when a partition is detected
1051 | |reduced availability|data lock, stop in case of partition
1052 | - Conclusion
1053 | - Therefore, two-phase commit is a good solution when the cluster is co-located, less good when it is distributed
1054 | 2. Paxos: Consistency and Partition-Tolerance
1055 | - This family of algorithms is driven by consensus, and is both partition-tolerant and consistent
1056 | - In Paxos, every node is either a proposer or an accepter:
1057 | - a proposer proposes a value (with a timestamp)
1058 | - an accepter can accept or refuse it (e.g. if the accepter receives a more recent value)
1059 | - When a proposer has received a sufficient number of acceptances (a quorum is reached), and a confirmation message is sent to the accepters with the agreed value
1060 | - Conclusion
1061 | - Paxos clusters can recover from partitions and maintain consistency, but the smaller part of a partition (the part that is not in the quorum) will not send responses, hence the availability is compromised
1062 | 3. Multi-Version Concurrency Control (MVCC): Availability and Partition-tolerance
1063 | - MVCC is
1064 | - **a method to ensure availability** (every node in a cluster always accepts requests) and
1065 | - **some sort of recovery from a partition** by reconciling the single databases with revisions (data are not replaced, they are just given a new revision number)
1066 | - In MVCC, **concurrent updates are possible without distributed locks** (in optimistic locking only the local copy of the object is locked), since the updates will have different revision numbers;
1067 | - the transaction that completes last will get a higher revision number, hence will be considered as the current value.
1068 | - In case of cluster partition and concurrent requests with the same revision number going to two partitioned nodes, both are accepted, but once the partition is solved, there would be a conflict.
1069 | - Conflict that would have to be solved somehow (CouchDB returns a list of all current conflicts, which are then left to be solved by the application).
1070 | - To achieve consistency, Bitcoin uses a form of MVCC based on proof-of-work (which is a proxy for the computing power used in a transaction) and on repeated confirmations by a majority of nodes of a history of transactions.
1071 |
1072 | #### Architecture
1073 | 1. Sharding
1074 | - What is it?
1075 | - Sharding is the partitioning of a database “horizontally”, i.e. the database rows (or documents) are partitioned into subsets that are stored on different
1076 | servers.
1077 | - shard: Every subset of rows
1078 | - Number of shards
1079 | - Larger than the number of replica
1080 | - the number of shards = the max number of nodes (lest a node contains the same shard file twice)
1081 | - Number of nodes
1082 | - larger than the number of replicas (usually set to 3)
1083 | - The max number of nodes = the number of shards (lest a node contains the same shard file twice)
1084 | - The main advantage of a sharded database lies in the
1085 | - improve performance through the distribution of computing load across nodes.
1086 | - i.e.: better distribution of data
1087 | - makes it easier to move data files around,
1088 | - e.g. when adding new nodes to the cluster
1089 | - sharding strategies:
1090 | |||
1091 | |---|---|
1092 | |Hash sharding|to distribute rows evenly across the cluster|
1093 | |Range sharding|similar rows (say, tweets coming for the same area) that are stored on the same node|
1094 | 6. Replication and Sharding
1095 | - What is replication?
1096 | - Replication is the action of storing the same row (or document) on different nodes to make the database fault-tolerant.
1097 | - (adv) Replication and sharding can be combined with the objective of maximizing availability while maintaining a minimum level of data safety.
1098 | - A bit of nomenclature:
1099 | - n is the number of replicas (how many times the same data item is repeated across the cluster)
1100 | - q is the number of shards (how many files a database is split)
1101 | - n * q is the total number of shard files distributed in the different nodes of the cluster
1102 | -
1103 |
1104 | - There are 16 shards since the three node clustered database has n=2 replicas and q=8 shards.
1105 | 7. Partitions
1106 | - What is it?
1107 | - A partition is a grouping of logically related rows in the same shard
1108 | - e.g.: all the tweets of the same user
1109 | - Advantage:
1110 | - Partitioning improves performance by restricting queries to a single shard
1111 | - To be effective, partitions have to be relatively small (certainly smaller than a shard)
1112 | - A database has to be declared “partitioned” during its creation
1113 | - Partitions are a new feature of CouchDB 3.x
1114 | 8. MapReduce Algorithms
1115 | - What is it?
1116 | - This family of algorithms is particularly suited to parallel computing of the Single-Instruction, Multiple-Data type (SIMD) (see Flynn's taxonomy in a previous lecture)
1117 | - Advantage:
1118 | - parallelism
1119 | - greatly reducing network traffic by moving the process to where data are
1120 | - Procedure:
1121 | 1. Map: distributes data across machines, while
1122 | 2. Reduce: hierarchically summarizes them until the result is obtained.
1123 |
1124 | function map(name, document):
1125 | for each word w in document:
1126 | emit (w, 1)
1127 | function reduce(word, partialCounts):
1128 | sum = 0
1129 | for each pc in partialCounts:
1130 | sum += pc
1131 | emit (word, sum)
1132 | 1. Clustered database architecture
1133 | - Distributed databases are run over “clusters”, that is, sets of connected computers
1134 | - Clusters are needed to:
1135 | - Distribute the computing load over multiple computers, e.g. to improve availability
1136 | - Storing multiple copies of data, e.g. to achieve redundancy
1137 | - Consider two document-oriented DBMSs (CouchDB and MongoDB) and their typical cluster architectures
1138 | 2. CouchDB Cluster Architecture
1139 | -
1140 | - In this example there are 3 nodes, 4 shards and a replica number of 2
1141 | - replica: copy of data
1142 | - All nodes answer requests (read or write) at the same time
1143 | - no master
1144 | - Sharding (splitting of data across nodes) is done on every node
1145 | - if a read request to shard A to node 1, then node 1 answer it
1146 | - if a read request to shard A to node 2, then redirect it to node 1 or node 3 answer it
1147 | - How Shards Look in CouchDB see lecture 7 slide 23
1148 | - and section "Replication and Sharding" below
1149 | ```
1150 | This is the content of the data/shards directory on a node of a three-node cluster
1151 | The test database has q=8, n=2, hence 16 shards files
1152 | The *.couch files are the actual files where data are stored
1153 | The sub-directories are named after the document _ids ranges
1154 | ```
1155 | - When a node does not contain a document (say, a document of Shard A is requested to Node 2), the node requests it from another node (say, Node 1) and returns it to the client
1156 | - Scalability: Nodes can be added/removed easily, and their shards are re-balanced automatically upon addition/deletion of nodes
1157 | - Quorums
1158 | - Write
1159 | - Can only complete successfully if the document is committed to a quorum of replicas, usually a simple majority
1160 | - Read
1161 | - Can only complete successfully only if a quorum of replicas return matching documents
1162 | 3. MongoDB Cluster Architecture
1163 | -
1164 | - Sharding (splitting of data) is done at the replica set level
1165 | - i.e.: it involves more than one cluster (a shard is on top of a replica set)
1166 | - write requests can be answered only by the primary node in a replica set
1167 | - read requests can be answered by every node (including secondary nodes) in the set
1168 | - depend on the specifics of the configuration
1169 | - Updates flow only from the primary to the secondary
1170 | - If a primary node fails, or discovers it is connected to a minority of nodes, a secondary of the same replica set is elected as the primary
1171 | - Data are balanced across replica sets
1172 | - Arbiters (MongoDB instances without data) can assist in breaking a tie in elections.
1173 | - Since a quorum has to be reached, it is better to have an odd number of voting members (the arbiter in this diagram is only illustrative)
1174 | 4. MongoDB vs CouchDB Clusters
1175 |
1176 | | | MongoDB | CouchDB |
1177 | | -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
1178 | | complexity | Higher | Lower |
1179 | | Availability | Lower | Higher |
1180 | | Accessibility | MongoDB software routers must be embedded in application servers | Can connected by any HTTP client |
1181 | | Data Integrity | Lossing two nodes in the MongoDB in this example implies losing write access to half the data, and possibly read access too, depending on the cluster configuration parameters and the nature of the lost node (primary or secondary) | Losing two nodes out of three in the CouchDB example implies losing access to between 1/4 and 1/2 the data, depending on the nodes that fail|in the MongoDB example implies **losing write access |
1182 | | Functionality | Some features, such as unique indexes, are not supported in MongoDB sharded environments | Can support this |
1183 | | CAP | **Two-phase commit** for replicating data from primary to secondary.
**Paxos-like** to elect a primary node in a replica-set. | MVCC |
1184 |
1185 | ||CouchDB|MongoDB|
1186 | |---|---|---|
1187 | |clusters are (difference in API)|less complex|more complex
1188 | |clusters are|more available|less available, as - by default - only primary nodes can talk to clients for read operations, (and exclusively so for write operations)
1189 | |software routers|while any HTTP client can connect to CouchDB|(MongoS) must be embedded in application servers
1190 | |Losing two nodes|out of three in the CouchDB architecture shown, means **losing access to between 1/4 and 1/2 the data, depending on the nodes that fail**|in the MongoDB example implies **losing write access** to half the data (although there are ten nodes in the cluster instead of three), and possibly read access too, depending on the cluster configuration parameters and the nature (primary or secondary) of the lost nodes|
1191 | |Some features (such as unique indexes)||not supported in MongoDB sharded environmen
1192 | |Classification of Distributed Processing Algorithms|uses MVCC|- uses a mix of two-phase commit (for replicating data from primary to secondary nodes)
- Paxos-like (to elect a primary node in a replica-set)|
1193 | |emphasis on and all have partition-tolerance|Availability|consistency|
1194 | - These differences are rooted in different approaches to an unsolvable problem, a problem defined by Brewer’s CAP Theorem
1195 | - first 5
1196 | - The different choices of strategies explains the different cluster architectures of these two DBMSs
1197 | - last 2
1198 |
1199 |
1200 | ### Introduction to CouchDB
1201 | - Part 2: Introduction to CouchDB, recording 07:: 01:15:16
1202 |
1203 | ### past exam
1204 | - > [2016 Q3] A) Big data is often associated with data having a range of properties including high volume, high velocity and high variety (heterogeneity).
1205 | Discuss the advantages, disadvantages and suitability more generally of the following data solutions with regards to these big data properties:
1206 | Your answer should include the way in which these solutions implement MapReduce.
1207 | - > a. CouchDB [3]
1208 | - document-oriented approach (less fine grained schema which is typically needed for many big data scenarios)
1209 | - is a document oriented database which helps to solve the data variaty challenge
1210 | - supports MVCC for availability and partition tolerance
1211 | - support for MapReduce for data analytics
1212 | - the use of mapreduce in CouchDB parallelize data processing from huge amout in a small amount way which can helps to solve the high volume challenge
1213 | - may not be suitable for all big data scenarios (where consistency needed)
1214 | - MapReduce not as rich analytics as others
1215 | - e.g. running machine learning algorithms hence we have Spark
1216 | - supports unique index which helps to improve storage space when there is data duplication for high volume challenge
1217 | - > b. Apache Hadoop Distributed File System (HDFS) [3]
1218 | - Apache Hadoop started as a way to distribute files over a cluster and execute MapReduce tasks
1219 | - HDFS distributed file system so suited for high velocity data (not single server bottleneck)
1220 | - MapReduce for big data processing and increased block size so suited to larger data
1221 | - variety needs programming to tackle
1222 | - > c. Apache Spark [3]
1223 | - Spark was designed to reduce the latency inherent in the Hadoop approach for the execution of MapReduce job
1224 | - Spark supports large in memory analysis
1225 | - richer data processing capabilities (plug-ins)
1226 | - typically used with HDFS to benefit from above two
1227 | - maybe also mention RDDs etc
1228 | - > What other data properties can be associated with big data challenges? [1]
1229 | - Veracity: the level of trust in the data accuracy (provenance); the more diverse sources you have, the more unstructured they are, the less veracity you have.
1230 | - > [2013 Q7] A) Many research domains are facing "big data" challenges. Big data is not just related to the size of the data sets. Explain. [5]
1231 | - |Big data challenges||
1232 | |---|---|
1233 | |Volume| No one really knows how much new data is being generated, but the amount of information being collected is huge.
1234 | |Velocity| **the frequency (that data arrive)** at which new data is being brought into the system and analytics performed
1235 | |Variety| **the variability and complexity** of data schema. The more complex the data schema(s) you have, the higher the probability of them changing along the way, adding more complexity.
1236 | |Veracity| **the level of trust** in the data accuracy (provenance); the more diverse sources you have, the more unstructured they are, the less veracity you have.
1237 | - For life science, there can be all kinds of flavours of datasets and it's not as simple as integrating them all as there can be huge heterogeneous across data sets.
1238 | - > [2013 Q7] B) What capabilities are currently offered or will be required for Cloud Computing infrastructures such as the NeCTAR Research Cloud to tackle these "big data" challenges. [5]
1239 | You may refer to specific research disciplines, e.g. life sciences, astrophysics, urban research (or others!) in your answer to part A) and B) of this question.
1240 | - You can have access to CouchDB service on the cloud but this does not help to the high velocity problems or sensitive data problems or it doesn't solve other challenges in the big data space. These problems should be solved by engineers and shouldn't be expected to be solved by the NeCTAR.
1241 | - For life science, there is no cancer database. You have to build your own by using the infrastructure provided by the NeCTAR.
1242 | - No fine-grained security service, you have to build it on the cloud.
1243 | - For life science, there can be all kinds of flavours of datasets and you need to provide a service to integrate them and make the result accessible.
1244 | - For life science, there are thousands of databases relevant to bioinformatics and growing! i.e. we know there can be lots of data but we don't know exact how much. So we should make the NeCTAR be scalable to the volumes of the data to be stored.
1245 |
1246 | - > [2013 Q3] A) Explain the consequences of Brewer's CAP theorem on distributed databases. [4]
1247 | - Brewer’s CAP Theorem: you can only pick any two of Consistency, Availability and Partition-Tolerance.
1248 | - Two phase commit can achieve Consistency and Availability
1249 | - Paxos can achieve Consistency and Partition-Tolerance
1250 | - Multi-Version Concurrency Control (MVCC) can achieve Availability and Partition-tolerance
1251 | - > [2013 Q3] B) Describe which aspects of the CAP theorem are supported by the following database technologies:
1252 | - > non-SQL (unstructured) databases such as CouchDB. [2]
1253 | - CouchDB uses MVCC to support Availability and Partition-tolerance
1254 | - > relational databases such as PostGreSQL. [2]
1255 | - Relational DBMSs are extremely good for ensuring consistency and availability
1256 | - > Describe the advantages of MapReduce compared to other more traditional data processing approaches. [2]
1257 | - You can map to multiple different servers and you can reductions on all data and it can scale.
1258 | - parallelism
1259 | - greatly reducing network traffic by moving the process to where data are
1260 | - > [sample Q4] A) In the context of distributed databases, explain the concepts of:
1261 | - > Consistency [1]
1262 | - every client receiving an answer receives **the same answer** from all nodes in the cluster (it doesn't depend on which node is quired)
1263 | - > Availability [1]
1264 | - every client receives **an answer** from any node in the cluster (which might differ from node to node)
1265 | - > [sample Q4] B) Give an example of a database technology that supports Availability in the presence of a (network) partition. [1]
1266 | - Multi-Version Concurrency Control (MVCC)
1267 | - > [sample Q4] C) In the context of CouchDB clusters what is the meaning of:
1268 | - > Replica number [1]
1269 | - Number of copies of the same shard kept in the cluster
1270 | - > Number of shards [1]
1271 | - Number of horizontal partitions of the cluster
1272 | - > Read quorum [1]
1273 | - Minumum number of nodes that have to give the same result to a read operation for it to be declared valid and sent back to the client
1274 | - > Write quorum [1]
1275 | - Minumum number of nodes that have to occur for a write operation for it to be accepted
1276 | - > [2014 Q5] A) Discuss the advantages and disadvantages of unstructured (noSQL) databases such as CouchDB for dealing with “big data” compared to more traditional databases, e.g. relational databases such as MySQL.
1277 | Your answer should cover challenges with data distribution, traditional database ACID properties, heterogeneity of data and large-scale data processing. [6]
1278 | - In traditional database, we have things like schema, keys tables. While we don't have these in noSQL database which is more flexible.
1279 | - In traditional database, we have to write queries. While we don't to do these in noSQL database
1280 | - Because we have heterogenous data, we would like to save them as document in the noSQL database which is not supported by traditional database
1281 | - The majority of the data comes in a semi-structured or unstructured format from social media, audio, video, texts, and emails.
1282 | - Because noSQL database support mapreduce, we can run mapreduce across many many services which full-scale the processing opportunities for large scale analytics on multiple servers and processing them
1283 | - For ACID, in mySQL transaction needs to be complete while this not a problem with distributed database which shard across multiple servers and everyone of returns with the same answer which can causes overhead on the limitation to do that. CouchDB's nodes can fail but you can still get results.
1284 | - While Relational DBMSs are extremely good for ensuring consistency and availability, the normalization that lies at the heart of a relational database model implies fine-grained data, which are less conducive to partition-tolerance than coarse-grained data.
1285 | - Example:
1286 | - A typical contact database in a relational data model may include: a person table, a telephone table, an email table and an address table, all relate to each other.
1287 | - The same database in a document-oriented database would entail one document type only, with telephones numbers, email addresses, etc., nested as arrays in the same document.
1288 | - While Relational DBMSs are extremely good at ensuring consistency, they rely on normalized data models that, in a world of big data (Veracity and Variety) can no longer be taken for granted.
1289 | - Therefore, it makes sense to use DBMSs that are built upon data models that are not relational (relational model: tables and relationships amongst tables).
1290 | - Relational database finds it challenging to handle such huge data volumes. To address this, RDBMS added more central processing units (or CPUs) or more memory to the database management system to scale up vertically
1291 | - Big data is generated at a very high velocity. RDBMS lacks in high velocity because it’s designed for steady data retention rather than rapid growth
1292 |
1293 | ## Workshop week6: Containerization and docker
1294 | ### Virtualization vs Containerization
1295 | - Virtualization
1296 | - Pros
1297 | - Application containment
1298 | - Horizontal scalability
1299 | - Cons
1300 | - The guest OS and binaries can give rise to duplications between VMs wasting server processors, memory and disk space and limiting the number of VMs each server can support -> virtualization overhead
1301 | - Containerization
1302 | - Pros
1303 | - It allows virtual instances to share a single host OS (and associated drivers, binaries, libraries) to reduce these wasted resources since each container only holds the application and related binaries. The rest are shared among the containers.
1304 | - |Parameter | Virtual Machines | Container |
1305 | | ------------- | ------------------------------------------------------------ | --------------------------------------------- |
1306 | | Guest OS | Run on virtual Hardware, have their own OS kernels | Share same OS kernel |
1307 | | Communication | Through Ethernet devices | IPC mechanisms (pipes, sockets) |
1308 | | Security | Depends on the Hypervisor | Requires close scrutiny |
1309 | | Performance | Small overhead incurs when instructions are translated from guest to host OS | Near native performance |
1310 | | Isolation | File systems and libraries are not shared between guest and host OS | File systems can be shared, and libraries are |
1311 | | Startup time | Slow (minutes) | Fast (a few seconds) |
1312 | | Storage | Large | Small (most are reusable) |
1313 |
1314 | - In real world they can co-exist
1315 | - When deploying applications on the cloud, the base computation unit is a Virtual Machine. Usually Docker containers are deployed on top of VMs.
1316 | - Containers not always better
1317 | - It depends on:
1318 | - The size of the task on hand
1319 | - The life span of the application
1320 | - Security concerns
1321 | - Host operation system
1322 |
1323 | ### What is Container?
1324 | - Similar concept of resources isolation and allocation as a virtual machine
1325 | - Without bundling the entire hardware environment and full OS
1326 | - What container runtimes are in use?
1327 | - Docker
1328 | - The leading software container platform
1329 | - Containerd
1330 | - cri-o
1331 |
1332 | ### Docker
1333 | - What is it?
1334 | - the most successful containerization technology.
1335 | - Docker Nomenclature
1336 | - Container: a process that behaves like an independent machine, it is a runtime instance of a docker image.
1337 | - Image: a blueprint for a container.
1338 | - Dockerfile: the recipe to create an image.
1339 | - Registry: a hosted service containing repositories of images. E.g., the Docker Hub (https://hub.docker.com)
1340 | - Repository: is a sets of Docker images.
1341 | - Tag: a label applied to a Docker image in a repository.
1342 | - Docker Compose: Compose is a tool for defining and running multi-containers Docker applications.
1343 | - Docker SWARM: a standalone native clustering / orchestration tool for Docker.
1344 | - Manage Data in Docker
1345 | - By default, data inside a Docker container won’t be persisted when a container is no longer exist.
1346 | - You can copy data in and out of a container.
1347 | - Docker has two options for containers to store files on the host machine, so that the files are persisted even after the container stops.
1348 | - Docker volumes (Managed by Docker, /var/lib/docker/volume/)
1349 | - Bind mounts (Managed by user, any where on the file system)
1350 | - different networking options
1351 | - host: every container uses the host network stack; which means all containers share the same IP address, hence ports cannot be shared across containers
1352 | - bridge: containers can re-use the same port, as they have different IP addresses, and expose a port of their own that belongs to the hosts, allowing the containers to be somewhat visible from the outside.
1353 |
1354 | ### Dockerfile
1355 | - ```
1356 | FROM nginx:latest
1357 |
1358 | ENV WELCOME_STRING "nginx in Docker"
1359 |
1360 | WORKDIR /usr/share/nginx/html
1361 |
1362 | COPY ["./entrypoint.sh", "/"]
1363 |
1364 | RUN cp index.html index_backup.html; \
1365 | chmod +x /entrypoint.sh; \
1366 | apt-get update && apt-get install -qy vim
1367 | # above run at build time
1368 | # below run at start up
1369 |
1370 | ENTRYPOINT ["/entrypoint.sh"]
1371 | CMD ["nginx", "-g", "daemon off;"]
1372 | ```
1373 | - ENTRYPOINT
1374 | - ENTRYPOINT gets executed when the container starts. CMD specifies arguments that will be fed to the ENTRYPOINT.
1375 | - Unless it is overridden, ENTRYPOINT will always be executed.
1376 |
1377 |
1378 |
1379 | ### What are Orchestration Tools?
1380 | - Container orchestration technologies provides a framework for integrating and managing containers **at scale**
1381 | - Goals/benefits
1382 | - Simplify container management process
1383 | - Help to manage availability and scaling of containers
1384 | - Features
1385 | - Networking
1386 | - Scaling
1387 | - Service discovery and load balancing
1388 | - Health check and self-healing
1389 | - Security
1390 | - Rolling update
1391 | - Tools
1392 | - Kubernetes and Hosted Kubernetes
1393 | - Docker SWARM / Docker Compose
1394 | - OpenShift
1395 |
1396 | ### Docker SWARM
1397 | - What is Docker SWARM (the correct name: Docker in SWARM mode)?
1398 | - It is a Docker orchestration tool.
1399 | - Why Docker SWARM?
1400 | - Hundreds of containers to manage?
1401 | - Scalability
1402 | - Self-healing
1403 | - Rolling updates
1404 | - Features
1405 | - Raft consensus group
1406 | - consists of internal distributed state store and all manager nodes.
1407 | - Internal Distributed State Store
1408 | - built-in key-value store of Docker Swarm mode.
1409 | - Manager Node
1410 | - It conducts orchestration and management tasks. Docker Swarm mode allows multiple manager nodes in a cluster. However, only one of the manager nodes can be selected as a leader.
1411 | - Worker Node
1412 | - receives and executes tasks directly from the manager node
1413 | - Node Availability
1414 | - In Docker Swarm mode, all nodes with ACTIVE availability can be assigned new tasks, even the manager node can assign itself new tasks (unless it is in DRAIN mode)
1415 | - Service
1416 | - consists of one or more replica tasks which are specified by users when first creating the service.
1417 | - Task
1418 | - A task in Docker Swarm mode refers to the combination of a single docker container and commands of how it will be run.
1419 |
1420 | ### past exam
1421 | - > [Sample Q1] Applications can be deployed across Clouds either through creation and deployment of virtual images (snapshots) or through scripting the installation and configuration of software applications.
1422 | - > Container based solutions such as Docker have advantages and disadvantages compared to traditional Cloud-based virtualization solutions based upon hypervisors. Discuss. [4]
1423 | - Guest OS
1424 | - Running on virtual Hardware will have their own OS kernels, thus introduce virtualization overhead. While container allows virtual instances to share a single host OS to reduce these wasted resources
1425 | - Communication
1426 | - Virulization communicate through Ethernet devices. Container communicate through IPC mechanisms
1427 | - Security
1428 | - Virulization depends on the Hypervisor. Container requires close scrutiny.
1429 | - Performance
1430 | - Virulization has small overhead incurs when instructions are translated from guest to host OS. Container have near native performance.
1431 | - Isolation
1432 | - Virulization has file systems and libraries are not shared between guest and host OS. Container has file systems and libraries can be shared.
1433 | - Startup time
1434 | - Virulization's startup time is slow. Container's startup time is fast.
1435 | - Storage
1436 | - Virulization's requires storage space is large. Container's requires storage space is small and most are reusable.
1437 | - (Virtualization vs Containerization table above)
1438 | - > [sample Q6] A) What are container orchestration technologies? What are the main benefits of using container orchestration tools? Name two of the most popular Docker orchestration tools? [3]
1439 | - Container orchestration technologies provides a framework for integrating and managing containers **at scale**
1440 | - benefits
1441 | - Simplify container management process
1442 | - Help to manage availability and scaling of containers
1443 | - Docker orchestration tools
1444 | - Kubernetes
1445 | - Docker SWARM
1446 | - > [2017 Q4] d ii What is the relationship between a Docker Image and a Docker
1447 | Container? [1]
1448 | - Container is a process that behaves like an independent machine, it is a runtime instance of a docker image.
1449 | - Image is a blueprint for a container.
1450 |
1451 | - > [sample Q6] B) A researcher wants to attach to an already running Postgresql container and list all of the databases it contains. The command to list all of the database is psql -U postgres -c “\l”. The name of the container is postgres and it exposes the port 5432 to the host. Is the following command correct? If not, please correct it: docker exec -p 5432 --name postgres sh -c psql -U postgres -c “\l” [3]
1452 | - docker exec -t postgres sh -c "psql -U postgres -c \"\I\""
1453 | - docker exec -t postgres psql -U postgres -c "\I"
1454 |
1455 | - > [sample Q6] C) The following Docker compose file starts two Docker containers that are used to run a WordPress website. What are the equivalent Docker commands that could be used to start these two containers individually? [4]
1456 | ```
1457 | version: '3.6'
1458 |
1459 | services:
1460 |
1461 | wordpress:
1462 |
1463 | image: wordpress
1464 |
1465 | restart: always
1466 |
1467 | ports:
1468 |
1469 | - 8080:80
1470 |
1471 | environment:
1472 |
1473 | WORDPRESS_DB_HOST: database
1474 |
1475 | WORDPRESS_DB_USER: wordpress
1476 |
1477 | WORDPRESS_DB_PASSWORD: wordpress
1478 |
1479 | WORDPRESS_DB_NAME: wordpress
1480 |
1481 | database:
1482 |
1483 | image: mysql:5.7
1484 |
1485 | restart: always
1486 |
1487 | environment:
1488 |
1489 | MYSQL_DATABASE: wordpress
1490 |
1491 | MYSQL_USER: wordpress
1492 |
1493 | MYSQL_PASSWORD: wordpress
1494 |
1495 | MYSQL_ROOT_PASSWORD: P@ssw0rd
1496 |
1497 | volumes:
1498 |
1499 | - /data/mysql:/var/lib/mysql
1500 | ```
1501 | - ```
1502 | docker run -e WORDPRESS_DB_HOST=database \
1503 | -e WORDPRESS_DB_USER=wordpress \
1504 | -e WORDPRESS_DB_PASSWORD=wordpress \
1505 | -e WORDPRESS_DB_NAME=wordpress \
1506 | -p 8080:80 --restart always wordpress
1507 | ```
1508 | - ```
1509 | docker run -e MYSQL_DATABASE=wordpress \
1510 | -e MYSQL_USER=wordpress \
1511 | -e MYSQL_PASSWORD=wordpress \
1512 | -e MYSQL_ROOT_PASSWORD=P@ssw0rd \
1513 | -v /data/mysql:/var/lib/mysql \
1514 | -d --restart always mysql:5.7
1515 | ```
1516 |
1517 | ## Week 8.1 – Virtualisation
1518 | Terminology
1519 | -
1520 | -
1521 | |||
1522 | |---|---|
1523 | |Virtual Machine Monitor/Hypervisor|The virtualisation layer between the underlying hardware the virtual machines and guest operating systems it supports. Give a perception of a whole machine.
1524 | |Virtual Machine|A representation of a real machine using hardware/software that can host a guest operating system
1525 | |Guest Operating System|An operating system that runs in a virtual machine environment that would otherwise run directly on a separate physical system.
1526 | 1. What happens in a VM?
1527 | -
1528 | - Inside the virtual machine, there are Virtual Network Device, VHD(Virtual Hard disk), VMDK(Virtual Machinie Disk), qcow2(QEMU Copy on Write)
1529 | - Guest OS apps “think” they write to hard disk but translated to virtualised host hard drive by VMM
1530 | - Which one is determined by image that is launched
1531 |
1532 | 2. Motivation (why we want VM/virtualization/advantages) ~~& History~~
1533 | |motivation||
1534 | |---|---|
1535 | |Server Consolidation|1. Increased utilisation
2. Reduced energy consumption
1536 | |Personal virtual machines can be created on demand|1. No hardware purchase needed
2. Public cloud computing - won't lockin Amazon
1537 | |Security/Isolation|Share a single machine with multiple users - won't want everyone see what you are doing
1538 | |Hardware independence|Relocate to different hardware
1539 | - originally, virtual machine = an efficient, isolated duplicate of the real machine
1540 | - Properties of interest (can also be thought as motivation):
1541 | - Fidelity
1542 | - Software on the VMM executes behaviour identical to that demonstrated when running on the machine directly, barring timing effects
1543 | - Performance
1544 | - An overwhelming majority of guest instructions executed by hardware without VMM intervention
1545 | - Safety
1546 | - The VMM manages all hardware resources
1547 | - history see lecture 8.1 slide 7
1548 | 3. Classification of Instructions
1549 | ||||
1550 | |---|---|---|
1551 | |Privileged Instructions|instructions that trap if the processor is in user mode and do not trap in kernel mode
1552 | |Sensitive Instructions|instructions whose behaviour depends on the mode or configuration of the hardware|Different behaviours depending on whether in user or kernel mode
- e.g. POPF interrupt (for interrupt flag handling)
1553 | |Innocuous Instructions|instructions that are neither privileged nor sensitive|Read data, add numbers etc
1554 | - Popek and Goldberg Theorem
1555 | - For any conventional third generation computer, a virtual machine monitor may be constructed if the set of **sensitive instructions** for that computer is a subset of the set of **privileged instructions** i.e. have to be trappable
1556 | - x86 architecture was historically not virtualisable, due to **sensitive instructions that could not be trapped**
1557 | - Intel and AMD introduced extensions to make x86 virtualisable
1558 | 3. What are the requirements for virtualisation?
1559 | -
1560 | |Typical Virtualisation Strategy||Achieved by|problem|
1561 | |---|---|---|---|
1562 | |De-privileging (trap-and-emulate)|trap-and-emulate: VMM emulates the effect on system/hardware resources of privileged instructions whose execution traps into the VMM|running GuestOS at a lower hardware priority level than the VMM|Problematic on some architectures where privileged instructions do not trap when executed at de-privileged level
1563 | |Primary/shadow structures|1. VMM maintains “shadow” copies of critical structures whose “primary” versions are manipulated by the GuestOS, e.g. memory page tables
2. Primary copies needed to insure correct versions are visible to GuestOS
1564 | |Memory traces|Controlling access to memory so that the shadow and primary structure remain coherent|write-protect primary copies so that update operations cause page faults which can be caught, interpreted, and addressed
- Someones app/code doesn’t crash the server you are using!!!
1565 | - Do sensitive instructions and privileged instructions both need to be trap-and-emulate?
1566 | - All sensitive/privileged instructions have to be dealt with. Some will need to be emulated/translated
1567 | - others can just happen depending on the mode and/or whether para-virtualisation is supported.
1568 | - (Popek and Goldberg Theorem above and sth below)
1569 | 4. Virtualisation approaches (compare with each other pair wise 1 v.s. 2, ...)
1570 | |Aspects of VMMs|What is it?|e.g.|Advantages|Disadvantages|
1571 | |---|---|---|---|---|
1572 | |Full virtualisation|allow an unmodified guest OS to run in isolation by simulating full hardware
- Guest OS has no idea it is not on physical machine|VMWare|1. Guest is unaware it is executing within a VM
2. Guest OS need not be modified
3. No hardware or OS assistance required
4. Can run legacy OS|1. can be less efficient|
1573 | |Para-virtualisation|- VMM/Hypervisor exposes special interface to guest OS for better performance. Requires a modified/hypervisor-aware Guest OS
- Can optimise systems to use this interface since not all instructions need to be trapped/dealt with because "VMM emulates the effect on system/hardware resources of privileged instructions whose execution traps into the VMM"|Xen|1. Lower virtualisation overheads, so better performance|1. Need to modify guest OS - Can’t run arbitrary OS!
2. Less portable
3. Less compatibility
1574 | |Hardware-assisted virtualisation|Hardware provides architectural support for running a Hypervisor
- New processors typically have this
- Requires that all sensitive instructions trappable|KVM|1. Good performance
2. Easier to implement
3. Advanced implementation supports hardware assisted DMA, memory virtualisation|1. Needs hardware support
1575 | |software virtualization|Any virtualisation that does not involve hardware support.|
1576 | |Binary Translation|Trap and execute occurs by scanning guest instruction stream and replacing sensitive instructions with emulated code
- Don’t need hardware support, but can be much harder to achieve|VMWare|1. Guest OS need not be modified
2. No hardware or OS assistance required
3. Can run legacy OS|1. Overheads
2. Complicated
3. Need to replace instructions “on-the-fly”
4. Library support to help this, e.g. vCUDA
1577 | |Bare Metal Hypervisor|VMM runs directly on actual hardware
- Boots up and runs on actual physical machine
- VMM has to support device drivers, all HW mgt |VMWare ESX Server
1578 | |Hosted Virtualisation|VMM runs on top of another operating system|VMWare Workstation
1579 | |Operating System Level Virtualisation|1. Lightweight VMs
2. Instead of whole-system virtualisation, the OS creates mini-containers|Docker|1. Lightweight
2. Many more VMs on same hardware
3. Can be used to package applications and all OS dependencies into container|1. Can only run apps designed for the same OS
2. Cannot host a different guest OS
3. Can only use native file systems
4. Uses same resources as other containers|
1580 | |Memory Virtualisation|VMM maintains shadow page tables in lock-step with the page tables.
detail see below section|||1. Adds additional management overhead
1581 | -
for Full virtualisation, Binary Translation
1582 | - but in each case there can be some differences in rangs for each service see lecture 8.1 slides 15, 19
1583 | -
for Para-virtualisation, Hardware-assisted virtualisation
1584 | differ in ring 0 service, see lecture 8.1 slides 16, 18
1585 | - New Ring -1 for VMM supported Page tables, virtual memory mgt, direct memory access for high speed reads etc
1586 |
1587 | 5. Memory Virtualisation
1588 | -
1589 |
1590 | - In conventional case, page tables store the logical page number and physical page number mappings
1591 | -
1592 |
1593 | - In VMM case, VMM maintains shadow page tables in lock-step with the page tables. Additional management overhead is added.
1594 | - Shadow Page Tables
1595 | - VMM maintains shadow page tables in lock-step with the page tables
1596 | -
1597 | - In this case, one OS represent in blue, the other in green
1598 | - Disadv: Adds additional management overhead
1599 | - Hardware performs guest -> physical and physical -> machine translation
1600 | 6. Live migration
1601 | - having continuity of service during data moving
1602 | - Live Migration from Virtualisation Perspective/Live migration of virtual machines
1603 | -
1604 |
1605 | ### past exam
1606 | - > [2014 Q7] A) Define the following terms and their relevance to Cloud Computing:
1607 | - > a. Hypervisor [1]
1608 | - above
1609 | - > b. Virtual machine [1]
1610 | - A representation of a real machine using hardware/software that can host a guest operating system
1611 | - > c. Machine image [1]
1612 | - is a Compute Engine resource that stores all the configuration, metadata, permissions, and data from one or more disks required to create a virtual machine (VM) instance.
1613 | - > d. Object Store [1]
1614 | - is a strategy that manages and manipulates data storage as distinct units
1615 | - > e. Volume Store [1]
1616 | - Store = Storage
1617 | - Volume storage is the virtual equivalent of a USB drive. A USB drive retains your data, whether it is plugged in or not. Manipulating the data on a USB drive requires that it is plugged into a computer and that it is mounted by the operating system. Your USB drive can be unplugged and plugged into another (newer, bigger, better) computer, but your USB drive can only ever be plugged in to one computer at a time.
1618 | - Equivalently a volume in your Nectar project can retain your data, whether it is attached to an instance or not. Manipulating the data on the volume requires that is attached to an instance, and that the file systems is mounted by the operating system. Your volume can be detached and attached to another (newer, bigger, better) instance, but your volume can only ever be attached to one instance at a time.
1619 | - > f. Key-pair [1]
1620 | - A key pair consists of a private key and a public key.
1621 | - > [2013 Q5] A) Explain what is meant by the following terms:
1622 | - > Virtual Machine Monitor/Hypervisor [1]
1623 | - is a technology to provide virtualization by providing a virtualisation layer between the underlying hardware the virtual machines and guest operating systems it supports.
1624 | - > Full virtualization [1]
1625 | - allow an unmodified guest OS to run in isolation by simulating full hardware
1626 | - Guest OS has no idea it is not on physical machine
1627 | - > Para-virtualization [1]
1628 | - VMM/Hypervisor exposes special interface to guest OS for better performance. Requires a modified/hypervisor-aware Guest OS
1629 | - Can optimise systems to use this interface since not all instructions need to be trapped/dealt with
1630 | - > Shadow page tables [1]
1631 | - VMM (virtual machine monitar) keeps a mapping between what a vitual machine on the server rack you think that if you are dealing with address spaces and memory update. And it keeps a logical mapping so that all the instances think their own page tables which they don't. All of that is managed indirectly by the shadow page table.
1632 | - > Explain how hardware virtualization and software virtualization can differ in their treatment of shadow page tables. [2]
1633 |
1635 | - Main issue is that the hardware does a lot of the management of shadow page tables and hence is faster but needs all calls to be trappable by hardware. Doing it via software virtualisation requires sensitive calls to be trapped and handled by the VMM which is slower.
1636 | - The VMM needs to keep shadow page tables synchronised with guest page tables. You might add para-virtualisation to hardware virtualization (the one has shadow page table) can improve things from a performance perspective.
1637 | - > Explain the advantages and disadvantages of virtual machines. [2]
1638 | - adv
1639 | - reuse hardware and have multiple different OS running on the same physical system
1640 | - disadv
1641 | - performance overhead
1642 | - privacy and security issue
1643 | - virtual machine has slow startup time
1644 | - > [2017 Q7 C [3]] Describe the typical steps that are required to support live migration of virtual machine instances using a Cloud facility such as the NeCTAR Research Cloud. [2]
1645 | - picture above
1646 | - > [2016 Q5] A) Popek and Goldberg laid down the foundations for computer virtualization in their 1974 paper, Formal Requirements for Third Generation Architectures.
1647 | - > a. Identify and explain the different types of classification of instruction sets for virtualization to occur according to the theorem of Popek and Goldberg. You should include the relationships between the instruction sets. [3]
1648 | - |||
1649 | |---|---|
1650 | |Privileged Instructions|instructions that trap if the processor is in user mode and do not trap in kernel mode
1651 | |Sensitive Instructions|instructions whose behaviour depends on the mode or configuration of the hardware|
1652 | |Innocuous Instructions|instructions that are neither privileged nor sensitive|
1653 | - relation = subset
1654 | - For any conventional third generation computer, a virtual machine monitor may be constructed if the set of **sensitive instructions** for that computer is a subset of the set of **privileged instructions** i.e. have to be trappable
1655 | - innocuous instructions do not need to be trapped and dealt with and hence can be considered separately.
1656 | - > b. Describe how these principles are realized by modern virtual machine monitors/hypervisors. [2]
1657 | - |Typical Virtualisation Strategy||
1658 | |---|---|
1659 | |De-privileging (trap-and-emulate)|trap-and-emulate: VMM emulates the effect on system/hardware resources of privileged instructions whose execution traps into the VMM|
1660 | |Primary/shadow structures|1. VMM maintains “shadow” copies of critical structures whose “primary” versions are manipulated by the GuestOS, e.g. memory page tables
2. Primary copies needed to insure correct versions are visible to GuestOS
1661 | |Memory traces|Controlling access to memory so that the shadow and primary structure remain coherent
1662 | - sensitive/privileged instructions (calls) have to be trapped and dealt with
1663 | - > c. Explain the differences between full virtualization and para-virtualisation. Give an example of a hypervisor that uses full virtualization and an example of a hypervisor that uses paravirtualisation. [2]
1664 | - full virtualization
1665 | - allow an unmodified guest OS to run in isolation by simulating full hardware
1666 | - VMWare
1667 | - para-virtualisation
1668 | - VMM/Hypervisor exposes special interface to a modified/hypervisor-aware Guest OS for better performance.
1669 | - Xen
1670 | - > d. Describe the role of a virtual machine manager/hypervisor with regards to memory management and shadow page tables. [3]
1671 | - VMM maintains shadow page tables in lock-step with the page tables. Additional management overhead is added.
1672 | - VMM emulates the effect on system/hardware resources of privileged instructions whose execution traps into the VMM
1673 | - VMM maintains “shadow” copies of critical structures whose “primary” versions are manipulated by the GuestOS
1674 | - Control access to memory so that the shadow and primary structure remain coherent
1675 |
1676 |
1677 | ## Week 8.2 – OpenStack & Comparing and Contrasting AWS with NeCTAR Cloud
1678 | - Offers free and open-source software platform for cloud computing for **IaaS**
1679 | - Consists of interrelated components (services) that control / support compute, storage, and networking resources
1680 | - Often used through web-based dashboards, through command-line tools, or programmatically through ReSTful APIs
1681 |
1682 | ### Openstack architecture
1683 | -
1684 |
1685 | - As a user, login in through Horizon to have access to cloud and get identity
1686 | - identity is passed among the components
1687 | - operation is restricted based on your identity and resource available
1688 | - reformat other's disk
1689 | - launch server/instance
1690 | - identity is passed among the components
1691 | - operation is restricted based on your identity and resource available
1692 | - use pre-existing instance, use Ubuntu via Glance's image service
1693 | - attach some storage
1694 | - object storage via Swift or
1695 | - block storage via Cinder
1696 | - setup firewall, ssh port via Neutron's Networking Services
1697 | - **Identity works as a glue among components**
1698 |
1699 | ### Typically asynchronous queuing systems used (AMQP)
1700 | -
1701 |
1702 | - AMQP: queueing service for load balance when large request comes
1703 | - queue an instance creation request and starts when one is released
1704 |
1705 | ### Key Services
1706 | #### Keystone -- Identity Service
1707 | - Provides an authentication and authorization service fro OpenStack services
1708 | - Tracks users/permissions
1709 | - Provides a catalog of endpoints for all OpenStack services
1710 | - Each service registered during install
1711 | - Know where they are and who can do what with them
1712 | - Project membership
1713 | - firewall rules
1714 | - Generic authorization system
1715 | - More refer to week10 ==TODO==
1716 | #### Nova -- Compute Service
1717 | - Manages the lifecycle of compute instances in an OpenStack environment
1718 | - Responsibilities for virtual machines on demand, include
1719 | - spawning
1720 | - scheduling
1721 | - Decommissioning
1722 | - Virtualisation agnostic
1723 | - Key point of success as it allows openStack works with **any kind** of virtualisation solution, including
1724 | - XenAPI, Hyper-V, VMWare ESX
1725 | - Docker
1726 | - You are not binding with any specific solution
1727 | - ==(The following not covered in detail in the lecture)==
1728 | - API
1729 | - Nova-api
1730 | - Accepts/responds to end user API calls
1731 | - Supports openStack Compute & EC2 & admin APIs
1732 | - Compute Core
1733 | - Nova-computer
1734 | - Daemon that creates/terminates VMs through hypervisor APIs
1735 | - Nova-scheduler
1736 | - schedules VM instance requests from queue and determines which server host to run
1737 | - Nova-conductor
1738 | - Mediates interactions between compute services and other components, e.g. image database
1739 | - Networking
1740 | - Nova-network
1741 | - Accepts network tasks from queue and manipulates network, e.g. changing IP table rules
1742 | -
1743 |
1744 | - I need a VM with: 64Gb memory, 8vCPUs, in Melbourne, running Ubuntu 12.04
1745 | - The call comes in through load balancer and buffered
1746 | - nova-api Accepts/responds to end user API calls
1747 | - Nova-scheduler schedules VM instance requests from queue and determines which server host to run
1748 | - Nova-conductor mediates interactions between compute services and other components, e.g. image database
1749 | #### Swift - Object Storage
1750 | - Stores and retrieves arbitrary unstructured data objects via ReSTful API
1751 | - e.g.: VM images and data
1752 | - This service can be used to access arbitrary unstructured data
1753 | - Fault tolerant with data replication and scale-out architecture
1754 | - Available from anywhere; persists until deleted
1755 | - Allows to write objects and files to multiple drives, ensuring the data is replicated across a server cluster
1756 | - Can be used with/without **Nova**
1757 | - Client/admin support
1758 | - Swift client allows users to submit commands to ReST API through command line clients to configure/connect object storage to VMs
1759 | #### Cinder -- Block Storage
1760 | - Provides persistent block storage to virtual machines (instances) and supports creation and management of block storage devices
1761 | - Cinder access associated with a VM
1762 | - Cinder-api
1763 | - routes requests to cinder-volume
1764 | - Cinder-volume
1765 | - interacts with block storage service and scheduler to read/write requests; can interact with multiple flavours of storage (flexible driver architecture)
1766 | - Cinder-scheduler
1767 | - selects optimal storage provider node to create volumes (ala nova-scheduler)
1768 | - Cinder-backup
1769 | - provides backup to any types of volume to backup storage provider
1770 | #### Glance -- Image Service
1771 | - Accepts requests for disk or server images and their associated metadata (from **Swift**) and retrieves / installs (through **Nova**)
1772 | - Find the image at **Swift**, but getting the image at **Glance**
1773 | - API
1774 | - Glance-api
1775 | - Image discovery, retrieval and storage requests
1776 | - Glance-registry
1777 | - Stores, processes and retrieves metadata about images, e.g. size and type
1778 | - Ubuntu 14.04
1779 | - My last good snapshot
1780 | - I (the owner) can control who can access the snapshot using **Keystone**
1781 | #### Neutron -- Networking Services
1782 | - Supports networking of OpenStack services
1783 | - subnet
1784 | - Network in and out
1785 | - Network security group
1786 | - Offers an API for users to define networks and the attachments into them,
1787 | - e.g.:
1788 | - switches
1789 | - routers
1790 | - Pluggable architecture that supports multiple networking vendors and technologies
1791 | - Neutron-server
1792 | - accepts and routes API requests to appropriate plug-ins for action
1793 | - Port management, e.g. default SSH, VM-specific rules, ...
1794 | - More broadly configuration of availability zone networking, e.g. subnets, DHCP, ...
1795 | #### Horizon -- Dashboard Service
1796 | - Provides a web-based self-service portal to interact with underlying OpenStack services, such as
1797 | 1. launching an instance
1798 | 2. assigning IP addresses
1799 | 3. configuring access controls
1800 | - Based on Python/Django web application
1801 | - Requires Nova, Keystone, Glance, Neutron
1802 | - Other services optional...
1803 | #### Trove -- Database Service
1804 | - Provides scalable and reliable Cloud database (DBaaS) functionality for both relational and non-relational database engines
1805 | - Benefits
1806 | - Resource isolation
1807 | - high performance
1808 | - automates deployment
1809 | - config
1810 | - patching
1811 | - backups
1812 | - restores
1813 | - monitoring
1814 | - ...
1815 | - Use image service for each DB type and trove-manage to offer them to tenants/user communities
1816 | #### Sahara -- Data Processing Service
1817 | - Provides capabilities to provision and scale Hadoop clusters in OpenStack by specifying parameters such as Hadoop version, cluster topology and node hardware details
1818 | - User fills in details and Sahara supports the automated deployment of infrastructure with support for addition/removal of worker nodes on demand
1819 | #### Heat -- Orchestration Service
1820 | - Template-driven service to manage lifecycle of applications deployed on Openstack
1821 | - Stack
1822 | - Another name for the template and procedure behind creating infrastructure and the required resources from the template file
1823 | - Can be integrated with automation tools such as Chef
1824 | - Puppet
1825 | - Ansible
1826 | - Heat details
1827 | - heat_template_version: allows to specify which version of Heat, the template was written for (optional)
1828 | - Description: describes the intent of the template to a human audience (optional)
1829 | - Parameters: the arguments that the user might be required to provide (optional)
1830 | - Resources: the specifications of resources that are to be created (mandatory)
1831 | - Outputs: any expected values that are to be returned once the template has been processed (optional)
1832 |
1833 | ### Creating Stacks in MRC/NeCTAR
1834 | 1. Create the template file according to your requirements
1835 | 2. Provide environment details (name of key file, image id, etc)
1836 | 3. Select a name for your stack and confirm the parameters
1837 | 4. Make sure rollback checkbox is marked, so if anything goes wrong, all partially created resources get dumped too
1838 | 5. Wait for the magic to happen!
1839 |
1840 | ### past exam
1841 | - > [sample Q3 A] The NeCTAR Research Cloud is based on the OpenStack technology. Describe the role and features of the following OpenStack components:
1842 | - Nova [1] (one of below)
1843 | - Manages the lifecycle of compute instances in an OpenStack environment
1844 | - Responsibilities for virtual machines on demand, include spawning, scheduling and decommissioning
1845 | - Horizon [1]
1846 | - Provides a web-based self-service portal to interact with underlying OpenStack services, such as launching an instance, assigning IP addresses and configuring access controls.
1847 | - Heat [1]
1848 | - Template-driven service to manage lifecycle of applications deployed on Openstack
1849 | - Glance [1]
1850 | - Accepts requests for disk or server images and their associated metadata (from **Swift**) and retrieves / installs (through **Nova**)
1851 | - Swift [1]
1852 | - Stores and retrieves arbitrary unstructured data objects via ReSTful API, e.g.: VM images and data
1853 | - Keystone [1]
1854 | - Provides an authentication and authorization service fro OpenStack services
1855 | - Tracks users/permissions
1856 | - Provides a catalog of endpoints for all OpenStack services
1857 | - Neutron [1]
1858 | - Supports networking of OpenStack services
1859 | - Offers an API for users to define networks and the attachments into them, e.g.: switches, routers
1860 | - Port management, e.g. default SSH, VM-specific rules, ...
1861 | - > [sample Q3 B] Describe the interplay between these components that allows a researcher to create an instance of a virtual machine through a pre-existing snapshot from a non-public NeCTAR Cloud image, e.g. a snapshot created by a user. [3]
1862 | - Authenticate via Keystone. provide unimelb id and password for MRC. And Keystone identity enable us to use other components in the system so that the system knows that's us using them.
1863 | - Daemon that creates/terminates VMs through hypervisor APIs via Nova-computer
1864 | - schedules VM instance requests from queue and determines which server host to run via Nova-scheduler
1865 | - Mediates interactions between compute services and other components, e.g. image database via Nova-conductor
1866 | - looking up resoueces required via Swift/Glance
1867 | - preparing the VM on machine required
1868 | - > Describe the approach that would be taken using the openStack Heat service for deployment of SaaS solutions onto the Cloud. [2]
1869 | 1. Create the template file according to your requirements
1870 | 2. Provide environment details (name of key file, image id, etc)
1871 | 3. Select a name for your stack and confirm the parameters
1872 | 4. Make sure rollback checkbox is marked, so if anything goes wrong, all partially created resources get dumped too
1873 | 5. Wait for the magic to happen!
1874 |
1875 |
1876 | ## Week 8.3 - Serverless (Function as a Service (FaaS))
1877 | 1. Why Functions?
1878 | - A function in computer science is typically a piece of code that takes in parameters and returns a value
1879 | - Functions are the founding concept of functional programming - one of the oldest programming paradigms
1880 | - Why they are used in Faas?
1881 | - Functions in server less comuting are:
1882 | - free of side-effects,
1883 | - What is it?
1884 | - A function that does not modify the state of the system
1885 | - e.g.: a function that takes an image and returns a thumbnail of that image
1886 | - A function that changes the system somehow is not side-effect free
1887 | - e.g.: a function that writes to the file system the thumbnail of an image
1888 | - How Side-effect free benefits parallel execition?
1889 | - Side-effect free functions can be run in parallel, and are guaranteed to return the same output given the same input
1890 | - How it benefits FaaS?
1891 | - Side-effects are almost inevitable (不可避免的) in a relatively complex system.
1892 | Therefore consideration must be given on how to make functions with side effects run in parallel, as typically required in FaaS environments.
1893 | - ephemeral (短暂的;瞬息的),
1894 | - Synchronous/Asynchronous Functions
1895 | - Relationship to FaaS
1896 | - By default functions in FaaS are synchronous, hence they return their result immediately
1897 | - However, there may be functions that take longer to return a result, hence they incur timeouts and lock connections with clients in the process, hence it is better to transform them into asynchronous functions
1898 | - a publish/subscribe pattern involving a queuing system can be used to deal with asynchronous functions
1899 | - How them work?
1900 | |Function|How|
1901 | |---|---|
1902 | |Synchronous functions|return their result immediately
1903 | |Asynchronous functions|return a code that informs the client that the execution has started, and then trigger an event when the execution completes
1904 | - stateless,
1905 | - What is it?
1906 | - A subset of functions with side-effects is composed of stateful functions
1907 | |||e.g.|
1908 | |---|---|---|
1909 | |stateful function|is one whose output changes in relation to internally stored information (hence its input cannot entirely predict its output)| a function that adds items to a "shopping cart" and retains that information internally
1910 | |stateless function|is one that does not store information internally|adding an item to a "shopping cart" stored in a DBMS service and not internally would make the function above stateless, but not side-effect free.
1911 | - Why it is important for FaaS?
1912 | - Because there are multiple instances of the same function, and there is no guarantee the same user would call the same function instance twice.
1913 | - which make them ideal for
1914 | - parallel execution and
1915 | - rapid scaling-up and -down
1916 | - Functions are free of side-effects, ephemeral, and stateless, which make them ideal for parallel execution and rapid scaling-up and -down, hence their use in FaaS
1917 | 2. Function & FaaS
1918 | - Side effects
1919 | - stateful & stateless
1920 | - Synchronous/Asynchronous Functions
1921 | 3. What is Function as a Service (FaaS)?
1922 | - FaaS is also know as Serverless computing
1923 | - FaaS is an extreme form of microservice architecture
1924 | - The idea behind Serverless/FaaS is to develop software applications without bothering with the infrastructure (especially scaling-up and down as load increases or decreases). Therefore, it is more Server-unseen than Server-less
1925 | - What does it do?
1926 | - A FaaS service allows functions to be added, removed, updated, executed, and auto-scaled
1927 | 4. Why we need Faas?
1928 | |Reason|How|
1929 | |---|---|
1930 | |Simpler deployment|the service provider takes care of the infrastructure
1931 | |Reduced computing costs|only the time during which functions are executed is billed
1932 | |Reduced application complexity|due to loosely-coupled architecture
1933 | 5. FaaS application
1934 | - Functions are triggered by events
1935 | - Functions can call each other
1936 | - Functions and events can be combined to build software applications
1937 | - Combining event-driven scenarios and functions resembles how User Interface software is built: user actions trigger the execution of pieces of code
1938 | - E.g.: FaaS Services and Frameworks
1939 | - Amazon’s AWS Lambda
1940 | - Google Cloud Functions
1941 | - Azure Functions by Microsoft
1942 | - proprietary FaaS services v.s. open-source FaaS frameworks
1943 | - open-source FaaS frameworks can be deployed on your cluster, peered into, disassembled, and improved by you.
1944 |
1945 | ### past exam
1946 | - > [sample Q7] A) In the context of Cloud, what is meant by serverless computing? [1]
1947 | - A way of developing applications as **collections of functions** that are deployed on a computing infrastructure without the need to manage it.
1948 | - > [sample Q7] B) List three reasons why it may be beneficial to choose a serverless solution. [3]
1949 | - |Reason|How|
1950 | |---|---|
1951 | |Simpler deployment|the service provider takes care of the infrastructure
1952 | |Reduced computing costs|only the time during which functions are executed is billed
1953 | |Reduced application complexity|due to loosely-coupled architecture
1954 | - "Why we need Faas?" above
1955 | - > [sample Q7] C) Discuss the role of functions in serverless computing. Your answer should include key properties of functions that make them suitable for serverless environments. [3]
1956 | - Serverless applications are composed of functions
1957 | - key properties of functions that make them suitable for serverless environments:
1958 | - Functions in server less comuting are:
1959 | - free of side-effects
1960 | - ephemeral
1961 | - stateless
1962 | - which make them ideal for
1963 | - parallel execution and
1964 | - rapid scale up and scale down
1965 | - Functions are triggered by events
1966 | - Functions can call each other
1967 |
1968 | ## Workshop week8: OpenFaaS
1969 | ### Properties
1970 | - Functions are passed a request as an object in the language of choice and return a response as an object
1971 | - OpenFaaS \& container
1972 | - Open-source framework that uses Docker containers to deliver FaaS functionality
1973 | - role of container technologies and their relationship with functions:
1974 | - Every function in OpenFaaS is a Docker container, ensuring loose coupling between functions
1975 | - Function can be written in different languages and mixed freely
1976 | - OpenFaaS can use either Docker Swarm or Kubernetes to manage cluster of nodes on which functions run
1977 | - By using Docker containers as functions, OpenFaaS allow to freely mix different languages and environments at the cost of **decreased performance** as containers are inherently heavier than threads
1978 | - However, it is possible to reduce the size to only a few MBs
1979 | ### Auto-scalability and OpenFaaS
1980 | - OpenFaaS can add more Docker containers when a function is called more often, and remove containers when the function is called less often
1981 | - The scaling-up (and down) of functions can be tied to memory or CPU utilization as well (currently only on Kubernetes-managed clusters though)
1982 |
1983 | ### past exam
1984 | - > [sample Q7] D) OpenFaaS is an open source framework that can be used to deliver serverless computing solutions. Discuss the role of container technologies such as Docker in OpenFaaS and their relationship with functions and how they might be used to support auto-scaling. [3]
1985 | - Every function in OpenFaaS is a Docker container, ensuring loose coupling between functions
1986 | - When the load increases, OpenFaaS add more container executing the same function.
1987 | - When the load decreases, OpenFaaS remove containers for the function is called less often.
1988 |
1989 | ## Week 9 - Big Data Analytics
1990 | 1. Why we need it?
1991 | - There would not be much point in amassing vast amount of data without being able to analyse it, hence the blossoming of large-scale business intelligence and more complex machine learning algorithms.
1992 | - Overlapping among business intelligence, machine learning, statistics and data mining.
1993 | - Just use big data analytics
1994 | 2. What is it?
1995 | - There is a good deal of overlap and confusion among terms such as business intelligence, machine learning, statistics, and data mining. For the sake of clarity, we just use the more general term (big) data analytics
1996 | 3. Examples of Analytics
1997 | ||
1998 | |---|
1999 | |Full-text searching
2000 | |Aggregation of data
2001 | |Clustering
2002 | |Sentiment analysis
2003 | |Recommendations
2004 | 4. Challenges of Big Data Analytics
2005 | - A framework for analysing big data has to distribute both data and processing over many nodes, which implies:
2006 | |imply||
2007 | |---|---|
2008 | |Reading and writing distributed datasets
2009 | |Preserving data in the presence of failing data nodes
2010 | |Supporting the execution of MapReduce tasks
2011 | |Being fault-tolerant|a few failing compute nodes may slow down the processing, but not stop it
2012 | |Coordinating the execution of tasks across a cluster
2013 | 5. Tools for Analytics:
2014 | - Apache Hadoop
2015 | - Apache Spark
2016 | ### Apache Hadoop
2017 | 1. How it works?
2018 | - Apache Hadoop started as a way to distribute files over a cluster and execute MapReduce tasks, but many tools have now been built on that foundation to add further functionality
2019 | 2. Components
2020 | - Hadoop Distributed File System (HDFS)
2021 | - Hadoop Resource Manager (YARN)
2022 | 2. Hadoop Distributed File System (HDFS)
2023 | - What is it?
2024 | - The core of Hadoop is a fault tolerant file system that has been explicitly designed to span many nodes
2025 | - HDFS blocks v.s. blocks
2026 | - HDFS blocks are much larger than blocks used by an ordinary file system (say, 4 KB versus 128MB)
2027 | - Why?
2028 | ||How achieve it?|
2029 | |---|---|
2030 | |Reduced need for memory to store information about where the blocks are|metadata
2031 | |More efficient use of the network|with a large block, a reduced number network connections needs to be kept open
2032 | |Reduced need for seek operations on big files
2033 | |Efficient when most data of a block have to be processed
2034 | - HDFS Architecture
2035 | - A HDFS file is a collection of blocks stored in datanodes, with metadata (such as the position of those blocks) that is stored in namenodes
2036 | -
2037 | - The HDFS Shell
2038 | - Why we need it?
2039 | - Managing the files on a HDFS cluster cannot be done on the operating system shell
2040 | - hence a custom HDFS shell must be used.
2041 | - The HDFS file system shell replicates many of the usual commands (ls, rm, etc.), with some other commands dedicated to loading files from the operating system to the cluster (and back)
2042 | 3. The Hadoop Resource Manager (YARN)
2043 | - What is it/What does it do?
2044 | - YARN deals with Executing MapReduce jobs on a cluster
2045 | - It is composed of a central ***Resource Manager*** and
2046 | - Many ***Node Managers*** that reside on slave machines
2047 | - Every time a MapReduce job is scheduled for execution on a Hadoop cluster, YARN starts an **Application Master** that negotiates resources with the **Resource Manager** and starts Containers on the slave nodes
2048 | - Containers are the processes where the actual processing is done
2049 | 5. Programming on Hadoop
2050 | ### Apache Spark
2051 | 1. Why Spark not Hadoop?/Spark v.s. Hadoop
2052 | - While Hadoop MapReduce works well, it is geared towards performing relatively simple jobs on large datasets.
2053 | - While the execution order of Hadoop MapReduce is fixed, the lazy evaluation of Spark allows the developer to stop worrying about it, and have the Spark optimizer take care of it.
2054 | - In addition, the driver program can be divided into steps that are easier to understand without sacrificing performance (as long as those steps are composed of transformations).
2055 | - However, when complex jobs are performed, we would like
2056 | - Caching data in memory
2057 | - Having finer-grained control on the execution of the jobs
2058 | - Spark was designed to
2059 | - reduce the latency inherent in the Hadoop approach for the execution of MapReduce job
2060 | - How?
2061 | - The transformations in the program use lazy evaluation, hence Spark has the possibility of optimizing the process
2062 | - Spark can operate within the Hadoop architecture, using YARN and Zookeeper to
2063 | - Manage computing resources
2064 | - Storing data on HDFS
2065 | - Spark has a tightly-coupled nature of its main components
2066 | - Spark has a cluster manager of its own, but it can work with other cluster managers, such as YARN or MESOS.
2067 | 2. Spark Architecture
2068 | - One of the strong points of Spark is the tightly-coupled nature of its main components
2069 | -
2070 | - Spark ships with a cluster manager of its own, but it can work with other cluster managers, such as YARN or MESOS.
2071 | 3. The Spark Shell
2072 | - allows to send commands to the cluster interactively in either Scala or Python.
2073 | - While the shell can be extremely useful, it prevents Spark from deploying all of its optimizations, leading to poor performance.
2074 | 4. Programming on Spark
2075 | - Lecture 09:: 00:34:24
2076 | 5. Spark Runtime Architecture
2077 | - Applications in Spark are composed of different components including
2078 | - Job
2079 | - The data processing that has to be performed on a dataset
2080 | - the overall processing that Spark is directed to perform by a driver program
2081 | - Task
2082 | - A single operation on a dataset
2083 | - a single transformation operating on a single partition of data on a single node
2084 | - Stage
2085 | - Set of task operating on a single partition
2086 | - Stage \& performance
2087 | - The fewer the number of stages, the faster the computation (shuffling data across the cluster is slow)
2088 | - Stage \& Job
2089 | - A job is composed of more than one stage when data are to be transferred across node
2090 | - Executors
2091 | - The processes in which tasks are executed
2092 | - Cluster Manager
2093 | - The process assigning tasks to executors
2094 | - Driver program
2095 | - The main logic of the application
2096 | - Spark application
2097 | - Driver program + Executor
2098 | - Spark Context
2099 | - The general configuration of the job
2100 | - The **deployment is set in the Stpark Context**, which is also used to set the configuration of a Spark application, including the cluster it connects to in cluster mode.
2101 | - For instance, this hard-coded Spark Context directs the execution to run locally, using 2 threads (usually, it is set to the number of cores):
2102 | - sc = new SparkContext(new SparkConf().setMaster("local[2]"));
2103 | - This other hard-coded line directs the execution to a remote cluster:
2104 | - sc = new SparkContext(new SparkConf().setMaster("spark://192.168.1.12:6066"));
2105 | - Spark Contexts can also be used to **tune the execution** by setting the memory, or the number of executors to use.
2106 | - These different components can be arranged in **three** different deployment modes (below) across the cluster
2107 | - Spark Runtime Mode
2108 | - Local Mode
2109 | - In local mode, every Spark component runs within the same JVM. However, the Spark application can still run in parallel, as there may be more than on executor active
2110 | - When used?
2111 | - Good for developing and debugging
2112 | - Cluster Mode
2113 | - In cluster mode, every component, including the driver program, is executed on the cluster. Upon launching, the job can run autonomously.
2114 | - When used?
2115 | - This is the common way of running non-interactive Spark jobs.
2116 | - Client Mode
2117 | - The driver program talks directly to the executors on the worker nodes. Therefore, the machine hosting the driver program has to be connected to the cluster until job completion.
2118 | - When used?
2119 | - Client mode must be used when the applications are interactive, as happens in the Python or Scala Spark shells.
2120 | 5. Caching Intermediate Results
2121 | - rdd.persist(storageLevel) can be used to save an RDD either in memory and/or disk.
2122 | - The storageLevel can be tuned to a different mix of use of RAM or disk to store the RDD
2123 | - since RDDs are immutable,
2124 | - the result of the final transformation is cached, not the input RDD.
2125 | - In other words, when this statement is executed
2126 | - rddB = rddA.persist(DISK_ONLY)
2127 | - only rddB has been written to disk.
2128 | 6. Resilient Distributed Dataset (RDDs) (Central to Spark)
2129 | - What is it?
2130 | - the way data are stored in Spark during computation, and understanding them is crucial to writing programs in Spark:
2131 | ||What?|
2132 | |---|---|
2133 | |Resilient|data are stored redundantly, hence a failing node would not affect their integrity
2134 | |Distributed|data are split into chunks, and these chunks are sent to different nodes
2135 | |Dataset|a dataset is just a collection of objects, hence very generic
2136 | - Properties of RDDs
2137 | - RDDs are
2138 | |Property||benefit
2139 | |---|---|---|
2140 | |immutable|once defined, they cannot be changed|simplifies parallel computations on them, and
is consistent with the functional programming paradigm
2141 | |transient|they are meant to be used only once, then discarded (but they can be cached, if it improves performance)|
2142 | |lazily-evaluated|the evaluation process happens only
- when data cannot be kept in an RDD, as when the number of objects in an RDD has to be computed,
- or an RDD has to be written to a file (these are called actions), but
- not when an RDD are transformed into another RDD (these are called transformations)|optimizing the process
2143 | - The transformations in the program use lazy evaluation, hence Spark has the possibility of optimizing the process
2144 | - How to Build an RDD?
2145 | - created out of data stored elsewhere (HDFS, a local text file, a DBMS)
2146 | - created out of collections too, using the parallelize function
2147 | - RDD variable
2148 | - are just placeholders until the action is encountered. Remember that the Spark application is not just the driver program, but all the RDD processing that takes place on the cluster
2149 | - RDD Transformations
2150 | - rdd.filter(lambda) selects elements from an RDD
2151 | - rdd.distinct() returns an RDD without duplicated elements
2152 | - rdd.union(otherRdd) merges two RDDs
2153 | - rdd.intersection(otherRdd) returns elements common to both
2154 | - rdd.subtract(otherRdd) removes elements of otherRdd
2155 | - rdd.cartesian(otherRdd) returns the Cartesian product of both RDDs
2156 | - RDD Action
2157 | - rdd.collect() returns all elements in an RDD
2158 | - rdd.count() returns the number of elements in an RDD
2159 | - rdd.reduce(lambda) applies the function to all elements repeatedly, resulting in one result (say, the sum of all elements. Not to be confused with the reduceByKey transformation)
2160 | - rdd.foreach(lambda) applies lambda to all elements of an RDD
2161 |
2162 | ### past exam
2163 | - > [2014 Q5] B) Apache Hadoop is a software framework that enables processing of large data sets.
2164 | - > a. Explain the role of Hadoop Distributed File System (HDFS) in supporting the Apache Hadoop framework. [2]
2165 | - HDFS has blocks existing on nodes and there is a name node which contains the meta data about which block is running.
2166 | - HDFS is a fault tolerant file system that has been explicitly designed to span many nodes
2167 | - > b. Describe the process by which Apache Hadoop supports fault tolerant data processing. [2]
2168 | - HDFS has blocks existing on nodes and there is a name node which contains the meta data about which block is running and if one of the nodes fails then the data is still available somewhere else in the system load balanced. And it will try to rebalnce itself.
2169 | - > [sample Q4] D) Describe the three different Apache SPARK runtime modes:
2170 | - > Local [1]
2171 | - The driver program and the executors are all hosted on the same computer (no need for a cluster manager).
2172 | - The Spark appplication is hosted on the same computer.
2173 | - > Cluster [1]
2174 | - The cluster manager, driver program and the executors are all hosted on the cluster.
2175 | - The cluster manager and Spark appplication is hosted on the cluster.
2176 | - > Client [1]
2177 | - The driver program is hosted on the same computer that is not part of the cluster, while the cluster manager and executors are hosted on the cluster.
2178 | - > [2017 Q2] B What is the Apache Hadoop Resilient Distributed Dataset (RDD) operation type that triggers RDD evaluations? Which operation type does not trigger RDD evaluations? [2]
2179 | - Spark's RDDs provide two kinds of operations: transformations and actions, where only actions such as reduce or collect trigger the evaluation. So transformation does not trigger RDD evaluations.
2180 |
2181 | ## Week 10.1 – Security and Clouds
2182 | 1. Why is security so important?
2183 | - If systems (Grids/Clouds/outsourced infrastructure!) are not secure
2184 | - Large communities will not engage
2185 | - medical community, industry, financial community, etc they will only use their own internal resources, e.g.: private clouds!
2186 | - Expensive to repeat some experiments
2187 | - Huge machines running large simulations for several years
2188 | - Legal and ethical issues possible to be violated with all sorts of consequences
2189 | - e.g. data protection act violations and fines incurred
2190 | - Amazon Web Services, Sydney
2191 | - Trust is easily lost and hard to re-establish
2192 | - What do we mean by security anyway?
2193 | - Secure from whom?
2194 | - From sys-admin?
2195 | - From rogue employee?
2196 | - Secure against what?
2197 | - Security is never black and white but is a grey landscape where the context determines the accuracy of how secure a system is
2198 | - e.g. secure as given by a set of security requirements
2199 | - security technology ≠ secure system
2200 | - Ultra secure system using 2048+ bit encryption technology, packet filtering firewalls, …
2201 | - on laptop in unlocked room
2202 | - on PC with password on “post-it” on screen/desk
2203 | - the challenge of peta/exa-scale computers and possibility for brute force cracking
2204 | 2. The Challenge of Security
2205 | - Grids and Clouds (IaaS) allow users to compile codes that do stuff on physical/virtual machines
2206 | - In the Grid world a rich blend of facilities co-existed (were accessible/integrated!) which had "issues" - prevent people do bad staff
2207 | - Highly secure supercomputing facilities compromised by single user PCs/laptops
2208 | - Need security technologies that scales to meet wide variety of applications
2209 | - Using services for processing of patient data through to “needle in haystack” searching of physics experiments
2210 | - Should try to develop generic security solutions
2211 | - Avoid all application areas re-inventing their own (incompatible/inoperable) solutions
2212 | - Clouds allow scenarios that stretch inter-organisational security
2213 | - Policies that restrict access to and usage of resources based on pre-identified users, resources
2214 | - Groups/tenancy…
2215 | - But what if new resources added, new users added, old users go…?
2216 | - Over-subscription issues
2217 | - User management (per user, per team, per organisation, per country…)
2218 | - What if organisations decide to change policies governing access to and usage of resources, or bring their data back inside of their firewall?
2219 | - Really not replicated somewhere else?
2220 | - What if you share a tenancy with a noisy neighbour!
2221 | - I/O demanding applications
2222 | - You hopefully never experienced this, but early NeCTAR RC had performance issues!
2223 | - The multi-faceted challenges of ”life beyond the organisational firewall”?
2224 | 3. Technical Challenges of Security
2225 | - All are important but some applications/domains have more emphasis on concepts than others
2226 | - Key is to make all of this simple/transparent to users!
2227 | -
2228 | - Single sign-on
2229 | - What is it?
2230 | - Login once, but can access many more resources that potentially provided by other providers
2231 | - When you login The university of Melbourne Cloud, you could also access the amazon cloud
2232 | - **The Grid model (and Shib model!) needed**
2233 | - **Currently not solved for Cloud-based IaaS**
2234 | - has to build by deveoplers
2235 | - Onus (责任) is on non-Cloud developers to define/support this
2236 | - Auditing
2237 | - What is it?
2238 | - logging, intrusion detection, auditing of security in external computer facilities. Logging the actions by each user
2239 | - When bad thing happen, we have the record
2240 | - **well established in theory and practice and for local systems**
2241 | - Less mature in Cloud environments (beyond the firewall!)
2242 | - Tools to support generation of diagnostic trails
2243 | - Across federations of Clouds?
2244 | - Log/keep all information?
2245 | - For how long?
2246 | - Problem/challenge
2247 | - The record are distributed most of time
2248 | - Solution
2249 | - Use block-chain ledger to provide confidentiality of the log
2250 | - Deletion (and encryption!!!)
2251 | - **Data deletion with no direct hard disk** (might cost a lot of money to delete it)
2252 | - Many tools and utilities don’t work!
2253 | - Scale of data
2254 | - Securely deleting a few Mb easy enough
2255 | - Liability
2256 | - Using contract to state the risk when put data here
2257 | - Licensing
2258 | - Challenges with the Cloud delivery model (Where can jobs realistically run)
2259 | - Many license models
2260 | - Per user
2261 | - Per server
2262 | - Per organisation
2263 | - Floating licenses
2264 | - Fixed to machines
2265 | - Workflows
2266 | - Many workflow tools for combing SoA services/data flows
2267 | - Taverna, Pegasus, Galaxy, Kepler, Nimrod, OMS, …
2268 | - **Many workflows models**
2269 | - Orchestration (**centralised** definition/enactment),
2270 | - Choreography (**decentralised**)
2271 | - Serious challenges of
2272 | - defining,
2273 | - enforcing,
2274 | - sharing,
2275 | - enacting
2276 | - security-oriented workflows
2277 | - The Ever Changing Technical/Legal Landscape
2278 | - requirements and guarantee on cloud using
2279 |
2280 | - Authentication
2281 | What does it do?
2282 | - prove who you are
2283 | - What is it?
2284 | - Authentication is **the establishment and propagation of a user’s identity in the system**
2285 | - e.g. so site X can check that user Y is attempting to gain access to it’s resources
2286 | - Note **does not check what user is allowed to do, only that we know (and can check!) who they are**
2287 | - **Masquerading always a danger** (and realistic possibility)
2288 | - Security guidance/balances
2289 | - Password selection
2290 | - 16 characters, upper/lower case and must include nonalphanumeric characters and be changed quarterly…!?!?!?!
2291 | - Treatment of certificates
2292 | - challenge:
2293 | - Local username/password?
2294 | - 100,000+ users that come and go (scalability)
2295 | - Centralised vs decentralised systems?
2296 | - More scalable solution needed
2297 | - Decentralised Authentication (Proof of Identity) thru Shibboleth
2298 | -
2299 |
2300 | - Supports Single-Sign On (in case you were unaware)
2301 | - service provider
2302 | - web site/Journal in the picture
2303 | - identity provider
2304 | - the federation, listed in this case
2305 | -
2306 |
2307 | - How does the role of AURIN project go into the unimelb system
2308 | - How does the unimelb system a user is involved in the AURIN project
2309 | - How does the unimelb system know to send the information to the service provider related to which when it is required
2310 | - Only send one attribute related to the project not the whole database
2311 | - How does the site know what to do with these information when gets it
2312 | - certificated have to be trusted
2313 | - If the site is a gateway to remote service, how does these privilages which have been defined in the unimelb to allow me to access the site.
2314 | - How can this be used to unlock the remote services which are outside of unimelb
2315 | - Only several attributes used to federation authentication access control
2316 | - Public Key Infrastructures (PKI) underpins MANY systems
2317 | - What is it?
2318 | - an arrangement that binds public key with respective identities of entities(like people and organization).
2319 | - The binding is established through a process of registration and issurance of certificates at and by a certificate authority
2320 | - The PKI role that assures valid and correct registration is called a registration authority(RA). RA is responsible for accepting requests for digital certificates and authenticating the entity making the reques
2321 | - Based on public key cryptography
2322 | - Public Key Cryptography
2323 | - Also called Asymmetric Cryptography
2324 | - Two distinct keys
2325 | - One that must be kept private
2326 | - Private Key
2327 | - One that can be made public
2328 | - Public Key
2329 | - Two keys complementary, but essential that cannot find out value of private key from public key
2330 | - With private keys can digitally sign messages, documents and validate them with associated public keys
2331 | - Check whether changed, useful for non-repudiation
2332 | - Public Key Cryptography simplifies key management
2333 | - Don’t need to have many keys for long time
2334 | - The longer keys are left in storage, more likelihood of their being compromised
2335 | - Instead use Public Keys for short time and then discard
2336 | - Public Keys can be freely distributed
2337 | - Only Private Key needs to be kept long term and kept securely
2338 | - PKI and Cloud
2339 | - So what has this got to do with Cloud…?
2340 | - IaaS – key pair!
2341 | - Cloud interoperability begins with security!
2342 | - There is no single, ubiquitous CA, there are many
2343 | - There are many ways to prove your identity
2344 | - OpenId, FacebookId, Visa credit card for Amazon, …
2345 | - Degrees of trust
2346 | - But remember need for single sign-on
2347 | - Prove identity once and access distributed, autonomous resources!
2348 | - Public Key Certificates
2349 | - (PKC & PKI) Mechanism connecting public key to user with corresponding private key is Public Key Certificate
2350 | - Public key certificate contains public key and identifies the user with the corresponding private key
2351 | - Distinguished Name (DN): CN=Richard Sinnott; OU=Dept CIS; O=UniMelb; C=AU
2352 | - Not a new idea
2353 | - Business card
2354 | - My name, my association, contact details, …
2355 | - Can be distributed to people I want to exchange info with
2356 | - If include public key on it, then have basic certificate, but
2357 | - has to be delivered in person (or no trust!), who says I work at Melbourne?, could be a forgery, I might be an impostor, what if I move to Monash or my phone number changes, who would have 1024-bit key on business card, …
2358 | - Public Key Certificates & Certification Authority
2359 | - Public Key Certificates issued by trusted "Certification Authority"
2360 | - Certification Authority
2361 | - What it it?
2362 | - **Central component of PKI is Certification Authority (CA)**
2363 | - CA has numerous responsibilities
2364 | - **Policy and procedures**
2365 | - How to’s, do’s and don’ts of using certificates
2366 | - Processes that should be followed by users, organisations, service providers …(and consequence for violating them!)
2367 | - challenge:
2368 | - Issuing certificates
2369 | - Often need to delegate to local Registration Authority
2370 | - Prove who you are, e.g. with passport, student card
2371 | - Revoking certificates
2372 | - Certificate Revocation List (CRL) for expired/compromised certificates
2373 | - Storing, archiving
2374 | - Keeping track of existing certificates, various other information, …
2375 | - models
2376 | -
2377 | - Typical Simple CA
2378 | - Based on statically defined centralised CA with direct single hierarchy to users
2379 | - Typical scenario for getting a certificate
2380 | - steps:
2381 | -
2382 | - Authorisation
2383 | - What is it?
2384 | - Authorisation is concerned with controlling access to resources based on policy
2385 | - Can this user invoke this service, make use of this data?
2386 | - Complementary to authenticationL Know it is this user, now can we restrict/enforce what they can/cannot do
2387 | - Many different approaches for authorisation
2388 | |approach|e.g.|
2389 | |---|---|
2390 | |Group Based Access Control|your project VMs
2391 | |Role Based Access Control|RBAC
2392 | |Identity Based Access Control|IBAC
2393 | |Attribute Based Access Control|ABAC
2394 | - Many Technologies
2395 | - XACML, PERMIS, CAS, VOMS, AKENTI, VOMS, SAML, WS-*
2396 | - typical model: RBAC
2397 | - Basic idea is to define:
2398 | - **roles** applicable to specific collaboration
2399 | - roles often hierarchical
2400 | - Role X ≥ Role Y ≥ Role Z
2401 | - X can do everything and more than Y who can do everything and more than Z
2402 | - **actions** allowed/not allowed for VO members
2403 | - **resources** comprising VO infrastructure (computers, data etc)
2404 | - A policy then consists of sets of these rules
2405 | - { Role x Action x Target }
2406 | - Can user with VO role X invoke service Y on resource Z?
2407 | - Policy itself can be represented in many ways,
2408 | - e.g. XML document, SAML, XACML, …
2409 | - Standards on when/where these used (PEP) and enforced (PDP)
2410 | - Policy engines consume this information to make access decisions
2411 | - Authorisation and Clouds
2412 | - Authorisation typically applies to services/data deployed on Clouds, i.e. when they are running
2413 | - But not only…
2414 | - Who can install this patch, when can they do it, how many VMs will be affected if this happens…?
2415 | - Is this virtual image free of trojans, malware etc?
2416 | - Lots of tools to support this: Pakiti, Cfengine, Puppet, …
2417 | - Real challenge of software dependency management for complex systems
2418 | - Amazingly (?) most users/organisations do not patch!!!
2419 | - Side-effects, complexities, stopping jobs, restarting jobs etc
2420 | - What does it do?
2421 | - Defining what they can do and define and enforce rules
2422 | - Each site will have different rules/regulations
2423 | - How it is achieved?
2424 | - Often realised through Virtual Organisations (VO)
2425 | - Collection of distributed resources shared by collection of users from one or more organizations typically to work on common research goal
2426 | - Provides conceptual framework for rules and regulations for resources to be offered/shared between VO institutions/members
2427 | - Different domains place greater/lesser emphasis on expression and enforcement of rules and regulations (policies)
2428 | - Should all be transparent to end users!
2429 | - Reflect needs and understanding of organisations involved!
2430 | - Identity Provider
2431 | - The place you got authenticated
2432 |
2433 | ### past exam
2434 | - > [2013 Q6] A) Explain what is meant by the following security terms:
2435 | - > single sign-on [1]
2436 | - is where you authenticate once then the identity provider will enable you to access set of multiple different services which can be hosted in different places
2437 | - > public key infrastructures [1]
2438 | - the cloud computing is based on this
2439 | - you have public private key pairs where public key is hold by anyone but only you hold by yourself. And the certificates which issuing the connection between your pulic key and your private key. And the certificate is issued by the certification authority. If you want to get certificates, you have to prove your identity.
2440 | - > certification authority [1]
2441 | - The certification authority is the authority who is responsible for issuing the certificate.
2442 | - > registration authority [1]
2443 | - The physical individual in the organization who is responsible for checking someone's identity
2444 | - > identity provider (IdP) [l]
2445 | - is the authentication system
2446 | - The place you got authenticated to prove your identity
2447 | - e.g.: When you want to login the AURIN, you are redirected to unimelb authentication where you need to provide your identity
2448 | - > [2013 Q6] B) Discuss the challenges in supporting fine-grained security in Cloud environments. You may refer to the importance and/or role of (some of) the terms in part A) of this question. [5]
2449 | - how cloud do authentication
2450 | - e.g.: fine-grained access control which is authorization, auditing. There is still problem which is confidentiality. The fact that you put your data on the given server and you have no idea where the server is
2451 |
2452 | - fine-grained security is not done pretty well in the cloud. We kind of knowing how to do authentication to a certain degree. But building access control system without fine-grain is something that cloud doesn't generally provide for you so that you have to build by yourself.
2453 |
2454 | - authentication
2455 | - authorization
2456 | - accounting/auditing
2457 | - confidentiality
2458 | - trust
2459 | - > [2015 Q5] A) There are many open challenges in delivering secure Clouds. Describe some of the technical and non-technical issues that currently exist for development and delivery of security-oriented Clouds. [4]
2460 | - techincal issues:
2461 | - authorisation
2462 | - trust,
2463 | - trust the cloud provider that data is secured to be stored on that
2464 | - api,
2465 | - single sign-on,
2466 | - Login once, but can access many more resources that potentially provided by other providers
2467 | - **The Grid model (and Shib model!) needed**
2468 | - **Currently not solved for Cloud-based IaaS**
2469 | - Onus (责任) is on non-Cloud developers to define/support this, so cloud developer can't do anything to help with
2470 | - certificate authority
2471 | - challenge:
2472 | - there isn't centralized certificate authority for the cloud
2473 | - Issuing certificates
2474 | - Often need to delegate to local Registration Authority
2475 | - Prove who you are, e.g. with passport, student card
2476 | - Revoking certificates
2477 | - Certificate Revocation List (CRL) for expired/compromised certificates
2478 | - Storing, archiving
2479 | - Keeping track of existing certificates, various other information
2480 | - non-techincal issues:
2481 | - business issue: government won't allow medical data stored on cloud like AWS because it might be backup in somewhere else
2482 | - sensitive issue
2483 | - policy issue
2484 | - Liability
2485 | - Using contract to state the risk when put data here
2486 | - Licensing
2487 | - Challenges with the Cloud delivery model (Where can jobs realistically run)
2488 | - Many license models
2489 | - Per user
2490 | - Per server
2491 | - Per organisation
2492 | - Floating licenses
2493 | - Fixed to machines
2494 | - > [2014 Q2] A) b. Outline some of the practical challenges in supporting Cloud interoperability? [2]
2495 | - Security
2496 | - You don't have single sign-on: login once to access a variety of clouds for various reasons
2497 | - API themselves
2498 | - Cloud providers, especially public ones want to lock you in.
2499 | - They have different business models, different costs
2500 | - > [2014 Q6] A) The Internet2 Shibboleth technology as currently supported by the Australia Access Federation provides _federated authentication_ and _single sign-on_.
2501 | - > a. Explain what is meant by the italicized terms [2].
2502 | - federated authentication
2503 | - is basically where you are trying to access a resource while you are proving your identity somewhere else
2504 | - single sign-on
2505 | - is where you authenticate once then the identity provider will enable you to access set of multiple different services which can be hosted in different places
2506 | - > b. Explain the role of trust and public key infrastructures in supporting the Internet2 Shibboleth model. [2]
2507 | - trust
2508 | - is a key part of any kind of security system in Shibboleth based on trust. So we all trust the organisation to authenticate their uses
2509 | - public key infrastructures
2510 | - all messages about where you are from and do you authenticate here are digitally signed. We don't trust anyone who is not identity proven. It's only those in the federation and they have keys which are used to do the authentication.
2511 | - e.g. I am from unimelb and I am assigning this message with my key which you can then use this key to verify that this is the key you trust effectively.
2512 | - > c. What are the advantages and disadvantages of the Shibboleth approach for security? [4]
2513 | - adv
2514 | - flexible when you doing single sign-on
2515 | - simple to use, access different service just by proving identity once
2516 | - disadv
2517 | - all of the protocols are static
2518 | - this information which is used to setup professor snott has authenticated at unimelb. So there is a collection about attributes e.g.: he is a staff. This information is pre-agreed in advance. If join the new project, this information wouldn't be available with unimelb system. limited because it is static
2519 | - not flexible, not dynamic
2520 | - > d. Why isn’t Shibboleth used to access Cloud-based systems more generally? [2]
2521 | - related to trust. differnt cloud provider requires different facts in the Shibboleth
2522 | - e.g.: Amazon requires you credit card info while unimelb only requires student info.
2523 | - Static federation
2524 | - no single CA
2525 | - > [2015 Q5] B) The Internet2 Shibboleth technology as currently supported by the Australia Access Federation provides federated authentication.
2526 | - > a. Explain what is meant by this italicized term and discuss the advantages and disadvantages of the Shibboleth approach for security. [3]
2527 | - above
2528 | - > b. Why isn’t Shibboleth used to access Cloud-based systems more generally? [3]
2529 | - above
2530 |
--------------------------------------------------------------------------------
/docs/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/1.png
--------------------------------------------------------------------------------
/docs/10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/10.png
--------------------------------------------------------------------------------
/docs/11.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/11.jpg
--------------------------------------------------------------------------------
/docs/12.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/12.jpg
--------------------------------------------------------------------------------
/docs/13.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/13.jpg
--------------------------------------------------------------------------------
/docs/14.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/14.jpg
--------------------------------------------------------------------------------
/docs/15.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/15.png
--------------------------------------------------------------------------------
/docs/16.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/16.jpg
--------------------------------------------------------------------------------
/docs/17.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/17.jpg
--------------------------------------------------------------------------------
/docs/18.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/18.jpg
--------------------------------------------------------------------------------
/docs/19.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/19.jpg
--------------------------------------------------------------------------------
/docs/2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/2.jpg
--------------------------------------------------------------------------------
/docs/20.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/20.jpg
--------------------------------------------------------------------------------
/docs/21.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/21.jpg
--------------------------------------------------------------------------------
/docs/22.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/22.jpg
--------------------------------------------------------------------------------
/docs/23.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/23.png
--------------------------------------------------------------------------------
/docs/24.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/24.jpg
--------------------------------------------------------------------------------
/docs/25.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/25.jpg
--------------------------------------------------------------------------------
/docs/26.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/26.jpg
--------------------------------------------------------------------------------
/docs/27.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/27.jpg
--------------------------------------------------------------------------------
/docs/28.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/28.jpg
--------------------------------------------------------------------------------
/docs/29.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/29.jpg
--------------------------------------------------------------------------------
/docs/3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/3.jpg
--------------------------------------------------------------------------------
/docs/30.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/30.jpg
--------------------------------------------------------------------------------
/docs/31.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/31.png
--------------------------------------------------------------------------------
/docs/32.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/32.png
--------------------------------------------------------------------------------
/docs/33.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/33.jpg
--------------------------------------------------------------------------------
/docs/34.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/34.jpg
--------------------------------------------------------------------------------
/docs/4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/4.jpg
--------------------------------------------------------------------------------
/docs/5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/5.jpg
--------------------------------------------------------------------------------
/docs/6.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/6.jpg
--------------------------------------------------------------------------------
/docs/7.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/7.jpg
--------------------------------------------------------------------------------
/docs/8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/8.jpg
--------------------------------------------------------------------------------
/docs/9.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/siriusctrl/COMP90024-CCC-notes/637e3f46cdf93c1666c231d76a0c41c0357a94fe/docs/9.jpg
--------------------------------------------------------------------------------