CS计算机代考程序代写 cache algorithm distributed system chain data structure scheme GOSSIPING
GOSSIPING
Distributed Systems (Hans‐Arno Jacobsen) 1
Pixabay.com
Gossiping in Distributed Systems
• Endless process of randomly choosing two nodes and have them exchange information
Seminal paper form 1987
• I.e., repeated probabilistic exchange of information between two nodes
• Information spreads within group of nodes
• A.k.a. epidemic algorithms where a disease spreads or infects nodes – let’s stick to gossip in this lecture
Distributed Systems (Hans‐Arno Jacobsen) 2
Node
Node
Node
Node
Node
Node
Clearly, there is redundancy!
Node
Distributed Systems (Hans‐Arno Jacobsen)
3
Node
Gossip in Distributed Systems
• Today, distributed systems grow to unprecedented scales
• Nodefailureisthenorm,nottheexception
• Near‐continuouschangeinnodesand
communication quality among nodes in system
• Gossipingisefficient,i.e.,spreadsquicklyand persistently (difficult to eradicate)
• Thus,especially,usefulinsituationswithlarge number of nodes that exhibit (high) churn
Distributed Systems (Hans‐Arno Jacobsen) 4
P
information
communication
• Number of hops each
• Nodes it knows
• Other
selected for
Generic Gossip Protocol
• “As with so many concepts in computer science, providing a precise characterization of gossiping solutions is difficult.” Kermarrec & Steen
• Model consists of a dynamically changing set of nodes, each of which regularly exchanges information with other nodes
Partial view of system
node
Cache
Q
node node
Main characteristics:
• Cache size
• Number of nodes
Send buffer
message travels
Distributed Systems (Hans‐Arno Jacobsen)
5
node
Generic Gossip Protocol
• “As with so many concepts in computer science, providing a precise characterization of gossiping solutions is difficult.” Kermarrec & Steen
• Model is a dynamically changing set of nodes, each of which regularly
exchanges information with other nodes • Realize via two communicating threads
• Active thread communicating exchange requests
• Passive thread accepting incoming exchange requests
Active thread (node P):
Passive thread (node Q):
(1) selectNode(&Q); (1)
(2) selectToSend(&bufs); (2)
(3) sendTo(Q, bufs); (4)
(5) receiveFrom(Q, &bufr);
(6) selectToKeep(cache, bufr);
(7) processData(cache);
(3) receiveFromAny(&P, &bufr); (4) selectToSend(&bufs);
(5) sendTo(P, bufs);
(6) selectToKeep(cache, bufr); (7) processData(cache)
Distributed Systems (Hans‐Arno Jacobsen)
6
• Node selection • Data exchange • Data processing
Gossip Framework
Distributed Systems (Hans‐Arno Jacobsen)
7
Pixabay.com
Node Selection
• A given node selects another node (nodes)
• Assume selection is done uniformly among set
of available nodes
• Not necessarily realistic assumption (approximate in practice)
• Performed by a node sampling service (a building block – a.k.a. peer sampling)
Distributed Systems (Hans‐Arno Jacobsen) 8
Data Exchange
• Nodes decide what data to exchange with each other
• Highly application‐dependent
• E.g., for node sampling, nodes exchange
references to known (recently active) nodes
• Network topology changes at each gossip exchange
Distributed Systems (Hans‐Arno Jacobsen) 9
Data Processing
• Specifies how nodes deal with received information
• Also, highly application‐dependent
• E.g., application discovers nearby nodes – Builds distributed data structures
– Clusters nodes according to mutual interests
Distributed Systems (Hans‐Arno Jacobsen) 10
Speed of Propagation
• Data is spread exponentially fast through system
• It take O(log N) rounds to reach all nodes
• N is number of nodes in system
• A round completes when every node has initiated a gossip exactly once
Distributed Systems (Hans‐Arno Jacobsen) 11
Information Dissemination Pattern
• Node selection: Each node P periodically chooses f ≥ 1 nodes Q1,…, Q4 uniformly at random from entire set of active nodes
• Data exchange: A message is selected from the local cache and copied from one node to another.
– Push model: P forwards a message to each Qi
– Pull model: Each Qi requests a message from P
– Hybrids, e.g., push a notification, pull the message
• Data processing: Store received message for a next iteration; pass it to a higher layer (application)
• Key parameters: Nodes store up to c messages, message forwarded up to t times, node selects f other nodes
Distributed Systems (Hans‐Arno Jacobsen) 12
propagation
Main Properties
• Simple–simpleexchangeofinformation
• Scalable–numberoftimesanodeneedsto gossip or number of targets it needs to gossip to is logarithmic in system size
• Reliable–duetoprobabilisticnature
– Leads to redundancy in target selection and
– Resilient to large number of failures
– Copes well with high churn in system (node and link failures)
• Redundancyimposesoverhead
Distributed Systems (Hans‐Arno Jacobsen) 13
Node Sampling Pattern (a.k.a. Peer Sampling)
• Node selection: Each node P periodically chooses a gossip target Q from its current set of neighbors (i.e., nodes it knows)
• Data exchange: Lists of node references • Data processing:
– Receiving node merges list of nodes received with its own list to compose a new list of neighbors
– Some nodes may need to be dropped from the new list due to cache size limitations
Distributed Systems (Hans‐Arno Jacobsen) 14
Topology Construction
• Node’s cache contains references to other nodes – partial view of system
• References thought to represent direct links in an induced overlay network among nodes
• Links could be directional if information can only flow in one direction (firewalls)
• Apply ranking function to references to select for caching
• E.g., based on utility of node (Spotify, BitTorrent),
availability of node etc.
• Form structured overlays based on applying a geometry‐ based proximity metric on the node identifier space
• Requires access to peer sampling service to guarantee whole system is eventually explored
Distributed Systems (Hans‐Arno Jacobsen) 15
Topology Construction Pattern
• Nodeselection:Setofnodesisrankedaccording to a given ranking function and gossip target is chosen at random among the nodes from the local cache
• Dataexchang:Listsofnodereferences
• Dataprocessing:
– Receiving node merges received list with its own list – Ranks elements according to given ranking function – Keeps highest ranked elements (up to size required)
Distributed Systems (Hans‐Arno Jacobsen) 16
Resource Management Pattern
• Node selection: Each node P periodically chooses a gossip target Q from its current set of neighbors
• Data exchange: Status information on other nodes (e.g., last reported alive message)
• Data processing: Receiving node merges the received information with its own status information on nodes, updating its view of other nodes
Distributed Systems (Hans‐Arno Jacobsen) 17
Resource Management Applications
• Build system monitoring services and failure detectors
• Essentially, monitor any system aspect
• Explicitvs.implicitfailuredetection
– Explicitly – nodes heartbeat each other to detect failures
– Implicitly – non‐responding nodes are dropped form cache and eventually “disappear” from view
• Aggregateresourcestatusinformationovertime
• Specialcaseofinformationdissemination
Distributed Systems (Hans‐Arno Jacobsen) 18
Computation Pattern
Information Aggregation in Large‐scale Distributed Systems
• Node selection: Each node periodically chooses one other node uniformly at random from the entire set of active nodes
• Data exchange: Application‐specific data elements are copied from one node to another
• Data processing: New data values are computed from exchanged information; are used in next gossip exchange
Distributed Systems (Hans‐Arno Jacobsen) 19
Observations
• Usecases:Internet,sensornetworks
• Compute:Sum,averages,min,max,etc.
• Typicalcharacteristics
– Centralized aggregation not possible
– Nodes change constantly
– Communication topology not globally known – Computing power may be limited
• Aggregateoverpartofthesystem
• Here,dataprocessingiskeyversusotherpatterns
Distributed Systems (Hans‐Arno Jacobsen) 20
Gossip in Practice
• Build and maintain distributed systems
• Used in Cassandra to disseminate meta‐data and failure
information
• Used in blockchains to disseminate transactions (Bitcoin and Ethereum et al.)
• Spotify (formerly used), BitTorrent, et al.
Pixabay.com
Distributed Systems (Hans‐Arno Jacobsen) 21
topology construction?
Self‐study Questions I
• Illustrate the exponential spread of information by analysing an example for increasing N (# of nodes)
• Write pseudo code to form a list, a ring, a tree, a graph topology of nodes via gossip
• Explore other topology construction based on geometry‐based proximity metrics of node identifiers
• Design a replication scheme based on gossip
• What other ranking functions could be used to select neighbours in
• How would the resulting topology look like?
• Along the lines of resource management, design resource allocation schemes where a set of nodes (possibly with specific properties) are allocated to a given application via gossiping
• How would you compute an average value across nodes via gossip?
• Why is the requirements of uniformly randomly selecting peers important?
Distributed Systems (Hans‐Arno Jacobsen) 22
Self‐study Questions II
• Draw out any topology of nodes and inject a message at a randomly chosen node, compare a broadcast (send to all neighbours versus a gossip (send to some neighbours):
– How many messages are required?
– How long does it take for all nodes to be up‐to‐date?
• Have each node in your topology maintain a data structure (e.g., counter, list, array, set, etc.), inject data structure updates at random nodes and propagate these updates via gossip:
– What is the net result?
– Does your replicated system converge at each replica to one and
the same state (data structure)?
Distributed Systems (Hans‐Arno Jacobsen) 23
Distributed Systems (Hans‐Arno Jacobsen) 24