[CST-2] Distributed systems

Martin Harper mcnh2@cam.ac.uk
Sat, 02 Jun 2001 18:12:45 +0100


Andrei Legostaev wrote:

> 
> There seems to be an obvious problem with all distributed updates:
> 
> How do we tell whether a failure is at the host or at the link?  If we can't
> tell then, for example, a host whose ethernet connector fell out with think
> that "Everybody's Dead, Dave" and proceed to make updates to itself.

Can't you just form a statistical model of the network which makes it 
clear that it's much more likely for your internet cable to be broken 
than for the whole of the world to have been blown up? Or just hack 
special cases to choose the more likely case, if you want to be lazy.

Depending on how helpful the routers between you and the target are 
feeling, if the UK/US link(s) go down then you should get "Destination 
Unreachable" IMCP messages which might be useful to help disambiguate. 
If it's just the process that has crashed, rather than the entire host, 
then ping/heartbeat will let you know this. Of course, that's all 
internet based - on other networks there'll be different amounts of info 
(and hopefully you can find an Application layer platform which provides 
a common base across them all).