In distributed systems, things will be wrong and the systems engineers should be creating mechanisms to avoid that a failure in components affects the whole availability of the system. We should build reliable systems from unreliable components, but there are some limitations on how reliable a system can be.
One failure is due to unreliable networks. Network problems will arise since packets can suffer from being discarded from queues or nodes might drop them because they are overloaded. One way to solve this is by using timeouts, in which you assume an unknown result after a maximum period of time.
Another potential problem is relying on system clocks. There are two types of clocks: monotonic clocks and time-of-day clocks.
Monotonic clocks are basically counters that strictly increase and usually, it is counted from when the CPU was started. Those clocks might suffer from some drift because of imprecision of hardware, but it is impossible for them to have their value decreased. Also, it is important to note that it’s worthless to compare monotonic clocks from different machines or even CPUs because their value does not have a universal meaning.
Time-of-day clocks, on the other hand, have universal meaning across machines and they are synchronized commonly through NTP. The precision for this synchronization is limited to the network latency and distance with the NTP servers. There is specialized hardware equipment that helps to keep the time synchronized. Also, there are some services that can return intervals of time, so it helps to determine causality and order of data.
Imagine a resource that should be accessed exclusively, you can create a lock service and only clients with the lock might be able to access the resource. The problem with using this approach with nothing else is that the token might expire and the client might not even realize it, thus causing inconsistencies.
One way of solving this problem is by using fencing tokens that are basically tokens with a monotonically increasing number. Whenever the resource is accessed with a past token, the operation is rejected.