* Give some example about end-to-end argument in RPC paper? - has its own transfer protocol, because it need to deal with reliability by the way, and want to improve performance - duplicate message suppression, cause by client retries. These retries message may look differently from the view of the data transmission system. **END-TO-END ARGUMENTS in SYSTEM DESIGN** ========================================= # 0. Take away Question: where to place functions among module of distributed system? Possible places: - communication subsystem - client End-to-end Argument: - "The function in question can *completely* and *correctly* be implemented only with the knowledge and help of the application standing at the *endpoints* of the communication system" - sometimes, *incomplete version* of function provided by the communication system may be useful as a *performance enhancement* # 1. An example: Careful file transfer - Transfer file from host A to host B - Threats: + disk error + transient memory error + message lost + message corruption + crashes + ... - Question: how to make it reliable? + Alternative 1: enforce checking and retry in communication lower levels > duplicate copies > retry > error detection > crash recovery Goal: reduce the probability of each above threats But: > not solve the problem completely > if all threats have low probability, may be uneconomical + Alternative 2: end-to-end check and retry > file is stored with checksum > when B receive file, it calculate the checksum, resend it to A > at A, check the checksum, if match, finish, otherwise retry ==> if failures are frequent, a lot of retries from endpoints - Take: + Alternative 1 does reduce the frequency of retries (hence may improve performance), but has no effect on the *inevitability* or *correctness* of the outcome + Alternative 2 is a must in order to achieve careful file transfer (no matter what alternative 1 is applied or not) # 2. Performance Aspect - effort at lower levels + may have significant effect on application performance + need not to provide "perfect" reliability + not a requirement for correctness, but an engineering trade-off > if communication system is too unreliable, app performance hurts (because of too many retries) > if chance of threats is low, then add extra stuff at lower levels may incur overhead (space, redundancy, extra checking) - Then, how, when to place function at lower level? + need care + may be efficient, or may be not? Why? > common to many apps, those apps do not not need the function will pay > may not have as much info as higher level, so can not do efficiently # 3. Identifying the ends - depends on *application requirement* + real-time conversation vs. speech message system > the former: may drop some packet rather re-transmission (because it is real time) > the latter: may care about packet ordering and duplicate suppression # 4. More examples: - VM ESX: semantic gaps - Disco - RPC: response is *implicit* ack for a call - Scheduler Activation