« Experience with Grapevine: The growth of a distributed system | Main | Web caching with consistent hashing »

Locality-Aware Request Distribution

Locality-Aware Request Distribution, Vivek Pai, Guarav Banga, ASPLOS-VIII

Reviews due Tuesday, 9/9.

Comments

* main contributions (as they claimed):
- an efficient locality-aware request distribution
- transparent TCP handout protocol (for connecting back-ends to clients)
- evaluating the system using a simulation scheme and a prototype.


* summary:
- they have two implementation:
- simple implementation: each target has one server assigned to it. if the load of server is higher than a threshold (T_high) and there are other idle servers (with load lower than a threshold (T_low)) they would assign the new server to the given target. also if load of the server assigned to a target is higher than a threshold (2*T_high) they would assign a new server to that target (even if the load of the new server is not lower than T_low)
they also have a maximum load threshold, and if they receive new request they will queue it in front end.

- LARD with replication: in this scenario they assign a set of servers to a target (instead of just one server). if the load of those servers is higher than a threshold they add a new server to the list. if the list wasn't changed in the last K seconds they would remove the busiest server from the list.

- they performed a simulation using real log file of web servers (in ibm and rice university). they looked at the effect of the number of clusters, cpu speed, and number of available disks on the performance of LARD (and LARD/R)

- they compared the performance of their model to two state of the art models (locality based, and weighted round-robin)

- they showed that in many scenarios they outperform the state of the art models.

- they showed that increasing the number of nodes in cluster improves their performance, also adding more cpu power improves their performance. but adding more disks doesn't improve it (because of the caching they don't have an I/O bottle neck and increasing the disks doesn't help them)
in the other hand, weighted round-robin wouldn't benefit from number of nodes and cpu power as much. but they would benefit from number of disks.

- they also tested a scenario with large number of requests for a few targets. this decreased their performance but they were still as good as WRR.


* other points:
- they assumed that all the back-end servers have the same configuration and used the same thresholds for all of them. this can be easily resolved by using server specific thresholds.
- the paper was published in 1998. given the increase in cpu power and faster disks and cheaper memory, I don't know if the concerns about having the files in memory is still a real concern or not.


p.s. I saw a couple of people mentioning the dynamic contents. I don't know how much it was in use back in 1998. also, it is probably possible to use the same strategy for dynamic contents. for example if we consider example.com/?req1 and example.com/?req2 as two separate targets (that need loading two separate files into memory), their strategy can still works.

In this paper, the authors propose a content-based request distribution system, and evaluate the benefits of such a design to throughput, load balancing and scalability. The seminal contribution here is the idea of cache aggregation: locality-aware request distribution causes content to be cached in main memory only at one or a few nodes, allowing the working set size of the workload to scale to the total main memory cache size across all nodes in the back-end. The paper also discusses a TCP handoff protocol that allows seamless handoff of the connection from front-end to back-end, transparently to the client.

The main contributions of this work are:
* The notion of content-aware and locality-aware request distribution as a simple way to achieve cache aggregation, avoiding the complexity of modeling and maintaining back-end cache state information at the front-end.
* Experimental and simulation results that show that cache aggregation improves performance (throughput, as well as delay) of the system, without sacrificing load-balancing and scalability.
* The idea of using the number of active connections as a rough proxy for load at the back-end nodes, rather than explicit load information like CPU utilization, which is harder to gather and update frequently.
* The need for front-end scheduling strategies to have both locality (cache hit ratio) as well as load-balancing as dual objectives to maximize throughput, as opposed to just the latter.
* Recognition of the fact that back-end nodes should ideally be CPU-bound rather than disk-bound, since CPU speeds improve faster than disk speeds, leading to improved scalability, both in terms of elasticity (adding more nodes) and scaling up in the long run (through better processors and/or disks).
* Efficient TCP handoff protocol to transparently (to the client) transfer a connection from the front-end to the back-end, using the front-end only as a forwarding point for light-weight incoming messages (mostly ACKs), while allowing the bulky response messages to bypass the front-end.

The major shortcoming of this design is that it makes the front-end both a single point of failure as well as a potential bottleneck. The front-end maintains a lot of per-target (content object), per-server (back-end node) and per-connection state, which grows as the system scales. Further, it is difficult to make the front-end scale out by adding more nodes, because this state needs to be shared between all the front-end nodes (for instance, all messages for a particular client connection must go to the same front-end node). While this may not have been an issue in 1998 (the authors do show scalability to 10 back-end nodes from a single front-end node), aspects of this design would certainly not be useful today.

However, it is clear that cache aggregation and locality-aware request distribution confers a lot of benefits towards scalability, resource utilization and performance, and is a highly relevant design concept even today (in systems like HDFS, for instance).

SUMMARY: The paper proposes a new strategy call LARD, for routing incoming requests to a pool of back-end servers. It does this by inspecting the content request and using that information combined with the size of the current work queue for each back-end node to route the request, which has major advantages for performance by attempting to keep as much of the content cached in RAM on each back-end node.

PROBLEM: A single server can handle only so many requests, so a common approach to scaling beyond that is to have a pool of servers (back-ends) which can all handle requests. Requests are then sent to individual servers, typically by weighted round robin (which can be done without looking at the content of the request itself). However, in the WRR approach, each back-end server must handle all content, and as such is unable to cache efficiently, leading to disk speed being the limiting factor in the ability to serve requests. Their approach attempts to break this barrier by increasing locality, keeping content cached in memory on each back-end server. The challenge is that the front-end needs to route requests in such a way that preserves this locality while still doing a good job of load balancing.

CONTRIBUTIONS: One contribution is LARD/R, a specific policy for routing requests to back-end nodes that allows for sets of nodes to dynamically grow and shrink, where each set is responsible for serving one particular piece of content ("a target"). The policy takes two thresholds and attempts to keep the load of any given server between the two thresholds while simultaneously satisfying the competing goal of keeping content local to nodes in a given set. The other contribution is a TCP Handoff protocol, which is necessary when doing inspection of the incoming request and transparently handing it off to a server without the client having to be aware, and allowing the server to respond directly to the client. However, client to server communications still needs to be forwarded manually, which they discuss how to do quickly. They implement the TCP Handoff protocol, along with their dispatch policy (LARD/R), in the kernel itself. In their design, the actual dispatch policy is independent of the TCP Handoff protocol, which means that the Handoff protocol could be used for any dispatch policy that future developers create, making the TCP Handoff protocol an important and versatile contribution.

DISCUSSION: This was likely a great step in scaling web services. No extra communication is necessary between front-end and back-end and the client and server do not need any special API other than the standard TCP stack, and the front-end doesn't need to know anything about the specs of machines on the back-end. As such, adding nodes to the back-end to ease load is trivial, and subtracting nodes (due to failure) is as well since the target will just get remapped to a different node.

Since locality and load balance are competing goals, a decision must be made for when to reassign a target to a different back-end. Their solution is simple but does not take into account anything other than the number of outstanding connections. In more modern systems, the decision to reassign a target to a different back-end could take into account an estimate of various "cost metrics" e.g. the size of the target (since cache hits are desirable and large files displace more cache space)

Also, they don't define what their target scale is. Scaling the front-end is suggested by getting a faster CPU or SMP. This will only scale so far. To go further, one could use DNS round robin to a pool of front-ends, with each front-end controlling a separate set of back-end nodes. This will tend to evenly distribute load across the entire back-end without any communication between front-end nodes.

Summary:
-The paper by Pai et al. aims to present a decentralized load balancing algorithm.
-The paper starts with background information about how the front-end node distributes request to back-end nodes.
-The weighted round robin (WRR) algorithm distributes tasks to the least loaded node, which causes a lot of cache misses. The WWR algorithm has the worst performance, and only performs okay when there are a large number of small requests to a small set of targets.
-The locality based (LB) algorithm decreases caches misses because similar requests will go to same back-end node. The LB algorithm causes a higher % of nodes underutilized because the front-end nodes are limited to local back-end nodes. The LB algorithm has intermediate performance.
-The locality-awareness request distribution (LARD) algorithm handles clusters of requests better (than the locality-based algorithm) because the number of active connection helps evenly distribute the load.
-The LARD with replication (LARD/R) algorithm assigns several back-end nodes to requests for another degree of load balancing. LARD/R has the best performance with the most throughput, low % requests missed, and low % time node underutilized.
-The paper explores other properties of the workloads. Adding more processors and another disk at each back-end node will increase performance.
-The paper then explains the handshakes requires in TCP connections, the experimental setup, and concludes with dynamic content as the future direction of research.
Positive Comments:
-Instead of a centralized front-end node of a round robin algorithm, the paper introduces a decentralized LB algorithm. LB is more efficient than (WRR) due to increase cache hit rate. Furthermore, scaling distributed systems in a decentralized network is a lot more efficient because nodes will only have to work/compute within their locality/connections.
-The paper handles the drawbacks of locality based algorithm (when many requests go to a small set of targets) by introducing LARD and using the local connectivity. Using the active connections and moving requests increases is another layer of load balancing, which increases throughput by using idle resources. This contribution farther helps the scaling and load distribution of distributed systems.
Negative Comments:
-Although the paper had 3 sets of simulation results, it is not enough to cover most server configurations.
-The LB algorithm did not provide much of technical solution, because using active connections (LARD) is trivial after introducing a LB algorithm. Furthermore, the paper did not discuss the results LB, as it is overshadowed by the LARD algorithm.
-The paper only tests sensitivity to CPU and disk speed for the LARD algorithm. Even though more than 2 disks will not help the LARD algorithm, it may help other algorithms, and possibly outperform the LARD algorithm.

Summary
The author of the paper propose a new content based request distribution strategy to build a cluster which provides
query locality
high node utilization
In addition to that authors propose a TCP hand off protocol which enables the back end node to provide response directly to the client. The a uthors prove that their strategy is better than "then" widely used request distribution scheme WRR in terms of throughput and more or less provide similar node utilization.

Problem
Clusters are distributed servers where request passed to a front end node is distributed to back end nodes for processing. Distributing the requests without the knowledge of the contents is not efficient as it does not take advantage of caching content in the servers. Considering the difference in speed of the CPU and disk this would mean a great improvement in performance. In spite of this alluring prospect many algorithms focus on balancing loads but not query locality. The authors of the paper henceforth provide a simple strategy for request distribution which provides high node utilization and query locality.

Main contributions
They propose LARD a content based request distribution scheme. LARD is an early attempt for content based routing in web server clusters. It tries to provide query locality by dynamically subdividing the servers working set. LARD assigns a dynamic set of servers to each file.When a request is received, the front end checks to see if any back end server have already been designated for this file. If not the least loaded server in the cluster is assigned to service requests for this file. After that subsequent requests are directed to a targets assigned server, if that node is not overloaded. LARD reassigns targets only when there is a significant imbalance. LARD assumes that a single target does not provide imbalance in itself. In other words it assumes that all the targets have more or less same request rate.

LARD-R is a more resilient version which removes the above assumptions. It effectively inflates and deflates the server set for a file by checking its requests rate.

The authors also provide a hand off protocol for TCP connections. Instead of having all the packets go through front end node, the protocol requires only ACK packets to flow through it.

Failures
1)LARD assumes that all servers are of similar capacity.
2)LARD assumes that all content has same resource consumption impact on the servers. They use only number of connections as a criteria for the node imbalance. This would provide complication with respect to load sharing.
3)LARD works only for static content. The strategy is desinged with the assumptions that targets do not change. Dynamic web content is not taken into consideration. This would provide complications in content request distribution scheme with regards to cache synchronization.
4)LARD assumes the number of targets in question does not change. There is no provision for dynamically adding or removing files. Even if we consider a website built only on static web pages, it cannot be assumed that web page content and numbers will not changed over the course of operation.
5)There is only one front end node in LARD. Considering the fact that front end plays an active role in TCP hand off protocol. This might be a bottle neck. In addition to that the case where front end is not in operational state is not considered.

Relevance
Although the LARD may not be effective in handling the dynamic web content (which compromises most of the web pages today) it is one of earlier strategy which showed how important and effective query locality is and paved for more sophisticated load balancing algorithms used today.

Summary:
The paper proposes the Locality-Aware Request Distribution (LARD) as an efficient implementation to handle and distribute the requests in a cluster-based network. It specifically focusses on how the content-based request distribution can merit the performance by improving the locality in the main memory caches of the back-end nodes. The authors back their proposition with a set of simulations that show that their implementations (LARD and LARD/R) are perform much better than the then state-of-the–art techniques such as the weighted round robin.
Contributions:
• The whole idea of distributing the requests based on the content locality seems interesting and intuitively appeals to improve the performance by benefiting from the low-latency due to locality.
• The LARD not only provides high cache hit rates but also achieves load balancing simultaneously. This is a very good feature since these two are orthogonal in nature and would have required explicit efforts.
• A unique point involved in this content-based request distribution model is that certain requests can be directed to nodes that are specifically designed to service such tasks efficiently. For example if a graphic intensive request is received, then it would make a lot of sense to assign it preferentially to a node with GPU support.
• The authors come up with a protocol that allows hand off of an established connection from the front-end to back-end node while making sure that this is transparent to the client and also that it is fast. This is especially important considering the huge number of requests flowing in to the front-end node.
• The LARD with replication handles the case where the loads are larger than what can be handled by a single node and this doesn’t require extra communication between front-end and back-end nodes and is also easy to recover from a failure.
• It is interesting to note that the authors suggests that the performance improvements is not just with the static content but also with dynamically generated content because load balancing is generally much more difficult since the load is not deterministic.
Applicability:
With increasing scale of the current networks, techniques such as LARD can significantly improve the performance by being aware of the content accessed. Although it seems very challenging to use the LARD in its vanilla flavor in today’s networks due to their extraordinary scale, there is definitely a lot to take away from this and it can prove as an add-on optimization to other implementations. At the time of the paper, this is undoubtedly a remarkable feature and I believe a lot of further implementations have been motivated from this.

Summary

The authors presented a technique of request distribution by routing similar request to same server. For very skewed request distribution where one server is always overloaded, they have proposed another approach called LARD with replication (LARD/R).

Problem

The problem that they were trying to solve is in distributed systems how to
route the request amongst the servers to get good main memory and cache utilization as well as have good load balancing. The envisioned that with increasing difference between disk reading speed and
CPU speed, always CPU bound processes are desirable.

Contribution
* Serving time can be used as a metric for load. This is simple and can be dynamically calculated. And their algorithm (esp. LARD/R) use this very neatly for load balancing.
* LARD, using only general servers (without having dedicated servers) they
achieved very high throughput and utilization. Also this makes the backend servers fail-safe.
* LARD/R can easily adapt to most load patterns in near optimal way.

Discussion

  • Scheduling loads amongst servers is still a valuable problem in any
    distributed system. Google uses multiple servers over locations and DNS hack to distribute the
    traffic.
  • This paper proposed a very novel idea which was better in resource utilization
    than Weighted Round Robin which was state of the art technique of that time. Their solution is good for small enterprise (10 computers) but scalability to even 100 backend server is an issue in that design (the front end server will start to be a bottleneck).
  • TCP-handoff is a really smart adaptation of TCP for their design of scheduling -- it is fast and transparently transfer the connection to the backend. Though what will happen to the serving request if the server breaks down in the middle of service is not clear from the paper. Also I am not aware of any place where it is being used now.
  • Summary
    For cluster based network servers, this work proposes a request distribution system that takes the content of the request into account in order to achieve high cache locality and efficient load balancing. The system modelled in this work consist of clients that send document requests to a front-end node in the server cluster which in turn forwards the request to one of the back-end nodes that actually serves the document. The front-end node examines the content of the request and forwards it to the back-end node that has the highest likelihood of finding the content in its cache. Further, the front-end node ensures that the workload is evenly distributed among the back-end nodes so that the workload of those nodes does not fall above a higher or below a lower threshold. Thus, unlike other request distribution mechanisms like the Weighted Round Robin(WRR) that achieves only good load balancing or the Locality Based(LB) mechanism that achieves only good cache locality, the LARD system achieves both, without sacrificing one for the other. Both the simulator modelling the server cluster and the prototype implementation of the system prove the benefits offered by LARD when compared to the other state of the art request distribution mechanism that existed at the time of this publication.

    Contributions
    The authors model the cluster through a detailed simulation, that considers different workloads (the Rice University server traces and the IBM web server traces) as well as variations in CPU and Disk Speeds along with variations in request delays. This simulator can be used as a benchmark for request distribution mechanisms presented in future.

    Since the front-end needs to examine the content of the request, it has to create a connection with the client. But this presents a problem when the document has to served directly from the back-end bypassing the front-end for efficiency. The authors solve this problem by devising a TCP-hand off mechanism, which the front-end uses to transparently hand off the client connection to the back-end nodes.

    Problems
    The authors consider only a single front-end node in the cluster. They do not consider the scalability issues that might arise with the use of multiple front-end nodes to manage a very large cluster of back-end servers.

    The authors model only static documents and merely suggest that dynamic content will also benefit from this mechanism without offering any proofs.

    Relevance
    The speed of CPUs increase at a rate much faster than disk speeds and hence the importance of caching and locality increases. LARD is optimized to make use of cache locality in clusters and hence the results and mechanisms remain relevant even in current generation clusters. Further adding additional nodes increases the performance of LARD which is an excellent scalability characteristic and the aim of many current state-of-the-art clusters.

    Summary
    - The paper advocates the need for a content based request distribution strategy namely locality-aware request distribution (LARD) to improve load balancing and improve cache hit rates at back end nodes. A detailed analysis of the simulation was provided to highlight the advantages of the employed strategy over state of the art strategies of the era.

    Problems to solve
    - One of the goals was achieving good load balancing as the cluster servers provided efficient load balancing but since each back end node could accept any request frequent caches misses occurred when the working set increased in size.
    - Another aspect aimed at was improving locality. The need for a good hashing function with the ability to partition the namespace and the working set of request near evenly among back end nodes seen.
    - The initial LARD implantation brought forth a problem that a single target could cause a back end node to overload motivating the need for a replicated implementation.
    - In the content based request distribution strategy the front end could render as a bottleneck since it now has to examine the requests’ content before handing it off to the back end. This has to be done in a client transparent manner.

    Contributions of the paper
    - The paper presents a practical and efficient LARD strategy that achieves high cache hit rates and good load balancing. This is achieved as the request is mapped to back end nodes that process it and is then handed off by choosing a lightly loaded node. The load of a back end node being defined as the number of active connections being processed by it.
    - An assumption in the LARD strategy above was that a single request would not overload a back end node. So a replicated version was created where several nodes would be assigned to do the job. This was achieved with an optimum degree of replication for a target.
    - A trace driven simulation that demonstrates the performance potential of locality aware request distribution was provided. Simulation results were analyzed logs from web servers at Rice University and IBM. Analysis showed that LARD with replication was an improvement on weighted round robin (WRR) request distribution strategy in terms of throughput and cache hit rate. Even in situations which favor WRR LARD showed equivalent performance.
    - An efficient TCP handoff protocol was implemented. This protocol provided content based request distribution while being client transparent.
    - A performance evaluation of a prototype LARD server cluster incorporating the TCP handoff protocol and the LARD strategy was performed. Which proved that LARD is better than WRR. Also the results showed that the front end can support up to 10 back end nodes of similar cpu speeds.

    Conclusion
    - Overall the paper provides with enough data to show that LARD is better or shows equivalent performance than the state of the art WWR strategy. Also the paper dealt with serving static content and argues that dynamically generated content can also benefit from the proposed strategy.

    Summary
    This paper aims to achieve the load balancing and high locality simultaneously in cluster-based network servers, by inspecting the request before assigning it to the server. They propose LARD and LARD/R algorithm to achieve about goal and maintain some stability. Also, a TCP handoff protocol is presented to handoff established TCP connections between front-end and back-end servers.

    Problem Description
    Preview works just consider the load of back-end servers regardless the content requested. However, the author noticed the cache hit will increase the system performance dramatically. Therefore, they try to balance two aspects: load balancing and high cache hit, which they are naturally conflicting with each other. Another problem is how to hand off established connections from front-end to back-end, so that the front-end will not involve in handling outgoing data.

    Contribution
    1. The biggest contribution comes from the author noticing that increasing the server’s cache effectiveness is nontrivial for cluster-based network servers. They design an algorithm called LARD to achieve high cache hit rate and load balance. In fact, this design also achieve some stability by preventing unnecessary target reassignment.
    2. The second main contribution is they introduce TCP connection handoff mechanism to deal with established connection handoff problem. This protocol design is transparent to clients and server application run on the back-end nodes.
    3. Other small design tricks: For example, front-end node limits the number of outstanding requests at back-ends to reduce the delay.

    Drawbacks
    1. This problem is very similar traditional network utility problem (Usually to balance network utility and fairness or more aspects). Probably, the author could formulate the problem in the similar way, and find a parameter that could tune the system to achieve high load balancing or high cache hit rate.
    2. At that time, probably requested file is small enough, which can stored a single back-end severs. However, for today’s network, some files might need to be separated into small chunks, and cached in different back-ends. How to coordinate these requests is another problem in big data era.
    3. If the amount of incoming requests quite small, then probably, we may not need to achieve high load balancing. We can put several into sleep mode to save energy. Because, when all servers handle the requests, it may have more energy consumption.

    Applicability
    Content-awareness is a very important aspect of modern Internet. In fact, the emergence of Content Delivery Network is one of such important applications in the real world. Some service providers like Akamai, Amazon, AT&T and so on, improve their service provided to the user by using this kind of technique.

    Summary: In this paper the authors discuss a new form of request distribution for cluster-based network servers. Specifically the authors introduce a policy called locality-aware request distribution (LARD). LARD uses a front-end server to distribute web requests to back-end servers in a manner that exhibits high locality and good load balancing characteristics. In principle these characteristics allow the effective size of main memory (cumulative cache) to scale linear with the number of servers as opposed to weighted round robin (WRR).

    Problem: The problem addressed is that WRR only takes into consideration back-end server load when distributed targets (requests) to given servers. This policy fails to acknowledge the potential benefits of a scheme that attempts to optimize towards cache hits and therefore overall performance of WRR suffers greatly. Under this scheme any request can go to any server which means the cache at each node must be large enough to cover the working set. In other words the WRR policy does not fully exploit the distributed nature of the back-end servers with regards to file locality. This causes WRR to suffer greatly with regards to performance.

    Contributions: LARD exploits a simple mapping of targets to servers in an attempt to create high locality on each server. This is because subsequent targets, of a given type, map to the same server which therefore means a given server is only responsible (ideally) for a subset of the working set of requests. In effect this creates a pseudo-distributed main memory amongst the back-end servers. In addition the authors add small checks to the LARD algorithm to induce good load balancing. Their other important contribution is the TCP handoff mechanism which allows the front-end server to delegate a TCP connection with a client to a certain back-end server. The mechanism is implemented as a small module which keeps it transparent to clients and back-end server applications (i.e. both require no modification). The authors implement and test their system (LARD/R+TCP Handoff) with a rigorous simulation procedure and compare it against other current algorithms.

    Application to Real Systems: The principle that the locality of back-end servers should be exploited is of great importance and remains so today with regards to real world systems. There are a number of companies that attempt to provide solutions to these problems (e.g. Akamai). However, LARD itself is not applicable to the web server ecosystem of today and the serving of dynamic content; mainly media. Unfortunately, many simplifying assumptions were made in their implementation and assessments. For instance, their system was built around HTTP 1.0 which is largely deprecated in favor of HTTP 1.1 which they do not address in their paper. They also do not consider the dynamic content, or content created on the fly for a given user as this will not lend itself to a locality based algorithm.

    Summary: Paper about building a system that takes into account the locality of the target data when distributing requests to from a front-end server to back-end servers. The idea is that if one is able to take locality into consideration, the size of the working set that can be served via hits in main memory can be increased more efficiently with the addition of more back-end servers; whereas with weighted-round-robin, the increase in working set size is very modest. The authors evaluate their system via simulation on trace data and the creation of a prototype.

    Problem: It is better for requests be served via hits in main memory. Furthermore, this becomes even more important when a distributed web-server architecture comes into play because the scalability of that system will start to be bottle-necked by disk accesses. State of the art techniques at that time used a weighted round robin strategy for distributing requests from a front-end server to back-end servers. Therefore, main memory is not being effectively used on the back-end servers when the goal is to have as much of the working set as possible to be able to be served from main memory. The reason being is that when you distribute requests in a RR manner, the set of target data that is present in the memory of the back-end servers will be the same across machines and the working set that can be served is much smaller than the aggregation of all the memory across the back-end machines.

    Contributions: This paper presents a locality-aware request distribution algorithm that completely addresses this problem. The algorithm that they implement is able to find an effective middle-ground between distributing requests based on server load and content of requests. They basically partition the namespace of content so that the data in each back-end server’s main memory cache does not overlap. Load balancing is achieved by having preset values for a high and low utilization and if the load on a back-end server gets too loaded, the request and subsequent requests that are the same are re-assigned to a less loaded node. Their final algorithm, LARD/R allows multiple back-end servers to be able to serve targets of the same sort. One big aspect of the system is that the front-end does not have to keep any complex state with regards to the content of the main memories of the back-end servers. All the front-end has to do is keep track of the back-ends that it has assigned specific targets to. Their final contribution is their protocol for handing off TCP connections. Again, they choose a good level of abstraction that requires no changes to clients and server applications.

    Applicability: The idea of having content-aware distribution in order to increase the effectiveness of the working set of some system is an idea that will always be part of the conversation when it comes to the design of distributed systems. The specifics of LARD and their particular system, on the other hand, is probably not that applicable to today other than the good ideas that they put forth, which really is what is most important. Given the environment of today, the front-end server of their system would easily become a bottleneck with just servicing the requests and assigning them to back-end servers. At the time, however, this system is an obvious improvement to the state of the art and the requirements for implementing such a system did not seem to be that onerous.

    Summary:

    In this paper the authors introduced locality-based request distribution (LARD) – a specific strategy for content-based request distribution in which the front end distributes incoming requests in a manner that achieves high locality in the back-ends’ main memory caches as well as efficient load balancing. We are also introduced to a novel TCP handoff protocol which was established to handle incoming requests handing off to the back-end from the front-end in a transparent manner to the client. To substantiate their claim that LARD combines good load balancing with high locality, the authors have presented simulation results demonstrating significant performance improvements. 

    Problem:

                    There was a trade-off between effective load-balancing and high locality; for example efficient load-balancing can be achieved using weighted round-robin distribution but it increases the likelihood of cache misses.  On the other hand a purely locality-based distribution strategy achieves a decrease in cache miss ration but at the expense of poor load balancing. A secondary problem (that’s arose from the proposed solution to the above problem) was a requirement of a light-weight protocol that’s allows the front-end to hand off an established client connection to a back-end node in a manner that is transparent to clients and is efficient enough not to render the front-end a bottle neck.

    Contributions:

    ·         The paper presents a simple algorithm that achieves high cache hit rates and good load balancing.

    ·         Given the infancy of the TCP and HTTP protocols, I think the TCP handoff protocol (layer on top pf TCP) was simple and light-weight that achieved the required transparency without needing changes on the client side and running unmodified server applications on the back-end.

    ·         Using the number of active connections to compute load-balancing was novel approach.

    ·         Replication in LARD was a simple yet effective extension used to handle overloading.

    ·         Since their cache aggregation makes the system increasingly CPU bound as nodes are added to the system, LARD and LARD/R can capitalize on added CPU power. This was important since CPU speeds were expected to improve at a much faster rate than disk speeds. Also it was demonstrated that WRR could not benefit from such added CPU.

    Limitation:

    ·         Although the authors assume the front end to be super-efficient in handling of new requests, it could potentially become a bottleneck due to overloading of requests.  Although they claim the front end can scale to meet the demand of increasing requests by upgrading to a faster CPU or by employing a SMP machine, no experimental results was provided to corroborate this claim.

    ·         In the TCP handoff protocol created it is not clear how the backend communicates with the client.

    Applicability:

                    Since the authors claim that the system can scale effectively up to 10 servers it is not clear how it would perform in today’s environment where the demand of scaling is much higher. But this paper introduced a new scheduling and routing algorithm which was different than the prevalently used methodologies. I believe the basic idea of this paper is still used in some form of the other in modern distributed systems. Hence the problem and solutions described in this paper are still relevant today.


    Summary: This paper suggests a strategy called locality-aware request distribution (LARD) for the front-end to distribute content-based requests into the back-ends servers. This strategy achieves both high cache hit rates and servers' load balancing. In addition, it proposes an efficient TCP handoff protocol.

    Problem: The cluster-based networks have front-end servers to distribute the requests and the back-end servers to process the requests. Previous methods, such as weighted round-robin (WRR), does not utilize the locality of the requests. Given that the back-end servers' disk speed is the bottleneck of the whole network, we should utilize locality to improve the cache hit rates.

    On the other side, naively distributed the requests by hashing map fails to balance the load of each server. So we need a strategy to simultaneously achieve the high cache hit rate and load balance.

    Contributions:
    1. They give a simple and efficient LARD strategy to achieve simultaneously high cache hit rates and good load balancing. The strategy does not need any further communication between front-end and back-end and do not do any statistics of the frequency of requests. Therefore, LARD is practical in their settings. The simulation shows that LARD scales well.

    2. They propose an efficient TCP handoff protocol to providing transparent connection for TCP-based network service.

    Applicability: Though the strategy is simple, the simulation is over-simplified to prove it practical.
    1. The number of open connections may become the bottleneck for efficiency when the number of back-end servers scale. Moreover, the different connectness of each back-end server should be considered.

    2. The one front-server settings may be unrealistic given the increase number of the request. The single front-end server is also vulnurable to single node failure.

    Summary:
    The presented locality-aware request distribution (LARD) is one of a new class of content-based distribution strategies used by a front-end (FE) of a cluster to direct requests to one of several back-end (BE) nodes which handle requests.

    Problem:
    (1) The state of the art distribution strategy, Weighted Round Robin (WRR), targeted load balance.
    + Used only # of open conn to BE nodes to direct requests to lightly loaded nodes.
    + All nodes were kept busy (read: had queued requests)
    - |working set| > |single node cache size| => throughput became disk-bound
    (2) Locality-based distribution is presented as a naive solution with its own problems. Requests are partitioned by hashing; all requests for given target sent to same node.
    + BE caches are effectively aggregated. More nodes => throughput more CPU-bound.
    - Targets not requested uniformly, some nodes over-loaded while others remained idle.

    Solution:
    (1) Map targets to set of nodes "responsible" for that target. If a set is highly loaded already, add a new lightly loaded node to the set. These sets are highly variable and set sizes grow and shrink for each target as needed.
    (2) Because the FE must read the content of the request, a TCP connection handoff (more-so a low-level forwarding) was developed for passing the packets from FE to BE.

    Contributions:
    + Seemingly the first evaluation of a content-based distribution approach (and TCP handoff), with encouraging results for future work in the area.
    + A solution which maintains the pros of WRR (simple FE strategy/accounting), and LB (cache aggregation), in a relatively small package (few kernel modules, and no required changes to client or server app code).
    + Forward-looking solution, whose benefits grow as CPU improvements continue to outpace disks.

    Weaknesses:
    - Though the LARD logic accounts for only 10-20% of FE delay, this combined with the "hand-off" (future packet forwarding) limits the total request throughput and maximum number of fully utilized BE nodes. Could use one FE naively round robin-ing (not content-aware) to several LARD "middle-front"-ends (each with its own BE nodes).

    Overall:
    Still a very important problem, as low request latency is a big goal for many companies. To do this effectively, you need caches, and caches benefit from locality. I think the authors did great work both solving their problem and demonstrating the solution's merit.

    Summary:
    The paper proposes a locality aware request distribution (LARD) strategy which achieves high performance in request based clusters through high cache hit rate and good load balancing.
    A TCP hand-off protocol has also been proposed to hand-out a established TCP connection from front end server to backend and maintain transparency to user.

    Problem:
    The paper addresses the problem of efficiently distributing the workload/requests in a cluster along with maximum utilization of all the resources. Earlier strategies such as Weighted Round Robin(WRR) achieve load balancing by sending targets to several nodes but fail to leverage from same content-type requests leading to poor locality in backend memory cache. Similarly, simple hashing to distribute work achieves locality but leads to load imbalance.

    Contributions:
    1. LARD: According to proposed strategy, the cluster consists of a single front end node and multiple back end nodes. The front-end node is content aware and redirects the same type of incoming requests to same backend node until its overloaded. This help in maximizing memory cache performance and good load balancing. The proposed strategy has been proved efficient through various experiments.

    2. LARD with replication: For hot targets, front end maintains a mapping to set of servers and assign targets to least loaded node in the set.

    3. TPC hand-off protocol which is client transparent and allows front end to hand off established connection to the backend node.

    4. Proposed system is able to achieve high scalability by effective locality-aware load distribution and efficient use of resources.

    Discussion:
    1. Single front end node = bottleneck
    The paper does not discuss the consequences or alternatives in case the front end node fails. Also, I believe adding more backend nodes for processing is limited by the number of nodes frontend node can support.

    2. The proposed strategy can not be used to fetch dynamic web pages. Consider a scenario where a target is served by multiple back-end nodes, and one of the requests modifies it. All the other nodes would now return stale/inconsistent data.

    Applicability/Relevance:
    The concept introduced can be directly applied to efficiently distribute heavy, static content such as videos by Youtube, photographs by Instagram or scenarios where data needs to be served at a faster rate.

    Summary:
    This paper solves the well known problem of distributed systems such as scalability and high availability for cluster based network servers. During the advent of web servers, the load balancing was done using weighted round robin method, which provided fairness by assigning in a round robin fashion using load parameters. But, it doesn't consider the locality of data, which will help in improving throughput by using memory hierarchy of the computer memory. But just considering the locality for request distribution will overload the same node. Hence LARD comes up with a front-end which distributes the load to back end based on content of the request.
    The Basic LARD design considers a single node for a content and if thats overloaded with 2 times the high-threshold or if there is a less loaded node, it will associate the target content with a lesser loaded server. LARD/R model has replicated version where a single target content is replicated in more than one node which balances the load when there is a burst of requests for hot page and finally the under utilized nodes are also removed from the target list. They have also come up with a transparent solution with the help of hand off protocol, so that client doesn't know the fact that the request is given to a back end node.

    Contribution:

    1. The idea of distribution of requests (content-aware) through front-end to back-end, which is a used a lot in every distributed systems.
    2. It gives out a solution which is the best of both the worlds location aware as well as round-robin.
    3. It gives out an efficient way to hand off the connection to the back end without letting the clients to know about it.

    Drawbacks:
    1. Though the system effectively manages the load when there is a sudden burst of requests for hot pages, the front end itself is a bottleneck for the design, since it has to distribute the requests accordingly.
    2. In the assumptions, they mention that they can monitor the connection status but doesnt mention anywhere about how do they do that.

    Conclusion:
    The overall system is implemented using a simulator as well as a prototype which helps them prove that their idea is feasible and increases the performance of the web servers by a factor of 3-4 as compared to state of art technology at that time.

    Summary :

    The authors present a content based request distribution policy named locality aware request distribution strategy (LARD) in which a front-end node receives a request and distributes it to a number of backend nodes based on both the content of the request and the load on the backend nodes. This is how they try to achieve good load balancing along with high cache hit rates in cluster based network servers.

    Contributions :

    1. The idea of content based distribution is an useful idea. The LARD scheme proposed in the paper uses this idea and is effective than the weighted round robin scheme as LARD can achieve better cache hit rates at the backend nodes.
    2. LARD scales out quite well as compared to a weighted round robin (WRR) scheme as the effective cache size is much higher. This caching scheme seems relevant to present day systems where the working set sizes are growing tremendously.
    3. One other advantage of the basic LARD strategy over the locality based strategy is that, it does not use a static hash function to achieve locality. Instead, it dynamically partitions the namespace and then re-assigns the targets to other nodes when there is a significant load imbalance.
    4. The authors have also considered the case in which a single backend node might not be able to serve a single target and the multiple backend nodes are required for the same. LARD with replication (LARD/R) serves the purpose. It consists of a 1-many mapping from targets to a set of nodes that can serve the target. One other advantage of these strategies is that the front-end does not need to keep track of any backend cache state.
    5. The results for the throughput, cache hit rate and underutilization time for LARD and LARD/R when simulated with two different traces are discussed. The results of the simulation show a significant speed-up in performance over the existing techniques. It is quite promising that the throughput increases by a factor of 2 to 4 as compared to the existing techniques.
    7. Transparency has been ensured by introducing the TCP handoff protocol that hands off the connections between the front end and the backend nodes in a manner that is transparent to the client.

    Discussions:

    1. A single front-end node is used for the setup. This node can be a potential node for failure and it is not discussed as to how failure in the front-end node could be handled.
    2. The simulation model makes a very ideal assumption that the front end node and the networks are quite fast enough and that the network has infinite capacity. This is not a practical assumption though.
    3. The entire paper focuses on the LARD strategy for serving static content. Caching dynamic content and problems that arise out of cache synchronization would be challenging aspects.

    The paper discusses some useful concepts on the effective use of content based request distribution. However, these usage of these techniques is limited only to servers that serve static content whereas present day web servers serve heavy dynamic content.

    Summary :

    The authors present a content based request distribution policy named locality aware request distribution strategy (LARD) in which a front-end node receives a request and distributes it to a number of backend nodes based on both the content of the request and the load on the backend nodes. This is how they try to achieve good load balancing along with high cache hit rates in cluster based network servers.

    Contributions :

    1. The idea of content based distribution is an useful idea. The LARD scheme proposed in the paper uses this idea and is effective than the weighted round robin scheme as LARD can achieve better cache hit rates at the backend nodes.
    2. LARD scales out quite well as compared to a weighted round robin (WRR) scheme as the effective cache size is much higher. This caching scheme seems relevant to present day systems where the working set sizes are growing tremendously.
    3. One other advantage of the basic LARD strategy over the locality based strategy is that, it does not use a static hash function to achieve locality. Instead, it dynamically partitions the namespace and then re-assigns the targets to other nodes when there is a significant load imbalance.
    4. The authors have also considered the case in which a single backend node might not be able to serve a single target and the multiple backend nodes are required for the same. LARD with replication (LARD/R) serves the purpose. It consists of a 1-many mapping from targets to a set of nodes that can serve the target. One other advantage of these strategies is that the front-end does not need to keep track of any backend cache state.
    5. The results for the throughput, cache hit rate and underutilization time for LARD and LARD/R when simulated with two different traces are discussed. The results of the simulation show a significant speed-up in performance over the existing techniques. It is quite promising that the throughput increases by a factor of 2 to 4 as compared to the existing techniques.
    7. Transparency has been ensured by introducing the TCP handoff protocol that hands off the connections between the front end and the backend nodes in a manner that is transparent to the client.

    Discussions:

    1. A single front-end node is used for the setup. This node can be a potential node for failure and it is not discussed as to how failure in the front-end node could be handled.
    2. The simulation model makes a very ideal assumption that the front end node and the networks are quite fast enough and that the network has infinite capacity. This is not a practical assumption though.
    3. The entire paper focuses on the LARD strategy for serving static content. Caching dynamic content and problems that arise out of cache synchronization would be challenging aspects.

    The paper discusses some useful concepts on the effective use of content based request distribution. However, these usage of these techniques is limited only to servers that serve static content whereas present day web servers serve heavy dynamic content.

    Summary:
    This paper talks about a locality aware request distribution strategy to efficiently utilize caches of backend servers in a cluster based web servers. It also introduces a new tcp connection hand-off protocol.

    Description:
    The problem this paper is trying to solve is to how to balance the load and to utilize the resources efficiently in a cluster-based network servers. The idea is to bind the requests to back-end server based on load on the server and the locality with respect to requests. So, if a request is forwarded to a backend server, when it comes later, it will again be forwarded to same server unless that server is overloaded. In this way, same requests are served by the same backend server, thus more cache-hit because of locality. The simulation results very clearly showed that LARD outperform WRR because it utilizes the cache effectively. The authors also introduced a tcp hand-off protocol which transparently transfer open connection from front-end to backend servers. The front-end forwards the packets coming from clients to back-end servers, however the backend server send packets directly to the clients.


    Contribution:
    - Distributing request in locality aware fashion was a very nice improvement over WRR. The results clearly showed that this simple LARD technique gives substantial improvements. Instead of disk-bound, with LARD the system becomes more compute-bound which is always better than the former because cpu speed increases more quickly than disk speed over time (especially at the time this paper was published). Moreover, if the system is disk bound, more and more request have to wait in the pending list which causes backend server to be swamped.
    - One other very nice feature which I like was to determine the loads on backend servers using the number of open connections which front server can easily determine because front-end transfer tcp connection to backend. If there are more open connection, this means that backend server is running slow and most probably overloaded. It is a very simple approach as compared to tracing load on the backend server.
    - Lastly, the tcp handoff technique was also a neat strategy to transfer open connections. (However, the front-end server can becomes a failure point for the whole cluster. Even if using more than one front-end server, if one fails, the connections handled by that server are lost and has to be re-established. This way, this system becomes a stateful kind of system which is not good in case of failures).

    Applicability:
    I think the techniques descried in this paper are still applicable in today's clusters. We have front end web servers whose tasks is to serve webpage by gathering content from many backend stores. Every web company today try to reduce their p99 latency, so it is very crucial to efficiently use the caches of the backend server which is only possible when their is locality in the distribution of requests.

    Summary :
    The authors of this paper have devised a client transparent request distribution strategy LARD in a cluster of network servers by taking into account various factors like the load and the locality of information on each server thereby exploiting the advantages of caching.

    Problem :
    1. Conventional request distribution strategies like weighted round robin, don’t take into account the locality of information. An example would be a front end server redirecting the requests to same objects to two different back end servers instead of just the same server which would lead to the second request benefit from caching.
    2. Request distribution strategies which take just locality into consideration also don’t perform well since if a server starts serving a hot content, it gets overloaded with requests.

    Contribution:
    1. Devised a strategy that takes both load and locality into account.
    2. LARD scales very well. As more number of machines are added to the cluster, due to locality aware distribution, the effective cache size increases. The same argument could be applied to secondary storage as well.
    3. This also allowed some servers to be reserved to serve specific content. For instance, a video content could be delivered by a machine which would employ Log Structured File System
    4. LARD with replication allowed the hot content requests to be equally balanced between the cluster of servers. This would prevent one single node from being bottlenecked by requests for a hot page.
    5. The whole strategy was implemented in a client transparent manner. There were not connection re-establishments between the client and the back end. The front end would simply transfer the TCP connection to the back end servers through the tcp handshake protocol.

    Shortcoming:
    1. Since there is just one front end server, it could become a potential bottleneck.
    2. There is no clear mention has to what happens when the number of active connections reaches beyond the total threshold. Will the front end stop accepting connections then?

    Applicability:
    Efficient content distribution would really improve the performance of content delivery networks which are a vital part of today’s internet. Though the authors don’t handle some scenarios like persistent HTTP connections, this request distribution strategy can prove to be highly efficient in the cloud.

    Problem: Cluster-based networks consisted of front-end and back-end servers, but the front-end considered only load when determining which back-end server to send the request to. The issue is here is that this doesn't fully utilize cache performance. Disk speeds were becoming slower than network throughput, meaning higher cache hit rate was of upmost importance.

    Solution: Locality Aware Request Distribution - a way of distributing requests from front-end nodes among the back-end nodes, increasing cache hits while still considering load balancing.

    Difficulty 1: Designing a request distribution strategy that both balances load and still maximizes cache hit rates

    Solution 1: (simple): partition the database name space across the nodes. When a request comes in, send it to the last node that handled the request. Do this unless: a) This node has a high load, and there is a server with a load load or b) this node has a very high load. If either of these occur, send the request to the node with the least load.
    (With Replication): This is very similar to the simple version, except that there can be multiple nodes currently serving some document. If required, it adds new nodes to the set of serving nodes (when there is an overload), and removes when time has passed with no overloaded servers. It then does WRR on any new requests that come in on the current set of servers serving that document.

    Simulation Testing: This method was tested on simulated clusters. Front-end was assumed to have no network bottlenecks, thus it is allowed to have infinite speeds in the network, and all bottlenecks in the backend would be seen. Real-world traces were used to give a distribution of file sizes for requests.
    Results were very good for LARD and LARD/R. They were compared against WRR, WRR with global memory system, and a global cache. LARD had higher throughput, low missed requests percentage, and high utilization.
    LARD was then compared against WRR for sensitivity to CPU and Disk speed. It was found that since WRR didn't utilize cache at all, it became disk-bound much more frequently, while LARD could utilize increased CPU power to handle more requests, and thus become CPU bound. Adding disks, however, only marginally improves LARD while greatly improving the disk-bound WRR.

    Difficulty 2: Allow front-end to hand off connection to back-end, in a way that is both transparent to the client and doesn't allow the front-end to be a bottleneck (since front-end now needs to read the contents of a request).

    Solution 2: TCP connection handoff. This protocol hands off requests from the front-end to the back-end, such that future communication is forwarded to the back-end.

    Cluster Testing: LARD was tested on a small cluster consisting of a single frontend, a switch, and several back-end servers. The results were consistent with simulation, where LARD outperformed WRR in throughput.

    Conclusions: LARD seems to be a great enhancement over the previous options, mainly WRR. Of interest to me is increasing the size of the cluster where multiple front-end nodes are required, and thus state must be maintained between front-end servers. Future work included investigation into usage of LARD with dynamic content.

    Summary:
    Locality-Aware Request Distribution (LARD) is a load-balancing policy which takes into account the content being requested, the backend server load, as well as, optimizing for cache hits on the backend servers. This is described, simulated, and tested in contrast to other state of the art schemes such as weighted round-robin (WRR) scheduling policy with significant results.
    Contributions:
    There were two significant contributions presented in this paper. First they described and gave the simple psuedo-code for the LARD load-balancing policy itself. This policy dramatically improved cache-hit rates for the backend servers, but also balanced load about as well as WRR. One reason the psuedo-code for the policy was simple is that it didn’t try to model backend cache eviction policies. The only metric the policy needed to know from the backends was their load which was measured as the number of active connections. In doing so LARD was actually much more efficient in terms of throughput and node utilization than the aggressive locality-based global cache (LB/GC) strategy,. Another nice property of locality-aware request distribution was that it could scale up horizontally at a linear rate across many backend machines. This was in contrast to WRR which was only able to upgrade vertically or, in other words, upgrade the individual machines with more memory and multiple disks. LARD could literally add another node to the backend cluster and it would scale up accordingly with the only limit being frontend throughput. Adding replication, so that multiple backends could serve the same content by having sets of servers was added with no additional complexity. In their simulations it could handle 3.9 to 4.5 times more requests than WRR and 25% to 50% less delay in serving those requests due to cache hits. In real cluster tests on six nodes LARD was able to handle 2.5 times more requests than WRR. The second major contribution was the creation of an efficient albeit low-level kernel module for handing off TCP requests from the load-balancing frontend to the backend servers transparent to the client. Replies could then be returned back to the client directly from the backend for minimal usage of the frontend.
    Problems:
    Two problems they mentioned, but only partially solved in the paper were HTTP 1.1 persistent connections and serving dynamic content rather than only static content. HTTP 1.1 is ubiquitous today and obviously of real concern, but it seems they could solve it mostly on the frontend with some simple code modifications. Dynamic web content was first coming on the scene back when this paper was written, however today it is universal. Very few websites are static only. I am curious, yet skeptical on how their locality claims would hold up on a modern web stack as web requests and responses can have high variability in size. Another unmentioned problem was the custom TCP handoff kernel module. The TCP change is transparent to the client, but to deploy it to production would require a kernel upgrade on every system implementing it, both in the frontend and the backend machines. With cloud computing and virtualization today this maybe easier than it was back in the nineties, but nevertheless in order to get this from the lab to production would take serious effort.
    Applicability:
    Locality-Aware Request Distribution is a highly significant contribution and very applicable to a modern web stack even if was developed sixteen years ago. Many web frameworks require load-balancing and horizontal scaling because the simple vertical scaling of a machine can only yield so much return. I see no reference to LARD in Nginx or HAProxy documentation, but I imagine the main implementation issue for such software load-balancers is the TCP handoff. I imagine they would see vast throughput and node utilization improvements if the LARD policy was added. Overall, there is much to be learned on the importance of locality-aware infrastructure married to load-balancing.

    Summary
    This paper introduces a locality-aware request distribution algorithm to process the request from front-end to back-end and a handoff protocol that is transparent to clients. The two proposed algorithms focus on providing both load-balance and high cache hit rate.
    Problem
    Most cluster servers at that era disregard the request contents or the type of the service. Most algorithms could achieve the good balance on back-end nodes, but are weak on performance, specifically the cache hit rate. So, the author summarizes two challenges. The first challenge is to design a practical load balancing and high cache hit rates on the back-end. The second challenge is to provide a protocol that allow the front-end to hand off an established connection to the back-end but still transparent to clients.
    Contributions
    The author gives two algorithms. The first algorithm is called LARD. When a new request arrives, the front-end server assigns this request to a lightly-loaded back-end server. The server keeps serving this request until its load reaches a threshold. Then the request will be reassigned to the current least loaded one. The second distribution algorithm improves the first one on that the front-end server dynamically maintains a server pool for each request instead of using only one server for one request. The handoff protocol is another contribution of this paper. The obvious advantage is that it is transparent to user. No modification are needed to the client. An additional protocol layout would hand all the request forwarding work.
    Discussion:
    The first distribution algorithm is pretty straight-forward. Some imbalances are tolerated in order to increase the cache hit rate. Once one node is overloaded, reassign the request to a less loaded one. This solution has a fatal problem that one request is only served by one node. Assume one extreme case that all requests hope to access the same resource. In this case, there is only one node serves all these requests no matter how requests are reassigned. Then the author proposed an improved algorithm using replicas to deal with this problem. This method is really good. The node-pool is maintained dynamically which ensures a busier request would have more nodes. Also if the request is not as frequent as before, the node pool shrinks down. From the simulation result, this algorithm works better WRR in most case and does well as WRR in some cases.
    However, these two algorithm still have some problems. First, it is not suitable for the stream data like video website, which has a really bad locality. Second, the author provide two data traces and two modified traces for the simulation which I think is not sufficient. More kinds of traces data should be used to verify the performance of the algorithm. Third, in this paper the author uses the number of active connections to measure the load of nodes. The number of active connections seems not the best choice. It is very possible some frequently accessed requests are fairly simple which would not lead the node to be overloaded. Fourth, as the number of requests increase, the dispatcher would become the bottleneck of the whole system. Many requests would be queued at the dispatcher node and waiting to be processed. Eventually, all the resources in the dispatcher would be exhausted. Once the dispatcher is down, the whole system crashes.

    Summary :
    The paper presents the idea of content based request distribution in cluster based networks. It effectively combines the ideas of load balancing among back end servers and locality based request distribution. The papers also presents the idea of introducing a dedicated front end node which is responsible for distributing incoming requests to the corresponding backend node, transparent to the client, through a TCP handoff protocol. The back end then responds directly to the client through this connection.

    Problem :
    The problem that the authors were trying to solve was this: there is a high number of cache miss rates when a protocol like Weighted Round Robin is employed because it doesn’t take locality into account even though it performs very good load balancing. And adding more nodes does not help with scaling out cache. So, a lot of incoming requests will be disk bound and scalability will be bound to disk performance. On the other hand, systems which take care of only content based distribution might end up with a single node handling all the requests. This is equally bad. This paper wanted to achieve the best of both worlds.

    Contributions/TakeAways:
    Content-based request distribution: The system reduced the number of cache misses by partitioning content among all the back ends and sending requests to them based on their content only.
    Content Replication with high frequency load: When a lot of requests involving a single page or such arrived, the system put more back end servers to service requests on the same page, which is replicated among them all, simultaneously. This way, the system gives higher priority to hot topics. I can totally imagine Twitter doing something similar to this especially on New Year’s eve when they know there is going to be a huge spike in the number of requests.
    Recovery from back end failure : The idea of making the front end not manage any states helps the system in recovery due to back end node failure much better. All the front end needs to do is to re assign the targets as if they had not been assigned before.
    Diverse datasets for simulation: The simulations done by the authors was a very wide set of data covering all the cases they wanted to show their system performed better in.

    Drawbacks:
    The system could eventually face the problem of the front end being a bottleneck because of overloading requests. Although the authors mention scaling up the number of front end servers, I think this might be a hard task to do taking into consideration consistency and synchronization issues. Also, this system would handle static requests really well, but, I would love to know how this can be done for dynamic requests too because it is difficult to track locality there.

    Relevance:
    The idea of request distribution is very much related to the problems that current systems face. I can see how having a dedicated server for routing requests could have been incorporated in recent days from papers like these. Although, I do not think content based distribution would be as much of a problem now because of the huge memory available these days. Having said that, the idea of replicating and handling hot content is a great idea that companies like Twitter, Amazon etc can use even today.

    Locality-Aware Request Distribution in Cluster-based Network Servers

    Summary:
    Locality-Aware Request Distribution (LARD) is an improvement over the typical Weighted Round Robin (WRR) algorithm which was prevalent at the time by taking load and content into consideration.

    Problem:
    In 1998, when the LARD paper was published, the dominant load balancing strategy was Weighted Round Robin (WRR). WRR delivers efficient load balancing, but it was terrible for cache hit rates since each request effectively goes to servers randomly without regard for what is already cached in the backend servers. In the case where the working set is larger than memory, operations become disk bound since servers would inevitably end with a large ratio of cache misses from the disk. In this scenario, the incremental throughput gain from adding additional machines is very poor.

    Contributions:
    1-Better memory scaling: LARD allows cache aggregation by making the system content-aware so that additional machines can increase the effective working set (since any given machine only contains a subset of the working set). Utilizing locality this way improved cache hit ratios which help make operations more CPU bound.

    2-Approximation of load: By measuring the number of currently active connections, LARD is able to estimate the approximate load. T-low, and T-high measures were used to categorize the load, and define whether any given request should go to one server or other (or whether a server was too full & should be removed from the working set until it was able to handle the current request queue).

    3-Load balance: While LARD would just pick a single destination machine, LARD-R, used a more complex algorithm that allowed replication. Allowing replication is of course critical since some files are so hot they can't be handled by a single server.

    Applicability:
    LARD scaled one front-end server to 10 back-end nodes. It was unclear how much further this kind of design could scale because there were no provisions for front-end scaling. As a system grew beyond 10 back-end servers, memory issues seem likely since the front-end must maintain state information for a growing cluster. Bandwidth would also likely be a problem since the front-end must hand-shake with both clients and back-end machines, as well as forward information from back-end machines up to the clients (with the transparent TCP handoff). There's also the concern that the front-end server would become a single point of failure. GFS and HDFS are reminiscent of LARD-R in the sense that they also both have a single master machine, and experience has shown this kind of design (at least as it currently stands) is flawed. Maybe this system style will eventually come back into favor if someone can address the frailty of the front-end server.

    Summary:

    The authors present a scheme to distribute client requests among servers with the goals of achieving a high cache hit rate and an equitable distribution of workload. The proposed method is compared with other contemporary state-of-the-art methods using web log trace simulations, and a prototype cluster to demonstrate its superiority over existing schemes.

    Problem:

    Scale throughput of client requests with the number of servers. Many existing schemes like WRR and LB had scalability bottlenecks that only provided marginal improvements as the size of the cluster increased. WRR focused largely on even load distribution, while LB focused on cache affinity. Individually, neither of the schemes could produce reasonable performance across a diverse workload. The authors attempt to solve this problem by presenting a solution that takes both the relevant factors into account.

    Contributions:

    - Provides a scheme that scales throughput with number of servers added to the cluster.
    - They developed the TCP Handoff protocol that is transparent to the client for efficient delegation of clients.
    - They had the insight to move beyond load balancing and also focus on optimizing in-memory caching to avoid expensive disk reads. It's not a novel idea since it's a well known processor technique. However, it was good to see the broad applicability of the technique.
    - LARD/R is interesting because it alters the number of servers in response to the work load. Seems like a precursor to elasticity in cloud computing.
    - The evaluation section was good since it demonstrated the technique using two different methods: simulation and a prototype cluster.
    - At first glance the single front server looks like a scalability bottleneck for the number of back end servers increase. However, we see that this is not necessarily a limit on the number of possible back end servers as Map Reduce's architecture has demonstrated.

    Flaws:

    - A single front end server is a single point of failure.
    - Scalability bottleneck for number of acceptable connections.
    - Will not work well with persistent connections. Seems like one could just DoS the cluster by leaving persistent connections on the front end.
    - Relying on the number of open connections on back end seems like an incorrect way to evaluate workload, since the "weight" of each open connection is unknown. Using CPU utilization is also an unreliable metric since that is applicable to processor intensive tasks and not something with disk reads (like serving large videos). CPU spikes can also give a false impression of workload if it's sampled at an unfortunate time.
    - LARD/R is evaluated on an artificial workload to compare it with LARD. It would have been more convincing if it was evaluated on a real workload.
    - Authors seem to believe that it can benefit dynamic content serving, but their argument about caching server processes and primary data files is weak.

    Summary:
    This paper proposes a content-based request distribution scheme to improve performance of cluster-based network servers where a front-end server receives client's request and forwards them to one of them back-end servers. Authors have presented locality-aware request distribution(LARD) strategy which dynamically sub-divides servers working set among the back-end servers and thus achieving high cache hit rate in back-end's memory, leading to high throughput, and as well this
    strategy achieves good load balancing between the back-end servers.

    Problem:
    Cluster-based network servers distribute incoming requests among set of back-end servers for processing to improve performance. To avoid overloading of one back-end server, most servers sends the request in a weighted round-robin(WRR) fashion to achieve load balancing among the back-end servers. So in this approach, each back-end server is expected to receive request for each possible data in the entire working set. The problem arises when the working set exceeds the size of the main memory cache at the back-end server, this leads to frequent cache misses which in-turn leads to disk-reads, resulting in bad performance and low throughput for the requests.

    Contributions:
    The major contributions of this paper is the strategy which provide high cache hit rate for the requests and simultaneously achieves load balancing. Authors have used a hashing function over the requested data to identify which back-end servers to dispatch the request to for processing. Overtime, a back-end receives requests for same data which it can process directly from the data in its cache without need for disk reads. This leads to high throughput for the server. To achieve this authors maintain a mapping in the front-end between the target data and the back-end servers, which helps in identifying the target back-end for a request and also to learn about the load on the particular server using T_high and T_low thresholds, resulting in a good load balancing strategy. And one of the important point of this approach is, that it is independent of the local cache replacement policy of the back-end servers. And also absence of elaborate state in the front-end, provides simple recovery approach for back-end failures. LARD approach as such enables easy scaling of the server when working set size increases, since a new back-end can be added to process request for the additional set of data without changes in cache size or processing power in other back-end servers.

    Another contribution is TCP handoff protocol, to transparently transfer the connection from front-end to back-end server. In the earlier WRR approach, the front-end could simply forward the connection to back-end, it did not had to accept the request and inspect the content. But with LARD approach, the challenge was, front-end have to accept the request and read the content to identify the back-end server where to dispatch the request and at the same time it should not lead to a bottleneck in the front-end. For this purpose author's proposed TCP handoff protocol which provides efficient connection forwarding approach and which is also transparent to both the client application and the back-end server.

    Authors have conducted extensive test using simulations to demonstrate LARD and LARD-Replicated strategy outperforms the WRR approach in terms of throughput achieved and cache hit rate. Also they have developed a prototype with LARD based server to compare test results in real environment.

    Limitations:
    Authors have mentioned about hash function for recognizing the target back-end servers but with out providing much detail on it.
    Authors have described that one front-end can handle forwarding load for about 10 back-end servers, which is quite less to compared to current systems.

    Applicability:
    Authors preference for processing request in CPU intensive fashion than as disk intensive is still relevant today. The CPU speeds have improved much faster compared to disk speeds. So the importance of caching and locality is still relevant. This paper brought concept of cache locality along with server load balancing, which are important for Content Distribution Networks (CDNs) which have back-end servers geographically distributed. But the LARD strategy applicable to static content only cannot be used in the majority of current systems which serve dynamic content.


    Summary:
    This paper looks into the problem of distributing requests among servers in a cluster. The authors show the problems of then currently used distribution methods. Then they introduce a new request distribution method called the LARD which can achieve good cache hit rates and also good load balance between the servers. The authors also discuss about some challenges like efficient TCP handoff and also show that LARD achieves load balance and high cache hit rate through simulation experiments and a prototype cluster.

    Problem:
    While the weighted round robin scheme to distribute the requests balances the load well, it cannot use the server caches effectively. Especially when the working set is much bigger than a single server's memory. A simple locality-based approach will distribute requests to servers by inspecting the content that is being requested. This will achieve high cache hits and the effective cache can be the sum of caches of all servers. But this method does not have good load balancing. This paper introduces a distribution mechanism that achieves both good locality and load balancing. The paper also aims to do transparent/efficient handoff from the front end to back end servers.

    Contributions:
    1. A new request distribution method, LARD that can achieve high cache hit ratio and good load balance simultaneously.
    2. A transparent and efficient TCP handoff mechanism for the front end servers.

    Positives:
    1. The two algorithms presented are easy to understand.
    2. I felt that the evaluation was one of the strongest aspects in this paper. The simulation shows that LARD works well in terms of throughput, load-balancing and cache hit rates for both IBM and Rice traces. Even for the most favorable workloads for WRR, LARD seems to be performing close to WRR.
    3. In addition to simulation, the authors also set up a real prototype cluster that uses LARD/R and compared the results with WRR in the same cluster. They show similar benefits as shown in simulation which is impressive.

    Discussion:
    1. It is not clear how dynamic content serving which is common today will fair in this approach.
    2. It is not clear how representative are the traces that the authors used.
    3. Fault tolerance of front end servers is ignored completely.
    4. It is not clear how to measure T-low and T-high for heterogeneous servers.

    Relevance:
    The problem of request distribution as such is relevant today. For example, Windows Azure and EC2 have some form of request distribution. It looks like Azure uses RR, WRR and latency based distribution rather than locality-based schemes. Also, EC2 does RR and 'least outstanding requests' routing. The reason I believe for this is that main memory sizes have increased so much and most workloads can be completely served from memory. In this setting, the effectiveness of LARD is not so pronounced. Hence to me, it looks the request distribution problem is synonymous with load balancing today. Still, the ideas/results discussed in this paper are interesting and useful.

    Summary: This paper proposes the locality-aware request distribution(LARD) strategy which simultaneously achieves load balancing and high cache hit rates on the back-ends of cluster servers, and an efficient and client-transparent TCP handoff protocol.

    Problem: A cluster-based network server has a font-end and many back-ends, where the front-end distributes requests to back-end nodes and back-ends equally serve requests. Round-robin distribution of request can lead to high cache miss and low performance. A naive content-based strategy can improve hit rates in the back-end's main memory caches. However, it will make back-end's loading unbalanced. A policy which can simultaneously achieve load balancing and high cache hit rates should is needed and an efficient and client-transparent protocol is also needed for the font-end to hand off an established client connection to a back-end node.

    Contributions:
    LARD strategy: A target will be assigned a back-end node by choosing a lightly loaded back-end node when the first request comes to the front-end and a one-to-one mapping of targets to back-end nodes is maintained. Targets will be moved from high-load to the low-load back-end for balanced loading. Number of active connections should be limited. A target can be mapped to a set of nodes. This replica strategy solves the problem when the task is larger than single node.
    Simulation: A configurable web server cluster simulator simulates request distribution strategies including WRR, LB, LARD, LARD/R, WRR/GMS and LB/GC. WRR has balanced loading but suffers many cache misses. Both LB schemes achieve higher cache hit ratio but loss in load balancing. LARD and LARD/R results in good speedup.
    TCP handoff protocol: The protocol can create a TCP connection at the back-end without going through the TCP three-way handshake. The protocol is expected to yield scalable performance on SMP-based front-ends.

    Applicability: This paper considers both back-end locality and load balancing and combine them to get the maximal throughput. The policy is very simple but gets good experiment result. For a small amount of requests in that era, this can probably work. But I think it also have several limitations.(1)What if the request volume is so large that even a SMP machine with fast CPU as a front-end can not handle? In that case, we may need to extend the strategy to multi-layer front ends with multiple machines. (2) Is it enough to use the connection number as estimation of workloads? If every request queries same amount of data, it may be true. But if the queries are skewed, this estimation can be very poor. The estimation will directly affect the server mapping, the target moving and so on. We may add some communication module between front-end nodes and back-end nodes to update system resource usages.

    Summary:
    This paper presents a locality-aware request distribution scheme for cluster-based networks. Using this scheme, the load can be distributed across multiple backend servers depending on the content/service requested and the backend server load. By means of increased cache-hits and reduced disk access in the back end servers, this scheme achieves better performance.
    Problem:
    Prior to this work, most request distribution schemes were WRR based which distributed load across back end servers purely based on the server load. This simple strategy does efficient load balancing across the servers but is locality unaware which leads to a lot of cache misses especially when the size of working set on the back end servers is more than their cache size. These cache misses in turn leads to poor performance because of more disk access. Naive request distribution scheme which just takes locality into account for load distribution will also perform poorly because of inefficient load balancing. This paper tries to address this problem by formulating a policy (LARD) that combines these two extremes and does content based as well as load-balanced distribution of incoming requests among multiple backend servers.
    Further, since the LARD distribution policy is content based, it requires the connection to terminate at the front end cluster server (for requested content inspection) while the actual content to be fetched from the back end servers. This restriction poses another problem of keeping this scheme transparent to the clients so that there are no changes at the client side. In order to address this, the paper proposes a TCP handoff protocol to handoff connections from the front end servers to the back end servers.
    Contributions:
    1. The paper introduces LARD, a locality aware request distribution strategy. LARD attempts to assign all requests for the same content to the same back-end machine to increase cache hits. At the same time if the current node serving a content is overloaded and there are ideal node to service requests LARD offloads new requests to other ideal nodes preserving the load balancing property like that of WRR.
    2. To keep this scheme transparent to the clients, LARD introduces a TCP handoff protocol that allows the front end to inspect the content requested and forward the connection to a back-end server.
    3. To demonstrate the effectiveness of LARD, this paper presents a trace driven simulation of LARD and other existing distribution scheme on two real world traces providing clear cut descriptions of the factors that makes LARD performant.
    Discussions:
    Though this paper presents a neat algorithm to do content based request distribution, the TCP hand-off protocol seemed a bit hacky. Some key aspects were not clear to me from the paper like how to handle failure of front end server (single point failure), how to serve dynamic content where the content represented by a URL keeps changing, how to hand-off https (secure connections), how to configure LARD parameters (TL, TH) when there are heterogeneous back end servers (differing in configuration).
    Relevance:
    The problem of request distribution in itself is still relevant today which is more popularly known as load balancing. But I am not aware of any modern cluster systems that employ LARD like content based distribution scheme for load balancing which is suggestive of the fact that the modern day main memory sizes are good enough to handle traffic demands.

    Summary:
    In cluster based web servers that serve content who’s working set is larger than the capacity of an individual server’s cache, there exists an interesting problem on how to best optimize for cache performance while not compromising on balancing the load across the servers in the system. Through LARD, the authors are proposing a system that exploits the best of both worlds without compromising on transparency to an end client.

    Problem:
    Schemes such as WRR, while providing good load balancing do not take into account cache locality. Scaling up the number of servers in the cluster does not scale out the effective cache available in the cluster. This means that a lot of incoming requests would be disk bound and scalability is bound to disk performance.

    On the other end of the spectrum are schemes like LB, which optimize for cache performance, but don’t take into account workloads that have strong affinity to certain target content. Hot content housed on a particular server would see an unusually large number of requests going to that server, while another server in the cluster could be idling.

    Contribution:
    A best of both worlds scheme LARD that looks to optimize for cache hits, while still accounting for a balanced load in the system.
    1. Content Based Request Distribution : Partitioning content across the backend servers, and having the front-end redirect to backend servers that house the content based on current load, after assessing if the current load would incur less of a penalty on the target server, as opposed to having a different node read in the content into cache from disk.
    2. Replication of ‘hot’ content: If a particular target content is popular across several incoming requests, rather than overloading one node, having this content slowly permeate to other nodes that will balance the incoming workload and spread it around instead of creating a bottleneck on the original node.
    3. TCP Handoff: Proxying through an already established connection to the chosen backend, rather than tearing down and re-establishing a connection to provide for both performance and transparency to a client.

    Flaws:
    Despite the effectiveness of the TCP Handoff that they seem to have achieved, I am not able to see how the front end will not become a bottleneck as it has to inspect the target content of all incoming requests. Not only will this imply that all requests funnel through this front end, this will also prove to be a single point of failure in the system, especially since they’ve accounted for backend failures elsewhere in the paper.

    Discussion:
    I’d be very interested in knowing how the front-end is aware of the working sets currently in the backend server cache. Given that the backends also run standard distributions of Linux, the page replacement policy would be local to each back end and a local process could very easily replace content that the front end assumes is still in cache. Also, post hand off, how does the front end know about connection termination to account for the current load on each server?

    Summary:
    The Locality-Aware Request Distribution (LARD) introduces a request distribution strategy which uses the backend’s cache locality and load balancing.

    Problem:
    The problem the authors are trying to solve is to reduce the number of cache misses that occur when the size of the working set is much larger than that of the main memory cache size of a single backend server with Weighted Round Robin request distribution strategy.

    The solution they propose is to subdivide the server’s working set among the backend and maintaining load balancing at the same time.

    Contributions:

    • Increasing the effective cache size to be the sum of the node cache sizes by having a mapping between the targets and the backend servers
    • The approach of using both load balancing (as done with weighted round robin distribution) and locality-based strategies to achieve a better request distribution algorithm namely LARD
    • The introduction of LARD with replication which handles the case where a single target can cause a backend to be in an overloaded state
    • TCP connection handoff protocol which helps in client transparent connection handoff from the frontend to the backend
    • The use of the parameter “number of active connections” to measure the load on the backend servers was very important as it was used to evenly balance out the load among the backend servers
    • The process of making LARD and LARD/R CPU bound instead of being disk bound (as it was with WRR) because of cache aggregation
    Limitations:
    The frontend seemed to be single point of failure although I couldn’t think of a better way to distribute the requests among multiple backend servers without the use of frontend

    Relevance:
    The authors have introduced the idea that a scheduling algorithm for request distribution should consider multiple objectives other than load balancing.

    Summary: In this paper the authors proposed a request distribution strategy for web servers that can both maintain cache locality and respect server loads. In addition, in order for their algorithm to work, they proposed a TCP hand-off protocol that can forward an already-established TCP connection to other servers so that everything behind the front-end server appears transparent to the client.

    Problem: Single computer cannot serve websites with huge traffic. We use clusters instead. There is a single front-end in the serving cluster responsible for forwarding requests to one of the back-end servers. Previously methods had been proposed to maintain the load balance of back-end servers. However those methods regards each back-end server identical, and each will likely to receive requests about everything. This is bad for cache efficiency, because often the resources needed to handle any requests is too large to be fit in the main memory. Therefore we need a distribution strategy that can take the content of the requests into consideration and distribute similar requests to the same set of servers. In addition to that, since the strategy would require the front-end server to peek the request before distributing it to any back-end servers, they will also need a TCP-based connection hand-off protocol.

    Contributions: The main contribution of this paper is the proposal of the first content-aware request distribution algorithm. The content-aware request distribution algorithm is better than the usual round-robin like distribution algorithm in that the target size is greater than the size of main memory of a single machine. This is usually true when we use a distributed serving cluster. More importantly, their algorithm will respect the load of each cluster as well. That is, even though for each content target there is an associated server (or a set of associated servers), when load imbalance occurs in the server(s), it will redirect the request to other idle servers. According to the authors, the performance of their distribution algorithm is twice or four times better than the performance of a round-robin like distribution algorithm, based on simulation results.

    Applicability: The simulation method used by the authors is oversimplified that it is not clear whether their algorithm is applicable in practice. This is mainly due to the following concern. The algorithm requires the front-end to extract the target from the request, fast and accurate. The authors claimed this could be done by some sort of "hashing" and they did not explain that in detail. The simulation also assumed that the target was already given so this part was not tested in their simulation. But it appears to me that this is in general a hard topic. The basic question is, how can we know the target of a request without trying to understand (process) the request? By doing things like hashing the url, we could achieve some level of target extraction, but that may not be good enough because different target could share the same url, while similar targets could vary a lot in urls. It is especially hard when you are trying to do it fast - otherwise there's no point having dedicated back-end servers to process each request.

    Summary:

    The paper presents a content-based locality aware request distribution technique for web workloads served by a cluster of servers; it is aimed at maximizing the overall cache usage of the cluster without compromising the proper balance of load among them. A front-end node is configured to run the LARD algorithm and perform client-transparent TCP hand-off to the chosen back-end server, which then performs a Direct Server Return of the response to the client.

    Problem:

    The problem of frequent cache miss occurs in Weighted Round Robin load balancing when the web workload is much bigger than the available main memory cache size(each back-end is likely to receive request for any of the targets hence multiple servers end up with the same set of documents in their cache). So WRR does not significantly utilize the collective cluster cache availability. LARD tries to address the problem in the following ways:

    • Achieve high cache hit rates and nominal load balancing simultaneously.
    • Ensure that the front-end is a not a point of bottleneck and ensure client transparency.

    Contributions:

    • LARD algorithm ensures that effective cache size is the sum of the individual node cache sizes:
      1. This is achieved by maintaining a mapping between target->server/server set and unless the mapped server is overloaded, it processes the request thereby effectively utilizing the local caching.
      2. If the mapped server is overloaded, then the target is transferred to a lightly loaded server thereby maintaining efficient load balancing.
      3. The algorithm ensures that the temporary overloading does not cause too many request re-assignments by having a limitation to the sum total of connections handled to the back-end servers.
    • Servers are made CPU bound as LARD achieves cache aggregation.
    • A TCP hand-off mechanism is proposed which uses a forwarder to perform quick forwarding of all the ACK's received from the client(to avoid higher latency).

    Shortcomings:

    • A single front-end can still get quite overloaded and is a single point of failure. I feel that this algorithm will be more complicated to implement for a group of front-end nodes as it will require synchronization of the target-server/set mapping information.
    • The paper does not say how exactly the front-end will handle specific types of connections, for example: HTTPS connection; the front-end might have to be aware of application layer details, which in turn can make it complicated and bottleneck the system.

    Applicability:

    Although the paper presents strong short-comings of the WRR algorithm, I feel it might not be applicable to the present day web servers mainly because of the increase in the memory availability over the past decade. I believe WRR is still being widely deployed in many cloud services. Even with Content-Distribution Networks, DNS based load balancing seems to be in use. Apart from these, majority of the web content nowadays is dynamically generated and caching dynamically generated content is a broader problem(whose effectiveness is still unknown) and these techniques might not work quite well for the same.

    Summary: This paper investigates the potential to use locality to distribute request from front-end server to back-end servers in cluster-based network servers. Compared to traditional distribution strategy - weighted round-robin request distribution, locality-aware request distribution can improve hit rates in the back-end server. Locality-aware request distribution can improve secondary storage scalability and provide the feasibility to implement specialization in back-end server, these benefits are not discussed in this paper though.

    Problem: The problem of state-of-art weighted round-robin request distribution strategy is that only load information of backend server is used to make decision. Such kind of distribution strategy can achieve best load-balance, but it’s bad for high overall performance of the system. Because every back-end server works isolatedly, so that any of them can’t take benefit of the request locality, and the miss rate would be high if the request’s space is larger than the local cache size of every single backend server. As the performance gap between CPU and storage device increasing, the performance loss due to the underutilized locality will also increase.

    Contributions: This paper proposes the solution to utilize the locality in making decision to distribute the request from front-end server to back-end server. In order to inspect the content of the request, a TCP handoff mechanism was also implemented. The authors verified the performance of the strategy with simulations, and test the prototype system to confirm the feasibility to implement the solution with over-the-shelf hardware.

    Discussion: Generally speaking, the benefit that taking locality as a factor when making decision to distribute request is obvious in theory. This idea is very useful in other context as well. However, the authors simplified the problem in this paper. For instance, no much detail information about detecting the content of the request is discussed. Such kind of detail is important for using LARD on cluster-based network servers serving dynamic content. In addition, the Web servers nowadays are much more complicated compare to the time that this paper was written, and they need to support more complicated protocols, e.g. HTTP 2.0. We need more effort to use the locality-aware request distribution in current web server architecture.

    Summary: LARD takes into account the locality of data, as well as existing load, on backend machines in order to decide where to forward a resource request. A TCP handoff protocol is also discussed, which to me seems like an implementation detail, and out of scope for distributing requests according to data locality.

    Problem: in existing schemes, such as Weighted Round Robin, requests are distributed equally (according to load) to the backend. These schemes do not take into account any specialization of the backends, but assume that all backends can service any request with the same efficacy. Although this may be true for some dynamic content, it makes less sense for static content that could already be cached in memory of some backend.

    Contributions: this system does not treat all nodes equally, but assumes that some are specialized -- some nodes are better at servicing certain requests than others. They look at the system holistically, considering the way the data is stored on pages on physical machines, and harnessing the speed of these preexistent caching techniques to reduce requests' latencies. The knowledge of the specialization of nodes is placed upon the frontend in this system; it does not require any complex modeling, but just mapping resources to nodes. They assume that the frontend has no overhead for forwarding requests, which I don't agree with, especially because the frontend is required to forward all the TCP packets to the backends, even after having handed off the connection! They have both a simulation and prototype, which are tested with real-world workloads -- it's nice to see how it compares to other schemes, namely WRR.

    Applicability: a lot of content on the WWW is dynamically generated, which means that this scheme for static content may not necessarily apply. It would have been interesting if the authors discussed how their scheme could be used for dynamic content -- maybe having some content hot in machines' caches would also speed up requests for dynamic content. This topic is important for Content Distribution Networks, which specialize in serving static content. An alternative is to use a reverse proxy, which is put in front of the system, and is thus transparent to the system. The implementation of LARD in a real world application may be cumbersome, especially if it has to adapt to an existing system for serving content. This is why a reverse proxy would be easier to deploy.

    This paper addresses the classic scalability problem of the Web and proposed a solution to distribute incoming requests to back end nodes based on the content of the request .
    The LARD system consists of a front end and a number of back end servers . Based on the content of the target request , the request is redirected to a back end server . This technique has two benefits , a range of targets are processed by the same back end server ,thereby maximizing cache hits of the server and simultaneously balancing the load by redistributing the requests . LARD also ensures that the handoff between the front end and the back end servers is completely transparent to the user . This redirection is based on a hashing algorithm which tries to partition to the working set database to the back end nodes . This strategy is very rough approximation of the cache size usage of the back end server and that might vary with different targets . To ensure that no particular back end server is overloaded , the front end server maintains a count of the TCP connections between the clients and the back end server , and redirects the target to another node based on the count value . I believe this strategy might not be the most optimum strategy since there might not always be a direct correlation between the number of the connections and the CPU and Memory usage of the server ( especially for static and dynamic web content ) . The system does take into consideration unequal distribution of target requests and using a cache in the front end determines the occurrence of a particular target and dynamically assigns another back end server for that target ( replication) . The author also made sure that the parameters to determine idle load and max load are picked to maximize throughput and at the same time maintain an acceptable delay .
    Overall LARD is a great system which would work very well for web servers serving static content , however for dynamic content the algorithms needs to be tweaked around keeping the concept of " locality aware request distribution" the same .

    Summary:
    The authors argue that a locality-aware request distribution (LARD) in cluster-based network servers is much better than a simple load-balanced approach. They approach consists of content-based request distribution while also considering the load on the backend servers. They also design an efficient TCP handoff protocol between the front-end and back-end. They evaluate their system through simulations and by building a prototype and show that their design attains significant improvements over state-of-the-art.

    The main contribution of the paper is to make the servers more CPU-bound than disk-bound by increasing the effective cache size of the cluster. In a traditional load balancing scheme, each node’s main memory cache needs to be the size of the working set for the service to be CPU bound. Using content based distribution, each server node is effectively responsible for caching only a part of the working set. This increases the cache hit percentage drastically making the servers more CPU-bound.
    The LARD algorithm also takes into account the load on each node. If a node has many requests pending to be serviced, then the time spent waiting in the queue might be more than the time for disk access. Thus, based on threshold values, the front-end of a LARD system also does load balancing by assigning requests in a non-locality based manner as well when required.
    Thus they provide an efficient algorithm for request distribution. Fault tolerance is also provided in LARD/R via replication. Their TCP handoff protocol is capable of transferring the connection from the front-end to back-end without the client noticing. They do this through various techniques such as avoiding the 3-way handshake etc., which is pretty neat. Linux added a ‘tcp_repair’ option to the network stack from version 3.5 to help checkpointing sockets which does very similar thing for very similar reasons. However, migrating a TCP connection is very involved, there is a lot of state that needs to be captured. The authors gloss over this topic without divulging much details which was disappointing.
    Also, the paper considers only static content, it does not address the performance of the system for serving dynamic content. Dynamic content would require code execution and disk access at the backend nodes. The authors say that dynamic content can also be cached but the cited paper requires the applications to be modified and to handle caching themselves.
    Overall, the evaluation numbers indicate a steep performance increase in a LARD based cluster as expected for any practical workloads. They achieve better throughput, latency and resource utilization than state-of-the-art.

    Post a comment