CS 739 Reviews - Fall 2014: Interposed Request Routing for Scalable Network Storage

Summary:
This paper describes the design of Slice, a storage system architecture and explores interposed request routing in this architecture. Request routing is performed through a network element called mu-proxy.

Problem:
The problem being solved is to design a storage system architecture that can provide scalable bandwidth and capacity.

Solution from the paper/Contribution:
The Slice architecture essentially provides the clients with a virtual shared file volume. The mu-proxy which resides on the clients network path to the storage service intercepts the requests from the clients to this virtual volume. The proxy essentially selects a target server by switching on the request type and identity of the target file, name or block. This enables the spreading of the request workload in a somewhat balanced fashion across all servers. Mu-proxy does the switching by looking up the routing table to determine which server to switch to and rewriting the dst-ip address and port number for the packets. It’s a layer-5 device, so it runs above the transport protocol (TCP/UDP) and hence is associated with one of the 2-endpoints.
Their main contributions are:
- They split the storage service into directory servers, small-file servers and bulk-IO servers. This is pretty good as each request class can be optimized configurationally and in other ways for the workload that they are responsible for. The directory servers enable efficient namespace operations while the small-file servers aggregate multiple small requests into a large one for the bulk-IO part to process thus making the system as a whole more efficient. This also enables more effective caching to improve performance. The servers are stateless and perform write-ahead logging to enable easy crash recovery.
- The mu-proxy as a network element that can load balance the requests among the servers is a good idea. Since it’s essentially a network node, packet drops are fine as the transport layer or application can handle retransmissions. Also great is that all state stored in mu-proxy is soft state.
- The architecture also allows different switching policies to be implemented on the mu-proxy to handle the requests in different ways to handle different goals. All one needs to do is use a new logic for the switching table at the proxy. The paper explores two such policies names MKDIR and NAME HASHING.
- Providing a virtual shared volume to the clients, i.e., concealing the architecture and being transparent to the clients is also a good quality of this system.
My Key takeaway:
This paper uses pretty simple content based routing to achieve it’s ends. That’s a key takeaway for me. Also, ‘separation of concerns’ as exhibited in the slice architecture can be a good rule to live by.

Posted by: Chaithan Prakash | November 9, 2014 01:11 AM

Problem:
- The paper discusses the architecture and implementation of the Slice prototype for scalable file management.

Summary:
- The paper introduces the Slice architecture, which is composed of µproxy and network storage nodes to handle different requests using directory servers, striping policies, and small-file servers.
- The Slice prototype is scalable, delivers high bandwidth, and delivers high throughput request rate.

Contributions:
- µproxy is a simple, small, and fast communication protocol to transport the request to the correct server or network storage array. µproxy is also fault tolerant by using other protocols to recover dropped packets.
- Network storage nodes allow cryptographic protection, which is useful if the network is unsecure. The storage nodes also use redundancy for fault tolerance. This provides more protection, which is required when clients or the network is untrusted.
- File managers are dataless, and used to provide cache and processing power to clients. This makes the Slice architecture more scalable because clients can benefit from more parallel reads and higher bandwidths as more storage nodes are added.

Confusing:
- I did not completely understand why Slice uses mirrored reading if it degrades performance. Mirroring is a fault tolerance scheme, but the Slice uses mirroring for “client µproxies [to] alternate between the two mirrors to balance the load, leaving some prefected data unused.” I think either 1) mirroring should increase read performance by being able to do parallel reads, or 2) increase fault tolerance by reading both copies and comparing, which should not alternate nor leave any prefetched data unused.

Learned:
- The Slice architecture optimizes for different loads and requests, making it very scalable because there are servers to route the tasks.

Posted by: Kai Zhao | October 30, 2014 07:58 AM

Summary:
In this paper the author has described about a scalable storage system architecture named Slice. Slice exploit the benefit of request routing and with mu-proxy on I/O and file service traffic and achieves high throughput and bandwidth.

Problem:
With the growing size of internet existing solutions were not scalable e.g. GFS, xFS, Zebra etc and many of them fail short in ideal solicited load balancing. The Slice set to provides key benefits of shared disk system with the presence of untrusted client. So, they provide a sort of distribtued storage layer on top of fast LAN.

Contributions:

Unified view: porvides a unified view of storage server systems and support large virtual volumes and directories.

mu-proxy: It a is the central piece for request distribution similar to front end in LARD. It checks the packets and modifies the contents such as source/destination address, checksum etc.
mu-proxy is easily replicable over multiple machines as it is almost state less. It will never be a bottle neck as long as not all client request through the same mu-proxy machine
Dividing request type and server based on the type of request. Slice considers 3 types of request forwarding -- name requests, small file requests and bulk I/O requests. Bulk I/O requests are directly forwarded to the appropriate storage server for better bandwidth.

Confusion:
I still confued with how mu-proxy is working and at which logical layer it resides. How it allows untrusted clients to operate and still provide promised guarantees.

Learning:
Residing on top of fifth layer, allows a protocol to provide better resource distribution besides low overhead as well as get the lower level recovery guarantee. It is almost stateless as well as easily replicable.

Posted by: Rahul Chatterjee | October 30, 2014 07:50 AM

SUMMARY: The paper describes the idea of a "microproxy" (uproxy from here on out...) which interposes between client and server, and their implementation called "Slice".

PROBLEM: Scaling storage can be difficult, as you ideally wish a disk to appear as an infinite resource in terms of both size and bandwidth. By virtualizing the underlying storage, you can add more disks to an existing system to address either of these issues, and doing so in a manner that is transparent to the client is what Slice addresses.

CONTRIBUTIONS: They show that creating an overlay storage service on top of commonly used services like NFSv3 is possible without affecting performance significantly. They show that the use of interposition agents can make transparent access to a system possible (i.e. the clients do not need to change to use the service or be aware of what is going on "behind the scenes") and that this can be done in the network layer instead of in the software stack itself. They also switch on the type of request being made (namespace operations, batching of small operations, and bulk I/O) to optimize each case.

DISCUSSION: What I found confusing was that I am unsure if all the packet rewriting will work well on modern hardware which do a lot of packet inspection for security purposes, and may consider the uproxy agent to be misbehaving. What I learned was that scaling can be done not just "at" nodes but "between" nodes in a system by interposing at the network layer. I knew that many systems existed to do load-balancing, but I never thought of them as being part of the network.

Posted by: Zach Miller | October 30, 2014 07:49 AM

Summary:
This paper explores interposed request routing in Slice, a new storage system architecture for high-speed networks incorporating network-attached block storage. Slice interposes a request switching filter called uproxy along each client's network path to the storage service. And traffic is distributed to different servers.

Problem:
A successful storage system architecture must scale to meet growing demands and placing a premium on the administrating costs. A simple request switching filter can virtualize a standard network-attached storage protocol incorporating file services as well as raw device access and get scalable bandwith/capacity.

Contributions:
(1)The uproxy, as an IP packet filter, can redirect requests to servers based on the functional request class of the requests. The switching has a balanced distribution of file-objects and requests across the servers and improves locality in the request stream.
(2) The high-volume I/O traffic is routed directly to the network storage array, bypassing the file managers. Small file servers absorb and aggregate I/O operations on small files.
(3) The uproxy is decoupled from the client-side file system, providing compatibility with standard file system clients.
(4) Provide mkdir switching and name hashing as namespace routing policy.
(5) Use WAL(write ahead log) to get recovery.

Learned:
The different functional classes are interesting. High volume I/O and I/O for small objects are directed to different servers. In most systems, disk access can be the bottleneck and efficient access pattern such as sequential access can provide considerable performance improvement. Separating stream to different servers is a relatively easy and efficient way.

confusion:
The uproxy can see a lot information in the packet and rewrite some information in the packet. Will this lead to security problem?

Posted by: Jing Fan | October 30, 2014 07:49 AM

Summary:

In this paper the authors proposed interposed request routing in a new storage system architecture for high-speed networks incorporating network-attached block storage called Slice. μproxy, a request switching filter is interposed along each client's network path to the storage service. The μproxy intercepts intercepts request traffic and distributes it to across a server ensemble. In this paper the authors have proposed request routing scheme for I/O and the file service traffic. The implementation details and performance benchmarking of Slice is also presented in this paper.

Problem:

The internet was growing rapidly and hence the demand for large-scale storage services was also growing rapidly especially for scalable computing, multimedia and visualization. Therefore a successful storage system architecture must scale to meet the rapid demand while keeping keeping the cost to administer and upgrade the system comparatively low. At that time the bandwidth gap between storage area network (SANs) and LANs was getting narrower due to the then recent advancements in LAN performance. Therefore devising a distributed software layer to unify the decentralized storage resources is a key challenge that this paper tries to address.

Contributions:

Distributing request traffic across a collection of storage and server elements that cooperate to present an uniform view of the shared file volume with scalable bandwidth and capacity using μproxy.
Proposes the architecture of Slice protocol followed by its implementation
Using functional decomposition and data decomposition to manage workflow.
Proposes and evaluates request routing policies within the architecture. Two policies for transparent scaling of the name space of a unified file volume were presented (MKDIR Switching and Name Hashing).
The Slice prototype was evaluated using synthetic benchmarks including SPECsfs97 to demonstrate scalability and compatibility with the NFS V3 standard.

Learned:

It was interesting to learn how Slice managed different work loads intelligently and efficiently to distribute the load effectively enabling scaling.

Confusion:

Reconfiguration, especially for Name Hashing systems was little confusing to me.

Posted by: Saikat R. Gomes | October 30, 2014 07:48 AM

Summary
The paper describes Slice , a new architecture for network storage that allows storage to scale out easily .

Problem
As the back end nodes in a SAN increase , there has to be a way to unify the decentralized storage resources so that the capacity of the network backbone and storage can be utilized efficiently

Contributions
1.Interposed Routing - A neat way of redirecting file system requests to specialized servers by interposing them at an intermediate programmable switch using uProxy.

2.This allows the underlying file system to not add any complexity when extra nodes are added or when load ( i/o requests ) increases .

3. Functional Decomposition - uProxy redirects requests based on the type of requests ( name lookup , read/write) as well as on the size of i/o requests indirectly acting as a load balancer and also reducing latency of requests .

4.Data Decomposition - Policies such as striping , mirroring and file placement can be added and implemented in uProxy itself .

Comments
What I learnt : Adding complexity in a higher layer ( such as various policies ) to help the underlying layer to scale .

What I didn’t understand : Why did uProxy have to be a layer 5 protocol component .

Posted by: Arkodeb Dasgupta | October 30, 2014 07:28 AM

Summary
The authors describe their motivation, design and implementation of Slice a network storage which is scalable. Slice is set of loadable kernel modules in file servers along with proxy which interposes NFS client traffic and reroute requests to servers which are best suited to handle them. This decoupling of requests provides a way to scale each workload type.

Problem statement
Scalable data storage is need of the day. As the speed of network has significantly increased it is now possible for network storage devices to have competitive speeds. Being able to build a network storage device with inexpensive components provides cost effective way for scalable data storage. Slice tries to build a cost effective scalable, decentralized data storage with competitive speed.

Contributions
1) They interpose on a standard file system to build a highly scalable network storage.
2) They distributed responsibilities. They divided functionalities of the servers and tried to scale functions by adding more servers for each function.
3)The client side is unmodified. If we need to scale for any other workload, it can be done by changing proxy and adding file servers with specific modules.
4) They try to balance load and capacity by distributed namespace evenly across multiple servers.
5) Recovery is done through write ahead logging by coordinator and proxy and is independent of client, server and storage nodes.

Confusing
It is unclear how access control is implemented.

Learning
Scalability can be achieved by divide and conquer strategy.
Soft state and stateless machines are easily recoverable.

Posted by: Sreeja Thummala | October 30, 2014 07:19 AM

Summary
The paper discusses Implementation and performance of Slice, a storage system architecture which incorporates network attached block storage.

Problem to solve
To build a storage system that scales to meet the increasing storage requirements mainly for web based applications with a large client base over the internet. Need to distribute traffic among storage resources. Devising a distributed software which is able to unify the decentralized storage resources.

Contributions
Implementation of a request switching filter named uproxy as an IP packet filter. Used to divert traffic based on the type and size of the request from the client.

Request switching performed based on content. Provides for better distribution of files across the available servers. The uproxy filters out the large file I/O operations which get serviced by the network storage arrays. The smaller files and namespace handling request are processed by intermediate managers.

Distribution of namespaces requests performed using two policies mkdir switching or name hashing. Each capable of distributing namespace load across servers with respective tradeoffs.

Use of write ahead logging for recovery purpose.

Confusing points
For Mkdir switching how is the value of probability p selected? And I didn’t understand as to what the metric is to determine if a file is large or small.

Learnings
Learnt of how the system utilizes the concept of content based distribution to distribute the load effectively with the help of underlying network layer.

Posted by: Shiva Prashant Chada | October 30, 2014 06:31 AM

Summary:
The paper discusses the architecture and implementation of Slice, a system for scalable NAS. The main feature is the µproxy which interposes on all requests from client to a virtual NFS server, rerouting requests among the storage nodes based on request function and target data.

Problem:
Growing data centers hosting web applications require large-scale storage services. Both highly scalable bandwidth and capacity are desired as the usage of these web applications grows.

Contributions:
- µproxy routing of requests based on function: high volume I/O to storage nodes, namespace operations to one set of file managers, and small I/O to another set. This allows specialized treatment of each type of request which may have different caching or data layout properties, as well as allowing the separate functions of the system to scale independently by adding resources where needed.
- µproxy also can distribute requests evenly across nodes for load balance, or use the data information in the request (which block or object+offset) to stripe data deterministically across storage nodes.
- Because µproxy is “part of the network”, individual request failures can be treated as dropped packets so correctness is not affected.
- File managers only maintain a cache, and are otherwise stateless. Important information is logged to storage nodes.

Learned:
Both functional and data routing ideas together seems interesting. It provides the backend systems with just the requests they best know how to work with.

Confusing:
It seems this wouldn’t work well (particularly data striping) if the backend storage nodes change semi-frequently. The discussion of µproxy reconfiguration was short.

Posted by: Brandon Davis | October 30, 2014 03:35 AM

Summary:
Slice is a scalable storage system which achieves high block storage by interposing requests routing. The system deploys a request switching filter - µproxy to distribute load across different servers maintaining a transparent view of the shared file system. Decomposition at functional level (i.e. based on volume of I/O - small, large and namespace requests) and data level (request class) helps in distributing requests evenly and allowing the different workloads to scale independently.

Problem:
Increasing storage demands need to be met by scaling the storage systems. Many systems interconnect storage devices and servers through SAN. Idea is to come up with way to unify these distributed storage nodes.

Contributions:
1. µproxy, a request routing filter to distribute requests based on their type, target objects and decomposition of request traffic.
2. High volume i/o is directly routed to storage nodes improving scalability.
3. Balanced distribution of requests and file objects across the server nodes there by improving locality.
4. mkdir switching and name hashing to distribute namespace load across servers.
5. Compatible with existing file clients.
6. Dataless file managers journals its updates in write-ahead log which helps in faster recovery.

Learning:
Main learning outcome is, how effective routing based on request type, target objects and decompositions helped in scaling a distributed storage system. The design was similar to LARD in many ways which made it easy to understand.

Confusion:
The paper describes µproxy can rewrite source IP, destination IP and modify packets (something done at layer-3). How is it a layer-5 protocol (as mentioned in Section 2.1).
How does the role of µproxy get modified when network is insecure? If cryptography is implemented, I guess µproxy has to decrypt and re-encrypt source address, destination address etc. How much overhead does it incur and how does it effect the over latency and scalability of the system?

Posted by: Harneet Singh | October 30, 2014 03:23 AM

Summary:
Slice, a scalable distributed network storage server which exposes an object storage model to the clients. The architecture has a uproxy, IP packet filter which routes the request based on the content of the request type (functional decomposition) to the file managers which again routes it to the specific storage server where the block resides in (data decomposition). The uproxy rewrites the destination address in the IP packet to the server based on the request type. The file managers saves the block maps and other structures such as logical ID mappings to servers. Based on the request type, all namespace operations goes to directory servers and low I/O is forwarded to the small file servers which coalesces small file disk reads for efficiency, large file I/O is forwarded to the server node directly.

Problem:
A scalable storage system combines the name space operations as well as the file operations which would not scale properly. Some of the systems separated either the bulk I/O operations or name space operations but not the both. The shared disk model, which layers the file system over the network storage has all the policies in the file system but the storage just provides the block layer storage. Slice, solves the problem of separating both the bulk I/O operations as well as name space operations. Slice solves the problem using IP packet filter called uproxy, where all the policy decisions are made.

Contributions:
1. It distributes the file service across various servers with the help of uproxy, based on request type.
2. Bulk I/O can access the storage directly.
3. Small I/O will aggregate all I/O operations on small files and handle them.
4. Since all the changes are in the uproxy, client needs not be changed.
5. A fixed threshold offset is used to distinguish between small and large I/O operations.
6. Name space operations is scaled by MKDIR switching and namespace hashing.

Confusing:
I dint understand the fixed placement policy, by fixed placement did they mean routing the request to the same file manager or the block allocation policy in the storage nodes.

Learned:
This paper again uses the Separable policy to reduce the overload on the file system to maintain all the operations

Posted by: Dinesh Rathinasamy Thangavel | October 30, 2014 01:52 AM

Summary:
Slice is a storage system architecture designed for scalability and performance. The key idea of Slice is to inject an intermediate routing system between each client and the file storage so that the requests can be redirected to a collection of storage servers while providing a unified view.

Problem:
Previous solutions were not highly scalable especially with the rising internet usage. Also, they were not able to achieve load balancing as needed. Also the LANs performance improved significant enough to be used as the storage infrastructure but it required a distributed software layer on top of it.

Contributions:
• Slice represents a unified collection of storage server systems. It can support large virtual volumes and directories.
• Uproxy is the central piece of request distribution (just like the front end in LARD). It checks the packets and modifies the contents such as source/destination address, checksum etc.
• An important usefulness of uproxy is that it resides within the network layer and so the existing recovery techniques (such as TCP) can be used. This also helps in it being soft state and hence very low overhead.
• Slice considers 3 types of request forwarding: name requests, small file requests and bulk I/O requests. Bulk I/O requests are directly forwarded to the appropriate storage server for better bandwidth.
• Directory servers and small-file servers provide support for dedicated requests for faster performance but are backed up by the same backend storage nodes.
• Slice nodes are object-based and the requests are referenced by the logical offsets in the objects. uproxy converts the incoming requests to such format accordingly.
• uproxy can be implemented as a IP filter and is light weight since it only modifies small portion of the packets and correspondingly the checksum recalculation is easier.
• The file managers use write-ahead log for recovery.

Confusion:
I did not fully understand the mkdir switching, it talks about probabilistically choosing the directory server since it would balance load. How would this help and wont it rather help in likely having the new directory server at the same place as the parent directory?

Learning:
I learnt how a scalable network storage could be built with LAN based infrastructure. It’s interesting how Slice managed to use the underlying network layer benefits for good performance while also being a small light weight module that can be plugged below the system’s IP layer.

Posted by: Chetan Patil | October 30, 2014 01:49 AM

Summary:
This paper proposes Slice, a unified file service for network storage with scalable bandwidth and capacity. It employs a proxy, which interposing on the NFS traffic and reroutes packets to the correspond servers. This assignment of packets is based on workload: large-files, small-files, and directory operations. It scales with cheap LAN.

Problem:
People need large scale storage services with high bandwidth and capacity, with cheap hardware and network connection, transparent view of the whole file system. NFS with Storage Area Network fails to achieve all of the above, one can use LAN as storage backplane and still has scalability.

Contributions:

1. The idea of uproxy, which is a proxy server. It reads the content of network packets and redirect them to appropriate backend servers based on their wordload: high-volume I/O (large-files) to the storage array (they are optimized for large I/O operations); small-files and directory operations to the appropriate servers.

2. Load balancing, Slice redirects packets so that it assigns requests evenly to the servers (it is similar with LARD when front end builds mapping to achieve load balancing).

3. Separation the file service with "dataless" services. So the modification of each part is simpler, the recovery process is also simpler.

Learned:

The idea of separation and adding layer: Separation of file service with "dataless" service. The new layer of uproxy based on NFS, which provides cheap scalability to NFS client.

Confused:

I am still concerned about the security that allowing a proxy to intercept and redirect the content of packets. If the uproxy is controlled by attacker, and intercept the packets to get information, and maliciously redirect the packets, can the Slice has some protocol to prevent that?

Posted by: Shike Mei | October 30, 2014 01:49 AM

Slice

Summary: The requests to a storage server are interposed by µproxy and they are handled separately based on the type of the request by routing it to a particular File Manager that handles those requests. Slice's µproxy is placed on the network path between the client and the storage service. The purpose of building this system is to make the clients which use standard distributed file system protocols like NFS to be able to access a functionality like Storage Area Networks (SAN) using the LAN.

Problem:
The growth of the internet over the past few years, makes scalable storage systems to be necessary to handle large amounts of data. Also, the advances in LAN technology can be used to build a scalable distributed storage system that can be comparable with the Storage Area Networks. Also, while introducing this software layer for distributed storage, the clients shouldn't be aware of the internal details and they should think that they are still in touch with only server. These requirements gave rise to a system like Slice which does request routing by interposing µproxy between the client and the storage server.

Contributions:
1. The idea of providing functional decomposition by handling name space requests and small file operations separately for using the locality and caching was a good contribution.
2. The idea of handling bulk I/O separately without going through the file managers helped prevent the unnecessary overhead on file managers. Instead they access the storage nodes directly..
3. The idea of data decomposition using name routing, striping policy and file placement policy helped to place the data and request workloads to be balanced across all servers.
4. The volume partioning, mkdir switching and name hashing helped Slice to distribute the load of a single file volume across multiple servers.
5. The aggregation of small I/O s is done to improve the performance.

Things I didn't understand:
They told µproxy lies in network layer 5, which I thought to be session layer and later I found that they have mentioned that the µproxy is placed below IP layer.

Things I learnt:
I learnt that multiple types of requests can be handled differently to maximize the benefits for each type of requests. I also learnt that it is possible to provided SAN like architecture without modifying the existing clients.

Posted by: Adalbert Gerald | October 30, 2014 01:43 AM

Summary:
Slice is a highly scalable network storage system architecture which provides NFSv3 clients a unified shared file volume (virtualization) with scalable bandwidth and capacity. The uproxy packet filter can reside in programmable switches, network adapters, or in the client or server’s interface. Excellent scaling is made possible with the aggregation of cheap LAN network components and virtualization of network storage.

Problem:
Increasing demand for massive scale network storage combined with the increase in popularity of web-based applications created the need for a system that could satisfy both of these needs. By attaching more storage directly to the network, both bandwidth and capacity could be improved. Also, recent improvements in LANs (narrowing the bandwidth gap with SANs), making the new Slice system architecture possible.

Contributions:
Slice system design:
-With Slice, the key system component is the uproxy packet filter which sits between the NFSv3 client and the network directory servers, small files servers, and the network storage array.
-The uproxy packet filter redirects read, write, and name space requests according to system policies (e.g. file placement policy, striping policy, bulk I/O etc).
-Block-I/O Storage Nodes: Bulk I/O operations route directly to storage nodes. Static striping or mirrored stripping are supported.
-Small File Servers: I/O requests below the 64k threshold go here. Also, I/O operations on small files are absorbed and aggregated (i.e. batching) to improve scalability
-Directory servers handle name space operations such as lookup and create
-Load Balancing: mkdir switching and name hashing both help distribute namespace load across servers
-Failure Tolerance: intention logs can be replayed by coordinators (roll forward, rollback),
-The benchmark tests demonstrate scalability and performance

What I Found Confusing:
In section 3.2, I didn’t understand aspects of the shared hash chain… Does the shared hash chain help resolve conflicting operations? Why is it important for it to be resident in memory?

What I Learned:
Transparent proxies at a low level in the network can provide content based routing, batching of I/O requests and other policies (like name hashing) to balance load, and increase system scalability.

Posted by: Jason Feriante | October 30, 2014 01:35 AM

Summary and Problem

This paper discusses the design and implementation of interposed request routing using μproxy in the Slice system. This work tackles the problem of providing scalable bandwidth and capacity in storage area networks while providing an uniform view of the volume to the clients. The paper introduces a component called as μproxy which, at a high level, interposes the requests from the clients and routes them to appropriate servers.

Contributions

1. The design and implementation of μproxy - The μproxy routes the clients' requests by looking into the request type. Bulk I/O is routed directly to storage nodes. Small I/O and namespace operations are routed through file managers. The routing tables are compact. All this state is soft and Packet drops and missing entries will not affect correctness which is very similar to network switches.

2. File managers for small I/O and namespace operations - The file managers do not store any data on themselves but take care of caching small files, buffering, provide computation to do cache look up etc. They also help in crash recovery scenarios by logging the updates. They store data on the back end storage array. They are also stateless.

3. The paper also describes about various routing policies that can be implemented in this architecture and its impact on system structure.

What I didn’t understand

I didn't understand the following: 1. It is not clear whether μproxy is violating the End-To-End principle of system design or not. 2. It is not clear how object stores provide cryptographic protection inherently which facilitates the "outside" placement of μproxy.

What I learned

I learned that properties like scalability can be achieved using very simple techniques like content based routing - especially for file system workloads. I also learned that it is possible to do so in completely client transparent manner.

Posted by: Ramnatthan Alagappan | October 30, 2014 01:17 AM

The paper proposes an interposed request routing architecture in a network storage system which uses an intermediate packet filter between the client and the server that distributes the requests amongst the storage nodes according to the parameters of the request for achieving high scalability.

Contributions :

1. Content based request switching that routes the requests with respect to both the file workload and also based on the type of request and the files accessed.
2. Use of directory servers and small file servers to handle specific functions. Small file servers aggregate multiple requests on small files and thus handle efficient access of small objects.
3. Transparent to the client since the proxy rewrites the source and destination addresses of the packets in the request and the response accordingly.
4. For large files, it uses a mirrored striping strategy to replicate blocks on multiple storage nodes for fault tolerance and low latency.
5. Specific name space routing policies in place like Mkdir switching and Name Hashing. Mkdir Switching uses a probability based approach for redirecting the mkdir request to a different server to achieve load balance. Name Hashing distributes namespace operations based on a hash function of the file name and its parent directory.
6. Write Ahead Logging used by the file managers to recover from failure.

Confusion :
The section about the implementation of the small file servers was not clear to me.

Learning :
The idea of configuring a packet filter on the network in a network attached storage system for effective request distribution and provide load balancing.

Posted by: Krishna Gayatri Kuchimanchi | October 30, 2014 01:12 AM

Summary,
In this paper the authors discuss a request interposing mechanism that intercepts client request to a virtual storage server to transparently route them to a set of physical servers. The system also interprets and changes the response from the physical servers to make them look like responses from the virtual server.

Problem,
In this work the authors aim at addressing the problem of seamlessly scaling and reconfiguring the storage in a storage system transparent to the client using the system. The authors achieve this by presenting the storage system as a single virtual server to the client and then interpreting the client requests to this virtual server and routing them to physical servers.

Contributions,
The authors have designed uproxy to intercept and transparently alter the request packets and route them to the physical servers like NAT.
The ability to handle small file request and filesystem metadata requests separately from I/O intensive requests on large files.
Ability to route the requests with only the soft state available at the uproxy router component without requiring time and bandwidth intensive book keeping.
Mechanisms like MKDIR switching and Name hashing to acheive better load distribution for name space related operations.

Learned,
The technique to categorize the workload into different classes and handling them separately to inprove efficiency.

Confused,
The router component works below the IP layer and doesn't maintain much state other that the routing table, how will it handle IP fragmentation? Does it mean that the client, the routing component and the storage system should be within a single network boundary?

Posted by: Sathiya Kumaran | October 30, 2014 01:06 AM

Summary
This paper explores interposed request routing in Sliece, a new storage system architecture for high-speed networks incorporating network-attached block storage. Slice uses a switching filter, uproxy to route each request to its appropriate servers and provide a unified shared file volume based on classical NFS.

Problem:
Demand for large-scale storage services is growing rapidly since a prominent factor is driving this growth, the concentration of storage in data center that hosts web-based applications.
The LAN’s performance is much similar to SAN’s than before, which means LAN has more changes to be the storage backplane. In this case, a decentralized storage resources is necessary.

Contributions:
1. A new architecture for network storage that based on interposed request routing is proposed. This kind of request routing is called uproxy and functions as a request switching filter inside a programmable switch, network adapter, client-side module or service interface. This uproxy would distributes request traffic to directory servers, small-file servers or network storage nodes directly.
2. The request routing scheme spreads the data and request workload in a balanced fashion across all servers. Separate the small files accesses from large file ones so that meta data-related accesses can be faster.
3. Using the write-ahead log to help recover.
4. Provide a unified view of shared file volume on NFS clients.

Confused:
uproxy will modify the IP address in the network packets. Will this expose some security issues, like man-in-the-middle attack or DoS attack?

Learning:
The thing I learned is that using proxies can help distributed file systems to achieve a more balanced access load on each server, redistribute requests to its destination and increase the performance of the system.

Posted by: Lichao Yin | October 30, 2014 01:05 AM

Summary:
This paper introduces a request router called µproxy that is a part of their network storage service named Slice based on NFS and NFS protocols. Combined with caching, this µproxy efficiently redirects requests from the clients and responses from the storage and directory servers to decrease latency. It does this by rewriting IP headers of the requests and responses received such that the request router itself looks to be an NFS server or client.

Problems:
The problems the authors were trying to solve were in the context of scaling an NFS like service for more and more demand. Much like Facebook’s “mcrouter”, this needs to be fast and contain as little state as possible so that it can serve as many nodes as possible downstream and upstream. Lastly, NFS is a simple design, µproxy should not attempt to add too much unneeded complexity in order to route this requests intelligently.

Contributions:
The contributions stated up front are quite clear:
• Interposed request routing for informed proxying by doing functional decomposition and data decomposition of traffic.
• Two policies for transparent scaling of the namespace which include:
o “mkdir switching” or storing directories on different servers regardless of the parent directory thereby simplifying load and storage balancing
o “name hashing”, which is similar to consistent hashing
• This request router is unique in that it doesn’t just proxy requests based on some packet header, but acts an endpoint and rewrites packet headers. In doing so it acts as a man-in-the-middle of sorts. Since it can be trusted this design allows the µproxy to route encrypted requests running over SSL/TLS, etc.

Unclear:
It was unclear to me why this service needed to be separate from the client making requests. As stated the µproxy could run on the client host to route requests. The only benefit I could see is if adding an additional network hop cost was less than the overhead of proxying the request on a separate host or network device.

Learned:
I learned about “mkdir hashing” and its usefulness in balancing storage evenly across servers, such that if a deeply nested path has several large files they aren’t all placed on a single server.

Posted by: Peter Collins | October 30, 2014 12:56 AM

Summary: This paper presents a project called uproxy that can intercept the packets sent by NFS client to create a virtual NFS server. uproxy can reroute packets and distribute requests to make the file system scalable.

Problem: I don't know precisely how NFS does, but according to this paper, the original NFS failed to meet one or more of the following: (1) large-scale scalable storage, (2) cost efficient hardware and architecture, (3) presenting the client a unified view of the entire file volume, (4) balancing request loads.

Contributions:
1. Implementing a proxy by intercepting NFS packets. By doing this client can keep using the NFS client without making any changes.

2. Having different servers handle different requests. They have a directory server that handles naming requests, a small file server that handles small file IOs, and a storage array that handles bulk IOs. Bulk IOs will be directly routed to the storage array without going to file managers.

3. Routing requests in a way that it will distribute requests evenly to servers.

4. Recovery by making the file manager writing operations to a write ahead log.

Things I learned: How to organize a scalable file storage system. It has a directory server, a small file server, and a storage array.

Things confused me: If a bulk IO happens, it will be directly routed to the storage array, bypassing the file manager. However only the file manager will keep an operation log. How do we recover from bulk IO failures?

Posted by: Menghui Wang | October 30, 2014 12:33 AM

This papers introduces a new storage system architecture which provides a uniform view of the shared file system implemented using multiple servers. It does this by interposing on normal NFS traffic and route packets in a content aware manner.

Contributions:
1. The main contribution of this paper is to provide a unified file service just by intercepting packets and route them according to their functional class. This allows to use separate policy for different kind of classes in a file system. For eg, block I/O servers handles large file I/Os, therefore they can be optimized for large read and write operations. This technique is quite similar to LARD which also contains a table which map urls to backend server.
2. The uproxy is mostly stateless (contains some mapping, but doesn't own it) and doesn't have to do anything if packets drop. Using this quite simple request switching technique allows to use many storage servers to provide a unified file service.
3. Directory servers and small file servers themselves use network array stores (large file stores) to store their data. The role of these servers is to provide separate computation for specific operations, aggregate small informations in a large block and then store it in some storage node. However this also makes it difficult to synchronize multiple access to same information.
4. All of these operations are transparent to the user.

Flaw: I find the atomicity and recovery policies in this architecture to be quite messy. It is heavily dependent on routing policy used by uproxy to route namespace operations (MKDIR switching, Name hashing).

Confusion: Author mention that uproxy is a layer-5 protocol component, this means it has to be able read and rewrite network packed, extract some information out of them. This means they should reside on some nodes which are intelligent enough to do these task. I am skeptical whether network switches and routers would be able to do that. If not, uproxy has to be put in some servers node. This means that clients send their packets to these servers and then they forward it. Wouldn't this increase the round trip time for namespace and small file operations.

Learned: I learned that functionality of distributed system can be implemented at a low level protocol also. You don't need programs running on specialized servers to distribute your data on many servers.

Posted by: Avinaash Gupta | October 30, 2014 12:33 AM

Summary:
This paper introduces a distributed file system - Slice, which manages block storage devices attach to high-speed network to form a unified storage service. Slice provides compatibility to NFS client by a request switching filter - called a uproxy - along each client’s network path to the storage service.

Problem:
- Speed of LAN is fast enough to be a storage backplane, we can attach storage devices to LAN, but we need to devise a distributed software layer to unify the decentralized storage resources.
- The software layer needs to distribute the request traffic across a collection of storage and server elements to present a uniform view.
- Need to support different clients (or compatible with existing system).

Contributions:
- The idea to use to uproxy to intercepts request traffic and distributes it across a server ensemble. uproxy is actually a packet filter runs as an intermediary between a client and the server ensemble, the important thing is that it preserve compatibility with NFS clients.
- The idea to separate the data service and dataless services, such separation makes the job easier to scale either of them.
- Implement different servers to handle different workloads - Directory servers handle name space operations; while small-file servers handle read and write operations on small files and the initial segments of large files. Slice also provides the interface to directly access storage for large file read/write.

Learned:
Although this paper discusses the many topics about distributed file systems, e.g. separate name service from the block storage service; layer the file system functions above a network storage volume using a shared disk model; balanced distribute file objects and requests across servers; improve locality in the request stream, etc. The most impressive point for me is the uproxy implementation, which provides the compatibility to NFS client nicely.

Discussion:
What makes me confused is the way to deploy the information about the servers to uproxy. Is it possible to add/remove network storage nodes on-the-fly? If uproxy maintains very detailed information about other servers, it will be a big challenge to distribute the information to all the uproxy’s consistently, especially considering the failures might happen in some servers. The authors only mentioned that uproxy may load new routing tables lazily from an external source. I am not clear about how much information needs to be maintained in every uproxy.
I am also confused about the placement of the uproxy. In theory we can place uproxy at any point in the network path between client and server ensemble, but every uproxy can support only one client (in my understanding), it’s not practical to place it in a network switch which is in the path of many NFS clients to the server ensemble, not to mention that’s it’s not so easy to install some software in the switches. So, the best place for uproxy should be the same machine as NFS client in my opinion.

Posted by: Peng Liu | October 30, 2014 12:23 AM

Summary
The Slice system provides a routing interface that enables existing distributed file systems to work on top of SANs. The Slice μproxy is analogous to LARD’s front end, in the sense that it intercepts packet traffic, inspects the content and decides to make routing decisions based on the request. In doing so, the Slice system is able to effectively partition its requests into namespace based operations, accesses to small files and bulk I/O requests, and forward them to different servers thereby improving request locality for the first two cases, as well as avoiding the extra hops for the bulk I/O requests.

Problem
SANs provide a large amount of block storage in the back end. To enable existing file system clients that speak known protocols such as NFS to work unchanged, and yet scale well is the challenge being addressed in this paper.

Contributions

μproxy that sits in between clients and the filesystem, and routing requests to specific servers that handle different kinds if functionality in the filesystem. This not only allows the system to scale, but perform better because of request locality being preserved.

Choice of placement of μproxy the on the network stack enable failures to be handled in exactly the same way as packet/request loss

The related work describes that some research groups were already looking into separating file management from data access, and others were looking into separation based on i/o size, this system effectively unified both these ideas

What was learned
Also a learning from LARD, ability to inspect incoming requests can enable specialized handling that can both optimize performance and provide the ability to scale.

What was confusing
The entire section 3.3 - Storage Service Structure

Posted by: Vijay Kumar | October 30, 2014 12:15 AM

Summary:
In this paper authors present Slice, a new architecture for scalable network attached storage, using interposed request routing to provide a unified shared file volume with scalable bandwidth and capacity. Slice employs a proxy that intercepts
and transforms request packets and routes them to the appropriate servers based on workload components (large-files, small-files, namespace/file attributes). The results from their prototype show that the architecture scales even using general purpose LAN as storage backplane.

Problem:
With growth of internet and with wide spread use of web-based applications, the demand for large scale storage services which are highly scalable in terms of bandwidth and capacity, is growing rapidly. Those days prevalent systems used dedicated storage area network to enable incremental scaling of bandwidth and capacity by attaching more storage to the network. Here authors have tried to provide a new architecture which also scales incrementally in term of bandwidth and capacity but uses the general purpose LAN as the interconnecting backplane.

Contributions:
-The main idea for the architecture to use uproxy to redirect request to appropriate servers. With approach to redirect high-volume I/O (large files) directly to storage servers.
-The flexible requirement on placement of uproxy anywhere in the network, because of their limited soft state requirement.
-Distributing traffic with different workload to different servers specialized for that workload.
-The idea to deal with packet loss with use of available transport layer protocol.
-The principle of "dataless" file managers to achieve simple recovery process.
-Aggregation of small I/O for better performance.

Confusing:
I have doubt regarding the namespace routing policies, authors describe two policies Mkdir Switching and Name Hashing. So are both the policies implemented simultaneously and decision to choose one of these is based on the workload with
large or small directory size at runtime, or only one is configured during the setup and that is used for all directory size workloads.

Learned:
I learned that the interconnecting network which is a part of the distributed system can be utilized in the design for other than simple low level operation of delivering packets, to build a scalable distributed system. Here using well known
concept of proxies in networks, authors have build a scalable network area storage.

Posted by: Bhaskar Pratap | October 29, 2014 11:33 PM

Summary:
This paper introduces Slice, which interposes a proxy which can reroute network traffic. This allows for virtualization of resources accessed through the network, and routed to by Slice.

Problems:
The rapid growth of networked computers, especially data centers, creates a need for easily scalable systems. Previously, they utilized Storage Area Networks to interconnect storage. But advances in LAN make it comparable to SAN, driving the need for a software layer for the distributed storage layer.

Contributions:
The idea behind Stripe is very simple, and (as mentioned in the paper) not overly original - mediate the requests, and separate them out based on content. This allows for a virtualization of file system resources, as had already been done.

But more importantly is that it shows that this kind of system can be used even with untrusted clients. In addition, this method improves the usefulness of caches, by reducing contention.

Confusing:
I had a hard time following some of the security implications of this model.

Learned:
I did not know that it would be possible to write such a software level that can run on the switch. It was surprising that the proxy could run at wire speed levels.

Posted by: Frank Bertsch | October 29, 2014 10:20 PM

Summary:
This paper presents a networked distributed storage that is aware of the request workloads coming in over the network. This is achieved by a small proxy server (or program on the client) to read network traffic at line rate, rewrite packet headers based on policies defined by the distributed storage, and forward the packets to the appropriate servers. The paper then goes on to describe the implementation of the system and a short analysis using various homemade and industry standard benchmarks.

Problem:
Since the goal is to provide a storage system over the network there is the very real problem of scalability bottlenecks. For instance as the number of clients grows the network traffic and subsequent load at the distributed storage system servers will increase proportionally. This will inevitably result in server overload, especially at servers with popular content. The paper seeks to solve the problem of load distribution to the pool of backend storage servers by using context gleaned from the actual request type arriving in a packet on the wire (in way similar to the LARD system).

Contributions:
- uproxy, which is essentially a proxy server (depending on where it is placed), that reads the contents of network packets and forwards them to the appropriate backend servers based on Slice policies. This is transparent to clients.
- Smart bootstrapping of TCP to handle packet drops between the client and server. This means uproxy does not need to store any hard state. It is free to drop packets as needed. The transport layer will handle reliability.
- The actual separation policies chosen for Slice that lend to good overall load balance which translates into high throughput rates when the number of clients connecting begins to scale.
- The implementation and analysis of the system using in house and industry standard benchmarks.

Learned:
I enjoyed the clever design of the system to use already in place lower level technologies. It made a lot of sense to build a system that bootstrapped already existing components such as TCP to achieve certain goals.

Confused:
The section about name space operations was a bit unclear to me. I understand the high level ideas, but the details of implementation were lost on me on first read. I believe MKDIR switching simply balanced directories among servers while NAME HASHING balanced files within a directory between servers?

Posted by: Aaron Cahn | October 29, 2014 09:58 PM

Summary: Paper presents a new way to route requests to a distributed storage system over a network. It does this by employing a proxy (called uproxy) that intercepts traffic and routes it to the appropriate servers (name space, small file, or directly to storage array). Their results show that the architecture scales and can be useful over cheap LAN.

Problem: As the internet becomes more popular and websites become more complex that need to serve more data, scalable storage becomes a necessity. Now that LAN is becoming faster and is a lot cheaper than current storage area networks, it becomes a feasible solution as a backbone to our storage needs. Since file systems have two main operations (namespace operations and file operations), we need a way to make this efficient on our modified system so it is both scalable and easy to administrate. This should also be done in a way where the client can be ignorant about what is going on.

Contribution: The main contribution is the new architecture they propose that uses a uproxy to redirect packets to the appropriate servers as well as the different policies that the uproxy uses.

The uproxy for this system can sit at any point in the architecture. It can be in-line on routers, within the file system architecture itself, or sitting on the clients. It is designed in such a way that a compromised uproxy can only cause damage to the client’s files and not everyone’s.

The main function of the uproxy is to intercept filesystem commands (in NFS format) over the network and execute their intent onto the storage array. Before actually getting to the array itself, there is a layer of indirection that separates some of the functional aspects of the system. For instance, small-file writes and namespace operations are separated into two distinct file servers. The uproxy, therefore, takes the functional operation of the clients command and directs it to the appropriate file server. Large file writes are written directly to the array; small-writes and directory operations are directed to their appropriate servers.

This separation of functionality allows for different sets of operations to be optimized. For instance, having directory servers allows for one to implement two different policies regarding namespace operations. In this paper there are mkdir switching and name hashing. The important thing to note is that clients have no idea what is going on and it just looks like a regular NFS server.

Confusion: Why would we want to use Name Hashing even if it is the case that the workload consists of large directories? Wouldn’t it be best to achieve the best amount of server locality for as many large directories as possible and probabilistically hash past this point?

Learned: About a new way to handle a distributed file system. I thought it was very neat how the simple concept of applying proxies to the design of this system easily allowed for the separation of functionality of different components that can be optimized. Little state is stored in clients or the uproxy making for a robust system that clients can use and be none the wiser about what is going on behind the scenes.

Posted by: David Tran-Lam | October 29, 2014 09:55 PM

Summary:
This paper describes the design and implementation of a request routing filter, uproxy, which is used in Slice, a storage architecture for high speed networks having network-attached block storage.

Problem:
With the growth of the internet, there are a lot of web applications being hosted in data centers and these serve a large client population. There is a need for the storage architecture to distribute requests traffic across a collection of storage and server elements which will then cooperate to provide a uniform view of a shared file volume. This should also have a high scalable bandwidth and capacity.

Contributions:
1. Introduces a request routing filter, uproxy, which would route incoming requests based on functionality to a set of servers. That is, it routes high volume I/O directly to storage nodes, small I/O and namespace operations are sent to specialized file managers. This is functional decomposition.
2. The routing mechanism spreads the data and request workload in a balanced way across all the servers. And this can be made scalable, allowing each workload component to scale independently. This is data decomposition.
3. This protocol decouples file managers from block storage service in such a manner that it is compatible with existing clients.
4. Recovery is implemented by making the file managers journal their operations in a write ahead log.

What I learned:
I learnt how request routing protocols can be built when untrusted clients are present in a shared disk system.

What I found confusing:
I don’t understand why to detect I/O, offset is used. In the API, we also specify the length which we read or write. Why not just use this length? If we used length, a 1 byte read in a large file with offset greater than 64KB can be considered as a small I/O and not large.

Posted by: Anusha Dasarakothapalli | October 29, 2014 09:50 PM

Summary: In this paper, the author studies
how to build a storage system with LAN as
a "coordinator" of story servers and file
managers. The paper talks about optimizations
for performance, how to add or remove
file servers, and also how to recover.

Problem: One key challenge is the performance
to make sure a diverse set of file operations
have reasonable speed. Second, when new
servers add in or leave, we do not want to
frequently re-orgnaize the index and file mapping
or even copying files around. Third, file
operations require to be atomic, and therefore
recovering needs to be discussed.

Contributions: The first contribution
of this work is that different workload should
be conducted differently with differet access
pattern in the data. One observation is that
the access of sequentially scanning large
files can de dealt with directly with the
storage server and by pass the file manager.
This obsevation is interesting and contribution
to performance improvement.

The load balance is also discussed, and the
solution there is using either hash or
random choices to avoid hotspot. For recovering,
for a set of infrequent operations, full
two-phase commit is used. The connection
between a distributed storage system and
classic distributed DB is interesting.

Confusing & Learned: One thing that I am confused
by is by what sense the word "Interposed" mean.
I can imagine multiple ways of explaining it, but
confused while reading the paper.

Posted by: Ce Zhang | October 29, 2014 09:45 PM

Summary:

The paper describes how interposed request routing can be used in a network with NAS. μproxy intercepts client-storage traffic and distributes it across a server ensemble and provides request routing schemes for I/O and file service traffic. These techniques are used to provide a unified shared volume with scalable bandwidth and capacity.

Problem:

Creating a distributed software layer to unify decentralized storage resources over a network whose bandwidth and capacity can be scaled.

Contributions:

- Distinguishing between types of workloads and optimizing for each one separately.
- Bypassing file managers for bulk I/O for optimization.
- Creating an NFSv3 compliant prototype implementation.
- mkdir switching policy and name hashing policy. The usage of stochastic methods for file placement is something I haven't seen before.
- Object based interface allows cryptographic protection in insecure networks.
- Request routing enhances locality in request stream improving cache effectiveness and reducing block contention among servers.
- Small I/O aggregation for better performance.
- Intention mechanism for atomicity.
- Logical id for routing table compression.

One thing I found confusing:

- In Name Hashing, why is the position in the directory tree as given by the parent directory fhandle part of the hash computation?
- μproxy was implemented below IP layer. Then how was it able to modify IP addresses in the packets (it shouldn't be aware of higher layers).

One thing I learned from paper:

A legitimate use of a man-in-the-middle to modify outgoing requests and incoming responses using μProxy.

Posted by: Satyanarayana Shanmugam | October 29, 2014 09:44 PM

Summary:

The paper presents architecture and implementation of Slice, a distributed storage system that provides high speed block storage by interposing traffic and performing request routing. μproxy(IP switching filter) performs virtualization of storage protocol by using different routing policies for functional and data decomposition of requests. The paper also presents evaluation of their architecture based on standard benchmarks.

Problem:

High usage of Internet leads to heavy storage requirements of web applications; with increase in demand we have the risk of latency in serving those requests.
Nowadays most of the content on the web are dynamic, which apart from storage have additional computational requirements.

These requirements lead to the necessitate load balancing among the distributed storage servers, which is the solution put forward by Slice.

Contributions:

μproxy can be deployed anywhere in the network because of limited soft state requirement, thereby making the architecture very flexible.
Separation of traffic to large files from the smaller ones and namespace operations ensures that caching is done effectively.
Leveraging transport layer protocols to deal with any kind of packet loss was a clever way of using underlying network infrastructure.
Supporting both object-based and sector-based storage nodes to cater to the security needs, thereby backing up their flexible placement of μproxy.
Making the file managers dateless aids in easy backup and recovery process.
Usage of logical server id aids in both load balancing and compaction of the routing table.
Algorithm for preserving atomicity using intention logs at the coordinators was simple and efficient.

Unclear concept:

In Name Hashing, the part about serialization of shared hash chain was not very clear. How can they ensure that conflicting operations always hash to a single server if they are going to do serialized hashing?

Learning:

I learned that the concept of proxies(which are well known in networks for offloading work from distributed servers) can be easily ported to a distributed storage scenario. Distributed storage scenario seems to provide a platform for more flexible configuration of proxies.

Posted by: Meenakshi Syamkumar | October 29, 2014 09:21 PM

Summary:
This paper describes the design and implementation of a network attached storage system architecture called slice. Slice consists of a set of cooperating storage servers and a packet filter called µ proxy that performs request distribution from the clients based on the request type and the parameters supplied thereby improving on the locality of the request stream. The functional decomposition [bulk I/O, small I/O, namespace operations] and the data decomposition [Routing the request per each request class] together distribute the requests evenly across servers while adapting to dynamically changing set of servers and allowing each category of workload to scale independently.

Problem:
Data centers hosting many web based applications deal with increasing storage requirements everyday. Highly scalable architecture in terms of bandwidth and capacity is required as these storage servers get interconnected to server a large volume of clients.

Contributions:

1. Functional decomposition provided by the packet filter helps each request type to scale independently by adding resources to its class of servers.

2. Data decomposition provided by the packet filter distributes the data and request workload in a balanced fashion across the storage nodes.

3. Bulk I/O is directly routed to the storage nodes and policies for block placement, replication etc can be customized per file basis.

4. Stateless file managers that journals its updates in a write ahead log thereby making faster recovery of a failed file manager.

5. Routing policy implementation is decoupled from the client side file system thereby aiding in compatibility with existing clients.

6. Name hashing and Mkdir Switching help in scaling name space operations.

7. Aggregation of smaller I/O operations before sending the request to the storage nodes.

One thing I learnt:
Using storage object architecture as opposed to sector based would not allow untrusted clients to modify the files of other clients as storage objects provide cryptographic security.

One thing I found confusing:
I/O requests are routed based on the threshold offset. If this is the case smaller I/O (in terms of size) could be treated as larger I/O requests if it accesses a bigger offset. Shouldn't the threshold be in terms of the size of the I/O and not the offset?

Posted by: Manasa Subramanian Ganapathy Subramanian | October 29, 2014 09:14 PM

Summary: This paper proposes implementing a distributed filesystem on the network, instead of on the client or the backends. This allows the network to route requests towards specialized servers, or servers that have requested data.

Problem: Some distributed filesystems require the client to know some details of the system, thus requiring it to do more computation, and also is not backwards-compatible with existing clients. Solutions that create a distributed backend are difficult to scale up and routing requests to the appropriate backend machine can be cumbersome.
Contributions:

Instead of implementing the system in the backend, implement it in the network, which can make fast decisions based on packet contents

Server specialization: designate some servers to handle certain requests, which can improve performance (and improve scalability).

Divide-up requests: send requests for different types of data to different servers. This can be done for reasons of where the data is physically stored, or also to improve performance to have certain servers handle operations of different sizes.

Separate operations on data blocks from those on the namespace.

Confusing:the way integrity is achieved seems a little too simple -- is it enough to make some operations atomic? Doesn't that hurt performance too much?
Learned: that distributed system can be built with parts of a computer system that are generally associated with lower-lever operations (networking, routing). The network is an inherent part of a distributed system, so it makes sense that it be used for routing.

Posted by: Theo | October 29, 2014 09:05 PM

CS 739 Reviews - Fall 2014

Interposed Request Routing for Scalable Network Storage

Comments

Post a comment