IP or Not IP?

The class is torn. The question was whether a network-attached storage system should utilize the standard IP suite for communication, or a specialized high-performance one. Roughly 7 votes were pro IP, and 5 against. Here is what people said.

IP (7 / 12 votes)

1 )
I see this as two questions:

1. To use NAP/NASDs or to use the server configuration. Because I know
that the server configuration works, is well supported, etc., this is
the obvious choice for a business that is going to make a large
investment and needs the resulting hardware to work.

2. Assuming that NAP/NASDs are going to be used, then the question
becomes wether to use either one of TCP/IP, UDP/IP, xxx/IP, or to use
xxx without IP under it. One choice not involving IP might be xxx/SCSI
or some derivative of SCSI. From a business perspective, this choice is
harder, since it may be the case that any implementation will be
somewhat experimental. From a development perspective, I would choose
xxx/IP. I do not believe either TCP or UDP is the best possible
protocol for NAP/NASDs. But, IP is so low-level, that it does not place
any restrictions on what can be put on top of it.

2 )
i agree with some of the arguments for TCP in VISA. however i feel that
more work needs to be done to extrapolate the real cost of IP and this
cost is probably not entirely justifiable. im voting for a user-level tcp
friendly protocol.

3 )
I would go with the IP over a house brand protocol because I feel that
IP (TCP and UDP) have been around awhile, so they are tested, and proven
very well, and they are compatible across almost all networks, so there
would be little to no worrying about compability with existing and
future networks.

4 )
It depends.

This "magical number" we talked about in class isn't just related to
performance. For instance, some computing environments may be willing to
sacrifice more performance for more flexibility with configurations and
distances. While other environments may not need that flexibility and
prefer to maximize the bandwidth.

I'd prefer to stay on the fence with this issue, but if I were to take
sides for the sake of this vote, I would fall to the networking side.

5 )
Network attached disks really need to speak IP. On some sort of a private
medium, like a bus, we might investigate designing more efficient
protocols. But on a shared network, network disks should be both TCP
friendly and TCP accessible to allow integration with everything else.

6 )
Vote : For IP

A major motivation for using an already standardized, globally available
protocol like tcp/ip is the ease of deployment and the level of
interoperability possible. In a heterogeneous global environment, this is
likely to be a more important issue than the performance benefit we get
out of custom-made protocols, considering the ever increasing network
bandwidth and processor speed.

7 )
Pro IP:

- use of naming and addressing (no need to develop another naming
and addressing system). In the case of NAD's where potentialy every device
is on the net, naming and addressing is an important part of the system.

- it is not inconceivable that a well-tuned transport protocol on
top of IP will yield good performance. IP does not restrict the type of
transport protocol to be used on top. From the system development point of
view, even before development of such a performant transport layer is
finished, UDP and TCP might be used instead and other parts of the distributed
system that use NAD's will be able to develop in paralel, without waiting
for testing/debugging of the tuned layer.

Not IP (5 / 12 votes)

1 )
IP provides advantages like scalability and wide availability at the cost
of performance and complexity. So, if scalability is important IP would be
a better choice.

In this network attached storage systems, I dont think, scalability will
be a major issue - and if IP is chosen, one of the fundamental objectives
of network-attached-storage-systems, to improve performance would be
compromised. So my vote is for "NOT IP" - rather, my vote is NOT for IP.

2 )
Not IP. Or at least not SCSI over IP. Obviously people want and need to have
their storage available on the network, but we don't need to have everything
go over the network. From the VISA paper, the only way they can keep up with
a SCSI disk is to soak the CPU - that's fine if you need to have that data on
the network. Secondly, why emulate SCSI? If you move up a layer you can avoid
all of the SCSI goo (though maybe just writing a general purpose block driver
is hard on SunOS...) - or you could go one higher and write a filesystem, and
have a bit better knowledge of what's going on (and of course have to write
all your own filesystem goo)

Bottom line: It takes a lot of CPU resources to get the same performance from
software as an already-existing and working piece of common hardware, and the
VISA paper doesn't make many claims as to what new features they get by putting
the disk farther down the wire. So I say forget it.

3 )
Not IP, because it is too slow.

In a LAN, it seems reasonable to use a specialized protocol because many of the
IP benefits are not needed.

4 )
I vote No IP.

The IP suite is a fairly mature protocol and thus, I don't expect to see
the performance improvements that would be needed to make it a reasonable
alternative for disk interfaces. Further, I think the WAN capabilities of
TCP/UDP/IP are wasted on this application. Even if, for some reason you
wanted a disk array spread all over the planet, the security concerns of
putting a SCSI interface on the Internet are disturbing.

5 )
My vote: NOT IP SCSI.

My biggest problem with this paper is one that was brought up in class:
the paper does not compare the right things. Their stated goal was to
run VISA at a performance close to native, local SCSI (something like
80%). Their are numerous advantages to locating storage componts away
from the system, and most users can justify the cost of distirbuted file
systems. I would have much rather seen VISTA's performance compared to
something like AFS or NFS, where someone has decided that the benefits
of non-local storage outweigh the disadvantages, and want a system that
would provide the least slowdown compared to a local solution.
Additionally, it seems that one of the main advantages of attaching
disks directly to the network is that they are not bound to a machine
and can be shared among multiple clients. But this seems difficult
because each computer would still have its own local file system and
would need knowledge of the other client's filesystems in order to
access the appropriate blocks. In a traditional distributed FS, the
server attached to the disks allows the clients to use an abstract
(perhaps object based) interface that the server would translate to
specific block requests. Allowing clients to use this simpler interface
reduces complexity and possibly network bandwidth (i.e. issue request to
retrieve file rather than issue SCSI requests to gather all the
fragmented blocks in the file).