Object Communication System

The Object Communication System, or ocomm, is used as an interface for inter and intra process communication. The communication system is based upon principles and experiences from Mach and KeyKOS.

Compile Time Configuration

The following shore.def cpp defines are used to statically compile Object Comm at compile time. Some of these options also entail Makefile changes, as they add addtional files to be compiled.

USE_OCOMM
OCOMM_USE_TCP
OCOMM_USE_UDP
OCOMM_USE_MYRI
OCOMM_USE_PVM
HACKED_PVM
OCOMM_USE_TCP_XXX
OCOMM_LARGE_PAYLOAD
OCOMM_LARGE_BOX

Run Time Configuration

The following environment variables allow Object Comm to be configured on a per-run basis.

OCOMM_COST_TCP
OCOMM_COST_UDP
OCOMM_COST_PVM
OCOMM_COST_MYRI
OCOMM_COST_MPI: To be described later.
SFILE_SELECT_READ
SFILE_SELECT_WRITE: To be described later.
SFILE_NDELAY: To be described later.
OCOMM_TCP_SNDBUF
OCOMM_TCP_RCVBUF: To be described later.
OCOMM_TCP: To be described later.
OCOMM_TCP_PS2: To be described later.
OCOMM_TCP_SLOW_RECV: Allow delivery of messages from the TCP Transport to procede asynchronously. This changes delivery semantics among multiple nodes somewhat, as messages may be delivered to the system in a different order than when synchronous delivery is used. Normally, only one message is received off the wire at a time. Enabling this option can can produce a small performance gain, and may reduce the probability of some statistical Solaris 2.5.1 TCP bugs. This option is not enabled by default because the interaction of "slow deliveries" and death notification have not been fully evaluated and tested.
OCOMM_MYRI: To be described later.
OCOMM_MPI: To be described later.

Bugs, Oddities, Things that will be changed

There are no known bugs in the Object Communications System.

Yeah! Right! There are some bugs and oddities that have crept in over time. They tend to stay in because they are difficult to fix, and I often end up working on the problem of the moment instead of the communications system!

Originally, if two processes contacted each other at the same time, two connections would exist between those systems. Everything worked correctly, but it was most bizarre. I installed a conflict resolution system whereby the two systems will negotiate which TCP connection to keep. In spite of lots of testing and proofs, I recently discovered the problem still seems to exist in some cases! There is a subtle bug in the code which I haven't spent any time locating.
Curt Ellman mentioned that he was having problems when running a large Paradise configuration on one of the Midships Sun Ultra Enterprise boxes. What would happen is that the listen queue would apparently fill up, and the various disks trying to contact the scheduler would have a communications failure (dead endpoint) because communications couldn't be established with the master.
I haven't experienced this myself when doing similar tests, but my simulator doesn't suffer the overhead of having Pardise running. My guess is that the system is spending lots of time in Paradise and Shore, and not enough CPU is available for the thread of TcpControl which accepts TCP connections!
One possible work-around is to increase the priority of the various threads that Object Comm starts to "above normal" priority, so they will be given a chance to run and process events. This is somewhat a case of priority inversion, because the idle thread will eventually run to schedule those threads when events become ready ... but the thread itself may not have the priority to run once scheduled!
Another possible workaround is to back-off and reetry connect()ing to another node for a period, incase something similar happens.
For now, Curt is delaying startup between groups of disk nodes, starting gangs of them less than the listen queue size, give-or-take!

Bolo's Home Page

Last Modified: Tue Aug 26 14:13:44 CDT 1997

bolo (Josef Burger) <bolo@cs.wisc.edu>