Data Model and Semantics in Visual Information Management Systems

In the foregoing sections we have emphasized that VIM systems have much in common with databases, and need to be designed through a data model. In this section we discuss the characteristics of a such a data model and, the relationship between Computer Vision and the data model.

The role of a data model in database systems is to provide the user a textual or visual language to express the properties of the data objects that are to be stored and retrieved using the system. The database language should allow users to define, update (insert, delete, modify) and search objects and properties. For Visual Information Management systems the data model assumes the additional role of specifying and computing different levels of abstraction form images and videos. Accordingly, the data model needs to be satisfy the following properties:

We perceive the general VIM data model to be organized in layers: the representation layer, the image object layer, the domain object layer and the domain event layer. We present here a refined version of our original four layer VIMSYS model . In each layer all data objects have a set of attributes and methods associated with them. The attributes too have their own representations, and are connected in a class attribute hierarchy. The relations, as we shall explain shortly, may be spatial, functional or semantic. Figure 2 illustrates the basic layered data model.

  figure93
Figure 2: The Layered VIMSYS Data Model

  1. Representation Layer: The representation layer contains the image matrix and any transformation that results in an alternate but complete representation of the image. For example, an image originally received as an RGB matrix and its LUV conversion required for color processing are both members of the representation layer. Similarly, if a raster scanned image of a line drawing is converted to a vector format, the latter belongs to this layer. Obviously this layer is itself not very rich in processing user queries. Only queries that requests pixel based image information can be provided by this layer. However the value of the layer is giving the system designer an explicit handle to define, maintain and interconvert between representations which are used by other layers of the data model. Since image transformation is part of the layer's intended functionality, the system designer has the option to exercise his knowledge about the image model by meaningful transformations. For example a class of transformations can be defined to enhance the input image under user or designer selected noise models. Implemented this way, the transformation becomes part of the data model, and upon insertion to the database, every image goes through the enhancement routine and only the enhanced image is used for all computations downstream. Similarly corrections for the Gamma Factor or for specular reflection, if known a priori can be made in this layer. In case of videos, the representation layer functionality can get more complex. Key frame composting, which converts a temporal sequence of frames covering a large spatial area into a large single image constructed to show the entire spatial coverage can be placed in this layer. In case key frames are extracted from the sequence, the input stream is first directed to the segmentation layer, and the extracted key frame(s) is transmitted to the representation layer for storage and usual static content analysis.
  2. Image Object Layer: This layer has two sublayers - the segmentation sublayer and the feature sublayer.
  3. Domain Object Layer: A domain object is a user defined entity representing a physical object or a concept that can be translated in terms of one or more features in the lower layers. Thus, a concept like ``sunset'' or an object like ``heart'' in a medical image are both domain objects. The domain object layer is analogous to a conceptual schema as defined in database systems. It consists of three components. It is a graph that relates an object with its attributes and other objects though different relationships. Many of these relationships are semantic, meaning that they cannot be inferred by any computation on the image or video, but have to be told to the system by the designer. An important category of relationship is classification: B is a subclass of A when from all instances of A in a database, a subset satisfying some condition is labeled as B. For visual information, the condition can be tested and labeled automatically if sufficient domain knowledge is built into the system. For example, a heart in systole and a heart in diastole have widely different shapes and occupy different spatial extents, but if their semantic similarity is by a classification hierarchy, the system should be able to correctly search for either the heart or any of its individual subclasses. For the heart, and other objects that go through a finite range of variations, the domain knowledge can be encoded in terms of a basis templates. A domain object can be expressed as a direct mapping to one of these template categories (the visual thesaurus approach). Alternately, the templates can be treated as basis vectors and an instance can be expressed as their linear combination (the eigenimage approach). Yet another way to specify domain knowledge, is by a set of rules that relate domain objects to image objects. For example, in a database of MRI images of the brain, the rules can be like: a segmented object with shape like this, and situated in a bounding box of ( tex2html_wrap_inline332 ) of a normalized T2 image, having a positive local contrast of about tex2html_wrap_inline334 , and a segmented object in the same location of a T1 image, having a similar shape, and a negative local contrast of about tex2html_wrap_inline336 , can be mapped to the domain object ``gray matter''. Such specification of domain objects in terms of their image properties is not new to Computer Vision. But this model provides generic and explicit methods to specify such knowledge to a retrieval system, and users can use it for his or her application specific data definition.
  4. Domain Event Layer: The purpose of the domain event layer is to allow ``events'' computed from image sequences or videos to be queriable entities.These events can be result from pure motion (e.g., when the velocity of the centroid of a segmented object exceeds 20 pixels/frame), from spatial interactions (e.g, when two object centroids come to about 5 pixels from each other), spatio-temporal interaction (e.g., when the object is approaching this region in space and it less than 10 pixels away from it), appearance, disappearance or morphing (e.g., when a ballet dancer transitions from one move to another). Domain events also include events that are not instanteneous but occur over a period of time. (e.g, a tumor that has grown beyond 15 pixels in diameter over a sequence of six images acquired monthly). In order to manage domain events,a VIM system not only needs an event detection mechanism but also an event organization mechanism,such as a temporal data structure that allows to maintain and search through detected events of different types and time granularities. We are currently working on effective methods of mapping domain events to motion segmentation results from Computer Vision.