Project continuation proposal "The Holodeck"

For course CompSci 774 Topics in Computational Linguistics

Kevin Caswick, Oguz Yetkin, Daan Wierstra

 

The ultimate goal of this project is to integrate virtual reality, natural language, and artificial intelligence by using a deep structure semantic network. The system will allow the user to alter the world interactively, either by using the provided interactive hardware (i.e., wand, dataglove, etc.) or by specifying his wishes in natural language. Altering the world involves adding either semantic facts or semantic rules that logically infer semantic facts. The semantic facts are concrete specifications about virtual reality objects (position, orientation and such) or high-level statements that need a logical inference system to resolve physical ambiguity. Those specifications are then executed in an interactive virtual reality environment of reasonable graphical sophistication (we used the Immersadesk located at the MAF as our VR system) and physical realism (Newtonian physics).

 

So far we created a 3D virtual reality system similar to a "holodeck". The user starts out in an empty virtual room, with only the walls visible. Then the system enables the user to add objects interactively by typing simple high-level commands in the text-interpreter: for example, he can add a table, a vase, a cow, and a bird. The requested objects appear (these objects are read from standard 3-D geometry files). Once present in the virtual reality environment, objects can be manipulated by typing more semantic text-based commands (for example, by typing that the vase is red, and is on top of the table). Our system also allows the user to manipulate objects with the Immersadesk wand (essentially a 3-D mouse with a built-in joystick) as well, enabling the user to move, rotate and scale objects interactively. In this way, an entire virtual world can be created.

 

The text-based command interpreter allows the user to enter commands that look like "cow1 red", "bird1 flies {flies to chair}" or "vase on table". Those commands (called triples) are basically our method of storage and processing of semantic information. These triples are in the form generated by an already existing natural language parser. These triples together form an interconnected semantic deep-structure network, capable of representing the semantics of both English and of the virtual reality environment we created. Triples can be rules as well: for example, we are able to represent the rule "If there exists some cow X, and there exists some red object Y in the environment, then cow X walks to object Y." in the notation:

X isa cow, Y red > X walks {walks to Y}

The user can currently type in rules like this interactively. Eventually, the rules will be generated by the natural language parser from their English counterparts. Certain combinations of rules and facts can lead to complex chains of logical inference by the system.

 

Our system is set up in a very modular way. There are three independent modules, each designed and implemented by one of the participants in the project: The graphics system (SMVR or "Simple Minded Virtual Reality") that runs at the Immersadesk, the semantic database that commands the graphics system (and keeps track of the properties of the VR objects, such as position, orientation and color), and the semantic rule system which handles logical inference and the processing of user-specified rules.

SMVR is a simple, language-independent 3-D graphics protocol that abstracts both the virtual reality system and the graphics libraries from the rest of the system. It runs as a server that takes and executes text-based world-manipulation commands from the semantic database. For example, other portions of the program simply issue commands like "load vase.obj x y z" in order to display the contents of the geometry file vase.obj at location x,y,z. SMVR also handles head-tracking, object manipulation via the wand, the saving of scenes. It can poll the current state of the world and reported that information to the other portions of the program when requested.

 

The semantic database and the rule system are both components of one program, written in Java. This part of the project stores the semantic network and the physical properties of the virtual reality system. It executes animation semantics (such as "cow1 flies {flies to table}") and keeps track of the locations and orientations of all objects. The semantics stored here give rise to display commands (sent to SMVR for rendering). The rule system is implemented here as well: after each "cycle" the rules are checked for "matches", cases in which rules should be executed based on the semantic facts in the database. If a rule is executed, semantic facts are added or deleted. For example, if a rule states that all cows chase objects that are red, and a red bird is added to the scene, this will trigger the rule and cause the cow to start chasing the bird.

 

Our Plan

We want to make this system into a fully operational "holodeck" in which users can perform Computer Aided Design activities by using both the wand and natural language. The graphics must be improved, physics models must be implemented and our English text parser and generator must be hooked up to the VR system. We also intend to integrate an existing voice recognition program with our system for easier natural language input.

There are a number of different things we want to either add or improve:

· This will be a CAD program that takes in English and outputs English. Specifically, we can start with an English description of a scene, have our system render the scene, have the user manipulate it, and save it as an English file again. For example, the "creative interior decorator" would say

"There is a blue couch in some sort of room. A green bird circles the couch, and a vase is on top of a coffee table."

Then he would grab the wand, move things exactly where he needs them, and use an interactive lathe to create a new vase. He would then say "computer, save design".

The design would be saved as:

"There is a blue couch 2 feet below glass manifold. A green bird circles the couch. There is a blue-green (rgb 1,0.5, 0) coffee table 1 foot in front of the couch. There is a vase (specified by new_vase23.obj) on the coffee table at coordinates (2.1, 2.3, 5.5)."

Since we store the semantics in a logical, not English-dependant format, we could hook up our semantics to the grammar of other languages as well. The grammar for text parsing and text generation is defined as data: it is not in the program code. That gives us maximum generality and flexibility in extending the language system to more sophisticated grammars and even gives us the ability to use other languages as well. One user could then command the system in English, while another user could use Dutch instead.

· The graphics will be improved. Textures will be added, smoothness of the animation will be improved and the manipulation of objects by the user will be enhanced and made easier. We will also add support for interactive creation of new 3-D geometry files. We will make a PC compatible version of the program, so that we can use it on other machines than the Immersadesk.

· We will simulate an interactive physical environment, with collision detection routines and simple laws of physics. We could implement simplified Newtonian physics and other physics systems (CartoonPhysics?), so that we have maximum flexibility in object behavior. Objects could behave according to Newtonian physics or according to another physics model, depending on their interactively specified features

· The user will be enabled to ask questions to the system. For example, he can ask "What color is the cow on the big table?" and get back "The cow is red." More interesting questions will be possible as well, e.g. "Why is the cow on the table red?" could lead to the response "The cow on the table is red because it is near a table" in the case of a rule that states that cows near tables become red: "X isa cow, X near Y, Y isa table > X red".

· The rule system will be improved and made more efficient. The rules will be "triggered" by events happening in the virtual reality environment. For example, if a rule states that cows near tables become red, and a cow is moving towards a table, the rule system will change that cow’s color into red whenever it comes near that table. Changes in the environment caused by rules will output text as well, so in this case the computer would say "The cow moving to the table is now red because it is near the table." Note that this renders our system more powerful than a CAD program—for example, the system (with different rules) can be used for interactive rendering of printed stories.

· The user will be enabled to enter rules in English. This would allow designers to "program" their simulations on the fly in English. For example,

"If the rod is only 0.03 mm smaller than the hole when I stick it in, sound

a tolerance warning"

"Cows near tables become red"

"Birds cannot fly"

"If there is some red object and there is a cow that does not move, the cow walks to that red object."

· The English should be hooked up to a voice recognition system.