Data Integration: Cimple is an attempt to do best-effort data integration at a community scale: first we apply the best automatic techniques to extract and integrate the data, then we leverage human effort (from the community builders and the users) to improve the extraction and integration process. This can be viewed as an example of self-improving automatic data integration systems. In addition, it can be interesting to consider if the problem of building a DBLife-like system for the database community can be cast as a data integration challenge (and benchmark).
AI: The AI community has developed numerous sophisticated solutions to address individual problems in the CIM process, such as information extraction, entity matching, and relationship discovery. The main focus has been largely on improving accuracies. Cimple also attempts to develop more accurate "blackbox" solutions in the CIM context. But it places a major emphasis on studying how the "blackboxes" can be composed effectively to handle the entire CIM process. The focus is on composing to maximize accuracy as well as efficiency (since scalability is a major problem). To compose "blackboxes" effectively, we often take cues from machine learning techniques as well as the relational optimization technologies.
Web: Cimple can be viewed as building technologies for vertical portals, but at the semantic (e.g., entity-relationship) level. It moves toward a vision of the Web where numerous such community portals exist, each of which can be maintained efficiently with minimal human effort, and where Web search can be moved to the next level by exploiting structured data at the community portals. Cimple also studies the problem of how to make community members collectively help build and maintain such portals. (Industrial efforts toward this direction can be seen, e.g., in the case of My Web 2.0 at Yahoo! and Google Base.)
Semantic Web: The vision of the Semantic Web is to have users mark up data on the Web so that it can be exploited more effectively by automated means. Cimple studies how this can be done in the context of communities: how we can "bootstrap" a portion of Semantic Web by initially marking up data using automatic means, then using services over this initial markup to entice the user base to mark up more data, thereby improving the provided services.