» Pagerank Computation and Keyword Search on Distributed Systems and P2P Networks

| Sorted by Date | Classified by Publication Type | Classified by Research Category |

Karthikeyan Sankaralingam, Madhulika Yalamanchi, Simha Sethumadhavan, and James C. Browne. Pagerank Computation and Keyword Search on Distributed Systems and P2P Networks. Journal of Grid Computing, 1(3):291-307, 2003.

Download

[PDF] [HTML]

Abstract

This paper presents a fully distributed computation for Google'spagerank algorithm. The computation is based on solution of the matrixequation defining pageranks by a distributed implementation ofasynchronous iteration. Pageranks for the documents stored on a webserver or on a host in a peer-to-peer network are computed in placeand stored with the documents. The matrix is never assembled and nocrawls of the web are required. Continuously accurate pageranks areenabled by incremental computation of pageranks for documents as theyare inserted onto a network storage host and incremental recomputationof pageranks when documents are deleted. Intrahost and intradomaindominance of document link structure is naturally exploited by thedistributed asynchronous iteration algorithm.Three implementations: (i) a simulation which was previously reported,(ii) an implementation of the algorithm in a peer-to-peercomputational system and (iii) an embedding of the computation in webservers, are described. Application of the three implementations tothree different workloads, two constructed following power law networkmodels for link distributions and one derived from the Governmentdocument database are reported. Convergence for computation of acomplete set of pageranks is rapid: 1% accuracy in 10 or fewermessages per document. Incremental computation of pageranks resultingfrom addition or deletion of documents also converges rapidly, usuallyrequiring 10 or fewer messages per document.Coupling locally stored pageranks with the documents in a peer-to-peernetwork dramatically diminishes the volume of data which must betransmitted to satisfy keyword searches in peer-to-peer networks. Theweb server implementation shows that the distributed algorithm can beused to enable web servers to compute pageranks for the documents theystore and thus potentially enable effective keyword searches for thedocuments stored on the web servers of intranets by utilizing unusedprocessing power of the web servers.

Additional Information

This is a test of the extra info broadcasting system.

BibTeX

 @Article{grid2003,
   author = "Karthikeyan Sankaralingam and Madhulika Yalamanchi and Simha Sethumadhavan and James C. Browne",
   title =     "{Pagerank Computation and Keyword Search on Distributed Systems and P2P Networks}",
   journal =      "Journal of Grid Computing",
   volume = "1",
   number = "3",
   year =         {2003},
   pages = "291-307"
   abstract = {
 This paper presents a fully distributed computation for Google's
 pagerank algorithm. The computation is based on solution of the matrix
 equation defining pageranks by a distributed implementation of
 asynchronous iteration. Pageranks for the documents stored on a web
 server or on a host in a peer-to-peer network are computed in place
 and stored with the documents. The matrix is never assembled and no
 crawls of the web are required. Continuously accurate pageranks are
 enabled by incremental computation of pageranks for documents as they
 are inserted onto a network storage host and incremental recomputation
 of pageranks when documents are deleted. Intrahost and intradomain
 dominance of document link structure is naturally exploited by the
 distributed asynchronous iteration algorithm.
 Three implementations: (i) a simulation which was previously reported,
 (ii) an implementation of the algorithm in a peer-to-peer
 computational system and (iii) an embedding of the computation in web
 servers, are described. Application of the three implementations to
 three different workloads, two constructed following power law network
 models for link distributions and one derived from the Government
 document database are reported. Convergence for computation of a
 complete set of pageranks is rapid: 1\% accuracy in 10 or fewer
 messages per document. Incremental computation of pageranks resulting
 from addition or deletion of documents also converges rapidly, usually
 requiring 10 or fewer messages per document.
 Coupling locally stored pageranks with the documents in a peer-to-peer
 network dramatically diminishes the volume of data which must be
 transmitted to satisfy keyword searches in peer-to-peer networks.  The
 web server implementation shows that the distributed algorithm can be
 used to enable web servers to compute pageranks for the documents they
 store and thus potentially enable effective keyword searches for the
 documents stored on the web servers of intranets by utilizing unused
 processing power of the web servers.  },
   bib_dl_pdf = "http://www.cs.wisc.edu/~karu/docs/papers/utexas/pagerank-journal-version.pdf",
   bib_dl="http://springerlink.metapress.com/content/x9841h68u4161x3v/?p=0019c14ae2c34132ab2bc615fcb16ad8&pi=4",
   bib_pubtype = {Invited Paper,Journal},
   bib_rescat = {Other},
   bib_extra_info = {This is a test of the extra info broadcasting system.}
 }

Generated by bib.pl (written by Patrick Riley ) on Sat Jul 15, 2017 14:55:44 time=1207019082


Page last modified on October 18, 2017