Abstract
Peer-to-peer (P2P) databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large-scale ad hoc analysis queries, for example, aggregation queries, on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement, given the distributed and dynamic nature of P2P databases. In this paper, we presented novel sampling-based techniques for approximate answering of ad hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors: the data is distributed (usually in uneven quantities) across many peers, within each peer, the data is often highly correlated, and, moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, proposed approach will uses approach based on random walks of the P2P graph, as well as block-level sampling techniques.
References
- S.Acharya, P.B.Gibbons, and V.Poosala, “Aqua: A Fast Decision Support System Using Approximate Query Answers,” Proc. 25th Int’l Conf. Very Large Data Bases (VLDB ’99), 1999.
- Adamic, R. Lukose, A. Puniyani, and B. Huberman, “Search in Power-Law Networks,” Physical Rev. E, 2001.
- B. Babcock, S. Chaudhuri, and G. Das, “Dynamic Sample Selection for Approximate Query Processing,” Proc. 22nd ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’03), pp. 539-550, 2003.
- A.R. Bharambe, M. Agrawal, and S. Seshan, “Mercury: Supporting Scalable Multi-Attribute Range Queries,” Proc. ACM Ann. Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM ’04), 2004.
- S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Analysis and Optimization of Randomized Gossip Algorithms,” Proc. 43rd IEEE Conf. Decision and Control (CDC ’04), 2004.
- S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Gossip and Mixing Times of Random Walks on Random Graphs,” Proc. IEEE INFOCOM ’05, 2005.
- M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya, “Towards Estimation Error Guarantees for Distinct Values,” Proc. 19th ACM Symp. Principles of Database Systems (PODS ’00), 2000.
- S. Chaudhuri, G. Das, M. Datar, R. Motwani, and V. Narasayya,“Overcoming Limitations of Sampling for Aggregation Queries,” Proc. 17th IEEE Int’l Conf. Data Eng. (ICDE ’01), pp. 534-542, 2001.
- S. Chaudhuri, R. Motwani, and V. Narasayya, “Random Sampling for Histogram Construction: How Much Is Enough,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’98), pp. 436-447, 1998.
- Y. Chu, S. Rao, and H. Zhang, “A Case for End System Multicast,” Proc. ACM Int’l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS ’00), 2000.
- X. Li, Y.J. Kim, R. Govindan, and W. Hong, “Multi-Dimensional Range Queries in Sensor Networks,” Proc. First ACM Int’l Conf. Embedded Networked Sensor Systems (SENSYS ’03), 2003
- D. Zeinalipour-Yazti, V. Kalogeraki, and D. Gunopulos, “Exploiting Locality for Scalable Information Retrieval in Peer-to-Peer Networks,” Information System, vol. 30, no. 4, pp. 277-298, 2005.
- Gkantsidis, M. Mihail, and A. Saberi, “Random Walks in Peerto- Peer Networks,” Proc. IEEE INFOCOM ’04, 2004.
- V. King and J. Saia, “Choosing a Random Peer,” Proc. 23rd Ann. ACM Symp. Principles of Distributed Computing (PODC ’04), 2004.
- C. Faloutsos, P. Faloutsos, and M. Faloutsos, “On Power-Law Relationships of the Internet Topology,” Proc. ACM Ann. Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM ’99), 1999