Web Search and OAI Protocol (reading notes)
Again the advanced developments of computer software and
applications have enabled companies like Microsoft, Yahoo and Google to peruse
the vastness of the internet and to increase the efficiency of their service. I
was particularly impressed about the capabilities of the Web-scale crawlers. If
what the article explained is just a simple description of the average crawler
technology I could only imagine how powerful Google’s web crawlers are. It was
also interesting to see how spam technology was adapting to the changing ways
in which crawlers were detecting them. The thought of everything reminds me of
the Matrix movies.
Web Search engines Pt.2
The second part of the series examines how algorithms and data
structures index 400 terabytes of Web page text and deliver the best results in
response to hundreds of millions of queries each day. The capabilities of these
indexers are just as impressive as the crawlers they interact with, in terms of
the magnitude of work they do. One issue the article mentioned that struck me
was the impracticality of some of the tasks and how solutions were found to
deal with these impractical problems. For instance the author spoke about the
Page Rank computation and the impracticality of processing matrices of rank 20
billion. However, I wonder if the solutions researchers arrived at were really
alternative ways of dealing with the problem or short-term solutions which did
not deal with the computational impracticality. Do these special indexing
tricks sacrifice arriving at a complete solution for the sake of inducing a
more rapid response when I query is submitted? This leads to the issue of
whether search engine companies such as Google, Yahoo and Microsoft are really
delivering on their promise of reliable queries in the fastest possible time.
The Deep Web: Surfacing Hidden Value
Comments