GeoSearch: A Geographically-Aware Search Engine


System Overview

As a proof of concept, GeoSearch searches news articles from more than 300 on-line newspapers based in the United States. Off line, GeoSearch estimates the geographical scope of the newspapers based on the distribution of hyperlinks to them. For example:

Then, for a query consisting of a list of keywords (e.g., [startups business]) and the US ZIP code of the user's location (e.g., 94043), Geosearch:

  1. Uses just the keywords to rank the newspaper articles using a standard, off-the-shelf text search engine called Swish.
  2. Filters out all pages coming from newspapers whose geographical scope does not include the user's specified ZIP code.
  3. Recomputes the score for each surviving page and returns the pages ranked in the resulting order. A page's new score is a combination of the Swish-generated score for the page and a score related to the geographical scope of the page (see VLDB '00 paper).

Example Geographical Scopes


Papers


People

and, at an early stage of the project, Orkut Buyukkokten, Junghoo Cho, and Hector Garcia-Molina.


This material is based upon work supported by the National Science Foundation under Grants No. 9733880 and 9619124. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Luis Gravano
gravano@cs.columbia.edu