GeoSearch: A Geographically-Aware Search Engine
System Overview
As a proof of concept, GeoSearch searches news articles from more
than 300 on-line newspapers based in the United States. Off line,
GeoSearch estimates the geographical scope of the newspapers based on
the distribution of hyperlinks to them. For example:
- The geographical scope of The New York Times is
automatically estimated to be the entire United States (see
below), which intuitively indicates that this newspaper
is generally relevant to users across the country.
- In contrast, the geographical scope of The Stanford Daily
is automatically estimated to be mostly the Palo Alto,
California area (see below), which intuitively indicates
that this newspaper is generally not relevant to users,
say, in New York City.
Then, for a query consisting of a list of keywords (e.g., [startups
business]) and the US ZIP code of the user's location (e.g.,
94043), Geosearch:
- Uses just the keywords to rank the newspaper articles
using a standard, off-the-shelf text search engine called
Swish.
- Filters out all pages coming from newspapers whose
geographical scope does not include the user's specified
ZIP code.
- Recomputes the score for each surviving page and returns
the pages ranked in the resulting order. A page's new
score is a combination of the Swish-generated score for
the page and a score related to the geographical scope of
the page (see VLDB '00
paper).
Example Geographical Scopes
- Some newspaper geographical scopes, derived automatically
from the distribution of hyperlinks to the newspaper
homepages:
- Geographical scope of all 300+
newspapers we indexed (might take a minute or two to
display)
Papers
- Categorizing
Web Queries According to Geographical Locality,
L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein,
in Proc. of the 12th ACM Conference on Information
and Knowledge Management (CIKM 2003), 2003.
- Computing Geographical Scopes of Web
Resources, J. Ding,
L. Gravano, and N. Shivakumar, in Proc. of the 26th
International Conference on Very Large Data Bases (VLDB '00),
2000.
- Exploiting Geographical Location
Information of Web Pages, O. Buyukkokten, J. Cho, H. Garcia-Molina,
L. Gravano, and N. Shivakumar, in Proc. of the ACM
SIGMOD Workshop on the Web and Databases (WebDB '99), 1999.
People
and, at an early stage of the project, Orkut
Buyukkokten, Junghoo
Cho, and Hector Garcia-Molina.
This material is based upon work supported
by the National Science Foundation under Grants No. 9733880 and
9619124. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the
National Science Foundation.
Luis Gravano
gravano@cs.columbia.edu