Monday, May 25, 2009

Searching on the web; the new breed of search engines

There has been alot of talk recently (on the web and elsewhere) about the next generation of "smarter" search engines. Below are examples of search engines which have recently gained coverage over their ability to either (1) structure and present data pulled from the web, (2) assign semantic filtering by quality, (3) search structured data on the web, (4) search the 'real-time' web or (5) search the 'deep web':

(1) Structure and present data pulled from the web

Wolfram Alpha
'We aim to collect and curate all objective data; implement every known model, method, and algorithm; and make it possible to compute whatever can be computed about anything. Our goal is to build on the achievements of science and other systematizations of knowledge to provide a single source that can be relied on by everyone for definitive answers to factual queries.' (Wolfram, 2009)

Google Squared
'Google Squared doesn't find webpages about your topic — instead, it automatically fetches and organizes facts from across the Internet.' (Google, 2009) It extracts data from relevant webpages and presents them in squared frames on a results page.

Bing is built to 'go beyond today's search experience' through recognising content and adapting to your query types, providing results which are "decision driven". According to the company; "we set out to create a new type of search experience with improvements in three key areas: (1) Delivering great search results and one-click access to relevant information, (2) Creating a more organized search experience, (3) Simplifying tasks and providing tools that enable insight about key decisions." (Microsoft, 2009)

'SenseBot delivers a summary in response to your search query instead of a collection of links to Web pages. SenseBot parses results from the Web and prepares a text summary of them. The summary serves as a digest on the topic of your query, blending together the most significant and relevant aspects of the search results. The summary itself becomes the main result of your search...Sensebot attempts to understand what the result pages are about. It uses text mining to parse Web pages and identify their key semantic concepts. It then performs multidocument summarization of content to produce a coherent summary' (Sensebot, 2008)

(2) Provide more semantic filtering of information by quality

'Hakia’s semantic technology provides a new search experience that is focused on quality, not popularity. hakia’s quality search results satisfy three criteria simultaneously: They (1) come from credible Web sites recommended by librarians, (2) represent the most recent information available, and (3) remain absolutely relevant to the query' (Hakia, 2009)

(3) Search structured data on the web

' There is already a lot of data out there which conforms to the proposed SW standards (e.g. RDF and OWL). Small vertical vocabularies and ontologies have emerged, and the community of people using these is growing daily. People publish descriptions about themselves using FOAF (Friend of a Friend), news providers publish newsfeeds in RSS (RDF Site Summary), and pictures are being annotated using various RDF vocabularies. [SWSE is] service which continuously explores and indexes the Semantic Web and provides an easy-to-use interface through which users can find the data they are looking for. We are therefore developing a Semantic Web Search Engine' (SWSE, 2009)
'Swoogle is a search engine for the Semantic Web on the Web. Swoogle crawl the World Wide Web for a special class of web documents called Semantic Web documents, which are written in RDF' (Swoogle, 2007)
Similar offering is;

(4) Search the 'real-time' web

One Riot
'OneRiot crawls the links people share on Twitter, Digg and other social sharing services, then indexes the content on those pages in seconds. The end result is a search experience that allows users to find the freshest, most socially-relevant content from across the realtime web....we index our search results according to their current relevance and popularity' (Oneriot, 2009)
'Scoopler is a real-time search engine. We aggregate and organize content being shared on the internet as it happens, like eye-witness reports of breaking news, photos and videos from big events, and links to the hottest memes of the day. We do this by constantly indexing live updates from services including Twitter, Flickr, Digg, Delicious and more.' (Scoopler, 2009)
'Collecta monitors the update streams of news sites, popular blogs and social media, and Flickr, so we can show you results as they happen' (Collecta. 2009).

(5) Search the 'deep web'

'The DeepDyve research engine uses proprietary search and indexing technology to cull rich, relevant content from thousands of journals, millions of documents, and billions of untapped Deep Web pages.' 'Researchers, students, technical professionals, business users, and other information consumers can access a wealth of untapped information that resides on the "Deep Web" – the vast majority of the Internet that is not indexed by traditional, consumer-based search engines. The DeepDyve research engine unlocks this in-depth, professional content and returns results that are not cluttered by opinion sites and irrelevant content.... The KeyPhrase™ algorithm, applies indexing techniques from the field of genomics. The algorithm matches patterns and symbols on a scale that traditional search engines cannot match, and it is perfectly suited for complex data found on the Deep Web' (Deepdyve, 2009)

Copyright © 2006-2008 Shane McLoughlin. This article may not be resold or redistributed without prior written permission.


i26031966 said...

"Wolfram|Alpha (also written WolframAlpha and Wolfram Alpha) is an answer-engine ... It is an online service that answers factual queries directly by computing the answer from structured data, rather than providing a list of documents or web pages that might contain the answer as a search engine might. " []
It isn`t a search engine!!!

shane mc loughlin said...

It may well be an 'answer engine' (however badly for the moment), but it is also a search engine!!

Search Engine

- a computer program that retrieves documents or files or data from a database or from a computer network (especially from the internet)

-search engine noun [C]
a computer program which finds information on the Internet by looking for words which you have typed in

-computer software used to search data (as text or a database) for specified information ; also : a site on the World Wide Web that uses such software to locate key words in other sites.

shane mc loughlin said...

One further point, calling it an 'answer engine' can be dubious on epistemological grounds. Often information it displays is outdated 'facts'. Or even alleged 'facts' can be nothing other than best/popular explanation or approximation.