Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

Monday, May 25, 2009

Searching on the web; the new breed of search engines

There has been alot of talk recently (on the web and elsewhere) about the next generation of "smarter" search engines. Below are examples of search engines which have recently gained coverage over their ability to either (1) structure and present data pulled from the web, (2) assign semantic filtering by quality, (3) search structured data on the web, (4) search the 'real-time' web or (5) search the 'deep web':

(1) Structure and present data pulled from the web

Wolfram Alpha
'We aim to collect and curate all objective data; implement every known model, method, and algorithm; and make it possible to compute whatever can be computed about anything. Our goal is to build on the achievements of science and other systematizations of knowledge to provide a single source that can be relied on by everyone for definitive answers to factual queries.' (Wolfram, 2009) http://www.wolframalpha.com/

Google Squared
'Google Squared doesn't find webpages about your topic — instead, it automatically fetches and organizes facts from across the Internet.' (Google, 2009) It extracts data from relevant webpages and presents them in squared frames on a results page.
http://squared.google.com/

Bing
Bing is built to 'go beyond today's search experience' through recognising content and adapting to your query types, providing results which are "decision driven". According to the company; "we set out to create a new type of search experience with improvements in three key areas: (1) Delivering great search results and one-click access to relevant information, (2) Creating a more organized search experience, (3) Simplifying tasks and providing tools that enable insight about key decisions." (Microsoft, 2009)
http://www.bing.com/
http://www.discoverbing.com/


Sensebot
'SenseBot delivers a summary in response to your search query instead of a collection of links to Web pages. SenseBot parses results from the Web and prepares a text summary of them. The summary serves as a digest on the topic of your query, blending together the most significant and relevant aspects of the search results. The summary itself becomes the main result of your search...Sensebot attempts to understand what the result pages are about. It uses text mining to parse Web pages and identify their key semantic concepts. It then performs multidocument summarization of content to produce a coherent summary' (Sensebot, 2008)
http://www.sensebot.net/

(2) Provide more semantic filtering of information by quality

Hakia
'Hakia’s semantic technology provides a new search experience that is focused on quality, not popularity. hakia’s quality search results satisfy three criteria simultaneously: They (1) come from credible Web sites recommended by librarians, (2) represent the most recent information available, and (3) remain absolutely relevant to the query' (Hakia, 2009)
http://www.hakia.com/

(3) Search structured data on the web

SWSE
' There is already a lot of data out there which conforms to the proposed SW standards (e.g. RDF and OWL). Small vertical vocabularies and ontologies have emerged, and the community of people using these is growing daily. People publish descriptions about themselves using FOAF (Friend of a Friend), news providers publish newsfeeds in RSS (RDF Site Summary), and pictures are being annotated using various RDF vocabularies. [SWSE is] service which continuously explores and indexes the Semantic Web and provides an easy-to-use interface through which users can find the data they are looking for. We are therefore developing a Semantic Web Search Engine' (SWSE, 2009)
http://swse.deri.org/
Swoogle
'Swoogle is a search engine for the Semantic Web on the Web. Swoogle crawl the World Wide Web for a special class of web documents called Semantic Web documents, which are written in RDF' (Swoogle, 2007)
http://swoogle.umbc.edu/
Similar offering is;
http://watson.kmi.open.ac.uk/WatsonWUI/


(4) Search the 'real-time' web

One Riot
'OneRiot crawls the links people share on Twitter, Digg and other social sharing services, then indexes the content on those pages in seconds. The end result is a search experience that allows users to find the freshest, most socially-relevant content from across the realtime web....we index our search results according to their current relevance and popularity' (Oneriot, 2009)
http://www.oneriot.com/
Scoopler
'Scoopler is a real-time search engine. We aggregate and organize content being shared on the internet as it happens, like eye-witness reports of breaking news, photos and videos from big events, and links to the hottest memes of the day. We do this by constantly indexing live updates from services including Twitter, Flickr, Digg, Delicious and more.' (Scoopler, 2009)
http://www.scoopler.com/
Collecta
'Collecta monitors the update streams of news sites, popular blogs and social media, and Flickr, so we can show you results as they happen' (Collecta. 2009).
http://www.collecta.com/

(5) Search the 'deep web'

DeepDyve
'The DeepDyve research engine uses proprietary search and indexing technology to cull rich, relevant content from thousands of journals, millions of documents, and billions of untapped Deep Web pages.' 'Researchers, students, technical professionals, business users, and other information consumers can access a wealth of untapped information that resides on the "Deep Web" – the vast majority of the Internet that is not indexed by traditional, consumer-based search engines. The DeepDyve research engine unlocks this in-depth, professional content and returns results that are not cluttered by opinion sites and irrelevant content.... The KeyPhrase™ algorithm, applies indexing techniques from the field of genomics. The algorithm matches patterns and symbols on a scale that traditional search engines cannot match, and it is perfectly suited for complex data found on the Deep Web' (Deepdyve, 2009)
http://www.deepdyve.com/

Copyright © 2006-2008 Shane McLoughlin. This article may not be resold or redistributed without prior written permission.

Friday, March 13, 2009

Twitter and it's data free for all....

The rise of Twitter
Twitter is expanding and expanding fast. A flurry of news coverage and hype about the product, particularly in the last 3 months, has seen users flock to the service. Twitter is seen to offer enormous potential, information can be filtered by content, location, keyword etc., opening up the realms of how data is used online in real time. This is in tandem with the numerous benefits of openness discussed below. However, Twitter still has some way to go. It has yet to come to terms with its own potential and how those possibilities should be steered and constrained. The service recently made some small developments to its site, with a 'trend' and 'search' facility added. However, the sophistication of its privacy and account settings is still limited. Thus, it has yet to put more control back in users hand, with regard to how their data is used and by whom. At present, it is an all or nothing affair, you're "open" or you're "private"!!. This begs the following questions, should account holders have more control over their data? If so, why should this be the case? Is openness itself constraining what people will say? Finally, If users have more control, will this stifle the success of the service?

Why openness?
The Twitter model is built largely around individuals posting short 140 character status updates, replies or retweets on any range of topic imaginable. Individuals can find and follow any other user on the service, ranging from friends to common interests, to celebrities etc. The great thing about twitter is its 'openness'. Most individuals choose to keep their profile public to ensure that they can be found by like-minded individuals, or that ongoing conversations can be picked up by interested parties etc. It means individuals have that feeling that someone out there is listening, even if it is just the possibility of feeling part of something. It is a forum for expression of the mind, even if expression is mundane. It is also a means to 'contribute' one's time, knowledge and experience and is thus an avenue of 'meaning' for individuals.

Openness ensures that those with something to offer others can more easily be heard. It engenders the possibility for more connection, collaboration, relationship and even community formation 'without' boundaries. By focusing on the content of messages and less on the full personality, it provides a different kind of social formation. The loud, influential and dominant personality may not make for interesting dialogue. Too many annoying tweets from a user and one can easily unfollow with the click of the mouse. This levels the playing field for users in many respects, as well as increasing the possibility of connection based on interest and not by persuasion. However, not everyone wishes for this openness. There is the option to set your profile 'private' in order to close your information to only those with whom you've allowed follow you.

Interpreting your past online
Full openness has its price though, Twitter first launched in March 2006, and since then, an archive of user data has slowly being amounting for all to access. Hundreds of your messages may (or may not) be carefully vetted by you, but one thoughtless twitter update may be enough to get you in to trouble at any point in the future. This may be nothing more than friends misinterpreting and taking offence to an update. But it could be something more: Recently a US cop had his status updates on Facebook and Myspace used as evidence against him in a gun trial on grounds of the accused acquittal. What was interesting about this case is how status updates became utilised and crucially 'interpreted' by the Jury. This highlights how information may be interpreted and placed into multiple contexts by whoever reads the information. Employers, even potential collaborators, may selectively choose just one suspect twitter update among hundreds as 'proof' of character, or misintrepret one's online ego as holistically representative of the individual. Twitter means your online past and identity will always be there online, waiting to be interpreted and analysed.

Analyse this!
You may think that with hundreds of recorded messages, it would be uncumbersome for anyone to want to thrall through your past data. But with twitter, software by third parties is springing up to offer just that: Twitter analyzer is just one of the free online applications available that allows you to analyse the data of "any" twitter user with an open account (hence the majority of twitter user). The bounds of what can be achieved with Twitter analyzer is limited. But it opens numerous possibilities. For beyond harmless apps like Twitscoop, which scrape status updates in order to form twitter 'trending topics' and 'buzz words', your data can be analysed in isolation or in tandem with others, in any number of ways, for any number of purposes, and by ANYONE. Twitter apps may emerge (if they don't already exist) to 'profile' individuals; to elucidate personality, truth and inconsistency, track record, literacy, interests etc. etc. etc. This is alongside the likely emergence of targeted advertising etc, and data mining of information, in order to make twitter a viable business model.

Openness on whose terms?
At present twitter has a very lax attitude to its data. If you have your profile public, your data is a free for all. If it's private, its between you, your vetted followers and twitter. This means that Twitter's so called openness may not be so open. People are constantly vetting and reflecting on what information they post on twitter. They may do it out of shyness, cautiousness, personal branding, or foresight etc. Twitter is open for many, but not too open. It's very openness curtails what dialogue does occur online. As users become aware of the ways in which their data can be used, this may further curtail individual expression. Thus, should Twitter not increase the range of choices with regard 'openness' and 'privacy'. What I would like to see is the possibility of users having the choice to make private their archive of data. For instance, what if only your recent updates were set as public? What if twitter made it difficult for those updates to be scraped by third party offerings? What if you could make replies only visible to who you follow? What if you could automatically make messages with certain 'keywords' private? What if you could make certain messages time sensitive and private after a certain period? What if you could make some status updates private to yourself? Thus, the bounds of privacy can be opened up. Will it constrain the services success however? I do not believe so, if too much openness is stifling expression and conversation on twitter, than increasing the scope of openness versus privacy, and doing it in an uncumbersome way; would perhaps increase use of the service. This choice may be the business model Twitter hopes for...


Copyright © 2009 Shane McLoughlin. This article may not be resold or redistributed without prior written permission.