Search 3.0
Search 3.0
Search 3.0_Introduction
Contest has ended! Congratulations winners!
Ever tried to Google the answer to a homework question?
Ever tried asking the question "I want to see a horror movie and then eat at a good
Italian restaurant. What are my options?"
Ever thought that present day search engines are not enough?
The day when you can use the web as your personal assistant is closer than you think.
In the journey to this end, the first stop would be to extract information of
interest, from websites and store it for later use.
This is exactly what you would be expect to do in this event.
The prerequisites? Basic knowledge of programming, and a working brain.
Web 3.0 is coming... are you ready?
The Semantic Web
An overview
The web as we know it today does not carry semantic meaning which can be interpreted by browsers, or even search engines. For instance,I can't search for a term "Computer peripherals for under Rs. 1000". That is because, as far as the search engine (say Google) is concerned, price is just another text entry. Only if the search engine can interpret the entry Rs 1000, can it understand the word under, and only then can it present the actual results that you need.
The future
There are several standards emerging to attach semantic information to web pages. One such method is using microformats to denote certain portions of a page to denote event invitations, a location, a contact and so on.
Since it is a standard, one can easily parse it and also understand the value described.
The present
Unfortunately as it stands now, the web does not carry semantic information explicitly attached to the values.
Notice the mention of explicitly. It is possible to infer the meaning of the text as long as you have already understood the structure of the page.
That is where this event comes in. Process a present day HTML web page, [non-semantic,purely presentational], to generate useful information [purely semantic]. If you do that, you can claim to be better than Google (at least for those web sites [:)]).
Search 3.0_General Information
Tutorials
Want to know more about the event? Does the problem statement seem too ambiguous, attemptable only by the CS Supremo? Don't worry, by the time you are done with this tutorial, you will be able to attempt the problems with confidence that they can, indeed, be solved.
The first thing that may appear daunting is the task of actually parsing the web page, How will I get the HTML Page to a form that can be handled easily? For that, you may want to look at HTML parsers for the language of your choice. For example, Java has this convenient library called JTidy which can get you a dom tree from the HTML source.
Don't want to go to the trouble of looking at that wiki? well, dom trees are just the representation of a HTML document as a tree structure, the nodes of the trees are the tags, and the leaves are the textual content.
<html> <title> Hello, world! </title> <body> This is the hello world page! </body> </html>
html would be the root, the title and the body tags as its children. The text (called #text) , the children of the title and the body nodes.
Jee, how does that help me find out where I can look for the information that the problem statement asks, you ask?
you can see that the price always is to the right of Buy It Now price and so on. So, if we now turn our attention back to our dom tree, we just need to visit the correct tree [how?] and, voila! you have the desired data!
Too easy, ain't it?
... ... (wait for it...) (wait for it....) ... ... ...Ah yes, Light bulb moment! What if the entries are not clear and cut out, I want a box of 12 knives, but the web page only sells individual knives?
Oh! That's where you get to use your
Still not got the hang of this?, go through the presentation and the videos to firm your footing.
Also please do keep in mind that the problems are graded, and the ones in the beginning will bring you up to speed on whatever skills you will need to solve the later problems.
Quick tutorial: Search 3.0 in 3 minutes
Video Tutorials
Tutorial 1
Tutorial 2
Tutorial 3
Tutorial 4
Search 3.0_Event Format
- The event is an individual online event.
- Participants must register on the contest portal. Registration is now closed.
- The contest will run for two weeks before Shaastra, from 20th September to 3rd October 2009.
- The problem statements will be uploaded on the contest portal.
- Participants can submit their solutions and these will be evaluated online by our judge. Solutions could be programs in any of various supported programming languages (C, C++, Java, Perl, Python) or data (text, xml, html data) depending on the problem statement.
- Scores will be based on the closeness of the solution provided to the optimum. These will be available as soon as the solution is submitted.
- The complete details of rules and scoring are available on the contest portal here.
Search 3.0_Rules and Regulations
- Contest is open to students only.
- The event is individual. One person can't register under more than once.
- Languages allowed: C/C++/Java/Perl/Python.
- The constraints on time, memory, etc. shall be specified separately
for each problem. - For all the problems, unless otherwise specified, the input should be read from stdin and the output should be written to stdout.
- Maximum of 20 submissions will be allowed for a problem.
- No system calls allowed.
- You are advised to write compiler independent programs. Libraries available in the judge will be updated before the contest
- Any sort of malacious activity will lead to immediate disqualification of the user in question.
- Admins have the authority to suspend/disqualify any user at any stage of
the contest, if they suspect any malpractices/plagiarism. - Ties in total number of points will be resolved based on the time of the
last correct submission. If a tie still remains, we will break it on
the basis of total number of submissions.
Search 3.0_FAQ
What can I win?
- I prize -- Rs 12,000
- II prize -- Rs 8,000
- III prize -- Rs 5,000
Consolation certificates will be given to 5 participants.
What are the pre-requisites?
Basic knowledge of programming(C/C++) is sufficient. Knowledge of natural language processing and information extraction will help.
Do we need to register for the event?
The event registration is now closed.
Is this a team event?
Participation is individual.
Who is eligible to participate?
The contest is open to students only. Students from any university in India or abroad are eligible to participate.
Where do I start!?
Please consult the tutorial page. Also, consult our contest portal for the latest updates and details.
For the more information and updates, visit our contest portal.
Thank you for participating in Search 3.0, Shaastra 2009.
The results of the contest are as follows. Congratulations, prize winners!
- First Prize: Prathab - 12,000 INR
- Second Prize: Vimalkumar - 8,000 INR
- Third Prize: Ramesh Kumar P - 5,000 INR
Consolation prizes: (certificates)
Please notify us, if there are inconsistencies.
The winning teams will be notified through email regarding the procedure for collecting prize money.




















