Get started with B.Tech Project

Courses Info

The aim of this project is to build a prototype of a large-scale search engine which works on millions of wikipedia pages(which are in xml format of more than 44GB) and retrieves the top-10 relevant wikipedia documents that matches the input query. This search engine takes wikipedia Corpus in XML format which is available at wikipedia.org as input. Then it indices millions of wikipedia pages involving a comparable number of distinct terms. The final size of the index will be around 7-10 GB. Then given a query, it retrieves relevant ranked documents and their titles using this index.

Domains

Information Retrieval, Search engines

Technologies Needed

  • Git (Syllabus)
  • Java (Syllabus)
  • JavaScript (Syllabus)
  • HTML CSS (Syllabus)
  • Tomcat
  • Project Structure

    What we provide
    • Videos on Required technologies by Ravindrababu Ravula
    • Project Implementation Videos by Ravindrababu Ravula
    • Presentation Slides
    • Assignments with solutions
    • 12 weeks of expert guidance
    • Assistance to Complete the Documentation