GitHub - rystecher/Crawler: Crawls the web, written in python.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README		README
crawl.py		crawl.py

Repository files navigation

Crawls the web for links and info, made in Python.

In the command line: "print crawl_web(url)"

The function will output a list titled 'index' which consists of keywords 
and urls associated with the keywords.

As of right now, I'm still working out some bugs, so when you run the function above, it will start with my website. In the future, you'll be able to search any website.

Current Problem: 

It goes on forever since it manages to find a link on every webpage it visits. I've limited the program to find 5 links and stop. 

...Apparently Google discovered that for every link you find on a website (on average),
two more links will be available when following that link!