Skip to content

Here I scrape information from certain webpages with an aim to use machine learning algorithms on them later.

Notifications You must be signed in to change notification settings

gtm2122/webscraping

Repository files navigation

The script check.py finds the names of the customers of a company (Adroll in this case)

The script features.py scrapes data from mattermark.com based on the customers as well as the products each custormer uses and stores them as a dictionary in adroll_customers.txt

abc.txt has the names of all the customers, if you look carefully you see that only 120 of the companies are unique, and they get repeated periodically, this is because the website needs to give permission to your IP to view all the customers.

About

Here I scrape information from certain webpages with an aim to use machine learning algorithms on them later.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages