-
Notifications
You must be signed in to change notification settings - Fork 0
Working with ETTU
Mikhail Gorbunov edited this page Dec 2, 2019
·
3 revisions
Here is parsing script:
📜 Python 3+ is required as well as installed bs4 and requests packages.
import JSON
from bs4 import BeautifulSoup
import requests
BASE_URL = 'http://m.ettu.ru/'
STATIONS = '1 4 7 А Б В Г Д Е Ж З И К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Э Ю Я'.split(' ')
BASE_STATION_APPENDANT = 'stations'
trams = {}
trolleys = {}
def save(title, link):
if state == 'Трамваи':
trams[title] = link
elif state == 'Троллейбусы':
trolleys[title] = link
else:
print('WARN state=', state)
def make_abs(link):
if link is None:
print('WARN, Link is none!')
return link
return BASE_URL + link
for letter in STATIONS:
state = 'Системное' # tram or trolley
page = requests.get(BASE_URL + BASE_STATION_APPENDANT + '/' + letter)
page_dec = page.content.decode("utf-8")
page_rdy = page_dec.split('\n')
for line in page_rdy:
if '<h3>' in line:
soup = BeautifulSoup(line, 'html.parser')
title = soup.find('h3').string
state = title
if '<a' in line:
soup = BeautifulSoup(line, 'html.parser')
tag = soup.find('a')
title = tag.string
link = tag.get('href')
save(title, make_abs(link))
with open('ettuTrams.json', 'w', encoding='utf8') as file:
json.dump(trams, file, ensure_ascii=False, indent=1)
with open('ettuTrolleys.json', 'w', encoding='utf8') as file:
json.dump(trolleys, file, ensure_ascii=False, indent=1)
📜 The script outputs two JSON files, that would be placed in .../$PWD with trams.json name and trolleys.json name respectively.
Output format:
$STATION_NAME:$STATION_URL
Where:
STATION_NAME — Is actually a station name
STATION_URL — URL of the station
An example:
{
"1-й км (на Пионерскую)": "http://m.ettu.ru//station/1168",
"1-й км (на Техучилище)": "http://m.ettu.ru//station/1169"
}
mm-oop-2019