Skip to content

Working with ETTU

Mikhail Gorbunov edited this page Dec 2, 2019 · 3 revisions

Parsing ETTU website

Here is parsing script:

📜 Python 3+ is required as well as installed bs4 and requests packages.

import JSON

from bs4 import BeautifulSoup
import requests

BASE_URL = 'http://m.ettu.ru/'
STATIONS = '1 4 7 А Б В Г Д Е Ж З И К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Э Ю Я'.split(' ')
BASE_STATION_APPENDANT = 'stations'

trams = {}
trolleys = {}


def save(title, link):
    if state == 'Трамваи':
        trams[title] = link
    elif state == 'Троллейбусы':
        trolleys[title] = link
    else:
        print('WARN state=', state)


def make_abs(link):
    if link is None:
        print('WARN, Link is none!')
        return link
    return BASE_URL + link


for letter in STATIONS:

    state = 'Системное'  # tram or trolley

    page = requests.get(BASE_URL + BASE_STATION_APPENDANT + '/' + letter)
    page_dec = page.content.decode("utf-8")
    page_rdy = page_dec.split('\n')

    for line in page_rdy:
        if '<h3>' in line:
            soup = BeautifulSoup(line, 'html.parser')
            title = soup.find('h3').string
            state = title

        if '<a' in line:
            soup = BeautifulSoup(line, 'html.parser')
            tag = soup.find('a')
            title = tag.string
            link = tag.get('href')
            save(title, make_abs(link))

with open('ettuTrams.json', 'w', encoding='utf8') as file:
    json.dump(trams, file, ensure_ascii=False, indent=1)

with open('ettuTrolleys.json', 'w', encoding='utf8') as file:
    json.dump(trolleys, file, ensure_ascii=False, indent=1)


📜 The script outputs two JSON files, that would be placed in .../$PWD with trams.json name and trolleys.json name respectively.

Output format:

   $STATION_NAME:$STATION_URL

Where:

STATION_NAME — Is actually a station name
STATION_URL — URL of the station  

An example:

{
   "1-й км (на Пионерскую)": "http://m.ettu.ru//station/1168",
   "1-й км (на Техучилище)": "http://m.ettu.ru//station/1169"
}

Clone this wiki locally