1- # Celcat Calendar Scraper
1+ # Celcat Calendar Scraper 📆
22
33An asynchronous Python library for scraping Celcat calendar systems.
44
5- ## Installation
5+ ## Installation 🚀
66
77``` sh
88pip install celcat-scraper
99```
1010
11- ## Usage
11+ ## Features 🌟
12+
13+ * Event attributes filtering 🔎
14+ * Async/await support for better performance 🔀
15+ * Rate limiting with adaptive backoff ⏳
16+ * Optional caching support 💾
17+ * Optional reusable aiohttp session ♻️
18+ * Automatic session management 🍪
19+ * Batch processing of events 📦
20+ * Error handling and retries 🚨
21+
22+ ## Usage ⚙️
1223
1324Basic example of retrieving calendar events:
1425
@@ -23,42 +34,107 @@ async def main():
2334 url = " https://university.com/calendar" ,
2435 username = " your_username" ,
2536 password = " your_password" ,
26- include_holidays = True
37+ include_holidays = True ,
2738 )
2839
2940 # Create scraper instance and get events
3041 async with CelcatScraperAsync(config) as scraper:
31-
3242 start_date = date.today()
3343 end_date = start_date + timedelta(days = 30 )
34-
44+
3545 # Recommended to store events locally and reduce the amout of requests
36- file_path = ' store.json'
46+ file_path = " store.json"
3747 events = scraper.deserialize_events(file_path)
38-
39- events = await scraper.get_calendar_events(start_date, end_date, previous_events = events)
40-
48+
49+ events = await scraper.get_calendar_events(
50+ start_date, end_date, previous_events = events
51+ )
52+
4153 for event in events:
4254 print (f " Event { event[' id' ]} " )
4355 print (f " Course: { event[' category' ]} - { event[' course' ]} " )
4456 print (f " Time: { event[' start' ]} to { event[' end' ]} " )
4557 print (f " Location: { ' , ' .join(event[' rooms' ])} at { ' , ' .join(event[' sites' ])} - { event[' department' ]} " )
4658 print (f " Professors: { ' , ' .join(event[' professors' ])} " )
4759 print (" ---" )
48-
60+
4961 # Save events for a future refresh
5062 scraper.serialize_events(events, file_path)
5163
5264if __name__ == " __main__" :
5365 asyncio.run(main())
5466```
5567
56- ## Features
68+ ## Filtering 🔍
69+
70+ Celcat Calendar data is often messy, and needs to be processed before it can be used.
71+ For example, the same course may have several different names in different events.
72+ Filtering allows these attributes to be standardized.
73+
74+ ### Usage ⚙️
75+
76+ > ℹ️ ** Info** : Each filter argument is optional. When course_strip_redundant is enabled, using remembered_strips is recommended.
5777
58- * Async/await support for better performance
59- * Rate limiting with adaptive backoff
60- * Optional caching support
61- * Optional reusable aiohttp session
62- * Automatic session management
63- * Batch processing of events
64- * Error handling and retries
78+ > ⚠️ ** Warning** : Disabling filters will require you to reset your previous events and refetch to undo changes.
79+
80+ ``` python
81+ import asyncio
82+ from datetime import date, timedelta
83+ import json
84+ from celcat_scraper import CelcatFilterConfig, CelcatConfig, CelcatScraperAsync
85+
86+ async def main ():
87+ # Load remembered_strips from a file
88+ remembered_strips = []
89+ try :
90+ with open (" remembered_strips.json" , " r" ) as f:
91+ remembered_strips = json.load(f)
92+ except (FileNotFoundError , json.JSONDecodeError):
93+ remembered_strips = []
94+
95+ # Create a list of manual course replacements
96+ course_replacements = {" English - S2" : " English" , " Mathematics" : " Maths" }
97+
98+ # Configure a filter
99+ celcat_filter = CelcatFilterConfig(
100+ course_title = True ,
101+ course_strip_modules = True ,
102+ course_strip_category = True ,
103+ course_strip_punctuation = True ,
104+ course_group_similar = True ,
105+ course_strip_redundant = True ,
106+ course_remembered_strips = remembered_strips,
107+ course_replacements = course_replacements,
108+ professors_title = True ,
109+ rooms_title = True ,
110+ rooms_strip_after_number = False ,
111+ sites_title = True ,
112+ )
113+
114+ config = CelcatConfig(
115+ url = " https://university.com/calendar" ,
116+ username = " your_username" ,
117+ password = " your_password" ,
118+ include_holidays = True ,
119+ # Pass the filter as an argument
120+ custom_filter = celcat_filter,
121+ )
122+
123+ async with CelcatScraperAsync(config) as scraper:
124+ start_date = date.today()
125+ end_date = start_date + timedelta(days = 30 )
126+
127+ events = scraper.deserialize_events(" store.json" )
128+ events = await scraper.get_calendar_events(
129+ start_date, end_date, previous_events = events
130+ )
131+
132+ scraper.serialize_events(events, file_path)
133+
134+ # Save the updated remembered_strips back to file
135+ with open (" remembered_strips.json" , " w" ) as f:
136+ json.dump(scraper.filter_config.course_remembered_strips, f)
137+
138+ if __name__ == " __main__" :
139+ asyncio.run(main())
140+ ```
0 commit comments