π Automating Routine Tasks with Python and GitHub Actions
Iβm excited to share how I recently implemented a workflow solution to simplify and streamline data tracking for NPTEL courses.
π― The Goal
Eliminate the manual effort required to monitor course metrics like "Learners Enrolled" and "Exam Registrations" and keep the data up-to-date without constant oversight.
π‘ What I Built
A fully automated data collection and update system that:
1. Scrapes NPTEL course webpages to extract critical information.
2. Updates an Excel file (data.xlsx) with the latest data.
3. Runs entirely on GitHub Actions, requiring zero manual intervention after setup.
π οΈ Tech Highlights
Hereβs a peek into the tools and technologies that made it all possible:
1. Python: For web scraping and data processing using BeautifulSoup and pandas.
2. GitHub Actions: To schedule daily runs at 6:00 AM IST and trigger workflows automatically in the cloud.
3. Excel Automation: Leveraging pandas for seamless file handling and updates.
Version Control: Ensured code reliability and traceability through GitHub.
π How It Works
1. Daily Schedule: The workflow is triggered every day at 6:00 AM IST to fetch the latest data.
2. Cloud-Hosted Automation: Thanks to GitHub Actions, the system runs without any local hardware requirements.
3. Manual Trigger Option: Users can run the workflow on demand for immediate updates.
π What I Learned
1. Advanced web scraping techniques to handle dynamic content.
2. Scheduling and managing workflows with cron in GitHub Actions.
3. Efficient data handling and file updates with Python pandas.
4. Problem-solving for cloud-based task execution and optimization.
GitHub Repo Link