This repository contains a collection of AI and data science projects focusing on Generative AI learning and PySpark data processing. The repository is organized into two main submodules, each containing specific learning materials and project implementations.
AI-Data-Projects/
├── GenAI-Learning/ # Generative AI learning materials
└── PySpark-Jupyter-lab/ # PySpark data processing projects
Repository: GenAI_Apex
A comprehensive learning journey through Generative AI concepts and implementations.
Topics Covered:
- Day 1: Fundamentals of GenAI, Large Language Models (LLM), Natural Language Processing (NLP), and AI tools
- Day 2: Large data handling, text generation, and image-to-text generation
Contents:
day2 (1).ipynb- Advanced GenAI concepts and implementationsopenai_day2.ipynb- OpenAI API integration and usage examples
Repository: PySpark
Data processing and analytics projects using Apache Spark with Python (PySpark).
Projects Include:
- Car Sales Dashboard (
Car Sales Dashboard.ipynb) - Interactive dashboard for automotive sales analysis - ETL Pipeline (
ETL_for_SQL_sales_data.ipynb) - Extract, Transform, Load operations for sales data - Spark SQL Connection (
Pysparksql_Connection.ipynb) - Database connectivity and SQL operations with PySpark - Retail Sales Analysis (
Retail Sales Data Analysis.html) - Comprehensive retail data analysis report
- Python - Primary programming language
- Jupyter Notebooks - Interactive development environment
- PySpark - Large-scale data processing
- Apache Spark - Distributed computing framework
- OpenAI API - Generative AI capabilities
- SQL - Data querying and manipulation
- Python 3.8+
- Jupyter Notebook
- OpenAI API key
- Required Python packages:
pip install openai jupyter pandas numpy matplotlib
- Python 3.8+
- Apache Spark
- Jupyter Notebook
- Required Python packages:
pip install pyspark jupyter pandas numpy matplotlib seaborn
-
Clone the repository with submodules:
git clone --recursive https://github.com/your-username/AI-Data-Projects.git cd AI-Data-Projects -
If you've already cloned without submodules:
git submodule update --init --recursive
-
Navigate to specific projects:
# For GenAI learning cd GenAI-Learning jupyter notebook # For PySpark projects cd PySpark-Jupyter-lab jupyter notebook
- Implementation of text generation models
- Image-to-text conversion techniques
- Large dataset handling strategies
- OpenAI API integration examples
- Real-time data processing pipelines
- Interactive sales dashboards
- ETL workflow implementations
- SQL integration with Spark
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is open source and available under the MIT License.
- GenAI-Learning: GenAI_Apex Repository
- PySpark-Jupyter-lab: PySpark Repository
For questions or collaboration opportunities, please feel free to reach out through the repository issues or discussions.
Happy Learning and Coding! 🚀