Skip to content

damodara2006/AI-Data-Projects

Repository files navigation

AI Data Projects

This repository contains a collection of AI and data science projects focusing on Generative AI learning and PySpark data processing. The repository is organized into two main submodules, each containing specific learning materials and project implementations.

📁 Repository Structure

AI-Data-Projects/
├── GenAI-Learning/          # Generative AI learning materials
└── PySpark-Jupyter-lab/     # PySpark data processing projects

🚀 Projects Overview

1. GenAI-Learning

Repository: GenAI_Apex

A comprehensive learning journey through Generative AI concepts and implementations.

Topics Covered:

  • Day 1: Fundamentals of GenAI, Large Language Models (LLM), Natural Language Processing (NLP), and AI tools
  • Day 2: Large data handling, text generation, and image-to-text generation

Contents:

  • day2 (1).ipynb - Advanced GenAI concepts and implementations
  • openai_day2.ipynb - OpenAI API integration and usage examples

2. PySpark-Jupyter-lab

Repository: PySpark

Data processing and analytics projects using Apache Spark with Python (PySpark).

Projects Include:

  • Car Sales Dashboard (Car Sales Dashboard.ipynb) - Interactive dashboard for automotive sales analysis
  • ETL Pipeline (ETL_for_SQL_sales_data.ipynb) - Extract, Transform, Load operations for sales data
  • Spark SQL Connection (Pysparksql_Connection.ipynb) - Database connectivity and SQL operations with PySpark
  • Retail Sales Analysis (Retail Sales Data Analysis.html) - Comprehensive retail data analysis report

🛠️ Technologies Used

  • Python - Primary programming language
  • Jupyter Notebooks - Interactive development environment
  • PySpark - Large-scale data processing
  • Apache Spark - Distributed computing framework
  • OpenAI API - Generative AI capabilities
  • SQL - Data querying and manipulation

📋 Prerequisites

For GenAI-Learning:

  • Python 3.8+
  • Jupyter Notebook
  • OpenAI API key
  • Required Python packages:
    pip install openai jupyter pandas numpy matplotlib

For PySpark-Jupyter-lab:

  • Python 3.8+
  • Apache Spark
  • Jupyter Notebook
  • Required Python packages:
    pip install pyspark jupyter pandas numpy matplotlib seaborn

🚀 Getting Started

  1. Clone the repository with submodules:

    git clone --recursive https://github.com/your-username/AI-Data-Projects.git
    cd AI-Data-Projects
  2. If you've already cloned without submodules:

    git submodule update --init --recursive
  3. Navigate to specific projects:

    # For GenAI learning
    cd GenAI-Learning
    jupyter notebook
    
    # For PySpark projects
    cd PySpark-Jupyter-lab
    jupyter notebook

📊 Project Highlights

GenAI Learning Journey

  • Implementation of text generation models
  • Image-to-text conversion techniques
  • Large dataset handling strategies
  • OpenAI API integration examples

PySpark Data Analytics

  • Real-time data processing pipelines
  • Interactive sales dashboards
  • ETL workflow implementations
  • SQL integration with Spark

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📝 License

This project is open source and available under the MIT License.

🔗 Submodule Repositories

📞 Contact

For questions or collaboration opportunities, please feel free to reach out through the repository issues or discussions.


Happy Learning and Coding! 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published