-
Notifications
You must be signed in to change notification settings - Fork 0
UPSTREAM PR #16985: Add circular tiling support to conv2d and pad, for Vulkan, CUDA, and CPU (used for making seamless textures) #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: LLaMA.cpp Critical FunctionsCritical Function Performance ChangesPrimary Impact: Convolution OperationsFunction:
Secondary Impact: Memory ManagementFunction:
KPI Impact Analysis1. Tokens Per Second ImpactReference Baseline: 7% reduction in tokens/second when Direct Impact Functions:
Assessment: Minimal direct impact on tokens/second as primary inference functions remain unchanged. Convolution degradation affects specific model architectures using depthwise convolutions. 2. Power Consumption ImpactBinary-Level Changes:
Total System Impact: <0.1% across all binaries, indicating minimal power consumption changes. 3. Quantization EfficiencyNo Impact Detected: Analysis shows no changes to quantization-related functions:
4. Memory Usage ImpactAffected Functions:
Memory Management Functions: No changes detected in core memory functions:
5. Batch Processing ImpactNo Direct Impact: Core batch processing functions show no performance changes:
Root Cause AnalysisConvolution Performance DegradationPrimary Causes:
Memory Allocation BottleneckContributing Factors:
Action Items for Performance OptimizationImmediate Code-Level Optimizations
Build System Optimizations
Memory Management Improvements
Performance Impact AssessmentOverall System Impact: The changes introduce localized performance degradation in convolution operations without affecting core inference pipeline functions. The 64% increase in convolution response time primarily impacts models using depthwise convolutions, while standard transformer inference remains unaffected. Critical Path Analysis: Core LLaMA.cpp inference functions ( |
b16251e to
95f6e9b
Compare
87bfdb3 to
a14857a
Compare
Mirrored from ggml-org/llama.cpp#16985
This adds extra functions
That have equivalent signatures to the non-circular versions (I considered modifying the existing ones, but didn't want to break existing code). Instead of padding with zeros, they act "on a torus" and loop x and y around.
I implemented this for CUDA, CPU, and Vulkan, as those are the primary backends people use in KoboldCpp/Stable Diffusion Cpp to generate images. For other backends, it'll fall back to non-circular.
This can be used to make seamless textures, see leejet/stable-diffusion.cpp#914 for an example and the changes needed on the image generation side. For some models (Stable Diffusion) simply calling the circular functions is sufficient, for other models (Qwen Image) you need to modify Rope embeddings slightly as well (so they cleanly loop).
I ran CI tests and added tests for these, but happy to answer any questions/modify things as needed.