during my own testing and people's feedbacks it seems that some kernels on M1 has precision issues in RoPE. Likely due to sin/cos. https://github.com/skyzh/tiny-llm/issues/27