PyTorch MPS on Apple Silicon Just Hit 8.6 TFLOPS: Your AI Training Is Getting a Massive Boost
As an AI systems engineer who’s built and trained models on everything from cloud infrastructure to Apple Silicon, I’ve seen the evolution of on-device AI performance firsthand. Today, I’m thrilled to share that PyTorch’s Metal Performance Shaders (MPS) implementation on modern Apple Silicon (M1/M2/M3) has achieved 8.6 TFLOPS