
AI·
Cactus Compute distilled Gemini into a 26M tool-calling model. The trick: no feed-forward layers.
Needle is a 26M-parameter function caller distilled from Gemini 3.1 Flash-Lite. The Simple Attention Network drops MLPs and runs at 6,000 tok/s prefill on edge silicon.