
AI·
Claude Opus 4.7 is here, and the long-context benchmarks got worse
Anthropic's Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.
2 articles mention Claude Mythos Preview on devtake.dev.

Anthropic's Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.

OpenAI's new cybersecurity-tuned model can reverse-engineer binaries and analyze malware. It's restricted to verified defenders through the Trusted Access program.