MiniMax announces the release of a new evaluation set for Coding Agent open-source.
On January 14th, MiniMax announced the release of a new benchmark evaluation set called OctoCodingBench for Coding Agents. MiniMax stated that based on this evaluation set, they have conducted extensive evaluations on existing open-source and closed-source models and found some very insightful experimental results: all models can achieve a check-level accuracy of over 80%, but instance-level success rates are only between 10% and 30%; the instruction compliance ability of most models gradually decreases as the number of rounds increases; currently, models generally fail to meet production requirements, and process compliance is still a blind spot; open-source models are rapidly catching up with closed-source models.
Latest
5 m ago

