CodeMMLU Leaderboard

A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs


blog leaderboard data
github paper

πŸ“ Notes

  1. Evaluated using CodeMMLU
  2. Models are ranked according to Accuracy using greedy decoding.
  3. "Size" here is the amount of activated model weight during inference.

πŸ€— More Leaderboards

In addition to CodeMMLU leaderboards, it is recommended to comprehensively understand LLM coding ability through a diverse set of benchmarks and leaderboards, such as:

πŸ™ Acknowledgements

  • We thank the EvalPlus and BigCode teams for providing the leaderboard template.