🏆 CoIR: Code Information Retrieval Benchmark 🏆

This leaderboard evaluates various models on different Code Retrieval tasks.

github paper
Performance of Various Models
Rank Model Model Size (Million Parameters) Apps CosQA Synthetic Text2sql CodeSearchNet CodeSearchNet-CCR CodeTrans-Contest CodeTrans-DL StackOverFlow QA CodeFeedBack-ST CodeFeedBack-MT Avg
1 Voyage-Code-002 - 26.52 29.79 69.26 81.79 73.45 72.77 27.28 87.68 65.35 28.74 56.26
2 E5-Mistral 7000 21.33 31.27 65.98 54.25 65.27 82.55 33.24 91.54 72.71 33.65 55.18
3 E5-Base-v2 110 11.52 32.59 52.31 67.99 56.87 62.5 21.87 86.86 74.52 41.99 50.9
4 OpenAI-Ada-002 - 8.7 28.88 58.32 74.21 69.13 53.34 26.04 72.4 47.12 17.74 45.59
5 BGE-Base-en-v1.5 110 4.05 32.76 45.59 69.6 45.56 38.5 21.71 73.55 64.99 31.42 42.77
6 BGE-M3 567 7.37 22.73 48.76 43.23 47.55 47.86 31.16 61.04 49.94 33.46 39.31
7 UniXcoder 123 1.36 25.14 50.45 60.2 58.36 41.82 31.03 44.67 36.02 24.21 37.33
8 GTE-Base-en-v1.5 110 3.24 30.24 46.19 43.35 35.5 33.81 28.8 62.71 55.19 28.48 36.75
9 Contriever 110 5.14 14.21 45.46 34.72 35.74 44.16 24.21 66.05 55.11 39.23 36.4

📝 Notes

  1. Most models are based on mean pooling, which may exhibit some performance differences compared to cls pooling. You can upload better results to update the table contents accordingly.
  2. Similar to BEIR and MTEB, you should submit the evaluation results of a single model across all datasets at once. Incremental submissions that optimize results for individual datasets are not acceptable.
  3. Click on the model name to see detailed information about the model.
  4. Models are ranked according to their average performance score.

🤗 More Information

For more detailed evaluations and benchmarks, visit the following resources: