🏆 CoIR: Code Information Retrieval Benchmark 🏆
This leaderboard evaluates various models on different Code Retrieval tasks.
Rank | Model | Model Size (Million Parameters) | Apps | CosQA | Synthetic Text2sql | CodeSearchNet | CodeSearchNet-CCR | CodeTrans-Contest | CodeTrans-DL | StackOverFlow QA | CodeFeedBack-ST | CodeFeedBack-MT | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Voyage-Code-002 | - | 26.52 | 29.79 | 69.26 | 81.79 | 73.45 | 72.77 | 27.28 | 87.68 | 65.35 | 28.74 | 56.26 |
2 | E5-Mistral | 7000 | 21.33 | 31.27 | 65.98 | 54.25 | 65.27 | 82.55 | 33.24 | 91.54 | 72.71 | 33.65 | 55.18 |
3 | E5-Base-v2 | 110 | 11.52 | 32.59 | 52.31 | 67.99 | 56.87 | 62.5 | 21.87 | 86.86 | 74.52 | 41.99 | 50.9 |
4 | OpenAI-Ada-002 | - | 8.7 | 28.88 | 58.32 | 74.21 | 69.13 | 53.34 | 26.04 | 72.4 | 47.12 | 17.74 | 45.59 |
5 | BGE-Base-en-v1.5 | 110 | 4.05 | 32.76 | 45.59 | 69.6 | 45.56 | 38.5 | 21.71 | 73.55 | 64.99 | 31.42 | 42.77 |
6 | BGE-M3 | 567 | 7.37 | 22.73 | 48.76 | 43.23 | 47.55 | 47.86 | 31.16 | 61.04 | 49.94 | 33.46 | 39.31 |
7 | UniXcoder | 123 | 1.36 | 25.14 | 50.45 | 60.2 | 58.36 | 41.82 | 31.03 | 44.67 | 36.02 | 24.21 | 37.33 |
8 | GTE-Base-en-v1.5 | 110 | 3.24 | 30.24 | 46.19 | 43.35 | 35.5 | 33.81 | 28.8 | 62.71 | 55.19 | 28.48 | 36.75 |
9 | Contriever | 110 | 5.14 | 14.21 | 45.46 | 34.72 | 35.74 | 44.16 | 24.21 | 66.05 | 55.11 | 39.23 | 36.4 |
📝 Notes
- Most models are based on mean pooling, which may exhibit some performance differences compared to cls pooling. You can upload better results to update the table contents accordingly.
- Similar to BEIR and MTEB, you should submit the evaluation results of a single model across all datasets at once. Incremental submissions that optimize results for individual datasets are not acceptable.
- Click on the model name to see detailed information about the model.
- Models are ranked according to their average performance score.
🤗 More Information
For more detailed evaluations and benchmarks, visit the following resources: