🏆 CoIR: Code Information Retrieval Benchmark 🏆
This leaderboard evaluates various models on different Code Retrieval tasks.
Rank | Model | Model Size (Million Parameters) | Apps | CosQA | Synthetic Text2sql | CodeSearchNet | CodeSearchNet-CCR | CodeTrans-Contest | CodeTrans-DL | StackOverFlow QA | CodeFeedBack-ST | CodeFeedBack-MT | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Salesforce/SFR-Embedding-Code-2B_R | 2000 | 74.99 | 36.31 | 59.0 | 73.5 | 85.77 | 86.63 | 33.17 | 90.54 | 81.15 | 53.08 | 67.41 |
2 | CodeSage-large-v2 | 1300 | 50.45 | 32.73 | 59.78 | 94.26 | 78.09 | 85.27 | 33.29 | 79.41 | 71.32 | 57.16 | 64.18 |
3 | Salesforce/SFR-Embedding-Code-400M_R | 400 | 48.57 | 34.05 | 58.96 | 72.53 | 80.15 | 75.67 | 34.85 | 89.51 | 78.87 | 45.75 | 61.89 |
4 | CodeSage-large | 1300 | 34.16 | 28.59 | 57.9 | 90.58 | 84.36 | 80.1 | 33.45 | 79.46 | 66.06 | 55.72 | 61.04 |
5 | Voyage-Code-002 | - | 26.52 | 29.79 | 69.26 | 81.79 | 73.45 | 72.77 | 27.28 | 87.68 | 65.35 | 28.74 | 56.26 |
6 | E5-Mistral | 7000 | 21.33 | 31.27 | 65.98 | 54.25 | 65.27 | 82.55 | 33.24 | 91.54 | 72.71 | 33.65 | 55.18 |
7 | E5-Base-v2 | 110 | 11.52 | 32.59 | 52.31 | 67.99 | 56.87 | 62.5 | 21.87 | 86.86 | 74.52 | 41.99 | 50.9 |
8 | OpenAI-Ada-002 | - | 8.7 | 28.88 | 58.32 | 74.21 | 69.13 | 53.34 | 26.04 | 72.4 | 47.12 | 17.74 | 45.59 |
9 | BGE-Base-en-v1.5 | 110 | 4.05 | 32.76 | 45.59 | 69.6 | 45.56 | 38.5 | 21.71 | 73.55 | 64.99 | 31.42 | 42.77 |
10 | BGE-M3 | 567 | 7.37 | 22.73 | 48.76 | 43.23 | 47.55 | 47.86 | 31.16 | 61.04 | 49.94 | 33.46 | 39.31 |
11 | UniXcoder | 123 | 1.36 | 25.14 | 50.45 | 60.2 | 58.36 | 41.82 | 31.03 | 44.67 | 36.02 | 24.21 | 37.33 |
12 | GTE-Base-en-v1.5 | 110 | 3.24 | 30.24 | 46.19 | 43.35 | 35.5 | 33.81 | 28.8 | 62.71 | 55.19 | 28.48 | 36.75 |
13 | Contriever | 110 | 5.14 | 14.21 | 45.46 | 34.72 | 35.74 | 44.16 | 24.21 | 66.05 | 55.11 | 39.23 | 36.4 |
📝 Notes
- Most models are based on mean pooling, which may exhibit some performance differences compared to cls pooling. You can upload better results to update the table contents accordingly.
- Similar to BEIR and MTEB, you should submit the evaluation results of a single model across all datasets at once. Incremental submissions that optimize results for individual datasets are not acceptable.
- Click on the model name to see detailed information about the model.
- Models are ranked according to their average performance score.
🤗 More Information
For more detailed evaluations and benchmarks, visit the following resources: