🏆 CoIR: Code Information Retrieval Benchmark 🏆

Performance of Various Models
Rank	Model	Model Size (Million Parameters)	Apps	CosQA	Synthetic Text2sql	CodeSearchNet	CodeSearchNet-CCR	CodeTrans-Contest	CodeTrans-DL	StackOverFlow QA	CodeFeedBack-ST	CodeFeedBack-MT	Avg
1	Salesforce/SFR-Embedding-Code-2B_R	2000	74.99	36.31	59.0	73.5	85.77	86.63	33.17	90.54	81.15	53.08	67.41
2	CodeSage-large-v2	1300	50.45	32.73	59.78	94.26	78.09	85.27	33.29	79.41	71.32	57.16	64.18
3	Salesforce/SFR-Embedding-Code-400M_R	400	48.57	34.05	58.96	72.53	80.15	75.67	34.85	89.51	78.87	45.75	61.89
4	CodeSage-large	1300	34.16	28.59	57.9	90.58	84.36	80.1	33.45	79.46	66.06	55.72	61.04
5	Voyage-Code-002	-	26.52	29.79	69.26	81.79	73.45	72.77	27.28	87.68	65.35	28.74	56.26
6	E5-Mistral	7000	21.33	31.27	65.98	54.25	65.27	82.55	33.24	91.54	72.71	33.65	55.18
7	E5-Base-v2	110	11.52	32.59	52.31	67.99	56.87	62.5	21.87	86.86	74.52	41.99	50.9
8	OpenAI-Ada-002	-	8.7	28.88	58.32	74.21	69.13	53.34	26.04	72.4	47.12	17.74	45.59
9	BGE-Base-en-v1.5	110	4.05	32.76	45.59	69.6	45.56	38.5	21.71	73.55	64.99	31.42	42.77
10	BGE-M3	567	7.37	22.73	48.76	43.23	47.55	47.86	31.16	61.04	49.94	33.46	39.31
11	UniXcoder	123	1.36	25.14	50.45	60.2	58.36	41.82	31.03	44.67	36.02	24.21	37.33
12	GTE-Base-en-v1.5	110	3.24	30.24	46.19	43.35	35.5	33.81	28.8	62.71	55.19	28.48	36.75
13	Contriever	110	5.14	14.21	45.46	34.72	35.74	44.16	24.21	66.05	55.11	39.23	36.4

Most models are based on mean pooling, which may exhibit some performance differences compared to cls pooling. You can upload better results to update the table contents accordingly.
Similar to BEIR and MTEB, you should submit the evaluation results of a single model across all datasets at once. Incremental submissions that optimize results for individual datasets are not acceptable.
Click on the model name to see detailed information about the model.
Models are ranked according to their average performance score.

For more detailed evaluations and benchmarks, visit the following resources: