Community Blog & Articles

Community Articles
view all
leaderboardresearchcollaboration

BigCodeBench: The Next Generation of HumanEval

  • +5
52
June 18, 2024
leaderboardresearchcollaboration

Launching the Artificial Analysis Text to Image Leaderboard & Arena

16
June 6, 2024
nlpresearchleaderboard

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

  • +12
23
May 24, 2024
nlpresearchleaderboard

Introducing the Open Arabic LLM Leaderboard

101
May 14, 2024
nlpresearchleaderboard

Introducing the Open Leaderboard for Hebrew LLMs!

53
May 5, 2024
leaderboardresearchcollaboration

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

14
May 3, 2024
evaluationcollaborationresearch

Improving Prompt Consistency with Structured Generations

66
April 30, 2024
leaderboardresearchcollaboration

Introducing the Open Chain of Thought Leaderboard

37
April 23, 2024
leaderboardcollaborationresearch

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

188
April 19, 2024
leaderboardresearchcollaboration

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

  • +3
16
April 16, 2024
leaderboardarenacollaboration

Introducing the Chatbot Guardrails Arena

6
March 21, 2024
leaderboardcollaborationresearch

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

  • +1
4
March 5, 2024
leaderboardarenacollaboration

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

  • +3
72
February 27, 2024
leaderboardguidecollaboration

Introducing the Red-Teaming Resistance Leaderboard

13
February 23, 2024