Community Blog & Articles

Community Articles

Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand

TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval

Norm-Preserving Biprojected Abliteration

Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement

An Edge-First Generalized LLM LoRA Fine-Tuning Framework for Heterogeneous GPUs

Uncensor any LLM with abliteration

Code a simple RAG from scratch

Building Jobly: Semantic Job Matching with RAG and Vector Embeddings

AI Energy Score v2: Refreshed Leaderboard, now with Reasoning 🧠

DeepFabric: Generate, Train and Evaluate with Datasets curated for Model Behavior Training.

Gemini-3 Benchmarkathon

Building and evaluating Multimodal Rerankers

Engineering Notes: Training a LoRA for Z-Image Turbo with the Ostris AI Toolkit

Mastering Tensor Dimensions in Transformers

KV Caching Explained: Optimizing Transformer Inference Efficiency

Curating datasets directly on the Hub

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

From GRPO to DAPO and GSPO: What, Why, and How

GSMA Open-Telco LLM Benchmarks 2.0: The first dedicated LLM Evaluation for Telecoms

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

leaderboardresearchcollaboration

BigCodeBench: The Next Generation of HumanEval

+5

leaderboardresearchcollaboration

Launching the Artificial Analysis Text to Image Leaderboard & Arena

nlpresearchleaderboard

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

+12

nlpresearchleaderboard

Introducing the Open Arabic LLM Leaderboard

nlpresearchleaderboard

Introducing the Open Leaderboard for Hebrew LLMs!

leaderboardresearchcollaboration

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

evaluationcollaborationresearch

Improving Prompt Consistency with Structured Generations

leaderboardresearchcollaboration

Introducing the Open Chain of Thought Leaderboard

leaderboardcollaborationresearch

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

leaderboardresearchcollaboration

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

+3

leaderboardarenacollaboration

Introducing the Chatbot Guardrails Arena

leaderboardcollaborationresearch

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

+1

leaderboardarenacollaboration

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

+3

February 27, 2024

leaderboardguidecollaboration

Introducing the Red-Teaming Resistance Leaderboard

February 23, 2024

Community Articles

NEW Articles from Team or Enterprise organizations will get promoted to the main section.

Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand

TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval

Norm-Preserving Biprojected Abliteration

Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement

An Edge-First Generalized LLM LoRA Fine-Tuning Framework for Heterogeneous GPUs

Uncensor any LLM with abliteration

Code a simple RAG from scratch

Building Jobly: Semantic Job Matching with RAG and Vector Embeddings

AI Energy Score v2: Refreshed Leaderboard, now with Reasoning 🧠

DeepFabric: Generate, Train and Evaluate with Datasets curated for Model Behavior Training.

Gemini-3 Benchmarkathon

Building and evaluating Multimodal Rerankers

Engineering Notes: Training a LoRA for Z-Image Turbo with the Ostris AI Toolkit

Mastering Tensor Dimensions in Transformers

KV Caching Explained: Optimizing Transformer Inference Efficiency

Curating datasets directly on the Hub

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

From GRPO to DAPO and GSPO: What, Why, and How

GSMA Open-Telco LLM Benchmarks 2.0: The first dedicated LLM Evaluation for Telecoms

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

View all articles