Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Omartificial-Intelligence-Space 's Collections
Arabic Semantic Embeddings
Saudi Dialect Sentence Embedding Models Collection
SHAMIYAT: A Collection of Syrian Dialect Datasets & LLMs
DIRA – Diraya Arabic Reasoning AI
Arabic Matryoshka & GATE Embedding Models
Arabic NLI & Semantic Similarity Datasets
Arabic Re-Ranking Hub
AraEuroBERT
Arabic ModernBERT
ArabianLLM Series
Arabic LLAMA3 & 3.1 FineTuned Models
Huggingface FineWeb2 Arabic Dataset Portions

Huggingface FineWeb2 Arabic Dataset Portions

updated Nov 28, 2025

Collection of a comprehensive dataset of Arabic text sourced from the FineWeb2 project, representing diverse content across Arabic MSA and Dialect.

Upvote
1

  • HuggingFaceFW/fineweb-2

    Viewer • Updated Oct 27, 2025 • 4.48B • 62.9k • 710

    Note This is the Original Repo for FineWeb2 include 1000s languages. Fine the Arabic Portions below


  • Omartificial-Intelligence-Space/FineWeb2-MSA

    Viewer • Updated Dec 15, 2024 • 907M • 1.21k • 1

  • Omartificial-Intelligence-Space/FineWeb2-Egyptian-Arabic

    Viewer • Updated Dec 12, 2024 • 23.9M • 99 • 2

  • Omartificial-Intelligence-Space/FineWeb2-Moroccan-Arabic

    Viewer • Updated Dec 12, 2024 • 69.6M • 415 • 2

  • Omartificial-Intelligence-Space/FineWeb2-North-Levantine-Arabic

    Viewer • Updated Dec 12, 2024 • 223k • 65 • 2

  • Omartificial-Intelligence-Space/FineWeb2-Najdi-Arabic

    Viewer • Updated Dec 12, 2024 • 48.4M • 128 • 3
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs