faisalq
/

SaudiBERT

@@ -1,10 +1,5 @@
 ---
 license: cc-by-nc-4.0
-language:
-- ar
----
----
 language:
   - ar
 tags:
@@ -17,6 +12,10 @@ widget:
 ---
 **SaudiBERT** is the first pre-trained large language model focused exclusively on Saudi dialect text. The model was pretrained on two large-scale corpora: the Saudi Tweets Mega Corpus (STMC), which contains +141 million tweets, and the Saudi Forum Corpus, which includes +70 million sentences collected from various Saudi online forums. The datasets comprise **26.3GB of text**. The code files along with the results are available on [repo](https://github.com/FaisalQarah/SaudiBERT).

 ---
 license: cc-by-nc-4.0
 language:
   - ar
 tags:
 ---
+---
+---
 **SaudiBERT** is the first pre-trained large language model focused exclusively on Saudi dialect text. The model was pretrained on two large-scale corpora: the Saudi Tweets Mega Corpus (STMC), which contains +141 million tweets, and the Saudi Forum Corpus, which includes +70 million sentences collected from various Saudi online forums. The datasets comprise **26.3GB of text**. The code files along with the results are available on [repo](https://github.com/FaisalQarah/SaudiBERT).