--- pipeline_tag: text-generation inference: false license: apache-2.0 library_name: transformers tags: - language - aquif - gpt2 - text-generation-inference - math - coding - small language: - en datasets: - tatsu-lab/alpaca - databricks/databricks-dolly-15k - OpenAssistant/oasst1 --- # aquif-neo-2-345m-c1 This is the first checkpoint of the 'aquif-neo-2-345m' model, a next-generation language model developed by aquif AI. This checkpoint is fine-tuned on a diverse dataset including conversational, code, and math data, serving as the initial step in a 5-checkpoint training process designed to create a versatile and capable model. ## Model Details **Base Model**: gpt2-medium\ **Method**: LoRA (Low-Rank Adaptation)\ **Parameter Count**: 355 million params\ ## Training Information This checkpoint was trained as the first stage of a multi-checkpoint process. The training was performed using a network-resilient script that includes fallback mechanisms for data loading and model initialization. Checkpoint Number: 1/5\ Hardware: Trained on a Google Colab T4 GPU.\ Training Duration: Approximately 2.5 hours for this checkpoint.\ Training Framework: PyTorch, Hugging Face Transformers, PEFT, bitsandbytes, TRL.\ Quantization: 8-bit.\ ## LoRA Configuration: r=8\ lora_alpha=16\ target_modules: ["q_attn", "c_attn", "c_proj", "c_fc", "attn.c_attn", "attn.c_proj", "mlp.c_fc", "mlp.c_proj"]\ lora_dropout=0.05\ bias="none"\ task_type="CAUSAL_LM"\ Training Arguments:\ per_device_train_batch_size=2\ gradient_accumulation_steps=16\ num_train_epochs=1 (for this checkpoint)\ learning_rate=1e-5\ max_steps=400\ \ *Optimized for 8-bit training.* ## Training Loss Data The following table shows the training loss recorded during the training of this checkpoint:\ | Step | Training Loss | |------|---------------| | 20 | 3.4444 | | 40 | 3.4754 | | 60 | 3.4954 | | 80 | 3.4213 | | 100 | 3.3338 | | 120 | 3.1749 | | 140 | 3.2208 | | 160 | 3.0503 | | 180 | 2.9293 | | 200 | 2.8377 | | 220 | 2.8094 | | 240 | 2.7225 | | 260 | 2.6260 | | 280 | 2.7452 | | 300 | 2.6614 | | 320 | 2.5056 | | 340 | 2.5391 | | 360 | 2.5115 | | 380 | 2.4892 | | 400 | 2.5117 | *Note: Training loss is a metric that indicates how well the model is learning. A decreasing loss generally suggests improvement.*\ ## Intended Use This checkpoint is an intermediate model in the development of the full 'aquif-neo-2'. It is not intended for production use but serves as a foundation for subsequent fine-tuning checkpoints focusing on specific domains and tasks. ## How to Load the Model You can load this model using the Hugging Face 'transformers' library: ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "aquiffoo/aquif-neo-2-345m-c1" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) ``` ## Future Checkpoints This is the first of 5 planned checkpoints. Future checkpoints will continue to fine-tune the model on additional data to improve its capabilities across various domains. \ **License**: Apache 2.0