ubergarm commited on
Commit
1fe77c2
·
1 Parent(s): fb6c7d9

add usage example with new `-ger`

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -109,7 +109,40 @@ numactl -N ${SOCKET} -m ${SOCKET} \
109
 
110
  ## Quick Start
111
  ```bash
112
- echo TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  ```
114
 
115
  ## References
 
109
 
110
  ## Quick Start
111
  ```bash
112
+ # Clone and checkout
113
+ $ git clone https://github.com/ikawrakow/ik_llama.cpp
114
+ $ cd ik_llama.cpp
115
+
116
+ # Build for hybrid CPU+CUDA
117
+ $ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
118
+ $ cmake --build build --config Release -j $(nproc)
119
+
120
+ # CPU-Only Inference
121
+ # `-ger` is still fresh:
122
+ # https://github.com/ikawrakow/ik_llama.cpp/pull/836
123
+ # Omit numactl and `--numa ...` if you have only a single NUMA node
124
+ # set batches/threads/kv cache as desired
125
+ # NOTE: multiple slots e.g. `--parallel 2` may case error after canceling generation then starting a new one at the moment
126
+ SOCKET=0
127
+ numactl -N "$SOCKET" -m "$SOCKET" \
128
+ ./build/bin/llama-server \
129
+ --model "$model"\
130
+ --alias ubergarm/Ling-1T-GGUF \
131
+ --ctx-size 65536 \
132
+ -fa -fmoe -ger \
133
+ -ctk q8_0 -ctv q8_0 \
134
+ -ub 4096 -b 4096 \
135
+ --parallel 1 \
136
+ --threads 128 \
137
+ --threads-batch 192 \
138
+ --numa numactl \
139
+ --host 127.0.0.1 \
140
+ --port 8080 \
141
+ --no-mmap \
142
+ --no-display-prompt
143
+
144
+ # optional use this once after downloading to confirm good files
145
+ --validate-quants
146
  ```
147
 
148
  ## References