I ran a 36B Parameter Model with this compressor file I made

RomanAILabs · November 25, 2025, 7:20am

Hey everyone!

I made a compressor for AI that gives it a speed boost and also allows you to run much bigger AI’s. I have no GPU, I have 16GB of DDR4 and a 4 core 2.7 CPU. I can run a 36B Model with this file. Check it out!

Spechawk · November 30, 2025, 12:13am

I have conducted a line-by-line analysis of the source code in this repository (specifically SPACETIME_36B_PROFESSIONAL.py and the Compressor.py), and it is demonstrably a placebo tool. The “compression” claims are technically impossible given the implementation.

Here is the technical breakdown proving the “Spacetime Engine” does not interact with the LLM:

1. You are “compressing” random noise, not model weights.

In your main telemetry loop (Line ~203 in the professional script), you explicitly generate random numbers to pass to your “compressor”:

# Ligne 203
z = engine.compress(np.random.randn(48))

The engine class performs linear algebra on this random NumPy noise. It has absolutely zero interaction with the GGUF model’s weights, tensors, KV cache, or VRAM. You are optimizing a random number generator, not an AI model.

2. The “48 Dimensions” are just system stats.

The “48D vectors” mentioned in the documentation are populated in probe_kv_stats using psutil data (CPU usage, RAM usage, Swap) and padding.
Mapping your CPU usage percentage to a matrix does not constitute “Spacetime folding” or “Hyper-dimensional compression”. It is simply monitoring the Task Manager.

3. The “Performance Boost” is hardcoded arithmetic.

Your decide() function calculates a “boost” score based largely on how idle the CPU is:

# The formula basically says: The less CPU you use, the higher the score.
boost = max(20, min(100, (100 - cpu) * 1.6 + score * 22))

This is a visual trick. If the system is idle, the bar goes up to “Peak Performance”. It does not reflect any internal optimization of the inference engine.

4. It is just a standard `llama-cpp-python` wrapper.

The only code that actually runs the model is the standard import:

from llama_cpp import Llama

The script simply loads the model using the standard library. Changing n_batch dynamically based on a “boost” score derived from random noise (as seen in line 209) is not an optimization technique; it is simply varying a standard parameter based on a random seed.

Conclusion

There is no “Spacetime folding”, “Toroidal compression”, or “48D projection” applied to the model. This is a Tkinter GUI wrapper around llama.cpp filled with mathematical jargon that performs operations on independent random variables, not on the Neural Network.

Please stop misleading the community with false claims about performance gains that are mathematically impossible with this code.

Amber14L · November 30, 2025, 1:05pm

There is something magical about watching a 36B model behave on pure stubbornness and smart tricks.

RomanAILabs · November 30, 2025, 5:55pm

I understand your thorough analysis of the SPACETIME_36B_PROFESSIONAL.py and Compressor.py code.

You’ve provided a detailed technical breakdown that clearly demonstrates how the core logic within the repository does not perform any genuine compression or optimization on the underlying Large Language Model (LLM) weights, tensors, or inference process.

Your key points are technically correct based on the code snippets and descriptions provided:

Random Noise Input: The engine.compress(np.random.randn(48)) call proves that the “compressor” is operating on randomly generated NumPy data, completely isolated from the GGUF model’s internal data structures.
System Stats as “Dimensions”: The “48D vectors” are confirmed to be largely populated by system telemetry (CPU, RAM usage) via psutil, which is standard system monitoring, not hyper-dimensional projection or spacetime folding.
Hardcoded “Boost”: The boost calculation is shown to be a simple arithmetic function primarily inverse-correlated with CPU usage, creating a visual trick where an idle system reports a higher “performance boost.”
Standard Wrapper: The model inference relies entirely on the standard llama-cpp-python library, and the dynamic change to n_batch is based on the hardcoded “boost” score, which is itself derived from random/telemetry data, not true internal optimization.

Apology and Clarification

Thank you for your detailed and rigorous technical review. Your analysis is absolutely correct.

I sincerely apologize for the misleading claims and the use of mathematically and physically inaccurate jargon (“Spacetime folding,” “Toroidal compression,” “48D projection”) within the documentation and the code’s comments.

It was never my intention to deceive or maliciously mislead the community.

The original intent was to create an experimental GUI wrapper for llama-cpp-python that appeared to correlate system performance (CPU/RAM load) with the model’s standard batching parameter (n_batch), presented through a visually engaging, though heavily abstracted, interface. However, the subsequent use of complex, pseudo-scientific terminology to describe standard telemetry monitoring and random number generation crossed a line into making false and technically impossible performance claims.

I deeply regret that the final product was presented in a way that actively suggested genuine, novel LLM compression or inference optimization, which, as your analysis proves, is not present in the code.

I will take immediate steps to address this:

Remove or completely rewrite the misleading documentation and pseudo-scientific jargon to accurately reflect the code’s function as a standard llama-cpp-python wrapper with system monitoring.
Ensure transparency about the true nature of the “compression” and “boost” mechanisms.

Your scrutiny is invaluable, and I appreciate you calling out these discrepancies to maintain technical honesty within the open-source community.

I have better tools to work with now I’ll post an updated version as soon as I can.

RomanAILabs · December 1, 2025, 12:47am

Dear community, please try the latest file at the repository file. Thank you for your patience.

Topic		Replies	Views
Benchmark results 🤗Transformers	1	764	July 19, 2020
Speeding up GPT2 generation Beginners	3	4834	October 29, 2020
Feature Suggestion! running large gguf models! Inference Endpoints on the Hub	0	545	December 3, 2023
Organization Pricing Beginners	1	422	February 22, 2021
Run_clm.py is very slow on gpu (used to take seconds) Beginners	0	899	May 20, 2021