litert-community/FastVLM-0.5B
Main Model Card: apple/FastVLM-0.5B
This model card provides FastVLM-0.5B converted for LiteRT that are ready for on device use, subject to license.
FastVLM was introduced in FastVLM: Efficient Vision Encoding for Vision Language Models. (CVPR 2025), this model demonstrates improvement in time-to-first-token (TTFT) with performance and is suitable for edge device deployment.
The model is supported on CPU, GPU and Qualcomm NPUs. For Qualcomm integration, see more details in this blogpost.
Disclaimer: This model converted for LiteRT is licensed under the Apple Machine Learning Research Model License Agreement. The model is converted and quantized from PyTorch model weight into the LiteRT/Tensorflow-Lite format (no retraining or further customization).
How to Use
Android (Google AI Edge Gallery)
You can either install Google AI Edge Gallery through Open Beta in the Play Store or install the APK from Github.
To build the demo app from source, please follow the instructions from the GitHub repository.
Android (LiteRT-LM)
1. Add the dependency
Make sure you have the necessary dependency in your Gradle file.
dependencies {
implementation("com.google.ai.edge.litertlm:litertlm:<LATEST_VERSION>")
}
2. Inference with the LiteRT-LM API
import com.google.ai.edge.litertlm.*
suspend fun main() {
Engine.setNativeMinLogSeverity(LogSeverity.ERROR) // hide log for TUI app
val engineConfig = EngineConfig(
modelPath = "/path/to/your/model.litertlm", // Replace with model path
backend = Backend.CPU, // Or Backend.GPU
visionBackend = Backend.GPU,
)
// See the Content class for other variants.
val multiModalMessage = Message.of(
Content.ImageFile("/path/to/image"),
Content.Text("Describe this image."),
)
Engine(engineConfig).use { engine ->
engine.initialize()
engine.createConversation().use { conversation ->
while (true) {
print("\n>>> ")
conversation.sendMessageAsync(Message.of(readln())).collect { print(it) }
}
}
}
}
Try running this model on NPU by using the corresponding litertlm file and setting your EngineConfig’s backend and visionBackend to NPU. To check if your phone’s NPU is supported see this guide.
Desktop
To build a Desktop application, C++ is the current recommendation. See the following code sample.
// Create engine with proper multimodality backend.
auto engine_settings = EngineSettings::CreateDefault(
model_assets,
/*backend=*/litert::lm::Backend::CPU,
/*vision_backend*/litert::lm::Backend::GPU,
);
// Send message to the LLM with image data.
absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
JsonMessage{
{"role", "user"},
{"content", { // Now content must be an array.
{{"type", "text"}, {"text", "Describe the following image: "}},
{{"type", "image"}, {"path", "/file/path/to/image.jpg"}}
}},
});
CHECK_OK(model_message);
// Print the model message.
std::cout << *model_message << std::endl;
Performance
Android
Benchmarked on Xiaomi 17 Pro Max.
| Backend | Quantization scheme | Context length | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | Memory (RSS in MB) | Model size (MB) | Model File |
|---|---|---|---|---|---|---|---|---|
GPU |
dynamic_int8 |
1280 |
2,220 tk/s |
64 tk/s |
0.55 s |
1766 MB |
1103 MB |
|
NPU |
dynamic_int8 |
1280 |
11,272 tk/s |
106 tk/s |
0.12 s |
925 MB |
899 MB |
Notes:
- Model Size: measured by the size of the file on disk.
- TTFT includes encoding time for 1 image and corresponding text prompt.
- Benchmark is run with cache enabled and initialized. During the first run, the latency and memory usage may differ.
- Downloads last month
- 111
Model tree for litert-community/FastVLM-0.5B
Base model
apple/FastVLM-0.5B