Forum Discussion
I built a Python runtime that loads precompiled MLIR artifacts from a closed-source compiler
I’ve been building Fluno, a closed-source compiler/runtime experiment for extracting selected hot regions from Python/PyTorch-style continuous inference loops and running them as precompiled native artifacts.
The public repo is not the compiler. It is the audit/runtime surface:
- a Python package ("fluno_runtime") that loads precompiled artifacts
- manifest/schema/hash/expiry validation before dynamic library loading
- a Windows x86_64 live artifact package
- benchmark docs and claim boundaries
- zero-compiler-internals public package structure
The current L-size continuous inference benchmark shows:
- PyTorch optimized repeated: 84.673 ms
- Fluno "hot_vector_repeated": 4.061 ms
- Fluno "hot_run_repeated": 7.245 ms
- max absolute error: 0.0 within the published 11-element "partial_summary_vector" scope
Important limitation: Fluno does not currently beat the handwritten Rust/C++ references on this row. The point of the current public release is not “faster than C++”; it is showing a Python-callable artifact runtime boundary with fail-closed validation and native-class latency.
4 Replies
- Jeffrey148Brass Contributor
If your compiler is closed‑source but the runtime is open, someone could extract the MLIR-to‑native part via debugging the runtime’s load behavior. That’s fine—you’re already accepting that by making the loader public.
- EmanuelomBrass Contributor
By releasing the runtime and benchmarks first, you are establishing credibility on the hardest part of the problem: correctness and safety. The 0.0 max absolute error claim is more important than the 20x speedup. It proves the artifact is not hallucinating or drifting numerically.
- ProkerCopper Contributor
Thank you for sharing this detailed overview of Fluno and its current capabilities. It sounds like a promising approach for integrating precompiled MLIR artifacts into a Python environment with strong validation and minimal runtime overhead.
- ZonceyCopper Contributor
Most Python acceleration projects (Numba, TorchScript, Cython) focus on compile-time transformation. You write Python, they turn it into something faster, and you trust that the generated code is correct.