Hi HN,
Over the past few months, I've been building `dsc`, a tensor library from scratch in C++/CUDA. My main focus has been on getting the basics right, prioritizing a clean API, simplicity, and clear observability for running small LLMs locally.
The key features are: - C++ core with CUDA support written from scratch. - A familiar, PyTorch-like Python API. - Runs real models: it's complete enough to load a model like Qwen from HuggingFace and run inference on both CUDA and CPU with a single line change[1]. - Simple, built-in observability for both Python and C++.
Next on the roadmap is adding BF16 support and then I'll be working on visualization for GPU workloads.
The project is still early and I would be incredibly grateful for any feedback, code reviews, or questions from the HN community!
GitHub Repo: https://github.com/nirw4nna/dsc
[1]: https://github.com/nirw4nna/dsc/blob/main/examples/models/qw...
Comments URL: https://news.ycombinator.com/item?id=44310678
Points: 37
# Comments: 2
Inicia sesión para agregar comentarios
Otros mensajes en este grupo.
Article URL: https://www.nytimes.com/2025/07/26/health/coronary-artery-calcium-heart.html
Comments

Article URL: https://sailor.li/asyncio
Comments URL: https://news.ycombinator.com/item?id=446

The slow and bloated nature of the Mac Apple Music app inspired us to create QuickTunes. It is a simple, fast, and native Apple Music player inspired by the simplicity of the iPod. You can use key
Article URL: https://richardandersson.net/?p=350
Comments URL: https://news.ycombin
