Fastgen – SOTA LLM inference in 3k lines of Python | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Fastgen – SOTA LLM inference in 3k lines of Python (github.com/facebookresearch)
		3 points by mpu 10 months ago \| hide \| past \| favorite \| 1 comment

mpu 10 months ago [–]

We just released a tiny (~3kloc) Python library that implements state-of-the-art inference algorithms on GPU and provides performance similar to vLLM. We believe it's a great learning vehicle for inference techniques and the code is quite easy to hack on!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact