vllm

python

A high-throughput and memory-efficient inference and serving engine for LLMs

⚠️ Further information - our curation - is coming soon

Link to repo

🌟 24.4k

3.5k

amdcudagptinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu