vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
⚠️ Further information - our curation - is coming soon
Link to repo🌟 24.4k
⑂ 3.5k
amdcudagptinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchrocmtputrainiumtransformerxpu