Language Breakdown
Lines of code distribution across 2 owned repositories
I-Shaped Developer
I-shapedSpecialist โ deep expertise in Python
Collaboration Network
Global Impact visualization
Repos
17
PRs
0
Growth
+18%
Top Collaborators
No collaborator data yet.
Coding Streak
Contribution activity over the past year
Darragh Hanley
@darraghdog
Peng Zhang
@AniZpZ
Yuxi Chi
@cherichy
ZZK
@MARD1NO
Barry Kang
@Barry-Delaney
Top Repositories
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
CUDA Templates for Linear Algebra Subroutines
FlashInfer: Kernel Library for LLM Serving
The Triton TensorRT-LLM Backend
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Code for QuaRot, an end-to-end 4-bit inference of large language models.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Open Source Impact
Contributions to external projects