The TensorRT Laboratory (trtlab) is a general purpose set of tools to build customer inference applications and services.
Triton is a professional grade production inference server.
This project is broken into 4 primary components:
-
memoryis based on foonathan/memory thememorymodule was designed to write custom allocators for both host and gpu memory. Several custom allocators are included. -
corecontains host/cpu-side tools for common components such as thread pools, resource pool, and userspace threading based on boost fibers. -
cudaextendsmemorywith a new memory_type for CUDA device memory. All custom allocators inmemorycan be used withdevice_memory,device_managed_memoryorhost_pinned_memory. -
nvrpcis an abstraction layer for building asynchronous microservices. The current implementation is based on grpc. -
tensorrtprovides an opinionated runtime built on the TensorRT API.
The easiest way to manage the external NVIDIA dependencies is to leverage the containers hosted on
NGC. For bare metal installs, use the Dockerfile as a template for
which NVIDIA libraries to install.
docker build -t trtlab .
For development purposes, the following set of commands first builds the base image, then maps the source code on the host into a running container.
docker build -t trtlab:dev --target base .
docker run --rm -ti --gpus=all -v $PWD:/work --workdir=/work --net=host trtlab:dev bash
This project is released under the BSD 3-clause license.
- Please let us know by filing a new issue
- You can contribute by opening a pull request
Pull requests with changes of 10 lines or more will require a Contributor License Agreement.