TensorRT : High-performance deep learning inference

안녕하세요, 메이아이의 ML Engineer 정재민입니다.

오늘은 딥러닝 모델을 가속화하는 TensorRT에 대해 소개하려고 합니다. 본문으로 들어가기에 앞서, 아래 링크에 접속한 후에 사용하고자 하는 TensorRT의 버전을 클릭하면 필요한 cuda 버전을 찾으실 수 있음을 알립니다.

Release Notes :: NVIDIA Deep Learning TensorRT Documentation

NVIDIA TensorRT is a C++ library that facilitates high performance inference on NVIDIA GPUs. It is designed to work in connection with deep learning frameworks that are commonly used for training. TensorRT focuses specifically on running an already trained network quickly and efficiently on a GPU fo…

TensorRT이란?

TensorRT는 엔비디아에서 개발한 추론에 최적화되어있는 SDK입니다. 자신이 개발한 모델을 TensorRT로 추론해 보면 엄청난 속도 향상을 가져올 수 있습니다. 그럼 어떻게 TensorRT는 빠르게 추론이 가능할까요? 추론하고자 하는 모델에 대해 TensorRT는 아래와 같이 작업합니다.

Elimination of layers whose outputs are not used.
Elimination of operations which are equivalent to no-op
The fusion of convolution, bias and ReLU operations
Aggregation of operations with sufficiently similar parameters and the same source tensor (for example, the 1x1 convolutions in GoogleNet v5’s inception module)
Merging of concatenation layers by directing layer outputs to the correct eventual destination.

TensorRT를 설치하기 이전에 Pytorch 버전과 관련된 이슈가 많은데요. 저는 Pytorch 1.3 버전 이상일 경우는 TensorRT 7 버전, Pytorch 1.2 버전 이하일 경우는 TensorRT 6 버전을 사용하시기를 추천합니다. 참고로 저는 Pytorch 1.3 / TensorRT 7을 사용합니다. NGC에 TensorRT 나 Pytorch 도커 이미지를 사용하는 것을 권장합니다.