News

NVIDIA's TensorRT-LLM now supports encoder-decoder models with in-flight batching, offering optimized inference for AI applications. Discover the enhancements for generative AI on NVIDIA GPUs.
In order to overcome the drawback of decoder-only LLMs for text embedding, a team of researchers from Mila, McGill University, ServiceNow Research, and Facebook CIFAR AI Chair has proposed LLM2Vec, a ...
We propose to utilize an instruction-tuned large language model (LLM) for guiding the text generation process in automatic speech recognition (ASR). Modern LLMs are adept at performing various text ...