
“These new improvements are not some fancy memory allocation system,” Kapasi said. Kapasi explained that the new sequence parallelism approach is more optimized than prior approaches, requiring less compute and memory resources. As such, they are distributed across multiple GPUs using various parallel processing techniques. With very large LLMs, all the parameters cannot fit on a single GPU. The other new feature that helps to accelerate Megatron is called sequence parallelism.
#GOBOT LANGUAGE HOW TO#
What Nvidia has now figured out is how to better optimize which items can be recomputed as needed, rather than continuously consuming memory, providing better overall efficiency. For various reasons, there are some pieces of state that disproportionately take up a larger amount of memory, yet they require a very small percentage of the overall compute resources to regenerate. Kapasi explained that within an AI transformer, there is a need to maintain process states in memory. One of the new features is a technique called selective activation recomputation. “Basically, the main impact of these new features is that you can train larger models more efficiently and the way they do that is by both reducing the amount of memory required during the training process and reducing the amount of computation required,” Kapasi said. That math – and the way compute, memory and process parallelization occurs – is now being improved in Megatron to make the model much faster. The fictional Megatron is powered by a substance known as “Energon,” but when it comes to Nvidia’s Megatron, it’s mostly math. It’s not Energon making Megatron faster, it’s math Common enterprise deployments can include things like chatbots, as well as question and answer services. Kapasi commented that enterprises may want to take a pretrained model and then adapt it for their own use cases.

Large language models aren’t just for large research organizations either, they also are finding a home within enterprises. At the foundational layer, Kapasi explained, NeMo Megatron is built on top of the open-source PyTorch machine learning framework. Meaning it includes GPU-accelerated machine learning libraries, hardware and networking optimizations for cluster deployments.

“Our stack is specifically optimized for Nvidia DGX SuperPODs, but the stack also works well on cloud systems.”Īs a framework, NeMo Megatron is a “top-to-bottom” stack, according to Kapasi. “People are using it to efficiently train large models of up to a trillion parameters these large language models run on clusters of GPUs,” Kapasi said. Megatron was used to help train BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) that was released on July 12, with support for 46 human languages and 13 programming languages. MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.
