LPU reasoning engine receives funding recognition! Groq, challenging NVIDIA, sees a sharp increase in valuation, nearly tripling in a year.
Nvidia's AI chip powerful challenger Groq's funding exceeds expectations, with a valuation as high as $6.9 billion.
Focus on AI chip startup Groq confirmed on Wednesday local time that the startup company was valued at approximately $6.9 billion after a new round of financing, raising $750 million in the new round. The company is one of the biggest competitors of "AI chip king" NVIDIA Corporation (NVDA.US), ranking second only to American chip giants Broadcom Inc. and AMD in the market size of AI chips among competitors.
The latest financing data is significantly higher than the rumors that circulated in July, when many media reports said that the round of financing was about $600 million, with a valuation close to $60 billion.
Similar to NVIDIA Corporation's largest revenue and profit contribution business segment - data center business, Groq focuses on selling the most essential AI computing infrastructure to major data centers and enterprise platforms - AI chip clusters. The company received a valuation of $2.8 billion and raised $640 million in August 2024, and the latest financing has doubled the valuation of the AI chip startup company in just over a year.
PitchBook's forecast data shows that Groq has raised over $3 billion in financing so far this year, a scale comparable to AI super unicorns such as Anthropic.
From a technological perspective, Groq's product is a custom AI ASIC designed for inference scenarios, not a general-purpose GPU. The company classifies its system as GroqCard/GroqNode/GroqRack, explicitly categorized as custom inference ASIC.
Groq is a force to be reckoned with in the global capital market primarily because of its dedication to breaking the dominant control of AI computing infrastructure held by NVIDIA Corporation, which commands up to 90% of the market share in AI chips.
The chip developed by Groq is not the usual AI GPU that powers AI training/inference systems. Instead, Groq refers to it as LPU (Language Processing Units) and calls its hardware an "Inference Engine" - a dedicated high-performance computer optimized for running AI large models with extreme efficiency. From a technological standpoint, Groq follows a similar path as Broadcom Inc. AI ASIC and Alphabet Inc. Class C TPU in the field of AI chip technology.
Groq's product targets developers and enterprises, serving as both cloud computing computing services and hardware clusters for local deployments. Groq's local hardware focuses on local AI server racks equipped with integrated hardware/software stacks. Both cloud and local hardware can run all updated versions of popular AI large models developed by Meta, DeepSeek, Qwen, Mistral, Google, and OpenAI globally. Groq claims that its LPU product can maintain or even enhance the efficiency of running AI large models while significantly reducing costs compared to core alternative solutions.
Groq's founder Jonathan Ross is a "super tech expert" in the field of AI chips. Ross previously worked in the chip development department of the tech giant Alphabet Inc. Class C on the complete development of their Tensor Processing Unit (TPU) chip, specialized high-performance processors designed by Alphabet Inc. Class C for high-load AI computing tasks.
Alphabet Inc. Class C TPU was released in 2016, the same year Groq entered the industry. Alphabet Inc. Class C TPU cluster has become the core power source for AI training and inference on the Alphabet Inc. Class C cloud platform Google Cloud and is only second to NVIDIA Corporation's AI GPU cluster in the scale of Alphabet Inc. Class C data centers.
Recently, Alphabet Inc. Class C revealed the latest details of their Ironwood TPU (TPU v6), showing significant performance improvements. Compared to TPU v5, Ironwood has a peak FLOPS performance boost of 10 times, an efficiency improvement of 5.6 times, and a single-chip computing power increase of over 16 times compared to Alphabet Inc. Class C's TPU v4 released in 2022.
Performance comparisons show that Alphabet Inc. Class C Ironwood's 4.2 TFLOPS/watt efficiency ratio is slightly lower than NVIDIA Corporation's B200/300 GPU's 4.5 TFLOPS/watt. JPMorgan commented that this performance data highlights the rapid narrowing of the performance gap between advanced dedicated AI ASIC chips and leading AI GPU in the market, driving major cloud computing service providers to invest more in more cost-effective customized ASIC projects.
Groq currently provides computing cluster support for over 2 million developers' AI applications, with this number being around 350,000 developers a year ago when the company was interviewed by TechCrunch.
It is understood that the new round of financing for the AI chip startup company was led by investment company Disruptive, with global asset management giant BlackRock, as well as Neuberger Berman, Deutsche Telekom Capital Partners, and other investors participating. Existing investors including Samsung, Cisco Systems, Inc. (Cisco), D1, and Altimeter were also part of this round of financing.
Focused on the AI inference field, LPU
Groq's LPU is a dedicated accelerator designed for inference, especially for large language model (LLM) inference. The core architecture is Groq's independently developed TSP (Tensor Streaming Processor): using a static, predictable data flow path instead of the traditional "thread/core/cache" paradigm of AI GPU, emphasizing low latency, stable latency, and high throughput.
Groq's LPU chip features a large on-chip SRAM capacity (about 220MB), ultra-high on-chip bandwidth (official data examples up to 80TB/s), and a compiler that explicitly schedules operators and data flow in time and space, with minimal dependence on hardware's "reactive" components (such as cache/arbitration/replay mechanisms).
Through the TSP's streaming, static, and predictable schedule and the high bandwidth SRAM for computation, in low/zero-batch LLM inference, LPU provides lower latency, more stable throughput, and potentially higher efficiency/delivery efficiency compared to AI GPU clusters. However, in terms of large model training, dynamic workloads, and ecosystem completeness, GPU clusters that focus on general GPU clusters still have systematic advantages.
It is worth noting that for comprehensive AI workloads that focus on "AI large model training" and "ultra-high batch throughput," NVIDIA Corporation's AI GPU ecosystem (CUDA/high-bandwidth memory/NVLink, etc.) still holds a comprehensive advantage. Groq's advantage mainly lies in interactive/real-time inference and low-latency LLM inference in terms of AI computing workload.
Especially in scenarios where batch sizes are small (even batch=1), LPU does not need to "run batches to capacity," with higher tokens/s per chip, lower scheduling overhead, and can meet the requirements of interactive products for "quick responses." Groq's LPU directly supplies computation with large on-chip SRAM, with official data showing on-chip bandwidth of up to 80TB/s, while GPUs require frequent access to external HBM, leading to significant reduction in "computing storage" round trips, improving the efficiency of running AI large models, and having high efficiency. LPU's deterministic execution provides a smoother power consumption curve, coupled with a streamlined data path, resulting in lower energy consumption per token inference than common GPUs.
Therefore, although AI ASICs cannot completely replace NVIDIA Corporation on a large scale, their market share is bound to expand, rather than the current situation where NVIDIA Corporation's AI GPUs dominate. For standard inference and some training workloads (especially ongoing fine-tuning training), the "unit throughput cost/energy efficiency" of custom AI ASIC solutions is significantly superior to pure GPU solutions; however, in rapid exploration, cutting-edge large model training, and trial-and-error with new multimodal operators, NVIDIA Corporation's AI GPU remains predominant. Therefore, in current AI engineering practices, tech giants are increasingly inclined to adopt a hybrid architecture of "ASIC for routine tasks, GPU for exploration peak/new model development" to minimize TCO.
Related Articles

Bank of France: The pound has become the "weakest currency in Europe", recommending three options strategies to go long on the euro/pound.

New Zealand's GDP unexpectedly plunged by 0.9%, leading to a rapid increase in market expectations for interest rate cuts.

The Bank of Japan keeping interest rates unchanged on Friday has become consensus, with the market closely watching for signals of a rate hike in October.
Bank of France: The pound has become the "weakest currency in Europe", recommending three options strategies to go long on the euro/pound.

New Zealand's GDP unexpectedly plunged by 0.9%, leading to a rapid increase in market expectations for interest rate cuts.

The Bank of Japan keeping interest rates unchanged on Friday has become consensus, with the market closely watching for signals of a rate hike in October.
