The Technoeconomic Pillars of Foundry
Silicon Statesmanship #2: Tracking the intersection of leading-edge semis and industrial policy
Jerry Sanders, CEO of AMD, once proudly proclaimed “Real men have fabs.” While he meant that chip design companies should own their own manufacturing, perhaps we should restate this as real superpowers have hard tech. Now, in April 2024, the major pieces are in place: Intel and TSMC both have “memorandums of understanding” defining the government funding to spur leading edge production. Samsung appears soon to follow. The table is set for the US’ first foray into industrial policy in generations.
My goal with this series is to educate readers about how foundries work, such that they can assess the effectiveness of semiconductor manufacturing industrial policy. Specifically, I am talking about the leading edge of innovation, where all the goodness of smartphones, AI, and the cutting edge of computing goes. The sanctions on China were designed to ensure Western countries preserve a several year advantage in this particular domain over what China can produce on their own.
Pat Gelsinger’s relentless execution has made Intel a formidable presence in this market, Samsung is reeling but still investing, and TSMC continues to dominate the foundry space with their extremely public, extremely important differentiator: they are a pure-play foundry.
What You Will Learn
In Part 1, I set up the history that makes this topic so urgent. Part 2 offers a deep dive on what foundries do and why. Like the fundamental forces of nature, these are the fundamental topics that shape every conversation in semis. These topics are quick to grasp at a high-level, but most of the nuance is left to professional conversations. In this post, we will open up the hood on a few of these topics and walk through the mechanisms, math, and connective tissue of the tech and business, including:
The value proposition
Revenue generation
Scale
Yield
How customers choose foundries
Again, I worked at Intel on the BD team for their custom foundry as late as 2017, which informs my perspective but will not get into its history or the specifics of my work there.
The Basics of a Foundry
The defining feature of a semiconductor foundry is that it only manufactures semiconductors; it does not design them. It is the Easy Bake Oven; not the goop you put inside of it. This clarification runs in contrast to the business dynamics inherent in Intel and Samsung, who design, market, and manufacture their own chips, a model known as integrated device manufacturing (IDM).
An analogy from Steve Blank:
TSMC was the first pure-play fab. Morris Chang (founder) described the inspiration for the foundry as follows:
When I was at TI and General Instrument, I saw a lot of IC [Integrated Circuit] designers wanting to leave and set up their own business, but the only thing, or the biggest thing that stopped them from leaving those companies was that they couldn’t raise enough money to form their own company. Because at that time, it was thought that every company needed manufacturing, needed wafer manufacturing, and that was the most capital-intensive part of a semiconductor company, of an IC company. And I saw all those people wanting to leave, but being stopped by the lack of ability to raise a lot of money to build a wafer fab. So I thought that maybe TSMC, a pure-play foundry, could remedy that. And as a result of us being able to remedy that then those designers would successfully form their own companies, and they will become our customers, and they will constitute a stable and growing market for us. -Oral History Interview: Morris Chang | SEMI
Ironically, in the late 1980s, the belief ran that you had to vertically integrate your manufacturing to succeed. Morris Chang’s core insight argued that you could create much more value by separating the two, unleashing innovative design engineers to work on the design side while the manufacturing engineers focused on general-use manufacturing.
Manufacturing is simply a culturally different domain than is that of design. The hard part of design runs from the insight and creative vision that is often non-linear.
The culture of design is taste, novelty, and usefulness. The culture of manufacturing is optimization.
The culture and art form of manufacturing revolves around paranoia. All the little things matter. A dust particle is your enemy. Creative divergence is The Dark Force. Manufacturers follow rules and structures beautifully - robotically beautifully. The culture of design is taste, novelty, and usefulness. The culture of manufacturing is optimization. Of course, the two are deeply interlinked and more cooperative than antagonistic—design shows manufacturing what is desirable, manufacturing shows design what is possible and economical at different scales.
The other big issue: HUGE capital for manufacturing plants versus relatively tiny capital for design companies: Fabulously low capex needed for fabless design semi companies.
The venture capital industry is arguably the most opportunistic group of investors in the world, especially in Silicon Valley. They want very small amounts of capital invested which can subsequently return 500, 1,000, 5,000 times that very small amount of capital. If they must invest LARGE sums - ie, not $5 million or $10 million to properly capitalize a company ($1 billion+), then it becomes geometrically harder to return huge multiples of that capital down the line. So smaller is again, better.
In the case of a foundry, the capex numbers are massive. Billions. Not millions. The cost is now so high that only TSMC, Intel, and Samsung are investing commercially in leading-edge, and even the IDMs (Intel and Samsung) have shifted their business model to try to capture more foundry business to support their need for scale to keep playing. Chips happen to have the unique properties that make this possible—if you have the right value proposition.
I. The Value Proposition
How does a foundry’s customer make money?
To understand how manufacturers differentiate their tech, we are going to take a detour into what shapes their customers’ behavior i.e. how does a chip designer make profits. So we’re talking about Apple, Qualcomm, and NVIDIA here i.e. companies that only design and sell chips, but don’t manufacture them. Because they don’t have fabs, they are often just called fabless designers. Chips are mostly commodities. Chips physically fit into a socket, so sales teams hawking chips refer to selling chips as “socket hunts”. Their potential customers have a need for a chip and they’re evaluating bids from multiple “commodity” chip sellers.
Winning a socket hunt is a process analogous to every other b2b transaction, wherein the seller has to be better, faster, or cheaper to win.
Winning a socket hunt is a process analogous to every other b2b transaction, wherein the seller has to be better, faster, or cheaper to win. That framing—better, faster, and cheaper—is strategically ambiguous. There are critical questions about what better means, faster at doing what, etc. which entirely depend on what you’re trying to make. Often there are other constraints, like power or size, that drive if it works best for the kind of device it is serving. While all chips are made with transistors, the chips needed to train an artificial intelligence model are different from the ones needed to control the clock in your microwave. They are different hardware applications.
What are hardware applications?
The job of chips is to make software applications … happen. It’s great that a given chip can process lots of data quickly and accurately - but just processing data is like chewing food without swallowing. So at the verrrry end of the day, the goal here is to render applications so that they satisfy the user. That is… they use applications like Word or Excel or games or zoom or photo editing stuff… in a seamless manner.
But in this sense, we are talking about hardware applications. The job of a hardware application is to optimally use transistors to form circuits which maximize the performance of those software applications. Here are a few examples:
Central processing units (CPUs) are great for running lots of different kinds of programs in serial execution i.e. where one computation has to finish before another. Intel is probably the biggest brand still focused on CPUs.
Graphics processing units (GPUs) are great for running simple programs across huge datasets. Critically, running one computation doesn’t require solving the others first, so it can quickly parallelize the data.
Application Specific Integrated Circuit (ASIC) is great if you only ever want to run one algorithm so you want every transistor on the board dedicated to doing one thing. As a consumer, it’s unlikely you have encountered these, but they are more common in networking and telecom.
Often the optimal solution in an application category can look quite similar e.g. lots of processors for mobile phones have the same basic components with the performance and price differing in degrees. Of course, many companies have been born by making something new and different. The differences between applications is much bigger than the differences inside an application category.
AI is hot. How does AI fit into this?
Sidebar on AI. Artificial intelligence (AI) is rightfully getting a lot of buzz. AI is a category of software applications. An AI has to be trained. After it has been trained, it is an AI model. AI models have recently demonstrated capabilities in some fields above what humans have been able to achieve with their own algorithms. Nation states are racing to understand how this capability can be exploited and how it undermines their security. And they are at least notionally “dangerous” (if we’re lucky, they’ll come for the lawyers first). Hence all the paranoia around export laws with scale volume “AI-driven” chip designs and our good friends in China.
An artificial intelligence model requires massive levels of computation to “learn” from massive datasets. This training process is limited by hardware. This learning process can take literally years, unless you can get better hardware, which can then subsequently reduce the process time to months, weeks, etc. depending on a number of variables. This friction lives at the heart of the export restrictions on selling these technologies to China. The US wants to maintain a several year advantage in these categories, so any hardware that might enable China to build better AI is now restricted to sell there without a license from the Feds that they aren’t going to give out. (China is quite good at extra-legally accessing technology anyway - we just don’t want to let that happen without a modicum of friction.)
When training an AI model, the trainers (designers) want chips that prioritize absolute performance and efficiency over low cost. The aforementioned microwave manufacturer that just wants a chip for the clock only cares about performance as much as if it can keep the time and then wants it to be as cheap as possible. All transistors, but very different application spaces. AI is expected to be a key beneficiary of leading-edge silicon going forward, and we’ll soon cover why they can get so expensive.
How do foundries develop transistors to match fabless design customer’s needs?
Ultimately, most chips get boiled down to a benchmark level of performance and a cost per computation. Their value proposition comprises three elements:
The benchmark level of performance for that application (said another way, in car terms, ‘what is your 0-60 mph time?)
The cost to achieve that performance is competitive i.e. the juice is worth the squeeze
The “performance envelope” i.e. how much power does it use, how big is it, how much heat does it generate, etc.
Performance is always relative—you are always being compared to something else— here, ‘else’ is ‘last year's chip’ or the super cheap chips from 5 years ago - they all work; they just work less quickly and/or elegantly as you compare prices and performance from long ago chip architectures. The other big ‘else’ are your competitors also trying to win that socket.
The value equation must work to win a socket. As in other markets, some chip design companies will go for absolute performance (the fastest processing chip, albeit for a higher per-something price) while others will try to be value leaders. Ferrari vs. Lexus. In practice, most companies pursue a portfolio strategy: a flagship high-end with more, lower-performing products that give customers the flexibility to meet different needs. For example, Apple will make a new chip, but keep selling the old chip in the old design as their value play. Outside semiconductors, this similar strategy is employed by Ford in the best-selling F-Series line, leading with the $80,000 Ford F-150 Raptor R (the one my wife wants to buy despite never spending more than $27,000 on car, having no interest in trucks, and working from home) and the $34,000 Ford F-150 that Ford sells in high-volume to small businesses. Marketing!
Setting aside my wife’s love of military-grade aluminum and back to transistors, many applications benefit from simply having more transistors. As we covered in the first post, Moore’s Law is the process of being able to draw transistors smaller and smaller, such that you can fit more transistors in the same amount of physical space. If your application benefits from more transistors and you can sell more transistors at the same (or cheaper) price than your competitors, your value prop will look amazing because you will meet (or exceed) the performance profile your customers want and your performance per dollar will be great.
This dynamic plays out all the time in gaming-oriented devices, notably the central processing unit (CPUs) and graphics processing units (GPUs). In GPUs, NVIDIA is always pushing the high-end to perform even better, while charging a premium for it. AMD, their chief competitor in gaming GPUs, cannot compete with the highest-performing GPUs, but they can sell ones that perform at 85% of the leading performance while being 60% of the cost. That’s a strong value prop for many buyers. This competitive dynamic repeats in many application categories, where chip designers seek the right mix of performance and cost. Competition is what drives Moore’s Law and is the mechanism by which higher-transistor count equals more, better electronics throughout our life.
As a manufacturer, the foundry faces similar choices:
The foundry can make new nodes, keep setting new performance benchmarks, drawing transistors smaller and smaller. This translates to big profits in the long run, if not necessarily the short run…, as the companies with the highest value applications will seek out any hardware advantage they can get and are willing to pay for that level of performance.
A foundry can also cost-optimize existing manufacturing lines (iterating on a current node or adding another flavor to a node to target different applications) to get the performance per cost just right to maintain its attractiveness.
All of this is ultimately abstracted in the industry to PPA: power, performance, area. Power indicates if it fits in your performance envelope. Performance is usually how fast the transistors can toggle on-and-off in some application-specific mode. Area is how much space it takes up. Area for a long time has been a proxy for cost. Remember, smaller transistors are generally better, so you want density to go up while area goes down—that gives you physically smaller chips, so you can fit more on a wafer, and you get more product for the same cost per wafer. Sometimes you’ll hear this as PPAC now, breaking out area and cost, because smaller transistors are not necessarily cheaper going forward.
II. Revenue Generation
Most foundry revenue comes from selling wafers. Customers want those wafers cut into chips, but for reasons we’ll get into later, the smallest unit of sale is usually a wafer - think: you don’t buy one egg at Safeway, let alone Costco.
The two critical numbers that define your profit contribution metrics are the number of wafers (scale) produced and revenue per wafer. To get those numbers, a foundry will get a sense for the number of customers who might be interested in the new process technology, smoke some ayahuasca to learn what volumes and prices might be for the next decade, and place a bet if it will be a greater cash on cash investment return than other places they could invest that $40B (or if Tim Apple tells you to do it, or if the Taiwanese government tells you to do it so they can tap into some Dark Brandon Energy, etc.). Moore’s Law means this bet has historically paid out, as long as the tech actually works.
III. Scale
Let’s set up an example foundry on paper to show why spending $40B on an Arizona fab is still attractive to TSMC. They are doing this enormous build today, often building several at a time. Of course, spending $40B reveals a belief that the investor is going to get a return greater than $40B on that factory.
How do foundries measure scale?
Embedded in that $40B is an assumption that this fab is going to be a high-volume fab, likely targeting 50-100,000 wafer starts per month. Think about a wafer start as making a pancake at Denny’s when there is a union strike. The pancake goes on the grill Tuesday at noon and you get served Thursday night. So wafers take ages to get set to be made at scale - they then get cut up into chips but the whole process takes so long that the industry standard metric is all about wafer starts - if you know the number of horses at the starting line, you’ll impute some lesser number after yield fails at the end.
It’s not practical to scale up or down a fab after you’ve built it, so you’re making this decision early. Everything in semiconductors is about confidence and reducing risk of not getting a return on the massive investment. As for why 100,000, I don’t have a firm answer but I assume that number has become something of an optimization point based around the physical constraints of the number of tools required as well as the economics of the newer plants. Smaller plants are available at 45,000 or so, but those tend to be for older nodes where costs and revenues are smaller. And I return to the fact that this is entirely a scale business - like other things in dating and kaiju, larger is better and the higher volumes then amortize the initial high fixed costs of building the plant and the capital costs much more efficiently.
How do new nodes affect the scale of a company’s operation?
TSMC’s average revenue per wafer in 2021 was about $4,000, spread across 291 distinct process technologies. A process technology is the size and scale you draw transistors at, so they can draw transistors for money 291 different ways. Pricing is always closely held, but often analysts and industry insiders can back it out. At launch in late 2022, TSMC’s 4nm node was reportedly going for $17,000 per wafer. At the time, about half of TSMC’s revenue came from technology released in the prior 5 years. The $17,000 flavor wafer delivered the smallest transistors in the world; the chips coming out of this system deliver massive value to the end customer which the wholesaler then (or retailer in Apple’s case) can upcharge consumers for. The disparity between the $17,000 for a new process technology and the average revenue per wafer of $4,000 illustrates how quickly older technologies become commoditized, how expensive new process technologies are, and how having a differentiated technology is how to recognize outsized profits. The horrible truth is that scale and revenue per wafer are highly correlated—those who execute well in delivering process technologies that are better, sooner than others get the demand from customers. In effect, they leverage their scale advantage in a kind of “winner take all” business map. Note that TSMC’s market capitalization even after the market fallings in 2022 and China sniffing around their borders is $641 billion.
How do the economics of new nodes work financially?
I promised numbers, so let’s look at the first 3 years or so of a new technology. Today (T-3), you place your bet for $40B. You won’t see profit, let alone real revenue, for 3 years (T1). You’re designing your factory for 100,000 unit sized production slates, but if it’s brand new, it definitely will not be running at peak capacity right away. Instead, you’re going to scale up to 100,000 wafers per month as your team gets more comfortable. The plant managers decide to start at 50% utilization, i.e. they are only going to produce 50,000 wafers per month to start and slowly scale up to >90%. Both numbers are based on how TSMC operates their tech over time, often not starting production until they can achieve 50% utilization and topping out around 90% after it has reached maturity in a few years. Each process technology is different, requiring new tools, and the utilization is often a function of how quickly you can master those tools.
Simple financial model: I have emphasized the marginal revenue and cost per wafers. The plant’s wafer costs are pretty much flat, meaning that wafer 1 costs the same as wafer 10,000 in the volumes the plan calls for. And for now, just completely ignore depreciation, taxes, and the cost of providing childcare to operators and technicians.
Fundamentally, the plant is buying processed silicon sand for ~$500 and selling it (after a ton of very high IQ work) for ~$17,000.
Fundamentally, the plant is buying processed silicon sand for ~$500 and selling it (after a ton of very high IQ work) for ~$17,000. The business on paper is beautiful. It’s almost software, or something analogous to a CD or DVD stamping business. In practice, TSMC makes about a 50% gross margin across all their lines. They define gross profit margins as revenues minus the costs of revenue which includes costs of equipment, labor, and materials directly used in manufacturing, excluding operating costs like R&D, G&A, and marketing. I recall they seek about 80% contribution margin on new process technologies in their first year of production, so cost per wafer is likely closer to $3,400 per wafer with a floor at process maturity less than $2,000 per wafer (averages 50% margin on wafer revenue of $4,000).
TSMC will have had a couple years to reap some big profits. At the high-end, Apple, NVIDIA, and others want to pay for performance as soon as they can get it. TSMC will sell as much as they can before competing foundries at Samsung and Intel bring their tech online in 3-4 years, when prices for similarly dense, powerful and light $17,000 style wafers … will decline (note how prices decline in T3 and T4). As for how low TSMC’s prices will decline…I have no idea. If their average revenue per wafer is $4,000 but the newer stuff is inherently more expensive with less competition, maybe the prices stay higher longer? I left it at $10,000 after T5 and I will leave it to the reader to make their own scenarios for higher or lower.
If TSMC is good and lucky, they will be close to recovering their cash investment in year 4. Here’s a guess at how that simple foundry might work out to T10. While I modeled the loaded marginal cost per wafer as flat, in practice, their buildings are on a 10–20-year depreciation schedule while the machinery and equipment is on a 5-year schedule. There would be some savings to loaded marginal cost per wafer after T5 as a result.
At $17,000 per wafer and a cost of $3,400 per wafer, that’s 2.9 million wafers required to recover a $40B investment, ignoring all the exogenous stuff. Just to frame these volumes: TSMC manufactured about 14.2 million wafers total in 2022, representing an estimated 24% market share of 59 million units in the whole market.
Beyond just recovering their capital, TSMC also has to plan for the future. Moore’s Law marches on, and TSMC’s customers expect them to lead the way. By T1, TSMC will have already had to invest in building a fab for their next-smallest process technology. By T3, they’ll just be recovering their investment and will start on the generation beyond that. The margin for error is slim.
Where does wafer demand for a new technology node come from? Why isn’t everyone doing this?
When you make a new process technology, some of your customers will choose to stop making the stuff they were making before. The new entrant elbows out the margin for the makers of that product who are less-than-awesome at doing so. For TSMC to make a new production technology, some of the expected 2.9 million wafers will come from the 14.2 million wafers you already make. Some of the volume will be all-new, though, which will only add to your total wafer production across all lines. If it were all new wafers, the new production technology would represent a 2% increase to TSMC’s total capacity. Now, this 2% might not sound like much, but historically, making this transition has been enough to scare away all about 3 players, TSMC, Intel, and Samsung. Assuming they could even get the capital to attempt it, the others are terrified because there’s so much that can go wrong, including:
Timelines could slip. Delays in construction, obtaining requisite tools, and developing expertise with those tools to reach performance milestones can all increase the risk that you will never recover the investment.
Your competitor could master the new technology faster than expected, reducing the revenue potential of the new investment.
The customer bringing you that early version decides they don’t want to buy as many and your volumes crater (or their customers get tired of your incremental updates and decide they can wait longer for a new one). In fact, having that first customer (your anchor) is essential to attaining the scale you need to ever recognize a profit on your fab. There are only about 4-5 anchor customers in the world, with Apple the clear #1 and NVIDIA a bitter #2.
Note that China has several aspiring anchor customers, like Huawei. At least, these were anchors in a globalized economy, where they have access to the best technology and Western markets (and steal and subvert and…) It’s a matter of debate if they’ll remain anchors, though they are now shoving any demand they can spare to the aspiring Chinese foundries.
From a traditional business perspective, think about the grocery store’s choice of selling Diet Coke for $7 a 6-pack where they make 20 cents a unit versus selling capers for $15 where they make $9. They’ll choose Diet Coke every time because DC’s volumes are MASSIVE relative to caper bottle sales. However… If somehow the market price of that Diet Coke dropped 45 cents, the grocery store is now losing 25 cents a pack for the privilege of selling it. So it’s … risky.
What does it look like if this bet goes wrong?
To give a sense of how this looks when things go sideways, let’s visit the other end of the talent pool: GlobalFoundries (GF). Until 2008, GF was the manufacturing arm of AMD, who was designing, marketing, and manufacturing their own products. I have emphasized trust several times, and TSMC emphasizes it in their own annual report. GF exists because Intel (allegedly) committed some…light…anti-competitive behavior, significantly undermining AMD’s ability to generate revenue and the scale necessary to keep investing in competitive technologies. (Note I did not join until 2011, so this was recent-ish to when I was working on chip design strategy at Intel but it wasn’t anything I had any knowledge or visibility into.) The FTC would ultimately settle with Intel, but it seems the damage was done. In such a competitive environment, it’s not hard to consider it a small price to pay to cripple your most significant threat. AMD needed the manufacturing assets off its books, and its manufacturing arm would have to stand on its own. GF was born. The consequences for Intel are still playing out, as fabless companies know Intel can be ruthless towards competitors.
GF would continue to introduce new process technology for a few years, until badly dropping the ball with Apple. In 2014, GF “bought” IBM’s fabs (IBM wanted these assets off their books so badly they gave them an exclusive contract to continue production of IBM chips plus $1.5B to take them). In 2016, to try to keep up with Intel and others, GF licensed the latest technology from Samsung. By 2018, GF announced they wouldn’t make any smaller process technologies and continue to optimize and refine the larger technologies. As of 2024, their smallest process offering is 12nm.
Bigger is better. Scale wins. But we’ve yet to introduce the most important metric—the only metric those in the silicon mines really talk about: yield.
IV. Yield
What is wafer yield? Why is it so important to semi manufacturing?
Moore’s Law is all about being really good at drawing small transistors. Research labs can in fact draw transistors at much smaller scales than what’s produced in volume. The research labs are not very good at it, so only a few of the ones they draw actually work. These research projects are important technical demonstrations that illustrate potential technology paths for manufacturers, but the research only matters if you can do it at scale. Things that work in a lab under non-commercial and extremely controlled conditions don’t necessarily translate into commercial production facilities that make chips profitably.
When a factory produces chips, it makes them in the form of giant 20” diameter wafers which are then cut into, say, 100 die which are (usually) packaged into 100 chips. Said another way, one wafer produces 100 chips. And many of those chips are broken or don’t work as expected. One dust particle ruined that production run. One burst of electrons from the solar flare changed the magnet system on the flux capacitor and other chips didn’t work. Fifty other things can go wrong here. Lots of accidents happen when flying at these speeds. So when you produce a wafer that would notionally carry 100 chips, the actual useful yield from that wafer is… less. Generally, the smaller the transistors, the higher the error rate.
Yield is defined as the number of functional chips produced in a given manufacturing run, divided by the total number of chips that were attempted to be produced during that run. Previously, I stated that the foundry charges by the wafer, not the chip. The foundry’s customer defines what functional means and, critically, has most of the levers to change the design to improve the number of functional chips and the degree to how well chips function. Performance is rarely binary but continuous. Again, these companies are selling a portfolio so even though it might not perform like a $300 chip, it’s still saleable as a $200 chip. Therefore, the foundry’s customer carries the risk of poor yields.
Ultimately, it’s a partnership to do your best and make the economics work. It is not uncommon for the first attempt to be zero-yield—i.e. no working chips. Learn and repeat. Sometimes, customers want to push the yield risk onto the foundry by only paying for good die. Foundries will accommodate customer requests as much as they can, and there are no good data on terms of agreements, but most major deals I am aware of are for wafers.
How does wafer yield affect semiconductor manufacturing economics?
Let’s add yield to our calculation. For their high-end iPhone chip, the A16 Bionic chip, TSMC can fit about 492 chips per 12” wafer. Let’s start with a processing run where everything runs perfectly, and the fab plant produces a wafer run with 100% yield and TSMC recognizes $17,000 per wafer of revenue from Apple:
The cost works out to $35 per chip. What happens if we plug in a realistic first year yield?
In that first year, the customer is paying the same amount to the foundry per wafer but only getting half as many “good” chips, so the cost per chip is essentially double. Recall that the value proposition covers three things:
The benchmark level of performance for that application (said another way, in car terms, ‘what is your 0-60 mph time?)
The cost to achieve that performance is competitive i.e. the juice is worth the squeeze.
The “performance envelope” i.e. how much power does it use, how big is it, how much heat does it generate, etc.
Poor yields increase the cost per functional chip. The poorer the yield, the worse the value proposition gets for a foundry’s customers as well as for the foundry. This is predicted and hedged as much as possible, but the newer the process technology, the higher the uncertainty around yields.
That first production period is known as risk production, where a foundry is still learning what works and what doesn’t. Reportedly, Apple’s rather basic requirement is >50% of the chips have to work. Foundry customers can only bear so much cost increase before it’s uneconomical to sell, so they limit production while yields are poor. Apple and TSMC work together to relentlessly maximize yields. Errors are tracked down with brute force and complex diagnostic tools and by run number 30, yields are over, say, 40%. Lather, rinse, repeat and after 8 months, yields on this new cutting-edge wafer production facility can kiss 80%.
As yields go up, it becomes more economical for more customers to come on board. Ideally, the fab manufacturer is simultaneously increasing max wafer starts per month, improving yield, and controlling the price per wafer. Even better, if your competitors are yielding poorly, you can keep your rates high and recognize more profits longer.
What is required to improve yields?
Answering this question in detail would require several graduate degrees that I do not have. The short version is:
The best photolithography tools, which today come from ASML in their Extreme Ultraviolet (EUV) product lineup. Today, these products produce wafers the fastest at the highest scale with the highest yields at the smallest transistor size.
China explicitly was barred from using these tools, which essentially limits the commercial viability of the products they can make. Yet, China has continued making smaller transistors. How? They use older technology (Deep Ultraviolet aka DUV) that’s slower, more defect prone, etc. The yield is reportedly pretty good, but it’s inherently more expensive and will have practical limits. Good enough if you just want better transistors for the military though.
Tribal knowledge about how to use the tools (Morris Chang loves talking about the importance of TSMC’s seasoned operators and technicians, with extremely low employee churn rates)
Good process design kits, a software tool given to chip designers that describe the “rules” of manufacturing for this new technology before it exists.
How has yield shaped the foundry market?
In the lead up to the CHIPs Act, US semiconductor manufacturers struggled with yield. Intel had a disastrous attempt at yielding their 7 nm process technology (which at the time was called 10 nm but was since re-branded to 7). GF was so bad at yielding new process technologies that Apple had to take all their iPhone 7 volume to TSMC. Of course, GF gave up on harder process technologies as a result.
Again, this is a business about confidence and reducing risk. When companies have stumbled this poorly, would you bet your company on their success?
V. How Customers Choose Foundries
I have touched on how hard it is to produce chips profitably, at scale, and with high yields…. and why fabs must win volume to survive and profit. Here is what those chip designers care about in a foundry partner:
Technology development: Technology development (TD) derives from scientists and engineers planning their design or ‘draw’ of transistors. Success in technology development is about:
Transistor performance: There are different kinds of transistors for different kinds of hardware applications. TD creates essentially catalogs of different kinds of transistors with different performance characteristics that map to the application categories they’re interested in.
Yield: This is about setting rules that the customers—the chip designers—can easily implement to achieve good yields, fast.
Timeline: Can this process be implemented at scale on the requisite timeline?
The job of the brilliant PhDs working in technology development is to create the blueprint for customers to get an appealing value proposition that the manufacturing guys are confident they can build. If they create an enticing value proposition for customers, but it’s so error-prone that manufacturing can’t build it, what was the point (👀 IBM)? Getting technology development is so critical to performance that even companies that don’t manufacture their own products employ their own teams to teach fabs how to optimize the supplier’s process technology for their chips.Manufacturing capability: How many wafer starts per month are available? When will it be online? Where is production occurring geographically—is it close to the rest of the customer’s value chain? Is it in a politically sensitive region? A foundry’s manufacturing discipline in both controlling the production environment and tracking the process over time can impact your ability to keep yields high.
Ecosystem: There is a massive ecosystem of IP and electronic design assistance (EDA) tools to help fabless chip designers get to market quickly. The best comparison I can draw here is to the tier suppliers in the automotive industry. You don’t need to reinvent the tire to make a new car, you just need to pick which kind of wheel and tire best matches your car’s use-case. The EDA tools help you do that and integrate it into your design and into the foundry’s design rules.
Trustworthy customer service: A foundry is ultimately in the service business. A fabless design company is handing them the keys to the castle, including all their IP and their roadmap, entrusting they will give them functional wafers years down the line that they can monetize. While technology is important, building semiconductors is a game of trust and confidence. Can you and will you do what you say you’re going to do? Pushing Moore’s Law forward is hard enough, (and it may be broken and who really cares? Moore’s Law just meant that chips would get better, faster, and cheaper for a long time and they did).
While technology is important, building semiconductors is a game of trust and confidence. Can you and will you do what you say you’re going to do?
A pure-play foundry is the only one that can definitively answer that they will always put customers first. Alternatively, an IDM has their own products to consider. Any IDM asking a fabless designer to let them manufacture their products is effectively asking them to put the life of their company in the hands of someone with all the resources to compete with them. Worse, they could simply not prioritize their wafers. This fear is often more theoretical, as IDMs are often looking to sell capacity to areas they don’t want to invest design resources into…though of course if that market gets attractive enough someday? At an IDM, customer #1 is always the internal customer. At a pure-play foundry, customer #1 is whomever has the most resources.
Ok, that’s the crash course in leading-edge foundry. When it all goes wrong, you get GlobalFoundries. In part 3, we’ll use this framework to analyze what American taxpayers are actually subsidizing and how to assess the success of the industrial policy going forward.