I did a double-take last week when Micron Technology, one of the world’s largest memory chip makers, acquired artificial intelligence hardware and software startup Fwdnxt.
The move could be very interesting. If it bears fruit, Fwdnxt could bring Micron into direct competition with partners such as Intel and Nvidia, as Micron believes that memory and AI computing are converging into the same architecture.
But it’s no accident that one of the people at Micron in charge of this project is Steve Pawlowski, a former Intel chip architect who holds dozens of patents. Pawlowski is now vice president of advanced computing solutions at Micron.
When combined with Micron‘s memory chips, Fwdnxt (pronounced “forward next”) will enable Micron to explore deep learning AI solutions required for data analytics, particularly with internet of things and edge computing. Maybe it will make AI-based memory chips, or maybe memory chips that include AI.
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
Boise-based Micron is doing this work, said Sanjay Mehrotra, CEO of Micron, because “the compute architectures of yesterday are not suitable for tomorrow … In the long run, we think compute is best done in memory.” I interviewed Pawlowski at the Micron Insights event last week in San Francisco.
Here’s an edited transcript of our interview.

Above: Steve Pawlowski of Micron and Abhishek Chaurasia of Fwdnxt.
Steve Pawlowski: When I left Intel in 2014, I came to Micron and they said, “What do you want to do?” I said, “I’m convinced that the convergence of compute and memory is necessary for performance efficiency and lower latency. You’re a memory company. You have the technology. DRAM is going to be around a while. I’d like to work on that.” They said, “OK.”
I have a small team that’s focusing on finding problems where compute and memory — we can start testing the concept, start getting concepts into products, but not increase the cost. One of the things I knew at Intel, I’ll never forget the story — we used to have math coprocessors. 80287, 80387. We made an obscene amount of money on the 387. We had this bright idea that we could do it faster and better if we integrated the coprocessor inside the 486. We did, and all of a sudden we didn’t have enough of a footprint. The people who didn’t need it said, “You’re not charging me for that die area,” and the people that did need it said, “You’re going to pay me the same as everyone else because I’m a favored customer.” Effectively that whole business went to zero.
The key learning there is that you can’t add more complexity and cost and expect people to pay for it right out of the chute. Not until there’s a significant majority that gets real value out of it. What we’re focusing on is finding the key things where people can get value out of it today, and then just see if you can expand that bubble over time. I look at it as an eight to 10 year journey. At the end of those years I may look back and realize I wasted them. Or I could look back and say, “Wow, we may not have gotten here, but we did OK.”
VentureBeat: That sparks a lot of imagination as to what could result from this, but are there some specific things that you would drop hints about?
Pawlowski: The one thing, and you’ve heard a lot about it here, is AI at the edge. The reason that we focus there is there isn’t an incumbent programming model or an incumbent architecture where you’re fighting a market battle. Everybody’s fighting to get into the same stall, so to speak. There’s an opportunity to go do something there. People don’t look at you and say, “Micron is a memory company. Why are you talking about this?” They look at it like — we have this capability in an FPGA with our high-performance memory and an architecture that maps on an FPGA. We take care of all the abstractions so you don’t have to become a VHDL programmer. Would you be willing to start working on problems with your data sets?
The interesting thing is, I haven’t really had to go push that. We’ve been showing up at FPGA conferences and things like that. Mainly government agencies have come and said, “We have a problem here. We’d like to kick the tires on this a bit more.” The problem with the government is they get excited early, but if you ever want to do something it takes so long. Procurement cycles are long. Contracts are long, and everything else.
We decided to look at the general market. There was an automotive company that came and said, “We’re not level five, but we can certainly get level three, level four autonomous vehicles where we want to be able to use the network to tell us what’s going on. This looks intriguing. Are you willing to work with us?” A lot of people inside said, “Why are they interested in working with you?” It’s because I don’t come in and tell them what they need to do. I say, “Here’s what we’ve got. What can we do for you?” They say, “OK, you’re willing to listen to us. Here’s our problem.”
I learned that lesson, believe it or not, in 2005, when AMD was coming out with Opteron. We were still pushing seven-gig processors, 33-stage pipelines, and nobody was going there. We went to Wall Street, and it’s one of those moments where you want to crawl in a shell, because they really lit in. But I said, “Can you give us another chance? Can we sit down and understand our workloads, work with you, and I’ll take that back and we can build better products?” And we did.
We turned a lot of them — UBS, I remember an op-ed they wrote where they said, “You may not build the biggest chip or the best chip, but you came and understood my problem.” It was really understanding the customer and their problem and what you can do. If you do it and it doesn’t help them, hey, you learned something.

Above: Micron
VentureBeat: As far as narrowing it down, is it coming up with a new kind of memory, or is it figuring out where the processing is done?
Pawlowski: The answer is yes. But it’s really understanding the dynamic. By the way, it depends on the model. I was just talking to someone down there about how some language models need 100 gigabytes for the parameters. When you see someone who says, “Hey, I’ve got two gigabytes, four gigabytes,” that’ll fit most models, but not all of them. The models are really evolving.
It depends on the latency of your solution, too. I don’t know if you saw the OHSU video down there where the lady had breast cancer. They need lots of data, because they want to put all the electron microscopy images together and build a 3D convolutional model, a 3D representation of the tumor. They don’t have enough time to go across, because they want actionable insight in a day or even an hour. The work we’re doing with CERN, we need the data now. We have to make decisions in microseconds. Is this something interesting or do we drop it on the floor?
Different solutions require different types of memory. What we’re learning is — the one thing I always liked about Intel, I knew what the instructions were from the program. I knew how they got executed in the machine and went out to the system. When I came to Micron, the only thing I saw were addresses and commands. Read/write command and an address. I had no understanding of — is this thing copying 15 different things to different elements here, or overwriting, or what? Having the company we’ve been working with and acquired in June — that architecture allows us to build these algorithms, run them, and see how the entire impact is.

Above: Micron 7300 SSDs
Our first goal is, what can we do in the memory storage to actually improve the time to a solution? We can always build higher bandwidth, but that may not necessarily be what’s going to get you there. Are there things you can do like scatter tensor arrays? If we could build a buffer that brought in a matrix and allowed us to shift the matrix over in one fell swoop, rather than just having the thing go and search for it, there’s potentially a big benefit there.
Eventually we’re also looking at — most of these are multiply and accumulate architectures, and very simple ones. They’re just replicated thousands of times. You can actually build a pretty good multiply and accumulate in a memory device once the transistors get just a little better. Eventually, can you take that architecture and then put it in a memory device itself? That’s the long-term vision.
What I want to do is, whatever we do, we build a programming infrastructure and a paradigm so that people don’t have to rewrite their code every time they go through a migration. In my mind, that was Intel’s great success. When we did 386, there was no 32-bit software. But it sure ran the 16-bit code really well. People bought it for that. You got a number of platforms out there, and then people said, “OK, now we’ll go and optimize for 32 bits.” When 486 came out six to eight years later, there was software to take advantage of it and it became a machine that never looked back.
Start with memory first, storage first, what we can do there. Then we’ll see what can actually migrate over time. The answer may be nothing. The answer could be everything. I think it’s somewhere in the middle. It just depends on where you move the needle.
VentureBeat: What makes this hard enough that it might be a 10-year project?
Pawlowski: What’s hard is it will be a 10-year project. Coming up with a programming paradigm and the community starts using it, so you can start bringing the programming community along — I don’t know if you know Steve Wallach, [from Tracy Kidder’s book] Soul of a New Machine, years ago. He just retired, but he was working for me for a while. Every one-on-one we had, he said, “If I ever teach you anything, it’s this. The thing that’s easiest to program is the solution that wins. Every time.” The bottom line is, you have to bring over the programming community. You can’t just go do fancy hardware and leave them behind, because they won’t touch it. That’s the hard problem.
We’re not a software company yet. Intel wasn’t a software company when I started. They’re like Nvidia. They have a number of software engineers that in some cases exceed their hardware engineers. You just don’t see them.

Above: Micron is moving into AI.
VentureBeat: You acquired one piece here with the Fwdnxt folks. Was it a pretty comprehensive piece, or do you need more? Do you still need to find a lot of partnerships?
Pawlowski: We’re going to need a lot of partnerships and data scientists. They have an inference engine architecture they’ve developed over five years, 10 years, 12 years. Different companies and different academic settings. The guy who founded it was a professor at Purdue. They’ve been optimizing that architecture. They have a fairly good compiler that takes an Open Network Exchange frontend and then maps it down to their hardware.
What I need are data scientists. I need applications. I also think we’re going to need a dynamic runtime/scheduler. If you really have this model of — if I wrote a network on hardware today, on an Intel processor, three years from now you could still run that same program. Everything is abstracted through the instruction set. What I want to do here is abstract the network, which means we’re going to need some type of dynamic runtime. That’s going to say, “OK, this thing has 8,000 multiply and accumulate units. This has 1,000. I can spread that thing out a little farther. Oh, these 150 units died. I don’t want to schedule anything on those, but I still want to be able to use the part.”
There’s a couple of entities out there that have been looking at solving the dynamic runtime problem that I think is going to be pretty important. Especially — I’ve heard estimates. The guy who used to run Litho at Intel, I ran into him a year ago in the airport. He said that they believe that when they get to sub-5nm, they’re looking at 30% of the devices are going to be out of spec at manufacture.
VentureBeat: You mean defective, or — ?
Pawlowski: Just out of spec. We guardrail the crap out of things, assuming that it’s going to have a seven-year lifetime and things are going to degrade. Well, in this particular cause, you can’t even guard that. It’s not working even to the spec within the guardrail.
I found a paper that was done by some people in Brazil that showed that if you have — assume you can do 512 cores. You get 20% degradation. The overall degradation from peak performance is about 4%. A 32-core chip is dead. A 64-core chip, only one of the cores is active. They’re just assuming a random distribution with these values. Having that dynamic runtime for these large-scale applications, if we go to finer geometries than seven, is going to be something that’s equally important.
Memory systems have done redundancy for years. They test. If a block is bad, they’ll swap in a redundant block. If there are more bad blocks than redundant blocks, OK, this becomes a keychain.
VentureBeat: Does this signal a lot of competition with the likes of Intel and Nvidia?
Pawlowski: It’s going to be more cooperative. It’s hard to compete with Intel and Nvidia in the datacenter. Nvidia has the training locked up. Even when people come in with new solutions — at least one startup told me that the hyperscalers told them, “It’s so hard to move our training algorithms from the GPU. It’s doing so well. They’re still giving us performance gains. Don’t spend your time on this.” And the last I heard, the last statistic I heard, was a significant portion of the inference was still run on Xeon.
We’ve been focusing — if we’re going to do anything in the datacenter, it’s to help our customers like Nvidia and Intel. But if there’s any innovation that can occur from a memory storage point of view, let’s look at it out on the edge. That’s where we’ll get the greatest efficiencies and economies of scale.

Above: Sanjay Mehrotra, CEO of Micron, at Micron Insights.
VentureBeat: Has the Moore’s law part been OK? Are you on schedule?
Pawlowski: It’s been a challenge, but that hasn’t stopped us from being able to continue to scale. Quite honestly, I had to live the Moore’s law thing forever. Thou shalt not say anything bad about Moore’s law! That was the eleventh commandment. When people ask me — it was the slowing and stopping of Dennard scaling that really forced the innovation. Now, we may not get double the transistors every two years. Maybe every three or four years. But we’ll grow in the third dimension. That really hasn’t stopped us. It’s just a question of what’s the most economical way to do it. Engineers find really creative solutions to hard problems.
VentureBeat: Intel reinforced today that they’re going to do 7nm in 2021 with the graphics chip. They seem to be back on schedule.
Pawlowski: I hope so. When I left, and that was only five years ago — it was amazing how fast that four-year lead evaporated.
VentureBeat: In that sense, it seems like the whole industry is moving in lockstep, then.
Pawlowski: I think the industry is continuing on their treadmill: “We can still see a path to scaling.” Like I say, I don’t know if it’s the aggressive two years of keeping up with Moore’s prediction, but think of where we’ve come in terms of capability because of Moore’s law over the last 40 years. It’s just incredible. I still think there’s scaling to be had. We’ll take advantage of it just like anybody else.
VentureBeat: The Fwdnxt deal, as far as what it gives you — is it more on the software side, or is there chipmaking talent there as well?
Pawlowski: Not really chipmaking talent, no. They have architecture talent. They have hardware architecture talent. They’ve translated FPGA. In terms of being able to take that and make an ASIC and do the frontend and backend, that’s not been their expertise, but now they’re in the place to do that. They bring the software and the architecture, and the architecture of not only the hardware, but the architecture knowledge of convolutional neural networks and how they can — if somebody presents them with a problem, how they can tune that network and then use their data to train it to get the accuracy levels they’re looking for. Once they achieve the kind of accuracy they want, they map that trained algorithm onto the FPGA to do the classification side.
VentureBeat: So it does give you a variety of options as to what you want to do.
Pawlowski: It does. I’m totally looking at it as — we’re learning so much about how these things interact and how these different networks are evolving. The nice thing is, I can put a network with a million parameters on it — I can put a 100-gig network parameter. It runs slower, but I would be able to understand how those large networks are going to evolve and what we would do.
I was on the panel talking about how we’ve been doing some work with CERN. Just what we’ve learned in the prototyping we’ve been doing with them is phenomenal. They’re throwing data at it at such a fast right, and they need insights so quickly. They’re not about — accuracy is good, but they don’t need something like a cancer patient, where you need 99.999% accuracy. They’re asking, “Is this 70 or 80% good this is something interesting? No? Throw it out. We have more stuff coming at us. Eventually we’ll get something that hits that threshold and that’ll be interesting.” They’re getting 40 million collisions a second.
VentureBeat: What’s your general description of the problems this can solve?
Pawlowski: Both of these problems, the health care and CERN, basically they’re taking 2D sensor images and constructing a 3D model. On the CERN one, particles collide and they create a shower of other particles. What they want to do is quickly take the measurements of those particles and say, “Do all the energies add up?” If the energy was X and you get Y, which was less than X, then there’s some energy that wasn’t accounted for, and that’s interesting science, because the law of conservation of energy says nothing should have been created or destroyed. Once they do that, they want to be able to take different images and construct a 3D model of what that decay looked like, because it doesn’t all show in the same 2D image.

Above: Micron has acquired Fwdnxt to build AI solutions integrated with memory.
Visualizing tumors takes many, many 2D images, X-rays, and whatnot, and creates a 3D volumetric model. We’re using the same 3D convolutional neural network styles. They’re different networks, because they have different inner layers that do different things, but we’re taking those and solving a similar problem of creating a 3D representation.
VentureBeat: I don’t know if you’ve heard of a company called MediView. They come out of the Cleveland Clinic, and they just raised $4.5 million in venture capital. They take an MRI of the patient’s body, where everything is inside, and then they put the data into a Microsoft HoloLens. The doctor can then visualize it all in 3D. You put the scalpel into the patient and he sees it going in through the HoloLens, that it’s going where he wants it to. He’d never otherwise have that view inside. Before, he had to guess based on all these 2D screens he’s looking at.
Pawlowski: That’s fantastic. Years ago, the head of surgery at Oregon Health Sciences University said, “You need to come here.” I was at Intel at the time. They actually scrubbed us up, took us into a double hernia surgery, and he said, “I want to show you how we do surgery now.” They were scoping everything. It wasn’t like this person was entirely laid open, because otherwise I would have not gone into that surgery. We finished and he said, “Now, let me show you how we train our surgeons.” It was like the Stone Age in terms of the tools. This would be a perfect teaching tool.