A $25 million grant will help Stanford Libraries preserve Silicon Valley Archives

Above: The Harold Hohbach and Marilyn Hohbach Foundation donated $25 million to the Stanford Libraries.

Image Credit: Hohbach family

VentureBeat: There’s so much minutiae out there to wade through to find some of the gems. I wonder what you consider to be more valuable?

Lowood: Leslie wrote a small article about the one I’m about to mention. We’ve gotten giant collections and small collections. We got the Apple collection in 1997. We got a lot of publicity around that. People contacted us who had kept things. After we got the big collection from Apple, we got smaller collections from people who’d been associated with them. The smallest collection we got — it fits in one folder — consists entirely of the Apple I manual and two small four-by-six tear-off sheets with notes on them. The notes were from the owners of this business.

What had happened was, Steve Jobs knew that he needed to print a manual and do some other things around the Apple I. It was some advertising, the one-sheets and things like that. He talked to Regis McKenna, and Regis said, “I know a guy who can help you print those things up. Send them to this business.” In walks Steve Jobs, who’s around 20 years at this time — barefoot, not the best hygiene probably, all of that — and he tells the guy he wants him to do this printing job. “We’re building computers in my parents’ garage.”

The guy behind the desk there is probably wondering what’s going on here. He writes a couple of pages of notes for his partner to describe this meeting. Things like, “These guys seem flaky. Look out.” It’s a really great document. It takes just two minutes to read it, this guy’s scrawled notes about the meeting. The beauty of it is he kept stuff. Talking about minutiae, how often does a business transaction like that, the notes from it, result in something that would be interesting to historians decades later? Fortunately, for some reason, he kept them.

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

When we got the Apple collection he gave us these materials, this one folder. It’s wonderful, because this is probably the most frequently paged thing for classes on Silicon Valley or the history of information technology. We always bring it out whenever students come into special collections to look at the archival resources. It just epitomizes how a historical document becomes a kind of time machine. You have your insight into Steve Jobs before he became Steve Jobs, from someone who had no clue who this guy was. It’s minutiae, but it’s transformed by — those people in the room at the time had no idea what was going to happen. Now we can enjoy it and use it.

It’s tricky. It’s hard to know. Experience helps you a lot. Doing historical work helps you a lot in figuring out what kinds of things are useful. You know this from your books. You’ve probably looked at things that, at the time they were produced, were probably — nobody had any idea a writer would be using them later to write about the Xbox or something like that. But people do keep that stuff. We exist to preserve it, so that writers and historians others can use it.

VentureBeat: How do you handle things like — I don’t know what you’d call them, but sensitive documents, things that people at some point considered private, but maybe historically now are very interesting? Some people might give you the whole pile and not realize there’s some private stuff in there.

Lowood: That’s always been a concern, but it’s become a bigger problem with the shift to digital. When we used to get boxes of documents, exclusively boxes of documents, you could go through them and review them pretty quickly. A folder might make you curious if it said “Trust documents” or something like that. Or something about somebody’s behavior issue. You’d immediately think it might be something personal. In our agreements with people, we have terms in there to raise this issue, make sure that people have reviewed what they give us; we also reserve the right to review a collection for private materials and then come back to them.

The problem with digital is, if you think about what’s on your computers and in your email and all of that, it would take you forever to go through everything. More than likely, and this is what generally happens, people just give us everything, without the review that we would have expected with paper materials.

I mentioned that email program we developed here. The main driver for that initially was to be able to find personal information in documents. We were spinning off another project we’d done here where we incorporated forensic technology into the library, which does things like that. It tells you where there are credit card numbers, social security numbers. It uses keywords to find potentially personal information. We applied that in the way that we initially designed this email program and how we provide access to researchers to people’s email without showing them everything initially. Just showing them things like headers and entities we’ve extracted, like companies and universities. That filters access to those collections a bit.

It’s one of our biggest concerns with digital materials, not exposing identity information or personal information because people haven’t had the time to review what they give us. I’d say we’re not 100 percent where we want to be in terms of our solution, but it’s something we’re very attentive to. We’re actively developing better approaches to being able to identify personal information in collections.

VentureBeat: It looks like you might need some AI to help you out here.

Lowood: We have an AI position in the library. It’s for things like that, of course, but also — the same problem comes up when, say, we get a collection of 100,000 digital photographs. Just what’s on somebody’s hard drive. We’re also looking at AI to help us with description of images, and other things too. Our AI researcher isn’t in the CS department, but they’re here working in the library. In fact, there’s a series of talks running right now around AI that’s sponsored by the library.

Above: Second Life

Image Credit: Linden Lab

VentureBeat: If somebody wanted to archive something like — here’s my home in Second Life, say. Is that possible? Is that somehow capturable?

Lowood: On the Preserving Virtual Worlds project, we spent a couple of years investigating that one. Generally, there are things we could do. Second Life was a particularly difficult problem, and the reason was — Second Life caused problems because of a good thing they did. If you recall, in Second Life you have IP rights over the things you create. The problem with that is it means Linden Labs couldn’t give us permission to copy things. We had to develop a methodology, which we did, for capturing everything on an island, so we could show where people lived in Second Life.

The thing is, when you do that, even on our own island, the Stanford libraries island, a huge percentage of the stuff is sort of primitives that were developed by somebody else. If you want a table, you get a table that somebody else has created, or that they’re even selling in Second Life for a small amount. It turns out that so many of the objects on your island aren’t owned by you. That’s in the metadata for those objects. Now you have the issue of needing to get permission to extract those things. In a virtual world that’s based on Second Life’s anonymity, how do we get in touch with a real person?

We developed a method of contacting the account listed in the metadata and directing them to a form on the web that we created. We got about a 10 or 15 percent response rate. Many accounts were dormant. People thought we were hackers and we wanted to copy their stuff. This is all written up. The report is online, from about eight or 10 years ago now, about what we did. We had full cooperation from Linden Labs, but there just wasn’t any easy way around the fact that the creators of these objects in Second Life were the IP owners. We weren’t ready to grab things without permission.

Above: The Matrix

Image Credit: via Warner Bros.

VentureBeat: I guess you have to prepare for the arrival of the Matrix. The metaverse is coming at some point. How are you going to archive that?

Lowood: What we’ve done — there’s a collection at the Internet Archive that has some of this stuff. Video, video recordings. We’ve done that with Second Life. It’s funny that you mention the Matrix, because The Matrix Online, the virtual world that they made, when they went under, we got videos and photos of that particular world in its last phases. We’ve done that with a couple of games. The Sims Online, which became EA-Land, we did that with that world as well. We did a video capture of the last hour of EA-Land, which is actually become a really popular document for people to look at.

Video is my answer, then, just like in the real world. Screenshots, video captures, tours, things like that. It’s a very difficult thing to document a world in its entirety, and the same is true of virtual worlds as well.

1 2 3 View All

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

The insights you need without the noise