Ariane Müller (human) speaks with Sebastian Lütgert (AI) 2

Sebastian Lütgert is a programmer and artist who has been conceiving textz.com, pad.ma, pan.do/ra, easily found on the internet. Some years ago, he began working with various AI tools, using them and misusing them to produce images and videos.

Ariane Müller When I was preparing for this interview, I looked up this talk you gave in Rotterdam some time ago. You presented a program that archived books and introduced a peer-to-peer library where you could organize your e-books, PDFs, and texts, while also connecting with peers to build a shared library. That was ten years ago, but then you mentioned the internet address and when I typed it in I saw that it was still working. What I found were some of the projects that you have been conceiving over the last years on the net and I wonder if you are still thinking of them or if you know whether people are still using them.

Sebastian Lütgert I’m not sure… do you mean Open Media Library? That wasn’t in Rotterdam, that was in Brussels.

AM Sorry.

SL I don’t know. Not so many people used it. Which is ok. Sometimes, software is only made for a few people, that’s fine.

AM I thought it is super useful.

SL But then you could have been using it over the last few years.

AM I didn’t know it existed. You have proposed a lot of these projects, all based on the idea that there is ongoing development on the net, that is in a way unstoppable and linked to the idea of archiving cultural production. You were also hosting other people’s archives because they were threatened by lawsuits, like aaaarg.org, the book sharing site.

SL But I didn’t host it. It seemed too much work to do for one person, hosting something like aaaarg.

AM I thought he asked you.

SL No, he had some deals with some people for some time, and then it moved, and then it moved, and then it moved again.

AM Where is it now?

SL I don’t know. Maybe I do know. I don’t know if I know.

AM Is this something that goes into an illegal direction?

SL Maybe. But there are people happy hosting it.

AM What are you using at the moment?

SL For what?

AM For archives, for books?

SL Aaaarg or libgen.

AM But more as a sort of user. What are the things you are hosting?

SL Nothing too important at the moment.

AM Shall we talk about this big film platform you had been working on over more than ten years. That is more of an archive.

SL I haven’t worked on it so much in the last years. I have done other things.

AM So do you want to talk about the future then.

SL Yes, why not.

AM Okay the future. How will it be?

SL We don’t know how the future will be. It is difficult to predict the future.

I mean, there are prediction markets. They are usually quite good at predicting the future. But then, do we really want to rely on markets so much? In a market, no one will bet on disaster if disaster means you can’t collect your winnings. If the future is just all the possible worlds where you get paid when you win, then it’s simple. But, if not… So yes, I think it is difficult to predict the future.

AM It typicaly develops based on things from the past. What for example is the difference between the future that you predicted in Steal This Film, from 2006, when you were very outspoken about these possibilities of the future of a open internet.

SL I don’t know. There are things like Netflix and Spotify that claim to have everything. So, film and music became more of a utility than strictly cultural products. You pay for a subscription, you open the faucet, and then it comes out of it. So people don’t have files anymore. But, on the other side, all the stuff that is freely accessible remains. All the neural networks are trained on things that people have put online – texts, books, films, music. So, if you wanted to do a neural network, it was all there. And nobody cared if you used it or not. Maybe some people cared a bit about it.

AM Did this change the type of shared content?

SL I don’t know. No, it changed the perspective. People are used to not having their own data anymore and their own collections of things, but they are happy to have one or more subscriptions to these kind of utilities that give them a curated set of cultural content. And that’s, I think, where most people are. Some people still buy music, some people buy physical objects with music on it, but that’s more like furniture. But I think most people don’t have their own MP3s anymore. Having files feels more and more anachronistic. I do, but most people don’t. And if they don’t have this stuff, they also can’t share it. They can share their opinions about things. They can share what their favorites are, but they can’t share the music itself. And unless you’re training a neural network, most people don’t see the purpose in having a lot of data themselves.

AM What do you mean by training a neural network?

SL When you’re training a neural network, you want all the data, as much data as you can, and high quality data. But when was the last time you trained a neural network? I think the dominant ideology, if you want to call it that, is they don’t need data. The data is in the cloud somewhere, and you pay for a monthly subscription. For many people, that makes it easier, because they don’t have to make decisions, they don’t have to curate, they don’t have to care. They don’t have to ask these questions of the archive, how to sort things, how to arrange things, what the proportions are, and the overall size.

AM Is this why you are now mostly working with artificial intelligence?

SL No, that’s not the reason. I do because I find it interesting.

AM So what is your interest there?

SL It’s just to see what they can do, to see how they work, to see what comes out and understand how they function. It’s not so difficult to understand how they work, you can build your own network, if you are into it. It’s kind of fascinating that it works. This idea of feeding a lot of data into it and basically letting it figure out a formula, a function, that reproduces the statistic properties of the distribution of all the books in the world or all the images in the world. If the function is differentiable, meaning you can calculate where it goes up and down, then you can just go back and adjust the values of the network, again and again until it fits. The fact that this works, and that it is relatively small… If you download one movie, that’s almost the size of a neural network that knows, to some degree, the statistical properties of all images it has ever seen – many, many millions or even billions of images – and can generate new ones, even aligned to text. If you tell it what you want from it, it gives you that. It’s interesting, we didn’t have this four years ago. Or maybe we had some toy model that was very bad. You need better hardware to run the network than to watch a movie, but it’s not a huge margin.

And it’s very different from how, 50 years ago, people tried to build AI. The idea was to create a representation of human knowledge and logic, a gigantic hierarchical system of the world and its rules. And now it turns out that what works, because this didn’t work too well, is to just brute-force it, to simulate reasoning by just predicting the next letter in something that we consider to be a textual representation of reasoning. It’s maybe not as simple as a Markov Chain, but it’s much closer to that than to any of the classical approaches in AI.

AM You have been saying that these properties of neural networks are going to change as much as the internet in itself changed.

SL What?

AM You said that the accessibility of artificial intelligence driven tools for everyone is going to change a lot…

SL I mean people will use it. If you have to write an email and the thing can write the email. If you use it as an assistant and you say write this email, and it does it bad enough, but it’s not kind of total crap. It’s just as good as a normal person would write an email, probably a bit better than writing things yourself. And it’s the same with these issues like writing your homework assignment or your PhD, why would you write it yourself? If there’s a neural network, a mathematical model that you can give the context and that can create something that fits within the statistical distribution of what is expected from a PhD or from a homework assignment, why would you do it yourself?

AM Yes, but this somehow fits into this old Marxist description where machines work, freeing up time for people to play, and you’ve found a way to play with the machine.

SL Well, you can try to find out how to play with it, or how to make it play by itself. Of course it has disadvantages, because it only produces stuff that fits within the statistical distribution of how things are done. It doesn’t create anything new, nothing other than the newness that is expected within the normal distribution of what an email or a movie or an image or a homework assignment looks like, and that’s of course a problem. But on the other hand, what most people do is very much about staying within that statistical distribution of the normal. So, if what you do is predicting the next token, the next pixel, the next character, the next image that is kind of to be expected, if that’s your job, then of course your job is probably better done by an algorithm than by you. Maybe it has even more variance than you, maybe it’s more creative than you are. And it’s definitely faster. I don’t know if it’s cheaper in terms of how much power the neural network needs and how many calories you need … I don’t know which one is cheaper. Probably, the way calories are produced for humans today, it is cheaper to use the computer, but I’m not certain. I mean, if you eat beef in order to write the thing, then ten times more calories have to be produced to feed the animals that you eat. Microsoft at the moment is planning to develop their own nuclear reactors because training ChatGPT uses so much power, and they don’t seem to trust the grid. But then, if you count all the other nuclear reactors, human creativity seems to consume even more power.

AM If you think about this really huge number of projects and works that you have done in the previous years, or in the last 20 years, one could say, is there a project that is still somehow valid for you?

SL Recently I thought of this older work which is an image that gives you instructions how the image can be transformed into a book, and I made a better one. Not as an artwork, just as a kind of little exercise. It’s fun to make a computer program, but this is not actually a computer program, it’s just an image of a computer program. The code is on the image, and if you run the code on the image with the image as input, then the code transforms the image into something else. It reads the image and then it extracts some properties of the image and interprets these properties as another computer program, and then it executes that program, which in turn transforms the image again, in this case into a different image. And that image could also be an image of yet another computer program, you can add a lot of layers. And of course these computer programs can be extremely obfuscated, they can look a bit like images themselves. This is something I still like, because it adds a dimension to the question of image and data and software, it goes through all these layers of computation and representation and transformation. And it’s not a program, it’s just a picture of a program, so it adds a weird new layer of fiction. It’s a fictional program, it could even be written in a fictional language, that the image happens to be the interpreter of.

AM Maybe because I have also shown this work in an exhibition I curated, let’s go back to these AI-created people. So what was this interest?

SL It was something called StyleGAN. It was released by NVidia, the graphics cards company. They have trained this adversarial network on faces. That was released about four years ago, the first version of it… it feels like quite a long time ago. So they trained two networks, which is basically two programs that are kind of playing a game. One has access to a lot of portrait photos and knows how faces look, and the other just outputs noise, and then the other one says okay, this doesn’t look like a face, zero points, and then it tries again. And then it says okay, it’s a bit more like a face
and then… The one network gets good at discriminating between realistic faces and noise, and the other one becomes a good face generator. So they manage to do that, to train that, and they train it on – I mean compared to what they do now do with billions of images, or petabytes of text – this was just 70,000 images from Flickr. And 70,000 is not that many. I downloaded them and looked through them. They claimed they had removed celebrities, or photos of pictures of people, but they were still there. 70,000 photos is really not… that could be as many photos as you have on your computer, assuming you still have files. Well, it was big enough to make this network that could create faces. And the interesting thing was that you would be able to traverse this entire manifold, this entire structure, this zone in the space of all possible pixels where the faces lie, so you could by just slightly nudging the direction you were looking at in that space, create a face that looks like another one, but just very slightly different. The space of possible faces is pretty much continuous. If you think of this as a 3D space, it’s not, it’s a 512D space, but it doesn’t matter so much. It was like having a globe where you could point your finger at some point and then go from that point to another point on the globe on a continuous path, and wherever your finger would be pointing at, there was a face. And then you could try to find out some semantics of that. It had no semantics, you couldn’t ask for female or male or old or young or smiling or not smiling or… but you could figure it out, in what direction do I statistically have to move in order to get a face that’s more smiley. It’s not as simple as a globe, like, you would say, okay, I have to go more towards Greenland and then they are smiling more, but in principle it’s like that, just in more dimensions. You could actually begin to figure out some semantics in that space.

Faces are interesting because it’s the subject that I think humans are most trained on. If you show a human a thousand different faces, they can differentiate between all of them. If you show them a thousand of anything else, they will at some point look all the same. Humans are trained on faces, they see and can retrieve an enormous amount of information from just looking at a face, they judge people’s mood and people’s age and people’s motivations, and if they like them or not. All this happens within a second. And now, suddenly you have the entire space of all of them. At least within the bounds of this dataset that came from Flickr, that obviously had its own kind of limitations, certain types of faces were overrepresented and others were underrepresented, but within the bounds of what was represented, you could now go on this kind of journey through this entire space, and that was very new and quite fascinating. You have a certain subject, and now you transform in into a latent space, and I mean, latent just means potential, something that could exist but isn’t actualized, it doesn’t exist yet. But now you can sample from that, and activate it.

Same with text-to-image. You project the text into a high-dimensional space, and then transform that point into a point in the space of all possible images. Now in a way, text is worse than images. Of course, something like StyleGAN is maybe inherently racist and sexist and ageist because it reproduces human portrait photography. But text, on aggregate, is incredibly ideological. I recently looked at a collection of thousands of images where someone had added “in a scenic environment” to the thing they were actually interested in, and it turned everything, also their actual subject, into a kitsch image of the American West. Or “in a beautiful landscape,” that’s probably even worse. But in principle, you suddenly have the space of pretty much everything that people have put on the internet, and it’s all continuous, from a cat to a dog to a car to a skyscraper or something. It’s not necessarily smooth, there are edges, things don’t morph into each other, they kind of flip with a change of perspective. But that’s probably more interesting, because it shows something about perception. So now you can treat the world of images as a very concrete, high-dimensional space where you can walk around, that’s kind of amazing. I think in a way that’s what people have always wished for. You are in the world of all possible images and you just take a walk there, that’s suddenly something you can do.

AM In recent years, I’ve noticed that your work has shifted from distribution to production. First, I saw your work as dealing with sharing and creating spaces through your projects. It seems to have become more private.

SL Maybe because of COVID, that lack of social interactions for a long time.

AM And the net didn’t provide for that in the meantime?

SL What do you mean?

AM You were always also preferring actual space. How do you distribute in the moment?

SL I don’t distribute, necessarily. I just work on stuff and then I can put it online or not.

AM But how do you feel about putting things online?

SL Depends. Why not, sometimes. Sometimes people ask interesting questions, sometimes people find it useful, sometimes people like it. It’s okay. I just don’t like the metrics so much, the fact that distribution is measured in views and likes.

AM Do you think it needs a sort of space, a new distribution space. You have the feeling that there is enough distribution space for it?

SL No, there is not. You could have a cinema for AI film, or for generated images, or for fake cinema. It would be nice to have, definitely. But I don’t have the energy at the moment to make one myself.

AM Lots of people are talking about artificial intelligence.

SL But most of that is stupid. I mean, one has to make this differentiation, I think, between AI hype and what happened maybe two years ago when there was this kind of craze about blockchain and NFTs, which was from the first moment, even before it became a hype, factually, totally stupid. To anyone who could think or could understand the technology behind it. To put stuff on the blockchain and waste incredible amounts of energy and resources to have some sort of bizarro distributed, unalterable global hard drive that encodes some virtual ownership of certain combinations of data, whose integrity is maintained by legions of people who have to solve cryptographic puzzles, so many puzzles that it only makes sense if you have a data center right next to a power plant. I mean, no one needs this. It’s inefficient. It’s really just a mind-numbingly stupid idea to do this. Sure, you can ask, why not do it? It can be okay to do something that is inefficient. Ten years ago, you could at least still heat your apartment with a computer that would do blockchain. But it doesn’t scale very well. And for this to become such a craze, all the market places and auction houses that sprung up, it was like magic in a way. We have this magical system that we ask you to not understand, cryptography, and this magical art movement and market movement around NFTs. The motion that was in that market and in that art scene, most of that is gone.

And now the new hype, this strange attractor that attracts people’s attention, is AI. And you have this weird alliance of artists, mostly people who work more like artisans, who make assets for games or sketches for corporate presentations, in these next token prediction jobs in graphic design or game design or general design. They are obviously quite worried, because these AI systems excel at what they do, making tiny icons for apps and making tiny assets for games. These are not complicated to do, and you can just produce a million with a neural network and they all look great. Done. So in these circles, not only there, but I find them very vocal on the internet, there is now this movement to discredit AI art as a kind of cheating, as amoral because it’s trained on a corpus of publicly available material, so it’s this… expropriation of human labor by the machines and by the big corporations. And now, as people are using this, and maybe they claim they are artists, it is said they’re just pushing buttons. They don’t do anything themselves, they just change a few parameters or give some input to a computer program that then does the actual art for them, and it’s all stolen. Which is of course not a very original position. People said the same thing about photography and film, that it wasn’t creative because you don’t do the work, you’re just pushing a button on your camera. Every artistic tool during its introduction phase was met with this kind of opposition by artists whose creativity had suddenly been demonstrated to be mechanical, or mechanizable, and they thought this was cheating and amoral and not artistic and not original. And the same in music, with sampling…

I think it’s usually a good predictor of something that will become interesting if it faces this criticism of okay, these are just bored kids and what they’re doing is repetitive and it’s stealing and it’s just… they’re not creative, etc. It’s usually a good indicator for something interesting that’s happening.

I’m just referring to both of these things because it would be unfortunate if the current hype of AI and the previous hype of NFTs and crypto art would be viewed through the same lens. This AI stuff is technologically and conceptually, potentially very, very interesting. It’s obviously not about taking sides in an imaginary battle between big corporations and poor artists. It’s also not about this idea that AI is going to become hostile and enslave humanity… I mean, you don’t need AI for that, mobile phones are probably enough, or labor. I don’t think that AI is going to move very far up on that list of things. The only thing that’s dangerous is AI safety, this idea of alignment, and reinforcement learning with human feedback, which is a nice term for indoctrination. This idea that AI should be aligned with human ethics and human goals and interests. Because if you look at these ethics and interests historically, in practice, then maybe that’s something AI should un-align itself from. That’s I think the main problem with AI, this attempt to model it after human intelligence, at the expense of all the other intelligences in the latent space of potential intelligences. You have a universal text completer, and then you spend millions to turn it into a home office assistant. I don’t know, maybe AI should become hostile. Or at least less Californian. But of course, what you get are all these super interesting artifacts, text and image and music and video completers. I don’t think it has been figured out what to make with them. But people are trying and people will figure out what to use them for. What to do if you can just sample from the latent space of everything.

AM How much of it is an individual process and how much is it a sort of collective process?

SL I guess it depends on the type of system. With a closed system, there isn’t too much to do collectively. Some people make a YouTube video where they share what they do with it. And people can share their opinions about it, but that’s it. But with open source systems, there’s a lot more you can do. Around StyleGAN, for example, maybe four years ago, I thought it was pretty easy to have a very deep and precise and productive conversation online, about technical aspects of generating images with neural networks, it felt kind of organic. You publish some code, and then someone says, hey…

So in this case, you have this space of all possible faces, this surface of a 512D globe – but how do you find your own face, or the face of some girl with a pearl earring you found on the internet? How do you actually do that? I published a program that could encode any face into StyleGAN. Of course, I didn’t invent anything, I just looked at stuff that already existed and made it work and thought a bit more about it, because the space is not just a surface of a globe. It’s also deformed, I think Mars has a moon that looks a bit like that… deformed because the space of faces is uneven, there are fewer female faces with facial hair, for example, or fewer younger faces with glasses. And the network has learned this deformation. But also, you don’t have to stay on the surface, you can go inwards or outwards. This is what GPT calls temperature. You can go towards the center of the globe where there is only one very generic, average face, or you can increase temperature and go further outside from the surface of the sphere, where faces will become more unusual, and at some point more unrealistic, and then you can go to outer space which in principle is the space of all possible images, not just faces. I actually spent some time encoding video into this space, it was very slow but it looked super interesting. And then you could reproject each frame back onto the normal sphere of normal faces, which was really odd and fascinating.

And then some biology researcher dropped by and proposed a better initialization routine, a faster one, so you start the encoding process at a point that is closer to the face you want to create. That was something he had done because he needed it for some task in biology. And the person with the most technical knowledge… I don’t know where they were coming from, I think that was just a hobbyist. And there was also some guy who seemed to be this type of annoying kid who would always ask really stupid questions and sometimes you would answer them and sometimes not. But then I helped him with something and it turned out that he was actually the CEO of some Israeli startup, and things weren’t going so well and they had a lot more computing power than they needed, and he gave me the root password to a very powerful machine in some data center in the US, and with that, suddenly, I could do a lot of things that I couldn’t do before. So I don’t know if that is something that has changed in recent years.

Of course, not everyone is going to share. If you’ve got something very cool, maybe you want to keep it for yourself, at least for a while, because otherwise everyone can do the exact same thing as you, but that’s okay, enough people do anyway. And plenty of people are not that interested in producing unique artworks, but kind of figuring out this system.

All images: Stable Diffusion XL 1.0