WinterWerk

Bruce Sterling on the Art of Text-to-Image Generative AI

Bruce Sterling (Chairman Bruce) gave, as always, a compelling lecture at AI for All, a conference in Eindhoven.

my notes follow the video.

what if a computer started writing your novels?
but we really like computers
we had no fear of them
why don’t we just imitate a computer

dartmout conference in 1956
thinking machines
take it seriously
invent machines that can really think
imperialistic war on metaphysics, psychology
are there rules about inteligence?

and then, these are the visual guys
these are actual text-to-image generation outfits here
platforms, companies, start-ups, most of them young,
some of them younger than this February
but just like a small army of these guys
coming out of the lab and schools
in the garages
dropping out of companies
scaring up venture capital
It’s a wild scene

look them up on the Internet
chase them down
follow them on social media
read their white papers

so what are they doing?

i’m going to talk about artwork

because I have a problem here

you could do fantastic unearthly landscapes that look like black and white photography

fantasy landscapes
odd-looking 3D gamer set stuff
just put in the word “utopia”
it will build you utopias
no two will be the same

there is just an endless supply
basically infinite
i mean, it’s not infinite because all these images are JPEGs
no, not really infinite
paintings are not really photographs
they’re conjugations of JPEGs
so what you’re seeing is like 256 by 256 jpgs
and, you know
statistically
there’s only so many ways you could vary the color
in a grid of 256 by 256
but these systems know how to do that

Refik
he will give you all the artificial intelligence that you can eat
on time and under budget
and the public loves it flat out

let’s explain how all this works technically

what you see here is a google tensor chip
it happens to be version three
already obsolete

what I don’t have here are nice, homely literary metaphors like ‘cyberspace’

it’s like you got all these wires
and all these protocols
and all these messages flying around at random
from node to node
you can understand the routing systems
and like the naming system and so forth
or just say ‘cyberspace’
it’s a metaphor because there really is no cyberspace
all there is are wires
storage units
built on top of web browsers
colossal stacks interacting

so, what’s a generator? How does it generate?

the mechanics

pretty refined
but most of it’s rubbish
nonsense
like throwing things out the window
like an impatient editor getting rid of bad paintings
like an angry gallerist
statistically comparing images to a database it has of successful paintings

this one’s obviously chaos
that one might pass
then when it’ll select a few
out of a great many which have been generated
it’ll actually edit it down to just a few and sort of print them
or turn them into actual JPEGs
present them to the viewer
it is astonishingly complicated
amazing that such a whopper-jawed thing
works at all
where it came from is not text-to-image generators
but image-to-text recognition

what happened here?

they were looking for your face
or tried to identify consumer items
basic surveillance capitalism procedures
and then one of the engineers said
okay
we can look at a photograph
our machine will name what’s in it
what happens if we just give it the name
and ask it to produce the photograph?

literally turning the box upside down

but this is really a crude whopper-jawed thing here
i mean
it’s literally as if I’d
like
turned a recycling machine on its ear
and put in broken glass and get out Greek vases

what are its innate characteristics?

I mean
what is the grain of the material there?
what is it good at doing
and what is it not bad at doing?
and if you were an art director or a museum curator
how would you judge what was like
a good output
and like just the stuff that’s like every day
and there’s 250 million of them

here is the ‘healthy boy eating broken glass for breakfast’

but since this machine lacks any common sense understanding
it doesn’t know what glass is
It doesn’t know what a boy is
doesn’t know what breakfast is
It’s the very opposite of an Isaac Asimov robot
no idea about possible harm
If you look at this,

where’s the ethics?

where are the rules?
where are the decisions?
where’s the common sense?
there’s not a trace of them
not one trace.

these AIs don’t have ears
they don’t have photographs
they don’t have paintings
they have a statistical relationship
between text and
clumps of JPEG pixels

I heard early on from users trying to put their prompts into these machines that they weren’t very good at hands

It’s like, why are they not good at hands?

you know, a hand is one of the most common things on the Internet
there are millions and millions of pictures of them

What if I ask to imitate a human drawing a hand?

for the machines that we built,
this is their realism

this is what they actually ‘see’
when they are comparing the word ‘hand’
to the most probable JPEGs of hands

and if you think of hands
and how fluid they are…
We don’t even have a vocabulary
for all the positions we can make with our hands
we’re used to them,
but we don’t talk about them very effectively

this vocabulary is not in the database
because people never described them with enough fidelity, for them to be accurately rendered by a probabilistic engine.

Eventually, they’ll crack the hand problem.

when these systems are more refined they won’t make these elementary errors

but they’re not errors

instead
I want to see what it can say
about things that humans can’t draw

what will it produce
if I ask it to draw something
that is beyond human capacity to draw?

for instance,
the unimaginable.
but the unimaginable is an oxymoron, right?
i mean, you can’t draw something you can’t imagine.
this thing will draw the unimaginable in a hot second

the undreamed-of
stable diffusion doesn’t care
it’s perfectly happy

the obsessive compulsion

the lysergic hallucination.
people have an amber proper
about going insane
computers aren’t supposed to be able to do that
it has no trouble whatsoever with psychedelics
it can spin it out by the square kilometre

The unthinkable.
that is, images of humans being unable to think the unthinkable
it is never able to think
it will always come up with some answer

it’ll regurgitate that as long as the current is flowing through it,
just indefinitely
we’ll never have screens big enough to show it all

it will never get tired of generating pastiches like this,
on any scale,
at any fidelity
quickly, cheaply.
and without ever making any common sense
without ever getting tired
it will grind these probabilistic connections
and spew this stuff out

there is zero creative effort in this
it does take a lot of voltage.

Max Ernst from the 1930s

first
he went out with his canvases
and rubbed pencils on them in order to get suggestive forms
and then he would paint over them
and then later he decided he’d just take the paint itself
and toss it onto the canvas
stomp on it
and then open it up like a Rorschach block
and paint over it

he got this smashed-up paint with not random, but suggestive kinds of imagery

they are nearly 100 years old now, and they never look like anything else

This is Meret Oppenheim’s
Breakfast in Fur
which will never be looked at the same

show this to anyone who is unaware of this famous artwork
almost a hundred years old
they will immediately conclude that it was generated
they will never look at it again and think
what a cool, surreal thing

it’s like she took a teacup
and wrapped it up in gazelle fur
and look, she even wrapped up the spoon
and you know what?
you can’t even drink out of that teacup
think of putting tea in there
picking it up and feeling that fur in your mouth
Ooh, ooh.
what a surrealist frisson
Boy, that’s super weird
such an artist, this Meret Oppenheim
such a form of human expression.

We may have opened Pandora’s box
and slammed the gate on our heritage

it’s beautiful, a beauty which is not of this world

a beauty that cannot be judged by the standards of beauty that we had earlier

there’s a quote by Simone Weil:
“the beauty of the world is the mouth of a labyrinth”
which is a warning:
if you’re interested in aesthetics,
you have to curate stuff
or happen to be an art director of a festival (like I am)
you can’t just pick the pretty ones
The beauty of the world is the mouth of a labyrinth.

once you start taking aesthetics seriously,
you enter metaphysics

that’s what beauty is
the beauty of the world is the mouth of a labyrinth

this is beauty, which is not of this world

and cannot be judged by the standards of beauty that we had earlier
but I know that this labyrinth is my doom
I don’t know how long I’m going to have to put up with this
i’ve been in the labyrinth of artificial intelligence since I first heard of it

i’m not too surprised that there’s suddenly a whole host of labyrinths
thousands of them
i don’t mind
i know it’s trouble,
but it’s kind of a good trouble
i don’t mind living there
i’ll build a house in the labyrinth
i’ll put a museum in it
you’re not going to stop me.
i’m happy to accept the challenge
i hope you’ll have a look at it.


Q&A

bizarre masters of art history

frankestein pastiche of erasing historical boundaries
hokusi as scifi cyborgs
no human feelings about violations of historical continuity

one of the most powerful effects it’ll have on creators

music sampling
reimaging into contemporary pop hits
these things are doing for imagery what synthesizers did for music

what new skills do artists get

what do you need to be a good prompt engineer?
schools of thought:
if you don’t write the code you don’t understand it
like telling an author that he needs to write his own word processor to write his novel

things it can do that humans can’t do
art that is radically post human
how radically?
what are you trying to achieve?
what’s the most post post human?

the competition winners

it’s unethical do claim you did something you didn’t do

people used to think that photography
wasn’t fine art
and cinema was dismissed as compared to theater
this person’s not an actor!

these arguments aren’t going away

and not slowing things at all

think paying musicians who’s work is sampled