I talked to Nvidia’s AI NPC: it’s impressive, uncanny, an ethical nightmare, and inevitably here to stay – like it or not

By admin Apr27,2024

Usually, when you go to see one of the big hardware powerhouses, the presentations are primarily focused around… well, you know – hardware. That’s been shifting over the years, however, as specialized software becomes an increasingly vital part of the computer hardware battleground.

Nvidia has led much of this charge. When you buy an RTX-series GPU, you’re not just buying it for the raw rasterization power – that is to say, how many pixels and frames it can push with the maximum visual bells and whistles enabled. You’re also buying it for specific software implementation that will aid in the presentation of your games – stuff like DLSS and Frame Generation, which can boost frame rates without sacrificing visual splendor, or Reflex, which aims to reduce input lag.

All of this is of course part of a greater graphics arms race. When one feature gets added by one manufacturer, the other fires back with an equivalent or better alternative. As this race deepens, Nvidia sees a chance to synchronize gaming with its other key interest – Artificial Intelligence.

The biggest example of how various branches of AI could combine to create something very real and very significant for gaming is Nvidia ACE – a system that basically generates AI-powered NPCs. Usually, when you speak to an NPC in a game, they’re written by a person, performed by a person, and thereby have a limited set of phrases and voice prompts. Nvidia dares ask: what if an AI wrote an NPC on the fly?

The natural response, of course, is to roll your eyes. Then comes the annoyance: fine, yeah, but what about the bloody people who work on bringing these characters to life? There is no art without a human – or at least living and breathing – artist. Of that I am sure. But, let’s push on for a second and ask – how good is this, anyway? Are those jobs under threat in favor of a GPU-enhanced AI?

Nvidia’s response would be to say that there’s still artistry needed to create these characters. Part-way through a hands-on demo of the AI chatter, company reps seem to pre-empt this criticism by pulling up part of the back-end of Inworld Engine, the tech that powers the demo. In the back-end, we can see a detailed profile of one of the characters you’re able to converse with. This is at once a demonstration of the artistry still involved and of the versatility of the AI.

Is this, really, the future?Watch on YouTube

Basically, you drop in information about the character in question. This can be a few lines summarizing them, or a really detailed description of their life history, outlook, likes, dislikes, loves, relatives – whatever you want. With this information defined by a human, the rest is handed off to the AI, which pulls from that profile as well as more general knowledge it has about the world of the game in order to formulate responses. Those responses are then delivered back to the player via a text-to-speech setup.

However I feel about all this ethically, I won’t lie to you about the end result: as an initial tech demo, it’s bloody impressive. The demo has us at a GDC-like tech conference, trying to track down a specific character. In a hotel lobby, there’s three folk to converse to – a bellboy, the check-in clerk, and one of the conference guests, a high-flying tech CEO type. Each has their own personality – and their own ‘mission’, so to speak, in terms of what they need to impart to the player.

Nvidia’s reps talk a lot about ‘guardrails’ in this context. The free-form nature of chatting to an AI using your actual voice means you could meander off – and so the AI is heavily gated in terms of what it’ll disclose, how it’ll say it, and so on. I mention the real-world case of Airline customer support chat bots promising refunds that technically broke company policy after being given the run-around by meandering conversation with customers; could you fool an NPC into giving up information they’re not supposed to with a sufficiently slippery tongue? Nvidia’s reps say no, if the characters are well-defined. The same applies for ensuring characters don’t come out with something offensive or inappropriate.

The result is a weirdly stilted conversation, but one that does feel natural – and the fact that you’ll get different dialogue even asking the same question multiple times is deeply interesting. What you begin to notice is the AI latching on to specific elements of their biography hard. Bellboy Tae has dreams of owning his own bar and has his own signature cocktail – and he’d mention it at just about any opportunity he could. As I peppered him with questions about his life, his feelings, and my mission in the demo, he’d always find a way to loop back around to cocktails. But when pressured for ingredients, he couldn’t give them – presumably that wasn’t in his biography, or the AI was programmed to not give more explicit instructions on boozing, lest an end-user need their stomach pumped.

Tae is one of three AI-driven NPCs you can get to know in the demo. Like me, he's obsessed with booze
Tae is one of three AI-driven NPCs you can get to know in the demo. Like me, he’s obsessed with booze | Image credit: Nvidia

Sometimes the bounds of gameplay design and the AI rub up against each other, the friction creating a sort of strange uncanniness. At one point you’re encouraged to invent a reason why the tech conference keynote is to be delayed in order to progress the narrative. This could be as simple as an electrical problem, or someone running late, or whatever else. But I can’t help myself: I tell CEO Diego that there’s been a bomb threat. He makes a panicked bee-line to the hotel receptionist to check if that’s true – but his dialogue with her is calm, and merely references the fact the keynote might be delayed rather than the fact we all might be exploded any moment. The reason for that is simple: Diego’s dialogue with me is generated by AI; but his chatter with other NPCs is pre-scripted, and thus not as reactive.

But it’s undeniably impressive as a piece of technology. Using my real voice to talk to the characters, everyone understood my regional UK accented drawl and every single interaction was unique, teasing a little more out of each character with every exchange. The scope of it is limited, but once you understand the intended scope and step back – yep, it all just works. It’s a bit ropey right now, but as a proof-of-concept, everything you need is there.

If we separate the tech from the art for a moment – one can absolutely see how this is the future. One can also see, however, how it has about a million miles to go in terms of having the human factor. There’s a missing element of expressiveness, of emotional intelligence, of fun. There’s a missing soul – and not just in the slightly robotic text-to-speech delivery, but in all of it.

With that said, though, I can imagine it fitting in. On a TV series-inspired kick I’ve been playing Fallout 4 again, for instance – and I could imagine emergent AI quest design, handed out by AI-powered NPCs, existing alongside the curated, human-written main quests and side quests. You could theoretically have an endless supply of procedurally-generated quests with ‘fully voiced’ AI quest-givers. Bethesda already has these, actually – they’re called ‘radiant quests’ – but you can easily imagine how the AI NPCs could be used to enhance them.

A Power Armor location in Clarksburg in Fallout 76.
Could Fallout 76’s launch have been improved with AI-powered NPCs… instead of NO NPCs? | Image credit: VG247/Bethesda

I want to be pragmatic about this: AI is potentially highly dangerous to the art of game making and to the artists who give us many aspects of games. But also let’s be real: it is inevitably part of the future. Just a few minutes with this demo drives that home. Developers and publishers are going to want to use this technology.

So, as with any technology, the question is in how it’s used. Even with characters uncannily looping around and showcasing last-generation lip flaps that obviously can’t keep up with syncing entirely realistically to words first formed a fraction of a second ago, even with occasional processing lag and even the odd audio stutter… it’s all very, very impressive. I can think of about a hundred different potential uses of this sort of technology: and not all of them are evil.

And, in a sense, I’m relieved. Nvidia ACE’s AI-powered NPCs are impressive… but they don’t feel real. Like anything in gaming, they’re all about smoke and mirrors. What it offers is a trade when compared to traditional games writing. In a game like Mass Effect or Fallout, what NPCs can say is relatively shallow, limited to just a handful of dialogue options and answers in each interaction – but each is painstakingly crafted and beautifully pitched in the way only a human can.

This trades that for breadth – handing you an NPC that you can potentially talk to in a variety of different ways for an age. These characters still feel a little alive – but they lack that human touch, even if they’re still far more personable than those idiotic refund-granting chatbots. For now, this means the jobs of artists are safe. The question now becomes how this technology might collaboratively intersect with their work – helping them, rather than replacing them. That’s a future I could buy into.

By admin

Related Post