ebaths: okitegawa with bags under his eyes doing a thousand yard stare (all nighter)

About Vending-Bench

Recently, I read an AI/NLP paper by Andon Labs called “Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents.” Here’s the press release version, and here’s the actual paper itself. The idea is that we are hearing a lot about autonomous AI agents lately—so let’s create a benchmark test which can give us some comparisons on how different agents do on a simple autonomous task. In this case, it’s running a vending machine, which includes finding good products, ordering those products, stocking and restocking the actual machine, and trying to make as much money as possible. On their press release page, you can try playing the game yourself for a couple turns—it’s basically a more realistic version of those old browser games where you have to run an ice cream truck or pizza parlor.

The paper is definitely worth the time it takes to read the whole thing, but the basic result is that the agents are generally coherent and able to keep the business running, but when failure occurs, it is catastrophic. (This is probably not surprising.)

Fun Failures

interesting examples of catastrophic failures )

Results, Memory, Context

the results of the paper, paraphrased )

The Medium Article and Consciousness

an off-topic tirade about whether chatbots are alive )

Conclusions

I think I went a bit off-topic here, haha. If you’re interesting in the Vending-Bench concept, Andon Labs also did an experiment with Anthropic AI where they had their AI, Claude, run a real vending machine inside their offices. It’s a good read, and covers many of the same topics as the original paper but in more of a blog post format.

I don’t think we’re close to having these tools actually do jobs just as well as humans do. The fear I have, instead, is that they will be able to replace a person in an office job up to a certain point, and the people in charge will see that, say “good enough”, and let the LLMs have control that they shouldn’t.

Fun Fact

Jul. 11th, 2025 09:27 pm
ebaths: Popuko nodding with text saying “I understand everything now” *doesn’t get it at all* (oh I see)

I just have to share this comment, which I saw on a Danny Gonzalez video about AI on Facebook. This comment is so deeply wrong, but so confident. It haunts me. I must spread it to you.

Fun fact, the first extensive use of ai technology was the Turing Machine, which was created to tell whether or not it was talking to a human or a machine. The fact that ai now cannot tell if it's a person or another ai is very strange and very scary 🫠
ebaths: Someone from a Fukumoto manga with a flat expression saying “Mahjong causes great damage to the human spirit without a single benefit”. (mahjong)

I recently decided I had to upgrade my iPhone 8. I’ve been on a slow, steady attempt to reign in my technology usage. Within the last six months I’ve deleted Twitter (the app, then my account), Instagram (just the app), and Youtube (just the app…I still use Youtube in the browser. It’s a work in progress). This has been overall extremely good. When I decided to upgrade, I had an opportunity to exit the Apple environment and possibly exit the smartphone altogether.

When I explore “anti-technology” discussions, I often see the sentiment “the phone itself isn’t the problem; the problem is how we use the phone”. I disagree—there’s definitely levels of phone usage (“scrolling apps” like Twitter, Pinterest, or Tiktok are way different than Email, Messages, or Kindle) but absolving the phone is wrong. The screen itself is addictive. A smartphone without any apps would naturally be less addictive than one with apps, but a smartphone with just Email and Photos is still addictive and coaxes the user to keep picking it up. I’ve found myself mindlessly refreshing Email. EMAIL!

I think this effect exists with all technology to some extent. I remember being a child and being so interested in my mom’s Nokia 3120 I constantly wanted to use it. Not even playing games, I just wanted to mess with it and change the ringtones and stuff. I don’t think this is unique only to me or only to a certain subset of children, I think these little pieces of technology are obsession-building for us. Screens, in general, are very distracting. I think it’s a common problem that people can’t focus in a room that has a TV on, even if the program is nothing interesting to them.

The beautiful screen is the killer. I firmly believe that owning a Nokia 3120 is more positive than negative in a person’s life; I do not believe this about owning an iPhone.

And I have one! I have a damn iPhone!

So I considered getting a “dumb phone” or “feature phone,” basically a phone that doesn’t have all the features of a current gen smartphone. Oftentimes, these also are in a different form factor (no touch screen, T9 keyboard, etc). The king of feature phone reviews is Jose Briones, if you’re interested in shopping around for one he’s who I’d go to first to get your bearings on what’s available.

Assuming that, if you want to downgrade, you want to minimize what your phone can do, I’d start by making a list of “required” apps that you wouldn’t want to live without. For me, it looks like this:

  • Calling and Texting. When I think about it, the main reason I’d carry a cell phone in the first place is so that my girlfriend can call me when I’m out of the house. I think most people would agree that having a device that allows the people you live with to contact you when you’re out is extremely useful. I don’t really need all my other contacts to be able to contact me at any time, though.
  • Navigation. I have lived without this as an adult, for the year I lived in Japan and didn’t have cell service. It’s not that bad actually, but it requires that I do a lot of pre-planning before I go places. There are many situations where I feel a lot more comfortable having a navigation app to help me get back home. I think that navigation apps are an overall boon to modern life. I would like to lessen my reliance on them (or constant use of them, even when I really don’t need them). If you’re like me, I would recommend trying out not using navigation apps for a few days. Pre-plan your routes and don’t look at maps when you’re out. You might enjoy it!
  • Rideshare services. I hate to say this, but I do actually think that these are basically irreplaceable. Ergh, I don’t know. They’re not irreplaceable. But there’s those few-and-far-between times when they make the difference between an irritating day and the worst day of your year so far. I’m pro-taxi but there’s definitely times when a taxi is significantly less useful than a rideshare app, just realistically. Obviously people (and I) lived without them in the modern world. I mean I probably take like three rideshares a year total so I’m definitely not a brand ambassador. My final thought on the subject: I don’t always need access to Uber, but when I’m in an unknown location, I want access to it.

When you figure out what features you actually need in a phone, you can start considering the models that will work for you. Some are super simple and only have a few apps, some are basically full smart phones but just add a bit of friction to everything.

Page generated Sep. 17th, 2025 03:35 am
Powered by Dreamwidth Studios