LibShortcut

GPT Vision

by VGC_ v0.6.9
iOS 16+
Requires
AI
Category
Jun 2026
Updated

Ever wanted to ask ChatGPT a question about a photo without opening the app and paying for Plus? That’s exactly what this shortcut is for: you hand it one or more images, type a question, and it sends them straight to OpenAI’s vision models for an answer.

What it actually does

The shortcut takes a photo from your library, the camera, or the Share Sheet, then passes it to the OpenAI API along with whatever prompt you give it. Recent versions handle multiple images in a single request, so you can ask it to compare two screenshots or summarize a few pages of a document at once. The reply comes back as plain text on your phone, and there’s an option to keep a running log of every conversation in iCloud if you want a record.

Apps you need

Getting it set up

  1. Tap Add Shortcut on this page, then confirm Add Shortcut in the preview.
  2. Generate an API key from your OpenAI account if you don’t have one. OpenAI’s help center walks through it.
  3. Run the shortcut once. It’ll prompt you to paste the key during first-time setup.
  4. Pick a default model when asked, or leave it on the suggested one.

First-run setup

The first time you run it, the shortcut asks for your API key and stores it for later runs, so you only paste it once. You’ll also choose a model from the built-in list. As of the latest release that list spans the lighter, cheaper options up through the newer flagship models, with gpt-5.5 set as the default. Pick a smaller model when you just need a quick caption and the bigger one when accuracy matters.

Running the shortcut

Trigger it however suits the moment. Run it from the Home Screen for a photo you’ve already got, or fire it from the Share Sheet while you’re looking at an image in another app. Type your question when prompted (“What’s this plant?”, “Read the text in this receipt”, “Is this email a scam?”), and the answer appears once the API responds. If you enabled logging, each exchange lands in iCloud under a GPT Vision conversation folder, which is handy when you want to revisit an answer later.

Tips

A couple of things worth knowing. The API charges per request rather than a flat monthly fee, and images are billed by token count based on resolution and detail, so a quick low-detail question costs very little while a high-resolution scan costs more. On the input side, the vision endpoint accepts the common formats you’d expect, including JPEG, PNG, WebP, and non-animated GIF, which covers just about anything in your camera roll.

FAQ

Do I really have to pay OpenAI to use this?

Yes, but it’s pay-as-you-go, not a subscription. You’re billed only for the requests you make, and a single image question is usually a fraction of a cent. For light use it works out far cheaper than a ChatGPT Plus plan.

Can I send more than one picture at a time?

You can. Newer versions added support for batching several images into one request, which is useful for comparing shots or feeding it a multi-page document.

Where do my answers get saved?

Only if you turn logging on. When enabled, the shortcut writes each conversation to a folder in iCloud Drive, and a toggle added in v0.6.9 lets you switch that off entirely if you’d rather keep nothing.

You might also like