
Creating A Caption Generator Service
Earlier this year, Jotform held a hack week to improve the platform’s accessibility to people with disabilities. During this week, whole developer teams worked on this topic. My team created a service to meet the need for creating alt tags to describe images for visually impaired users.
It’s hard to come up with alt tags when you have a lot of images. So why don’t we have a service that will generate alt text for a given image? Here’s how we built a caption generator service to meet this need.
Requirements for building a caption generator
First, we need a basic endpoint that will allow an image parameter. This image parameter will allow image URLs and also base64 images.
After getting the correct parameters, we must predict the image and create a caption. For making predictions, I need to choose an open-source model. BLIP is a good match.
I preferred using Replicate to run open-source models with a cloud API.
Start with creating an endpoint

Replicate allows you to make your predictions via HTTP methods, but it also has a built-in Python library. For this reason, I looked up a Python web service and found a few basic options: Flask, FastAPI, and DJANGO.
For my purposes, I wanted to create a basic API with documentation. FastAPI is a good match for me.
I defined an endpoint as the base route. We are passing the image to the replicate package and returning the result.
How to install replicate?
To install Replicate, use;
pip install replicate
I didn’t need to parse or validate the image field because it is auto-validating and allows base64 format.
For a better understanding of FastAPI, check out the documentation.
FastAPI Features
One of the best features of FastAPI is the Automatic docs. It uses Swagger UI.
Based on open standards:
- OpenAPI for API creation, including declarations of path operations, parameters, body requests, security, etc.
- Automatic data model documentation with JSON Schema (as OpenAPI itself is based on JSON Schema).
Designed around these standards, after a meticulous study. Instead of an afterthought layer on top.
This also allows using automatic client code generation in many languages.
You can check a real example here.
Let’s deploy Python App In Vercel

We have the endpoint but for now, it only works in local development. This is not enough and we need to serve it globally. For this I choose Vercel. There are many pros. The best benefit for me is, It is very easy to deploy a nodeJS App or Python App.
Just open a vercel.json and copy these lines inside. That is it.
For a better understanding of deploying a FastAPI App in Vercel, please follow this article.
Using the caption generator service
Here’s a summary of what we’ve implemented so far:
- Git repository
- Endpoint (POST with image param)
- Documentation
Here is our test image.
Example Curl:
curl -X POST \
'https://ai-caption-generator.vercel.app/' \
--header 'Accept: */*' \
--header 'Content-Type: application/json' \
--data-raw '{
"image": "https://miro.medium.com/v2/resize:fit:1400/format:webp/1*jfdwtvU6V6g99q3G7gq7dQ.png"
}'
The response is: “Caption: a black and white photo of the word medium”
Yay! The service works and is very easy to use.