Creating A Caption Generator Service

Atakan Demircioğlu
Jotform Tech

Earlier this year, Jotform held a hack week to improve the platform’s accessibility to people with disabilities. During this week, whole developer teams worked on this topic. My team created a service to meet the need for creating alt tags to describe images for visually impaired users.

It’s hard to come up with alt tags when you have a lot of images. So why don’t we have a service that will generate alt text for a given image? Here’s how we built a caption generator service to meet this need.

Requirements for building a caption generator

First, we need a basic endpoint that will allow an image parameter. This image parameter will allow image URLs and also base64 images.

After getting the correct parameters, we must predict the image and create a caption. For making predictions, I need to choose an open-source model. BLIP is a good match.

I preferred using Replicate to run open-source models with a cloud API.

Start with creating an endpoint

Replicate allows you to make your predictions via HTTP methods, but it also has a built-in Python library. For this reason, I looked up a Python web service and found a few basic options: Flask, FastAPI, and DJANGO.

For my purposes, I wanted to create a basic API with documentation. FastAPI is a good match for me.

I defined an endpoint as the base route. We are passing the image to the replicate package and returning the result.

How to install replicate?

To install Replicate, use;

pip install replicate

I didn’t need to parse or validate the image field because it is auto-validating and allows base64 format.

For a better understanding of FastAPI, check out the documentation.

FastAPI Features

One of the best features of FastAPI is the Automatic docs. It uses Swagger UI.

Based on open standards:

  • OpenAPI for API creation, including declarations of path operations, parameters, body requests, security, etc.
  • Automatic data model documentation with JSON Schema (as OpenAPI itself is based on JSON Schema).

Designed around these standards, after a meticulous study. Instead of an afterthought layer on top.

This also allows using automatic client code generation in many languages.

You can check a real example here.

Let’s deploy Python App In Vercel

We have the endpoint but for now, it only works in local development. This is not enough and we need to serve it globally. For this I choose Vercel. There are many pros. The best benefit for me is, It is very easy to deploy a nodeJS App or Python App.

Just open a vercel.json and copy these lines inside. That is it.

For a better understanding of deploying a FastAPI App in Vercel, please follow this article.

Using the caption generator service

Here’s a summary of what we’ve implemented so far:

Here is our test image.

Example Curl:

curl -X POST \
'https://ai-caption-generator.vercel.app/' \
--header 'Accept: */*' \
--header 'Content-Type: application/json' \
--data-raw '{
"image": "https://miro.medium.com/v2/resize:fit:1400/format:webp/1*jfdwtvU6V6g99q3G7gq7dQ.png"
}'

The response is: “Caption: a black and white photo of the word medium”

Yay! The service works and is very easy to use.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in Jotform Tech

Welcome to Jotform official tech blog. Read about software engineering and how Jotform engineers build the easiest form builder.

Written by Atakan Demircioğlu

Passionate about blogging and sharing insights on tech, web development, and beyond. Join me on this digital journey! 🚀

Responses (1)

Write a response