A Fullstack Text-to-Image Generative AI App You Can Build in an Afternoon
As a fun project for a hot summer weekend, I built a minimal fullstack text-to-image Generative AI web app with the Stable Diffusion model (deployed through Amazon SageMaker JumpStart), FastAPI for the web backend, and React for the frontend.
Here let me walk you through step by step 😆.
Deploy a Model Endpoint
To deploy the Stable Diffusion model, I took a shortcut to use SageMaker JumpStart.
First, find Stable Diffusion 2.1 base pre-trained on LAION-5B under SageMaker JumpStart’s “Foundation Models: Image Generation” ML task.
It can be deployed by one click, so let’s do it!
When the endpoint is ready in service, you can click “Open notebook” to see and play around with the sample code. The notebook will open in SageMaker Studio but if you’re more comfortable playing around in your local VSCode and Jupyter server (like me), just set up your AWS credentials locally and download the notebook.
There are three key helper functions provided in the sample notebook that lets you send prompt to the Stable Diffusion model and get back the generated image. Just use them and swap in your own prompt to have some fun:
# response = query_endpoint("cottage in impressionist style")
response = query_endpoint("a cat astronaut fighting aliens in space, realistic, high res")
img, prmpt = parse_response(response)
# Display hallucinated image
display_image(img,prmpt)
Build the API
A Fast and Minimal Start
Now that the model endpoint works as intended, let’s build a minimal web API to forward prompts that our web app frontend will send.
Since it’s a minimal project, in the project root I just created one folder for the api(/api) and one for the UI (/ui).
For the API, I’ve been wanting to try out FastAPI for a long time since it comes with a Swagger UI that documents your API routes out-of-the-box. Let’s get it following the official guide:
# Install fastapi as well as the ASGI server Uvicorn
$ pip3 install fastapi
$ pip3 install uvicorn[standard]
Then create our project's folder with main.py
file:
from typing import Union
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
def read_root():
return {"Hello": "World"}
@app.get("/items/{item_id}")
def read_item(item_id: int, q: Union[str, None] = None):
return {"item_id": item_id, "q": q}
Run the uvicorn
dev server in the /api folder:
$ uvicorn main:app --reload
As promised, the API is running at http://127.0.0.1:8000
and your API Swagger doc UI is accessible at http://127.0.0.1:8000/docs
, instantly 🚀!
Connect API to the Model
Now it’s time to hook up our API and the model endpoint. Let’s add a route and its handler in the main.py
below the example routes:
# api/main.py
@app.get("/generate-image")
def generate_image(prompt: str):
image, prmpt = utils.parse_response(utils.query_endpoint(prompt))
print(image)
return {"out": "yeah"}
To make it work, I first took the example functions query_endpoint()
and parse_response()
from SageMaker JumpStart’s sample notebook and packed them into a utils.py
file, then imported and used them in the route handler. As I don’t know in what format would the generated image be sent back, I first just print it to console and send a dummy JSON as my API response that shouts out “yeah”.
After saving the main.py
file, we can see that the new route /generate-image
appears in the Swagger doc UI automatically. Since our handler expects a string parameter prompt
, Swagger conveniently provides a text input UI for us try it out!
When I typed the prompt “A unicorn astronaut” and hit “Execute”, the model did send back the generated image. But what does it look like? Well, it’s an array of RGB channel values of each pixel in the image! Not something for our web app to show yet, but it’s an image!
Process and Save the Image
To turn the magic pixel numbers into an image that a human can see, I used numpy and PIL(Pillow). For a quick test (and as we haven’t built our frontend yet), I just added another utility function to convert the pixels into a numpy array and then an Image object, which is then saved to disk.
# api/utils.py
from PIL import Image
import numpy as np
# ...
def save_image(pixels):
arr = np.array(pixels, dtype=np.uint8)
img = Image.fromarray(arr)
img.save("new.png")
In our /generate-image
route’s handler, now we can just plug in the image pixel array from response to this new utility function, and sent the prompt as the API response for now.
# api/main.py
# ...
@app.get("/generate-image")
def generate_image(prompt: str):
image, prmpt = utils.parse_response(utils.query_endpoint(prompt))
utils.save_image(image)
return {"prompt": prmpt}
Let’s test it out in the Swagger with a new prompt “A unicorn astronaut in space, full body from side”. And here is what I found in my api folder — Stable Diffusion did give me a unicorn in space!
Support CORS
Before we get to the frontend, there’s one last thing we need to do in the API: to support CORS so that the browser will allow our frontend app make AJAX calls to our API server.
The simplest way to do this is by adding to our API app the FastAPI’s CORSMiddleware:
# api/main.py
from fastapi.middleware.cors import CORSMiddleware
# Support CORS
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
Build the UI
Now it’s time for the face of our minimal app!
Scaffolding
I haven’t touched React for years but heard that it’s still the go-to frontend framework. This time, though, instead of the classical CRA, I’m going with the lighter and faster alternative, Vite.
For UI components, I tried Chakra following its nice guide for working with Vite.
$ npm i @chakra-ui/react @emotion/react @emotion/styled framer-motion
After installation, just sneak in the <ChakraProvider>
tags in the React root:
// ui/src/main.tsx
import { ChakraProvider } from "@chakra-ui/react";
ReactDOM.createRoot(document.getElementById("root")!).render(
<React.StrictMode>
<ChakraProvider>
<App />
</ChakraProvider>
</React.StrictMode>
);
A Minimal UI
Our web app does one simple thing for the user: it takes in a text prompt and shows the generated image. So the minimum we need here are three UI components:
- a text input to enter the prompt
- a submit button to send the prompt
- an image component to render the generated image
With Chakra-UI’s ready-made components Input
, Button
, and Image
, it’s a piece of cake to make a properly nice-looking UI. For layout I used a Container
and an InputGroup
, then tweaked some margins here and there. Of course, we also need a nice Heading
just for completeness.
// ui/src/App.tsx
// ...
return (
<Container maxWidth={"2xl"} marginTop={30} centerContent>
<Heading margin={8}>Image Generator</Heading>
<InputGroup>
<Input
pr="4.5rem"
value={prompt}
placeholder="Enter your prompt"
onChange={onInputChange}
/>
<InputRightElement width={"6rem"}>
<Button onClick={onButtonClick} isDisabled={isLoading}>
Generate!
</Button>
</InputRightElement>
</InputGroup>
{imgSrc ? (
<Box boxSize={"l"} marginTop={5}>
<Image src="https://picsum.photos/640" />
</Box>
) : null}
</Container>
);
To tweak the image size and position I used a dummy image service https://picsum.photos/, asking it for a 640px wide dynamic image. And here is everything put together:
UI States, Event Handlers and the API Call
Now it’s time to make our UI work! There are three pieces of dynamic states in our minimal UI:
- the
prompt
text, which is bound to the text input’s value - the
imgSrc
, a piece of data that should come from the backend, and also dictates whether to render the Image UI component at all - a
isLoading
state for abetter user experience. When it’s set totrue
, I want to disable the submit button and show a loader, so that the user knows our app is working hard to get that image generated!
// ui/src/App.tsx
function App() {
// states
const [prompt, setPrompt] = useState("");
const [isLoading, setIsLoading] = useState(false);
const [imgSrc, setImgSrc] = useState("https://picsum.photos/640"); // we'll replace the initial value to null later so that the image component is not rendered when there is no imgSrc
//...
}
As for event handlers, we care about two events in our UI
- the
change
event of the text input, which updates theprompt
state every time the user change anything in the text input - the
click
event of the submit button, which apparently triggers the API but also updates theisLoading
state accordingly
For the former we define the onInputChange
event handler as follows:
// ui/src/App.tsx
const onInputChange = (e: ChangeEvent<HTMLInputElement>) => {
setPrompt(e.target.value);
};
For the latter we define the onButtonClick
event handler. It’s async since it needs to call the API and wait for the response. I also wrapped the API call with some error handling and added the logic to update the isLoading
state at the right timing:
// ui/src/App.tsx
const onButtonClick = async () => {
setIsLoading(true);
try {
const response = await (
await fetch(`http://127.0.0.1:8000/generate-image?prompt=${prompt}`)
).json();
console.log(response);
} catch (err: any) {
// catch any runtime error
console.log(err.message);
} finally {
setIsLoading(false);
}
};
And to show the user when we’re loading, let’s add a spinner below the text input:
// ui/src/App.tsx
// ...
</InputGroup>
{isLoading ? (
<Spinner
thickness="4px"
speed="0.65s"
emptyColor="gray.200"
color="blue.500"
size="xl"
marginTop={6}
/>
) : imgSrc ? (
<Box boxSize={"l"} marginTop={5}>
{/* <Image src={`data:image/png;base64,${imageData}`} /> */}
<Image src={imgSrc} />
</Box>
) : null}
</Container>
At this stage, submitting a prompt at the UI (clicking the “Generate!” button) actually calls the API and triggers the image generation. However, our API is not yet sending the image back in the response. Instead, it’s saving the image for itself!
So let’s fix that.
Backend and Frontend Integration
This is the final missing piece. We need to do something both at the backend and at the frontend, so that our user can finally see the image generated with their prompt.
Send Back the Image in the API Response
To get the image out of the API, we’ll send the image data as a base64 encoded string. Our API response JSON would then look like this:
{
"prompt": prmpt, // the prompt string from user
"img_base64": img_str // the generated image as a base64 data string
}
I’d still want to save the generated image somewhere in the server, so I rearranged the utility functions as follows:
First, a general utility function to convert pixel arrays from the Stable Diffusion model into a PIL Image:
# utils.py
def pixel_to_image(pixel_array):
arr = np.array(pixel_array, dtype=np.uint8)
img = Image.fromarray(arr)
return img
Second, a function to save the image in the generated_image
folder on the server:
# utils.py
def save_image(img, filePath="generated_images/new.png"):
img.save(filePath)
Third, a function to convert a PIL Image to a base64 encoded image data string, which we can send in our response for the frontend to show:
# utils.py
def image_to_base64_str(img, format="PNG"):
buffered = BytesIO()
img.save(buffered, format=format)
img_str = base64.b64encode(buffered.getvalue())
return img_str
Now with these utility functions, our API route handler can do its job:
# main.py
@app.get("/generate-image")
def generate_image(prompt: str):
pixel_array, prmpt = utils.parse_response(utils.query_endpoint(prompt))
image = utils.pixel_to_image(pixel_array)
utils.save_image(
image, filePath=f"generated_images/{str(datetime.datetime.now())}.png"
)
img_str = utils.image_to_base64_str(image)
return {"prompt": prmpt, "img_base64": img_str}
In order not to overwrite the new.png
every time, I used the timestamp at which the image is saved as the image name.
Show the Image in Frontend
Now that our API sends back the image data, let’s update the frontend’s API handling logic to extract it and feed it to the imgSrc
state:
// App.tsx
const onButtonClick = async () => {
setIsLoading(true);
try {
const response = await (
await fetch(`http://127.0.0.1:8000/generate-image?prompt=${prompt}`)
).json();
console.log(response);
const lastPrompt = response["prompt"];
const imgBase64 = response["img_base64"];
setImgSrc(`data:image/png;base64, ${imgBase64}`);
} catch (err: any) {
// catch any runtime error
console.log(err.message);
} finally {
setIsLoading(false);
}
};
Using the base64 image data string as the src
of an HTML <img>
tag not only works just fine as using an image URI, it also saves the user one network round trip to fetch the image 😌.
Now let’s prompt the model to generate a llama family image in the wild.
Voila! 🦙
Disclaimer: for simplicity I put all the API routes and handlers in main.py
, as well as all the UI components and logics in App.tsx
. For any production grade project that you need to maintain, you would want to organize and modularize your code properly.
Conclusion
Here I walked you through how to build a minimal text-to-image generative AI web app, composed of a Stable Diffusion model deployed through SageMaker JumpStart, a Python API backend built with FastAPI, and a frontend web app built with React (Vite + Chakra-UI). It’s pretty nice to get a fullstack generative AI web app up and running this quick and easy, isn’t it?
You can find all the application code in my GitHub repo.
However, both the backend and the frontend are just running on the local dev servers. What’s more, the one-click deployed model endpoint is billed by the seconds no matter you call it or not. So just delete the model endpoint for now and in future articles I’d like to show you how to:
- build and release the frontend (not sure where to, yet)
- package and release the local API to either a containerized application or a serverless backend with AWS Lambda and Amazon API Gateway
- deploy a serverless model endpoint that only charges you when you actually use it
Acknowledgement
This project was inspired by this amazing YouTube video. Salute to the brilliant creator Nicholas Renotte!