Using Gemini “Google’s Most capable AI model” with Golang

Google recently launched its AI Model to beat every other LLM model. We saw many amazing videos showing the mind-blowing capabilities of the Google Gemini AI. Google has launched its Google Gemini Pro which is available for us to use.

It has many promising features. Let’s make Gemini do something for us. We will be integrating Gemini-pro-vision (the best image handling model according to Google) with Golang.

What We are Going to Do?

So, to check the capabilities of Google Gemini AI, we have a task in mind. We will be giving Google Gemini three images and asking it to create a story out of them. It should first recognize the image and write a story.

So, we have three photos here; first is Earth, then Narendra Modi, and then Donald Trump.

We will be using these photos to ask Google Gemini AI  to write a story.

Getting Started with Golang and Google Gemini AI

To use Google Gemini with Golang, we need to get the API Key from Google AI Studio. Once you have the API key, we can get things started.

Make sure you have Golang installed in your system, then create an empty directory. Go to the newly created directory.

To add the Google AI SDK to the Golang Application, we will use: go get github.com/google/generative-ai-go, and if you want to read the detailed documentation of this, you can read through: https://pkg.go.dev/github.com/google/generative-ai-go/genai

Add your API KEY in the .env file, and to load the .env file, we have used: https://github.com/joho/godotenv

// Load environment variables
err := godotenv.Load()
if err != nil {
    log.Fatal("Error loading .env key")
}
// Retrieve API key
api_key := os.Getenv("API_KEY")
// Initialize Gemini Pro Vision client
ctx := context.Background()
client, err := genai.NewClient(ctx, option.WithAPIKey(api_key))
if err != nil {
    log.Fatal(err)
}
defer client.Close()
// Connect to Gemini Pro Vision model
model := client.GenerativeModel("gemini-pro-vision")

Once we are done with loading the environments and retrieving the API Key, we need to initialize the Gemini Pro Vision Client and connect it with the model.

Crafting a Stimulating Prompt

   img1, err := os.ReadFile("images/earth.jpeg")
    if err != nil {
        log.Fatal("Error Loading img1")
    }
    img2, err := os.ReadFile("images/modi.jpeg")
    if err != nil {
        log.Fatal("Error Loading img2")
    }
    img3, err := os.ReadFile("images/trump.jpeg")
    if err != nil {
        log.Fatal("Error Loading img3")
    }
    prompt := []genai.Part{
        genai.ImageData("jpeg", img1),
        genai.ImageData("jpeg", img2),
        genai.ImageData("jpeg", img3),
        genai.Text("First identify the pictures then, Write a story out of it "),
    }

To stimulate the prompt, first, we need to load the images and add them to our prompt Gemini. As Gemini is a multimodel, it can have both the image and the text as the prompt at the same time.

Now, we just have to generate the response from the Gemini:

resp, err := model.GenerateContent(ctx, prompt...)
if err != nil {
    log.Fatal(err)
}

Parsing and Printing The Generated Response

The generated content needs to be parsed and printed:

marshalResponse, _ := json.MarshalIndent(resp, "", "  ")
fmt.Println(marshalResponse)

If you run the code, it will give you the response, but the response you will get is in JSON, so for that, we need to modify the code to get the specific part of the response. So, after analysing the response, we have made some structs and used them to get the response from the specific part.

Here is the full code:

package main
import (
    "context"
    "encoding/json"
    "fmt"
    "log"
    "os"
    "github.com/google/generative-ai-go/genai"
    "github.com/joho/godotenv"
    "google.golang.org/api/option"
)
type Content struct{
    Parts []string `json:Parts`
    Role string `json:Role`

type Candidates struct {
    Content *Content `json:Content`
}
type ContentResponse struct{
    Candidates *[]Candidates `json:Candidates`
}
func main() {
    //Load .env key
    err := godotenv.Load()
    if err != nil {
        log.Fatal("Error loading .env key")
    }
    api_key := os.Getenv("API_KEY")
    //initializing the gemini
    ctx := context.Background()
    client, err := genai.NewClient(ctx, option.WithAPIKey(api_key))
    if err != nil {
        log.Fatal(err)
    }
    defer client.Close()
    model := client.GenerativeModel("gemini-pro-vision")
    img1, err := os.ReadFile("images/earth.jpeg")
    if err != nil {
        log.Fatal("Error Loading img1")
    }
    img2, err := os.ReadFile("images/modi.jpeg")
    if err != nil {
        log.Fatal("Error Loading img2")
    }
    img3, err := os.ReadFile("images/trump.jpeg")
    if err != nil {
        log.Fatal("Error Loading img3")
    }
    prompt := []genai.Part{
        genai.ImageData("jpeg", img1),
        genai.ImageData("jpeg", img2),
        genai.ImageData("jpeg", img3),
        genai.Text("First identify the pictures then, Write a story out of it "),
    }
    resp, err := model.GenerateContent(ctx, prompt...)
    if err !=nil{
        log.Fatal( err)
    }
    marshalResponse,_ := json.MarshalIndent(resp,"","  ")
    fmt.Println(string(marshalResponse))
    var generateResponse ContentResponse
    if err := json.Unmarshal(marshalResponse, &generateResponse); err !=nil{
        log.Fatal(err)
    }
    for _, cad := range *generateResponse.Candidates{
        if cad.Content !=nil{
            for _, part := range cad.Content.Parts{
                fmt.Print(part)
            }
        }
    }
}

Results

Finally, we can see the results:

The picture of the earth is a symbol of our planet and home. It is a reminder that we are all connected and that we need to take care of our planet. In the picture, Narendra Modi is the Prime Minister of India. He is a symbol of hope and change for many people in India. He is working to improve the lives of people in India and to make India a more prosperous country. The picture of Donald Trump is the President of the United States. He is a symbol of power and authority. He is working to make America great again. The story of these three pictures is a story of hope, change, and power. It is a story about the power of one person to make a difference in the world. It is a story about the power of hope to overcome adversity. It is a story about the power of change to make the world a better place.

Nice, right? It generated a pretty good response and did its job well!

Try it yourself and experiment with the capabilities of the model.

Conclusion

We have explored the capabilities of Gemini, and we think it is pretty good. Integrating it with Golang opens up new possibilities; by combining image data and a text prompt, the code demonstrates how to generate creative content, opening up possibilities for storytelling and creative applications using Generative AI in the Go programming language. If you want to check out the source code refer here.

 For more informational blogs and practical advice, visit CloudZenia, a literal hub of your tech solutions.

Jan 09, 2024