Photo by Milada Vigerova on Unsplash

From PaLM to Gemini with Flutter’s new AI Dart SDK

Sylvia Dieckmann

--

In September 2023 I built a small Flutter app to demonstrate the integration of the PaLM API into any mobile app. On the surface, the app WineSnob generated whimsical but not always accurate wine-tasting notes. Under the hood, the app was built with Flutter and used a REST API to prompt the PaLM model.

At the time PaLM was Google’s largest and best large language model (LLM) and while it didn’t include dedicated support for Flutter and Dart yet, the REST API was available and could be used from any platform. Interacting with a backend via asynchronous HTTP requests is a common task for Flutter engineers and so I typically spent more time in my presentations analyzing the quality of the generated tasting notes than explaining the intricacies of the technical implementation.

Since the publication of the original WineSnob project, Google has released several new products in the AI space and renamed a few more. The PaLM foundation model is replaced with the Gemini family of foundation models. Gemini Pro Vision supports multi-modal input and can handle a combination of text, image, and video prompts. Gemini 1.5 has a mind-blowing 1m token context window and can process huge documents such as entire books or videos. Imagen 2 will soon be available for image generation after some rework. Finally, some existing products got rebranded: MakerSuite is now Google AI Studio and Bard has become Gemini. (Yes, that one is confusing …)

But most importantly for my project, Gemini now comes with a dedicated Google AI Dart SDK. No more REST API calls with prompt parameters packaged as JSON payload. Instead, we can now interact with the model directly from our dart code with type-checked initializers to set up model, inputs, and responses. This cuts down on common errors such as mistyped model parameters or mismatched parentheses but it also helps to make multi-modal input much more feasible.

It will take me a while to explore all the new developments properly and I am planning to break my findings into several smaller posts. But today I want to start with the obvious first challenge:

Replace REST API Model Call with the New Google AI Dart SDK

Challenge: Refactor the WineSnob app to use google_generative_ai, the new AI Dart SDK. Stick with text-only input for now (keep multi-modal input for a later post) and don’t stress about cleaning up past feature creep.

Step 0: Update all packages to the latest versions

This step isn’t strictly necessary but it’s good practice to start any work on a legacy project by updating pubspec.yaml with the latest dependencies.

flutter pub outdated
flutter pub upgrade --major-versions --tighten [--dry-run]

Step 1: Add google_generative_ai to the code base

This is as simple as adding the new package to your pubspec.yaml. No further configuration is required.

flutter pub add google_generative_ai

Step 2: Replace PalmRepository with ModelRepository

All of the code concerning the REST API calls was bundled in a class called PalmRepository which had to be renamed and refactored to be less dependent on one model.

class ModelRepository {
ModelRepository({required this.apiKey, required this.modelName}) {
// initialize model
}

final String apiKey;
final String modelName;
late GenerativeModel model;

Future<List<String>> fetchResults(String description, Prompt prompt) async {
// query model
}
}

By the way, the parameter prompt does not represent the actual model input but rather a prompt context with metadata about the current configuration. Some cleanup is needed here.

Step 3: Initialize the model

This is done in the constructor of ModelRepository. At a minimum, the initialization needs the model name and a valid API key.

Since I am restricting myself to text-only input in this round I’ll be initializing the model with gemini-pro, which is a shortcut for gemini-1.0-pro.

  const GEMINI_PRO_VISION = 'gemini-pro-vision';
const GEMINI_PRO = 'gemini-pro';

// constructor
ModelRepository({required this.apiKey, required this.modelName}) {
model = GenerativeModel(
model: modelName,
apiKey: apiKey,
// the rest is optional
safetySettings: [],
generationConfig: GenerationConfig(
// for now gemini-pro seems to support only one candidate
candidateCount: 1,
temperature: 0.7,
maxOutputTokens: 1024,
),
);
}

In this step, you will really appreciate the new Dart SDK. Previously, all parameters had to be specified in JSON without semantic or type checks. As a result, typos and other mistakes were common. With the new SDK, all parameters are members of two configuration objects (SafetySettings and GenerationConfig), greatly cutting down on errors.

Step 4: Talk to the model

In this step, we are compiling the prompt. Since we support text-only input for now, this can be done with the static convenience method Content.text(prompt).

Note that generateContent requires a list of Content objects. This will become relevant as we move to multi-modal input but for now, the prompt list contains just one Content object with a TextPart component.

Future<List<String>> fetchResults(String description, Prompt prompt) async {
// Make sure the input string doesn't contain newlines or quotes
final sanitized =
description.replaceAll("\n", " ").replaceAll("\"", "'").trim();

// TODO: the context should come from prompt
final text_prompt =
"Write tasting notes for the ${sanitized}. The tasting notes should be"
"in the style of a wine critic and should mention the wine style, taste, "
"and production process.Keep the result to one paragraph.";

try {
// Convenience method for single shot text-only prompts
final content = [Content.text(text_prompt)];
final response = await model.generateContent(content);
// Result parsing will get more complex with multiple candidates
return [response.text ?? 'no result'];
} catch (error) {
throw Exception('Error on model.generateContent: $error');
}
}

The actual interaction with Gemini is hidden in the async call model.generateContent(content). The SDK then parses the result and provides a single response as a string.

By the way, result parsing by the AI Dart SDK still seems a bit unfinished and I expect to see some updates to the SDK once the model allows to query for multiple candidates. But all the code is public and as a developer, I can always decide to ignore the convenience method text() and parse the raw response myself.

Source Code

The WineSnob repo is public and you can find all changes discussed in this article in a single commit.

You can play with the latest iteration of the WineSnob app here.

Conclusions

Changing my app from accessing the legacy PaLM API via REST API calls to Gemini via native calls proved simple and I love the new SDK. Any challenges were entirely due to my earlier design choices and some feature creep in the WineSnob app.

The new SDK is a significant improvement over accessing the models via Rest API and json formatted parameters. Even for the simplest single-shot text prompt, the SDK helps cut down on annoying typos and formatting errors. And I haven’t even touched on more complex prompts. (Multi-shot, multi-modal, …)

Concerning the quality of the generated responses I did not see a huge difference between the PaLM and Gemini models. However, this is likely due to the fact that this experiment was never designed to test AI models but rather to demonstrate the technical aspects of an AI integration into Flutter.

Up Next: Multimodal Prompts

In my next article, I will have a closer look at some of the components of the new generative AI SDK, most notably Content(), GeneratContentResponse(), and Candidate().

My goal this time will be to add multi-modal capabilities to the WineSnob. Could adding a snapshot of your wine glass with a backdrop of the grapes in the vineyard avoid some of the hallucinations? Maybe multimodal inputs won’t lead to better tasting notes but they sure would make the app more interesting 😎

--

--

No responses yet