More models, prompt engineering, and enhanced bulk testing and evaluation
We are thrilled to announce the release of AI Toolkit VS Code extension February update - marking a major milestone to empower AI engineers and developers. This version introduced new features and updates with the latest popular AI models, improved tools for prompt engineering, and expanded capabilities in bulk testing and evaluation.
Here’s a closer look at the new additions in the AI Toolkit February release:
More AI Models at Your Fingertips
Our latest release brings support for additional AI models to suit a wide range of applications. You can now explore the following new models directly from the toolkit:
- DeepSeek-R1 Model: The latest open-source model that has drawn broad attention in the AI community.
- GitHub-hosted o3-mini Model: OpenAI's most recent reasoning model for efficient AI experiments.
- Google Gemini 2.0 Models: Advanced models built for high-complexity tasks and outputs.
- Anthropic Claude 3.5 Haiku Model: Anthropic’s fastest model, delivering advanced coding, tool use, and reasoning.
Introducing Prompt Builder
We understand that prompt engineering is foundational to creating powerful AI interactions, and we hope this new feature makes this process a little easier.
- Create, Edit, and Test Prompts: Tailor prompts in an intuitive and user-friendly way.
- AI-Generated Prompts: Simply describe your project idea in natural language and let the AI-powered feature generate prompts for you to experiment with.
- Structured Output Support: Design prompts to deliver outputs in a structured, predictable format.
- Code generation for prompt interactions: After experimenting with models and prompts, get into coding right away by viewing ready-to-use Python code automatically generated.
Playground Improvements
Key updates to test and debug AI outputs in the Model Playground include:
- Refined Deepseek-R1 Thought UI: A polished interface for exploring model's thought process.
- Enhanced Markdown and LaTeX Rendering: Clean and reliable rendering of complex model outputs such as Markdown documents and LaTeX equations.
Bulk Run Upgrades
The Bulk Run feature enables testing multiple input-output scenarios with different models. The February update added the following capabilities:
- AI-powered Dataset Generation: Looking to curate a dataset for evaluation? Let the new Dataset Generation feature do that for you - bring a sample dataset or start from scratch. We also know that the current version isn't perfect, and we would love for you to try it out and share your feedback as we improve the experience going forward.
- Structured Output Support: Manage pattern-based or hierarchical data outputs with ease.
Evaluation now supports custom evaluators
February Update enables the use of custom evaluators refine your AI evaluations. You can:
- Incorporate custom evaluation logic using Python code.
- Leverage LLM-driven evaluation using tailored Prompt Logic.
Share your feedback
With the additions in the February update, AI Toolkit continues to evolve, putting greater control and creativity in your hands.
Get started with the latest version, share your feedback, and let us know how these new features help you in your AI development journey. As always, we’re here to listen, collaborate, and grow alongside our amazing user community.
Thank you for being a part of this journey—let’s build the future of AI together!