Changelog

New features, improvements, and fixes shipped to PromptCask.

Smart Collections, Scheduled Evaluations & More

Feature

We are excited to introduce three major new capabilities that make it easier for teams to organize, test, and iterate on their prompt libraries at scale.

Smart Collections

Manually organizing prompts into folders works for small teams, but as your library grows past a few hundred prompts it becomes unwieldy. Smart Collections solve this by letting you define dynamic groups based on rules — filter by tags, model, creator, status, last-run date, or any combination. Prompts are automatically added or removed as they match your criteria.

For example, you can create a "Production GPT-4o Prompts" collection that always reflects the current set of active prompts targeting GPT-4o, without anyone needing to manually curate it. Collections update in real time as prompts are published, archived, or re-tagged.

Scheduled Evaluations

You can now schedule evaluation runs on any collection or individual prompt. Set a daily, weekly, or custom cron schedule, choose your evaluation criteria (accuracy, tone, format adherence, latency budget), and PromptCask will automatically run your prompts against a test dataset and surface any regressions.

Results are tracked on a per-version timeline so you can see exactly when a model update or prompt edit caused a quality shift. Alerts are sent to Slack, email, or your webhook of choice when scores drop below your defined thresholds.

Other Improvements

  • Bulk tag editor: Select multiple prompts and add, remove, or replace tags in one action.
  • Version comparison view: Side-by-side diff of any two prompt versions with highlighted changes.
  • Workspace activity feed: A real-time stream of edits, publishes, and evaluation results across your workspace — great for keeping distributed teams in sync.

Approval Workflows, Guardrail Templates & SLA Tracking

Feature

This release focuses on governance and reliability features that enterprise teams have been requesting. Whether you are managing prompts for customer-facing applications or internal tools, these additions give you the controls needed to ship with confidence.

Approval Workflows

Prompts can now go through a configurable approval flow before being published to production. You define the stages — for example, peer review then manager sign-off — and assign reviewers per stage. Reviewers receive a notification with a rendered preview of the prompt, a diff from the previous version, and one-click approve or request-changes actions.

Approval workflows are optional per workspace and can be scoped to specific collections, so your experimental prompts can move fast while production-critical ones get the oversight they need.

Guardrail Templates

We have added a library of pre-built guardrail templates that you can attach to any prompt. Guardrails run automatically on every execution and flag outputs that violate your rules. The initial template library includes:

  • PII Detection: Flags outputs containing names, emails, phone numbers, or addresses.
  • Tone Enforcement: Ensures outputs match your specified tone profile (professional, casual, empathetic, etc.).
  • Factual Grounding: Checks that claims in the output are supported by the provided context.
  • Length Limits: Enforces minimum and maximum token counts on responses.
  • Brand Compliance: Validates outputs against your brand voice guidelines and terminology lists.

You can also write custom guardrails using our expression language or by providing a judge prompt that evaluates outputs.

SLA Tracking Dashboard

The new SLA dashboard gives you a real-time view of prompt performance against your defined service level objectives. Track p50, p95, and p99 latency, success rate, and guardrail pass rate across all your prompts. Set targets per prompt or collection and receive alerts when you are trending toward a breach.

The dashboard includes a 30-day rolling view with drill-down into individual execution logs, making it straightforward to investigate incidents and identify root causes.

Performance Improvements: Faster Search, Leaner Payloads & Dashboard Speedups

Improvement

This release is all about speed. We profiled the most common workflows across our customer base and identified several areas where we could significantly reduce latency and improve the overall experience.

Semantic Search — 3x Faster

Our semantic search endpoint now returns results up to three times faster than before. We migrated our vector index from a single-region deployment to a distributed architecture that keeps embeddings closer to your query origin. For workspaces with fewer than 10,000 items, most queries now resolve in under 100 milliseconds.

We also improved relevance ranking by incorporating recency signals and usage frequency alongside cosine similarity. Prompts that your team uses often and were recently updated will surface higher in results, even if a less-used prompt has a marginally closer embedding match.

API Response Payload Optimization

All list endpoints now support a fields query parameter that lets you request only the attributes you need. For integrations that only need prompt IDs and names — for example, populating a dropdown in your application — this can reduce payload sizes by over 80 percent. The full response shape remains the default, so this is a non-breaking change.

We also switched our JSON serialization library, shaving roughly 15 percent off response generation time for large result sets.

Dashboard Rendering

The prompt editor and execution log views now use virtualized rendering for long lists. Workspaces with thousands of prompts will see initial load times drop from several seconds to under 500 milliseconds. Scrolling through large execution log tables is now smooth even on lower-powered devices.

Infrastructure

  • Upgraded to PostgreSQL 16 with improved parallel query execution for complex analytics queries.
  • Moved static assets to a global CDN edge network, reducing asset load times by 40 percent for users outside North America.
  • Reduced cold start times for serverless API functions by 60 percent through dependency tree optimization.

Bug Fixes: Editor Stability, Auth Edge Cases & Export Reliability

Fix

This release addresses a collection of bugs reported by the community and surfaced through our internal monitoring. We appreciate everyone who took the time to file detailed reports — it makes a real difference in how quickly we can track down and resolve issues.

Editor Fixes

  • Cursor jump on paste: Fixed an issue where pasting text into the prompt editor would occasionally cause the cursor to jump to the beginning of the document. This was caused by a race condition in the Tiptap synchronization layer where the selection state was being reset before the paste transaction completed. The fix ensures selection updates are deferred until after the content transaction is fully applied.

  • Variable autocomplete dismissal: The {{variable}} autocomplete popup would sometimes remain visible after pressing Escape if the user had scrolled the editor while the popup was open. The popup position is now recalculated on scroll and correctly dismissed on all close triggers.

  • Undo across variable boundaries: Using Ctrl+Z to undo edits that spanned a variable placeholder would sometimes corrupt the placeholder syntax, leaving orphaned {{ or }} tokens. The undo history now treats variable placeholders as atomic units.

Authentication Edge Cases

  • Session refresh during long edits: Users who kept the editor open for extended periods (over one hour) would occasionally see a "Session expired" error on save. The token refresh logic now proactively renews sessions 10 minutes before expiry rather than waiting for an API call to fail.

  • SSO redirect loop on mobile Safari: Fixed an infinite redirect loop that occurred when signing in via SAML SSO on mobile Safari with Intelligent Tracking Prevention enabled. The fix adds explicit SameSite cookie attributes and uses a server-side relay page for the OAuth callback.

Export and Import

  • CSV export encoding: Exporting prompts to CSV now correctly handles Unicode characters in all fields. Previously, prompts containing emoji or non-Latin scripts would produce garbled output in some spreadsheet applications. We switched to UTF-8 with BOM encoding, which is recognized by Excel, Google Sheets, and Numbers.

  • JSON import validation: The bulk import endpoint now provides specific, actionable error messages when a JSON file contains invalid entries, rather than rejecting the entire file with a generic error. Each invalid entry is reported with its line number and the specific validation failure.

Other Fixes

  • Fixed a layout shift in the execution log table when the latency column contained values over 10 seconds.
  • Resolved an issue where duplicate webhook deliveries could occur if the target server returned a 200 status with an empty body.
  • Fixed workspace invitation emails not being sent when the inviter's display name contained special characters.