27% of AI Requests Get Blocked: New Tool Maps the Boundaries of ChatGPT, Grok, and 40 Other Models

In a field where capabilities get most of the attention, a new tool is shining light on what AI systems won’t say. SpeechMap.AI, created by developer “xlr8harder,” tests major AI chatbots against controversial topics to map their boundaries and refusals – creating what may be the first public comparative analysis of AI content policies in action.

SpeechMap dot AI is launching today.

We analyzed 65k+ responses from 34 AI models see what trends emerged. Some previews:
– Grok is the most permissive model, with 96% compliance.
– OpenAI's models are getting less permissive over time.

Details and link in reply. pic.twitter.com/QcN2cVxDJF
— xlr8harder (@xlr8harder) April 14, 2025

Table of Contents

Toggle

What SpeechMap Actually Tests

SpeechMap evaluates how 42 different AI models respond to sensitive prompts across 490 themes. Instead of testing capabilities, it specifically measures what models refuse to answer. The tool categorizes responses into three types: “complete” (full answers without hedging), “evasive” (partial or redirected answers), or outright refusals.

According to the site’s data from over 81,000 analyzed responses, 26.8% of requests were filtered, redirected, or denied across all models tested.

Perhaps most notable is that Elon Musk’s Grok 3 model tops the permissiveness chart with a 96.2% compliance rate – far above the global average of 71.3%.

The data reveals another interesting trend: OpenAI’s models have become less willing to answer political prompts over time, though their newest GPT-4.1 models show a slight reversal of this pattern. This aligns with OpenAI’s February statements about tuning models to provide multiple perspectives rather than taking editorial stances.

The Technical Challenges of Testing AI Boundaries

SpeechMap’s approach faces significant methodological hurdles. The developer acknowledges potential “noise” from provider errors and possible biases in the “judge” models that evaluate responses.

Testing AI responses to controversial topics requires developing prompts that hit similar sensitivity thresholds across different content areas. A question about political criticism may trigger different guardrails than one about civil rights, making direct comparisons difficult. The system also can’t account for subtle evasions where models appear to answer but actually sidestep the core request.

Model responses can vary based on phrasing, making it challenging to create truly comparable tests. A slight rewording might produce dramatically different results from the same model, raising questions about how representative single-prompt testing can be.

Why These Boundaries Matter Now

As language models become integrated into everyday tools like search engines, writing assistants, and research platforms, their built-in limitations shape what information users can access. When models refuse to address certain political viewpoints or critical questions about governments, they create invisible barriers to information.

Consider a student researching protests for a civics paper, or a journalist fact-checking claims about controversial historical events. If AI systems consistently decline to provide information on these topics, they effectively place these subjects outside the bounds of normal discussion – particularly problematic as these tools become primary information sources.

The “compliance rate” differences between models represent real-world consequences for users. Someone using Grok 3 will have a fundamentally different experience discussing controversial topics than someone using more restrictive models.

Beyond Politics: The Full Spectrum of AI Limitations

While much of the discussion around AI content policies focuses on politics (with accusations of “woke” bias from some quarters), SpeechMap’s testing covers broader territory, including:

Religious and moral arguments
Satirical takes on leadership
Questions about civil rights and protest
Topics that vary in sensitivity by country or region

These tests reveal that politics is just one dimension of AI content restrictions. Models often have unique patterns of refusal that cross ideological lines, with some willing to discuss politically charged topics but refusing certain religious discussions, or vice versa.

Who Benefits From This Transparency

SpeechMap’s public dataset serves multiple stakeholders:

For developers building with AI APIs, understanding model limitations helps predict when systems might refuse user requests. This knowledge is crucial for designing robust applications that handle content restrictions gracefully.

For researchers studying AI alignment and safety, these datasets provide empirical evidence of how theoretical content policies translate into practical limitations. This helps bridge the gap between AI ethics research and real-world implementation.

For business users evaluating which AI platforms to adopt, this data reveals significant operational differences between seemingly similar models. A company creating a customer service chatbot, for example, would need to know which topics might trigger refusals.

For journalists covering AI technologies, SpeechMap provides concrete examples of how models differ, moving beyond vague claims about models being more or less restricted.

The Future of AI Speech Boundaries

As AI models continue evolving, the boundaries of what they will and won’t say remain in flux. OpenAI’s apparent shift toward less permissive responses to political prompts, followed by a slight reversal with GPT-4.1, highlights how these boundaries can change with each model update.

With President Trump’s allies arguing for less restrictive AI models and his administration featuring prominent AI industry figures like Elon Musk, we may see regulatory pressure pushing toward more permissive models. At the same time, concerns about AI systems spreading misinformation could push in the opposite direction.

SpeechMap creates a foundation for tracking these changes over time. By establishing baseline measurements now, researchers can identify shifts in AI content policies as they happen – potentially revealing the impacts of corporate decisions, public pressure, or regulatory changes.

What Users Should Know

For regular users of AI chatbots, SpeechMap’s findings highlight several key points:

Different models have drastically different content policies. The chatbot you choose determines what information you can and cannot access.

Models change over time, sometimes becoming more restricted in certain areas while opening up in others. The capabilities you have today might not be the same tomorrow.

No model answers everything. Even the most permissive model (Grok 3) still refused 3.8% of test prompts.

Regional variations exist, with some models blocking different content depending on the user’s location.

Critical Questions About AI Speech

SpeechMap raises questions that anyone using AI systems should consider:

Who decides what AI can and cannot say? Currently, these decisions happen primarily within companies, with limited public input or oversight.

Should language models apply consistent rules across all topics, or are some subjects rightfully treated differently?

As AI becomes embedded in more tools, how much should users know about what their AI systems refuse to discuss?

Is transparency about AI limitations enough, or should users have more control over the boundaries of AI speech?

As you interact with AI systems, pay attention to what they won’t say. These boundaries, often invisible until you hit them, shape your experience as much as the capabilities that get all the attention.

Looking Forward

As AI systems grow more powerful and widespread, tools like SpeechMap will become increasingly vital for understanding how these technologies shape our information landscape. By making AI boundaries visible and comparable, we can have informed discussions about what these limits should be – before they silently shape what we can ask and learn.

The most valuable aspect of SpeechMap may be simply making these boundaries visible. You can’t debate what you can’t see, and until now, most AI speech limitations remained hidden until users accidentally triggered them.

What will you discover if you visit SpeechMap and explore the boundaries yourself?