GAO Warns Protesters of AI Sanctions

A Warning

Protesters beware—the Government Accountability Office (GAO) wants you to know they can and will sanction misuse of Generative Artificial Intelligence (GenAI or AI). In the ninth decision¹ of its kind since May 2025, GAO dismissed a protest as a sanction for GenAI misuse. Specifically, GAO said, “many of the protester’s citations are simply non-existent or fabricated.”² In each GAO decision so far, the protesters have filed without legal counsel.³ GAO dismissed the first eight protests for reasons other than misuse of AI, but the ninth time was the charm.

Contractors should consider this a serious warning as they consider using AI to craft their protests. GAO maintains an “inherent right to dismiss any protest and to impose sanctions” if the protester uses AI to “undermine the integrity and effectiveness of our process.”⁴ In addition to some fundamental problems with GenAI, there are concerns about discoverability of AI chats in lawsuits and inadvertent disclosure of seemingly private GenAI conversations.⁵

Popular GenAI systems are built on Large Language Models (LLMs) that essentially predict the next word based on enormous data sets. These remarkable systems are very good at many things. But in the legal domain, they suffer from at least three stubborn problems: (1) factual hallucination; (2) sycophancy; and (3) overconfidence. In other words, AI is different from most lawyers: it seldom says “well, it depends.” More importantly, it has the unique trait to confidently make up information from whole cloth. It is this characteristic that is leading to a deluge of sanctions from various courts and now GAO. AI will quickly give you something that looks good and sounds like what you want. The second you dig in? You realize AI says a protest was sustained when it was denied. Or AI (inexplicably) gives the wrong case for a real protest. Or it completely fabricates information when it can’t find the support it needs to make you happy.

In the bid protest context, unhappy bidders draft a letter outlining the legal factual and legal grounds for their position. This includes supporting law, the Federal Acquisition Regulations, and legal precedent. It goes without saying that the citations submitted with any protest must be correct in form and support the position a protestor is asserting. When a lawyer submits a protest on behalf of their client, the lawyer is bound be the rules of professional conduct. That lawyer must take the time to ensure that what he or she is submitting is true. In any event, the protestor or their lawyer must make sure they are not misleading GAO by providing false information, including citations. To do otherwise is not only risking the success of the protest, but also risking sanctions and reputational damage.

Hallucination

First, AI has a strange and unique trait to create information. Suppose you’re talking with a lawyer and ask them an obscure legal question. They are unlikely to instantly respond with a perfectly plausible but entirely made-up reference to a case. Yet this is exactly what AI does on a regular basis. In my own practice, clients are now regularly sending me AI-generated documents. In each case, I find hallucinated case law, embedded with factual distortions and inaccurate summaries. From my perspective, the benefits of clients doing work with AI prior to engaging with me is outweighed by the time wasted untangling what the AI gets right from what it gets terribly wrong.

I commonly hear the refrain: “But even if there are minor problems, AI today is as bad as it will ever be.” The implication is that even if there are some kinks to work out, the problems will be resolved in short order. After all, technology improves exponentially, right? Not true. The newer and more complicated reasoning models, which have become ubiquitous, hallucinate significantly more than older models.⁶ This problem, in other words, has only gotten worse, and it shows no signs of improving. Some experts believe this is a fundamentally intractable problem with LLMs: “These systems use mathematical probabilities to guess the best response. . . [and] ‘[d]espite our best efforts, they will always hallucinate.’”⁷ OpenAI’s whitepaper on Why Large Language Models Hallucinate concluded that these are inevitable with current models, at least.⁸

One underreported issue here is the fact that citations do not just occur in the form of fabricated or inaccurate legal citations. Any fact or proposition can be hallucinated. The reason we hear about citations is because they are objective and verifiable. But rest assured, hallucinations are happening at every level of GenAI output, including basic facts and numbers.

This is bad news for laymen who lack domain-specific expertise because they are unlikely to detect when AI spits out something that smells fishy. This leads directly into the next problem, which is sycophancy.

Sycophancy

Second, people intuitively understand when you surround yourself with sycophants or “yes-men.” You lose a critical ability to detect when there are flaws in your own logic. AI is the ultimate sycophant, designed to flatter and please the user. If you don’t believe me, go ask your AI what your chat history says about you.⁹ If it says something critical about you, I’ll eat my hat.

OpenAI and other LLMs admit that this is a problem.¹⁰ In April, OpenAI wrote that after an update “GPT‑4o skewed towards responses that were overly supportive but disingenuous.” The issues caused the company to roll back the update and commit to building in more “guardrails to increase honesty and transparency⁠.” Tweaking model behavior on the back-end is all well and good, but right now baseline levels of sycophancy are very high. Moreover, it is in these companies’ best commercial interests to make LLMs sycophantic so that users like their AI and want to use it more.

Even with advanced prompting, experienced users are likely to get high levels of agreement from their chatbot with little critical pushback. And prompting and back-end fiddling are only marginally useful. These LLMs are so large and complex that they are difficult to direct. As a beta tester for legal GenAI tools, I’ve experienced serious failures with products that I brought to the platforms’ technical teams. They took my feedback but were unable to promise any improvements or real changes because these systems are a black box. To illustrate this point, consider a conversation I had with a C-Level executive a couple of months ago. No matter what sort of back-end prompting they tried, their AI wouldn’t stop telling its users “bless your heart.”

If you ask a lawyer about your odds for success on a case, they are likely to hedge because there is inherent uncertainty in litigation and there are many contingent facts, both known and unknown, that can intervene to change the outcome. A reasoned response from a lawyer to a client is based on that lawyer’s experience. A lawyer will tell a client what the client needs to hear. While a lawyer may want to please their client, the lawyer will (or at least should) give a thoughtful response, rather than a response that simply confirms what the client wants to hear.

Understandably, clients may prefer to hear that they are totally right and will easily win their case. For better or worse, GenAI will seldom question underlying assumptions. This sycophancy is part of what likely contributes to AI hallucination. You want a case to support your argument—but there is none. What is a “yes-man” to do? Well, make it up, of course!

Confidence

Third, an underlying limitation in GenAI is that it has extremely high confidence in its answers, even under direct questioning. I’ve had exchanges with GenAI where it gives me a suspect citation. After searching for the case (which looks amazing and supports my viewpoint perfectly), I came to the conclusion that it is a fake. If I challenge the AI and ask where it found the case, the response tends to go one of two ways. First, in some cases, the AI will repeatedly insist that the case is real with absolute confidence. This double-down is problematic for obvious reasons, but even worse when a layman (or a lawyer who is not familiar with the inherent issues of GenAI) is involved who can’t or won’t check the AI’s accuracy. The other option relates back to sycophancy: the AI will become incredibly apologetic, admit that it made up the citation, and try to account for its mistake. In some cases, this means searching for a substitute citation that often leads us back to square one: another fake case.

Here’s the issue: a word probability machine does not have the ability to generate confidence intervals.¹¹ GenAI is not currently capable of reliable self-reflection, uncertainty, or doubt. Moreover, these LLMs are trained to give an answer even if they are uncertain. Think of a test where guessing is not penalized. It is a basic fact that LLM models are trained to guess when uncertain.¹² Therefore, this is a feature, not a bug.

Further, these systems do not arrive at answers in the same way that humans do. It is very easy to anthropomorphize this technology when it presents as an intelligent mind. This is even easier when it tells you exactly what you want to hear. I understand that it can be maddening to hear a lawyer answer “it depends” to a seemingly straight-forward question. But unfortunately, often the answers AI gives are the equivalent of empty calories: all form and no substance.

Privacy

Fourth, on top of these three intractable problems are privacy, confidentiality, and related concerns. On the legal front, OpenAI CEO Sam Altman has made it clear that personal legal details should not be shared with ChatGPT because “[i]f you go talk to ChatGPT about your most sensitive stuff and then there’s a lawsuit, we could be required to produce that.”¹³ Altman is referring to an order by a judge in a copyright dispute between the New York Times and OpenAI, which means OpenAI cannot delete chats.¹⁴ Unlike attorney-client privilege, there is no established legal privilege between a chatbot and its user. Personally, I am interested in issuing a discovery request for “all relevant conversations with GenAI.” I do not see how an opposing party could justify withholding this information, and even if they deleted it¹⁵ OpenAI is telling us that these records now exist on the cloud and could be accessed by subpoena.

A more recent concern arose when Fast Company broke the story that seemingly private conversations with ChatGPT were searchable after being indexed by Google.¹⁶ Users have the option to “share” a link to a conversation by various means, including text. When users created this link, even when sharing with a single other person, the generated link was publicly viewable. As a result, it was indexed by Google and searchable by anyone on the web. OpenAI has since moved aggressively to change this feature and scrub the contents from Google.¹⁷ Although this “breach” has seemingly been resolved, many of these conversations have been eternally documented by websites like Archive.org.¹⁸

Both of these stories make me think that contractors will use GenAI without understanding the confidentiality, privacy, and privilege risks. Even if you have hired a lawyer, disclosing information to a third party like OpenAI could break attorney-client privilege. More importantly, these are probably just the start of data-related breaches.

What Next at GAO?

For the record, I am optimistic about the use of AI despite the criticisms outlined here. And notably, I do not believe that hiring a lawyer is a silver bullet to combat these concerns. There are countless cases of lawyers sanctioned for misuse of AI.¹⁹ In fact, now more than one judge has had to withdraw orders based on misuse of AI. But at least in the bid protest context, hiring a lawyer could be beneficial, especially if your lawyer has significant government contracts experience. At a minimum, regardless of whether you file yourself or with a lawyer, you need to verify any AI outputs in your protest.

I’ve been tracking these since the first decision came out in May. While most of these protests were probably doomed regardless, one might have had a chance with a lawyer on the team.²⁰ Another intriguing question is whether the protester would have even filed the protest without access to AI. On the one hand, maybe without AI this company never would have filed the protest in the first place, or perhaps they would have made the argument. But because they had access to and (unwarranted) confidence in AI, they decided to protest without a lawyer.

In any case, AI reliance is generally becoming much more normalized. In the latest decision, GAO was scathing in its dismissal of the protest as a sanction for AI misuse. The protester’s conduct was particularly egregious because it was warned twice before in prior decisions. Although they did not admit AI misuse, GAO argued the outcome (inaccurate or fabricated citations and facts) was the same. GAO said the protester’s “actions in these recent protests demonstrate its reckless disregard in advancing non-existent citations or decisions in its protest submissions.”²¹ I could not agree more.

The problems with relying on AI will persist, so there are likely to be an increasing number of decisions where AI misuse is an issue. I am aware multiple instances of AI misuse at GAO where no public decision was issued. I’m confident there are other examples that are not public. Look out as well for intervenors and agency counsel to attempt to dismiss protests solely based on inaccurate citations as standard operating procedure. The prospect of dismissal as a sanction has only made the incentive to find inaccuracies more intense.

Conclusion

In an attempt to save costs and time, protestors are increasingly relying on AI rather than hiring legal counsel. There is no shortcut, and to roll the dice on an overreliance on AI can create significant issues. You may have an excellent case. Do not let the issues set forth in this article torpedo what might otherwise be a good, clean win.

GAO does not like these false citations. And GAO is right. Tracking down false citations is a waste of time for everyone involved. Agency counsel are getting up to speed on this issue and are aggressively moving to dismiss based on AI usage. Ultimately protesters care about whether they can win their protest. If GAO sees these sort of issues in your protest, they are likely to downgrade your credibility. It does not matter if GAO dismisses your protest explicitly because it has hallucinated case law in it or because they treat the remainder of your argument more skeptically. The effect is the same: your chances of winning are getting worse.

David Timm is an Associate in the Washington DC office of Burr & Forman and a member of the firm’s Construction & Project Development practice group. He represents contractors and companies in complex disputes, claims, and bid protests involving federal, state, and local government contracts. He can be reached at [email protected] or 771. 232.1696.

End Notes

¹Raven Investigations & Security Consulting, LLC B-423447 (May 7, 2025); Oready, LLC B-423524 (June 5, 2025)(unpublished); Assessment and Training Solutions Consulting Corporation B-423398 (June 27, 2025); Wright Brothers Aero, Inc. B-423326.2 (July 7, 2025); BioneX, LLC B-423630 (July 25, 2025); Oready, LLC B-423524.2 (August 13, 2025); Helgen Indus. d/b/a/ DeSantis Gunhide B-423635 (August 26, 2025); IBS Government Services, Inc., B-423583 (August 29, 2025); Oready, LLC B-423649 (September 25, 2025).

²Oready, LLC B-423649 (September 25, 2025).

³ To date, I am not aware of any AI hallucination discussion at the Court of Federal Claims (“COFC”) in the bid protest context. However, COFC has discussed AI misuse in an unusual $1B slavery reparations claim. See Sanders v. United States, No. 24-cv-1301 (March 31, 2025)(“The cases referenced by Plaintiff have the hallmarks of cases generated by AI found in other courts.”).

⁴ Raven Investigations & Security Consulting, LLC B-423447 (May 7, 2025).

⁵ Schwartz, E. H. (2025, August 1). OpenAI pulls chat sharing tool after Google search privacy scare. TechRadar. https://www.techradar.com/ai-platforms-assistants/chatgpt/openai-pulls-chat-sharing-tool-after-google-search-privacy-scare; Al-Sibai, N. (2025, July 28). If you’ve asked ChatGPT a legal question, you may have accidentally doomed yourself in court. Futurism. https://futurism.com/chatgpt-legal-questions-court

⁶ Yao, Z., Liu, Y., Chen, Y., Chen, J., Fang, J., Hou, L., Li, J., & Chua, T.-S. (2025). Are reasoning models more prone to hallucination? arXiv preprint arXiv:2505.23646 (LRMs [are] consistently improving their ability to solve formal tasks, [but] bring inconsistent effects in terms of hallucination on fact-seeking tasks.”); see also Metz, C., & Weise, K. (2025, May 5); see also Zeff, M. (2025, April 18). OpenAI’s new reasoning AI models hallucinate more. TechCrunch. https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/ (“According to OpenAI’s internal tests, o3 and o4-mini. . . hallucinate more often than the company’s previous reasoning models. . .”)(Emphasis in original).

⁷ AI is getting smarter, but hallucinations are getting worse. The New York Times. https://www.nytimes.com/2025/05/05/technology/ai-hallucinations-chatgpt-google.html (Emphasis added).

⁸ Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025, September 4). Why language models hallucinate. arXiv. https://arxiv.org/pdf/2509.04664

⁹ ChatGPT says that I am “ambitious, reflective, strategic, and intentional. You pursue excellence while staying thoughtful about relationships, systems, and the broader impact of your actions. You value progress—both personal and societal—and you invest in it with care.”

¹⁰ OpenAI. (2025, April 29). Sycophancy in GPT-4o: What happened and what we’re doing about it. OpenAI Blog. https://openai.com/index/sycophancy-in-gpt-4o/

¹¹ Lewis, P. (2025, June 17). Can we trust generative AI to know and tell us when it doesn’t know the answer? Ontario Tech University News. https://news.ontariotechu.ca/archives/2025/06/can-we-trust-generative-ai-to-know-and-tell-us-when-it-doesnt-know-the-answer.php (“research shows that AI systems are often overconfident in what they tell us, and are not able to judge their own ability very well.”).

¹² Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025, September 4). Why language models hallucinate. arXiv. https://arxiv.org/pdf/2509.04664

¹³ Von, T. (Host). (2025, July 24). Sam Altman [Podcast episode 599]. In This Past Weekend w/ Theo Von. https://open.spotify.com/episode/272maKnMzjm0Sb4bDqzZ2y

¹⁴ Werth, T. B. (2025, June 5). Court orders OpenAI to save all ChatGPT chats. Mashable. https://mashable.com/article/court-orders-openai-to-save-all-chatgpt-chats

¹⁵ Deleting discoverable evidence can result in sanctions for spoliation of evidence.

¹⁶ Stokel-Walker, C. (2025, July 31). Exclusive: Google is indexing ChatGPT conversations, potentially exposing sensitive user data. Fast Company. https://www.fastcompany.com/91376687/google-indexing-chatgpt-conversations

¹⁷ Shanklin, W. (2025, Aug. 1). OpenAI is removing ChatGPT conversations from Google. Engadget. https://www.engadget.com/ai/openai-is-removing-chatgpt-conversations-from-google-194735704.html

¹⁸ Tech Desk (2025, Aug. 3) ChatGPT Privacy Breach Exposes User Chats via Wayback Machine: Urgent Risks Revealed. ZoomBangla. https://inews.zoombangla.com/chatgpt-privacy-breach-wayback-machine-exposure/

¹⁹ Patrice, J. (2025, July 23) Partner Who Wrote About AI Ethics, Fired For Citing Fake AI Cases. Above the Law.https://abovethelaw.com/2025/07/partner-who-wrote-about-ai-ethics-fired-for-citing-fake-ai-cases/

²⁰ BioneX argued that “FAR clause 52.219-4 operates by default.” The regulation it includes language that the clause “shall” be inserted into solicitations like this one and it was left out. GAO said that the dismissal was based on failure to provide a “detailed statement of the legal and factual grounds for the protest.”

²¹ Oready, LLC B-423649 (September 25, 2025)(Noting, “the material aggravating factor that the protester’s misconduct in submitting non-existent citations or decisions was not a single occurrence, but, rather, included a multitude of such problematic citations and decisions across multiple protests. Most troubling, the misconduct continued after GAO explicitly warned the protester of the potential for sanctions if such misconduct persisted.”).