How the US Military Might Use Generative AI Chatbots in Targeting Decisions

On background, a Defense Department official described to MIT Technology Review a potential workflow in which generative AI chatbots are used to analyze and prioritize lists of candidate targets. According to the official, a list of possible targets could be fed into a generative AI system; operators might then ask the system to evaluate and rank those targets while accounting for operational factors such as aircraft locations. The official said humans would be responsible for vetting and approving any AI-produced recommendations, and framed the description as an illustrative example rather than a confirmation of specific current practice.

What was described

Inputs and outputs: A ranked list of candidate targets could be provided to a generative AI system, which would analyze the items and return prioritized recommendations for which targets to strike first, based on available contextual data.
Human in the loop: The official emphasized that humans would check and evaluate the system’s recommendations before actions were taken.
Possible vendor models: The article notes that OpenAI’s ChatGPT and xAI’s Grok could theoretically be used in such scenarios because both companies have reached agreements with the Pentagon for classified use. Other reporting has indicated Anthropic’s Claude has been integrated into some military systems.

Context and technological lineage

Maven and older AI programs: Since at least 2017, the Pentagon’s Maven initiative has used computer-vision and other non-generative AI techniques to process imagery and highlight potential targets on map-based dashboards, enabling human analysts to inspect and vet automated flags.
Generative AI as a conversational layer: The official’s remarks suggest generative large language models might be layered on top of existing data-processing systems to provide conversational access, summarization, and prioritization of candidate targets.

Key technical and operational distinctions

Model differences: Generative large language models (LLMs) produce synthesized text and conversational outputs, while earlier systems like Maven produced more directly interpretable artifacts (images, map markers). LLM outputs can be easier to query but are typically harder to verify against raw data.
Verification needs: Because generative outputs synthesize information, human reviewers must cross-check underlying evidence; this verification requirement may limit the net time savings offered by automation unless verification processes are also improved.

Relevant developments and scrutiny

Approved models and procurement: Anthropic’s Claude had been approved for some classified Pentagon uses and was reported in connection with operations; OpenAI and xAI have also reached agreements to provide models for classified environments. The Pentagon limits which generative models are approved for classified use.
Public scrutiny and incidents: The military’s use of AI tools is under increased scrutiny following a deadly strike on a girls’ school in Iran. Multiple outlets have reported that the strike involved a US missile and investigations are ongoing; reporting has suggested outdated targeting data may have been a contributing factor.
Policy and vendor disputes: The article notes tensions between the Pentagon and some AI vendors—Anthropic was designated a supply-chain risk by the Defense Department after disagreements over usage restrictions and is contesting that designation in court.

What is confirmed versus illustrative

Confirmed by reporting: The Pentagon is experimenting with and fielding AI tools; a small set of generative models has been approved for classified use; generative AI could be used as a conversational layer to help prioritize targets; humans are presented as the final check on recommendations.
Not confirmed or unresolved: The on-background official did not confirm whether the specific targeting workflow described is currently in operational use. Independent public verification of the exact role and degree of autonomy of generative models in live targeting decisions is limited.

Practical implications (factual)

Trade-offs: Generative chatbots can accelerate retrieval and synthesis of information, but their outputs require human validation, which affects expected efficiency gains.
Auditability and traceability: Deploying LLMs in targeting workflows increases the need for logging and mechanisms that allow human reviewers to trace recommendations back to raw evidence.
Procurement and policy constraints: Model availability and permitted use cases are shaped by vendor agreements, Pentagon approvals, supply-chain assessments, and broader policy and political considerations.

Conclusion

Reporting based on an on-background Defense Department official indicates generative AI chatbots are being considered as a conversational analysis layer to help prioritize targeting options, with humans described as retaining final vetting responsibility. The broader context includes the Pentagon’s earlier use of AI for imagery analysis, recent approvals of a small set of generative models for classified use, ongoing vendor disputes, and heightened public scrutiny following a lethal strike. The exact operational role of generative models in targeting decisions remains partially unverified in the public record and should be assessed against official procurement documents, policy statements, and investigation findings where available.