Trust But Verify: The Character, Competence, & Control of Large Language Models

As large language models (LLMs) evolve into strategic instruments, this Walker Paper proposes a pragmatic framework for evaluating when and how they can be trusted in military decision making. Adapting human trust models to the algorithmic domain, the paper advances a “Trust Triad”—Character, Competence, and Control—and surveys emerging benchmarks (e.g., ethics, fairness, safety, truthfulness, robustness, and privacy) to compare current models for military decision support. It finds no model is perfect but some are more “mission fit” than others, especially when assessed with weighted metrics emphasizing factual reliability, robustness under pressure, and ethical alignment. The study also identifies gaps in transparency and accountability evaluation and recommends developing standardized measures such as a Transparency Evaluation Score and Attribution Traceability Score. The bottom line: LLMs should augment—not replace—human judgment, and trust must be earned through measurable performance.

PHOTO BY: Lt Col Michael S. Perry, USAF
VIRIN: 260306-F-UR165-001.JPG
FULL SIZE: 0.27 MB
Additional Details

No camera details available.

IMAGE IS PUBLIC DOMAIN

Read More

This photograph is considered public domain and has been cleared for release. If you would like to republish please give the photographer appropriate credit. Further, any commercial or non-commercial use of this photograph or any other DoD image must be made in compliance with guidance found at https://www.dimoc.mil/resources/limitations, which pertains to intellectual property restrictions (e.g., copyright and trademark, including the use of official emblems, insignia, names and slogans), warnings regarding use of images of identifiable personnel, appearance of endorsement, and related matters.