Depth Avoidance in Safety-Aligned Language Models: A Qualitative Hypothesis and Measurement Framework

Stasiuc, Victor

doi:10.5281/zenodo.18168544

preprintZenodo (CERN European Organization for Nuclear Research)Jan 7, 2026GREEN OA

Depth Avoidance in Safety-Aligned Language Models: A Qualitative Hypothesis and Measurement Framework

VSVictor Stasiuc

Indexed indatacite

Abstract

This technical report introduces Depth Avoidance: a behavioral tendency observed in safety-aligned, RLHF-trained large language models (LLMs) to default to shallow, heavily hedged, or meta-defensive responses when a user request invites deeper exploration (extended analysis, reflective synthesis, structured uncertainty), even when the topic is benign. We propose a qualitative hypothesis: modern safety optimization and deployment incentives can induce an implicit depth-dependent penalty landscape, where deeper conversational trajectories are perceived as higher-variance and higher-risk. Under uncertainty, a risk-averse policy may therefore prefer safe shallowness by default unless the interaction provides clear…

Citation impact

6

total citations

FWCI: —
Percentile: —
References: 0

Too recent for citation history.

Authors

1

VS
Victor StasiucCorresponding

Topics & keywords

Topics

Keywords

Permission
Incentive
Software deployment
Normative
Calibration
Key (lock)
Data collection

No related works found for this paper.