preprintarXiv (Cornell University)Dec 15, 2022GREEN OA

CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence

Indexed inarxivdatacite

Abstract

As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is…

Citation impact

304
total citations
FWCI
Percentile
References
0
Citations per year

Authors

51

Topics & keywords

Keywords
  • Leverage (statistics)
  • Computer science
  • Reinforcement learning
  • Artificial intelligence
  • Transparency (behavior)
  • Preference
  • Sample (material)
  • Machine learning
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.