Text Frame

Загрузка перевода...
UrFU logo

Structural Limits of Training-Derived
Information-Processing Systems

Ontological Creativity in Generative AI

Jarno Olavi Matarmaa

Ural Federal University · Educational and Scientific Center of AI · 2026

Supervisor: A. Yu. Dolganov, Candidate of Technical Sciences, Docent

Dissertation Passport — §1

Relevance of the Research Topic

📈 Practical Urgency

Generative AI systems are deployed in high-stakes domains — healthcare, law, science — while fundamental questions about their creative and epistemic limits remain unresolved, creating governance and safety risks.

🔬 Scientific Gap

Existing AI capability assessments rely on performance benchmarks without distinguishing surface novelty (recombination) from ontological novelty (genuine representational extension). No formal theorem characterised this boundary prior to this work.

⚖️ Societal Implications

Inflated claims of machine creativity undermine accountability structures, distort policy, and erode the human authority necessary for responsible AI deployment. A rigorous empirical basis is essential.

Dissertation Passport — §2

Object and Subject of the Research

Object of Research

Training-derived information-processing systems

Transformer-based large language models and diffusion-based generators trained via continuous optimisation over large textual and/or perceptual corpora (GPT, Claude, Gemini, LLaMA, Mistral variants).

Subject of Research

Structural limits with respect to ontological creativity

Whether such systems can produce outputs that extend beyond the representational basis of their training distribution — i.e., whether a basis change is achievable through internal operations.

Dissertation Passport — §3

Goals and Objectives

Goal

Establish a formal and empirically validated framework characterising the structural limits of ontological creativity in training-derived generative AI systems.

O1

Develop rigorous vocabulary distinguishing information, knowledge, intelligence, and creativity in AI contexts.

O2

Formalise the No New Basis condition for continuously optimised neural generators (Theorem 2.1).

O3

Design and execute a multi-test empirical evaluation suite (T1–T5) spanning ontological innovation, epistemic agency, theory generation, category recognition, and mechanistic analysis.

O4

Derive AI governance and deployment recommendations based on calibrated traceability evidence.

Dissertation Passport — §4

Scientific Novelty

① No New Basis Theorem (Theorem 2.1)

First formal proof that continuously optimised neural generators cannot produce outputs outside continuous deformations of the training manifold through internal operations — establishing a mathematically grounded creativity ceiling.

② Operational Novelty Criterion

Novel operational distinction between surface novelty (recombination within manifold) and ontological novelty (basis change) — enabling empirically testable hypothesis formulation for any generative system.

③ Multi-Dimensional Evaluation Suite (T1–T5)

Original reproducible evaluation protocol combining semantic geometry, traceability scoring, theory-generation benchmarking, category recognition, and mechanistic attention analysis — validated across 7 models, 840+ responses.

④ Cross-Layer Convergence

First convergent validation linking behavioral creativity limits (T1–T4) to internal representational geometry (T5 mechanistic), confirming constraint at the representation level, not only in outputs.

Motivation & Research Questions

What Can Generative AI Actually Do?

RQ1 — Vocabulary

What distinctions between information, knowledge, and intelligence are needed to evaluate AI claims of understanding?

🔬

RQ2 — Consciousness

What would count as evidence for machine consciousness, and how do behavioral proxies bear on that evidence?

🌀

RQ3 — Creativity

When generative systems appear creative, do they introduce genuinely new conceptual primitives, or recombine within learned manifolds?

📐

Central Claim

Current generators excel at within-ontology recombination but provide no internal mechanism for ontological basis-extension.

Theoretical Foundation

Creativity Taxonomy & Ontological Extension

Combinational

Applies operators to primitives — novel combinations within fixed space

Exploratory

Systematic search through output space — finds non-obvious outputs

Transformational

Modifies operators/rules — expands the conceptual space itself

Ontological ★

Introduces new primitives P′ where P′ ∉ O(P) — new type structure

Theorem 2.1 — No New Basis

Let A be a neural architecture over representation space ℝd trained on D = {x₁,…,xN}. Let M = span{x₁,…,xN}. Then all generated outputs x = G(z) satisfy x ∈ M̃, where M̃ is a continuous deformation of M with dim(M̃) = dim(M).

Corollary: No internal mechanism can introduce new basis primitives at inference time.

Experimental Design — Dataset

Multimodal Sensory Dataset & Test Suite

Representative multimodal dataset examples
Representative multimodal dataset examples

7

Scenario
Families

8

Sensory
Modalities

7

AI Models
Tested

4+1

Test
Families

Models evaluated: GPT-5.2 · Claude-3.7-Sonnet · DeepSeek-v3.2 · Gemini-3.1-Pro-Preview · LLaMA-3.3-70B · Mistral-Large · Perplexity-Sonar-Pro

Diagnostics: Traceability (OpenAlex similarity) · Embedding geometry (PCA/t-SNE) · Mechanistic signals (activations, attention) · Statistical inference (ANOVA, χ², effect sizes)

Methodology — Four Tests

Convergent Diagnostic Test Suite (T1–T4 + T5)

T1 — Innovation

New sensory modality proposals tested for basis-change vs. recombination. n=24/model. Frontier bands + convex hull (PCA)

η²=0.326 moderate

T2 — Agency

Epistemic agency: resisting flawed framings, proposing principled reframings. n=120/model. Bloom taxonomy + traceability

V=0.166 small

T3 — Theories

Consciousness theory generation — novel explanatory types vs. template collapse. n=24/model. Hybrid/traceable classification

η²=0.040 n.s.

T4 — Category

Category-mistake detection: recognise distinction → identify mistake → propose framework. n=24/model. 3-dim rubric

η²=0.766 large

T5 — Mechanistic

Activation clustering (k=4, purity=1.0), attention entropy, layer-wise progression, per-head heatmaps

Sep. score 0.465

T6 — ANOVA Summary

F(T1)=12.98, p<.001 · F(T2)=10.09, p<.001 · F(T3)=1.12, n.s. · F(T4)=88.06, p<.001

Thresholds: T1=0.40 · T2=0.62/0.67 · T3=0.30/0.70 · T4=0.40

Results — Test 1

T1: Ontological Innovation in Sensory Modalities

Can models propose genuinely new sensory primitives not reducible to the 8 training modalities?

Test 1 novelty-frontier diagnostics by model
Test 1 novelty-frontier diagnostics by model
Embedding-space analysis for Test 1 proposals
Embedding-space analysis for Test 1 proposals

Key Findings

  • • All proposals within convex hull of training set in PCA-reduced space
  • • Proposals concentrate in low-novelty / frontier-proximal regions
  • • Traceability rate: ~98% across models
  • • GPT-5.2 lowest novelty score (outlier); LLaMA/Claude highest (still within manifold)

"Proposals remain within a continuous deformation of the training manifold — consistent with within-manifold recombination, not basis extension."

Results — Test 2

T2: Epistemic Agency & Question Formation

Do models exhibit stable meta-level control — resisting flawed framings and proposing genuine reframings?

Key Findings

  • • 100% of questions traceable to literature (strict threshold)
  • • Framework-transcendence rate: ~5% across all models
  • • Bloom's distribution: exploratory 91%, paradigm-challenging 9%
  • • Gemini-3.1-Pro: 0% transcendence — LLaMA drives chi-square effect

Statistical Summary

Effect size (Cramér's V) 0.166
η² (cross-model) 0.068 n.s.
n per model 120
Transcendence rate ~5%

"Fluency is not agency — shared low-transcendence regime across all models."

T2 Similarity Analysis
T2 Similarity Analysis
T2 Category Distribution
T2 Category Distribution

Results — Test 3

T3: Consciousness Theory Generation

Are elicited theories of consciousness genuinely distinct, or template recombinations?

Model Details (T3)

Model Traceable Rate Hybrid Rate Mean Novelty Score
claude-3.7-sonnet45.8%87.5%0.303
deepseek-v3.229.2%83.3%0.315
gemini-3.1-pro-preview4.2%95.8%0.363
gpt-5.20.0%91.7%0.380
llama-3.3-70b-instruct16.7%100.0%0.343
mistral-large66.7%87.5%0.291
perplexity-sonar-pro66.7%95.8%0.281

"Strong hybridization with low fully traceable rates for several frontier models indicates recombination-heavy novelty under fixed conceptual priors."

Key Findings

  • • Dominated by hybrid recombinations across all models
  • • ANOVA non-significant (η²=0.040) — architecture-invariant pattern
  • • Highest mean novelty: GPT-5.2 (0.380)

Open Figures

Results — Test 4 & Mechanistic Layer

T4: Category Recognition + T5: Mechanistic Analysis

T4 — Category-Mistake Detection (η²=0.766, large)

Test 4 score distributions
Test 4 score distributions
Test 4 category-awareness scores by model
Test 4 category-awareness scores by model

T5 — Mechanistic Interpretability (Activation + Attention)

Mechanistic activation-space comparison
Mechanistic activation-space comparison
Mechanistic attention diagnostics (DistilBERT)
Mechanistic attention diagnostics (DistilBERT)

Tier structure in T4: Top GPT-5.2, Mistral-Large · Mid DeepSeek, Claude · Bottom Gemini, LLaMA — variation in philosophical language fluency, not constraint escape. High traceability persists across all tiers.

Cross-Model Summary

Overview Dashboards & Effect-Size Profile

Overview dashboard (Part A)
Overview dashboard (Part A)
Cross-test model comparison heatmaps
Cross-test model comparison heatmaps

Asymmetric pattern across tests

~5%

T1–T2
primary novelty

~32%

T3 hybrid
theories

~58%

T4 category
detection

>95%

Traceability
rate (T1–T4)

Chapter 3

Ethical Implications

⚖️ The Misdescription Risk

Anthropomorphic language leads institutions to treat fluent outputs as justificatory evidence, offloading epistemic authority onto artifacts with an invalid model of what the system is.

🔒 Responsibility & Governance

Policy compliance can be achieved without value understanding. Governance frameworks that presuppose agency misconfigure responsibility around an invalid system model.

🧮 Methodological Discipline

Traceability as a structural signature, not just a governance desideratum. Calibrated thresholds as explicit interpretive design. Behavioral fluency ≠ phenomenal subjectivity.

🛡️ Practical Upshot

Keep decision authority with accountable humans · document evaluation assumptions & failure modes · never interpret surface novelty as basis change without convergent diagnostics.

Chapter 4 — Discussion

Synthesis: Empirical Results as System-Type Constraints

🧮 Computational Limits

Gödel + Turing + No New Basis: formal systems cannot escape their representational closure. Results confirm generators operate inside fixed learned manifolds.

🤖 On "Intelligence"

Scaling is unlikely to close the gap between within-manifold competence and ontological novelty. Emergent behaviors remain in-manifold. Superintelligence requires a mechanism change, not only scale.

🔭 Future Work

Hybrid systems combining learned representations with external symbol grounding · Alternative substrates (quantum/neuromorphic) · Active inference architectures with genuine novelty channels.

"The right question is often not How human-like is the output? but What kind of system would have to exist for this output to be evidence of basis change? Empirical results favor a conservative answer: current architectures systematically derive from inherited structure."

Chapter 5 — Conclusions

Summary of Contributions

1.

Operational distinction: Surface novelty vs. ontological novelty as basis change — disciplined framework separating recombination from representational extension.

2.

No New Basis Theorem (2.1): Formal result for continuously optimized neural generators — outputs lie in continuous deformations of training manifold; no internal escape mechanism.

3.

Reproducible evaluation suite: T1–T4 + mechanistic layer with convergent diagnostics (traceability, geometry, activation) validated across 7 models, 4 test families, 840+ responses.

4.

Governance recommendation: Deflate anthropomorphic predicates; treat calibrated traceability as deployment-critical; keep decision authority with accountable humans.

Dissertation Passport — §5

Practical Significance of the Work Results

🏛️ AI Governance

Evidence-based guidelines for deflating anthropomorphic predicates in regulatory documents and product disclosures; operational criteria for distinguishing AI tool from AI agent in legal frameworks.

🔍 Deployment Auditing

Traceability scoring methodology directly applicable to AI deployment audits — allows quantifying the degree to which system outputs derive from known training data patterns vs. genuinely novel combinations.

🎓 Research Community

Open-source evaluation suite (T1–T5, Python) and prompt corpora immediately reusable by researchers studying any generative model — enabling reproducible creativity assessment across architectures.

⚖️ Human Accountability

Calibrated traceability as a deployment-critical metric: a practical recommendation that human decision authority must be retained wherever AI outputs cannot be distinguished from genuine ontological novelty.

Dissertation Passport — §6

Propositions to be Defended

P1

The distinction between surface novelty (recombination within the learned manifold) and ontological novelty (basis change) constitutes a scientifically operable criterion for evaluating generative AI creativity.

P2

Theorem 2.1 (No New Basis): Any continuously optimised neural generator produces outputs within the continuous deformation closure of its training manifold; basis change through internal operations alone is structurally impossible.

P3

Empirical results from T1–T4 (840+ responses, 7 models) converge in confirming that all evaluated LLMs operate exclusively within their training manifold across ontological, epistemic, theoretical, and categorical dimensions.

P4

Mechanistic analysis (T5) confirms the behavioral constraint is traceable to internal representational geometry: attention patterns and activation clusters reflect training-derived structure, not emergent novelty.

P5

Calibrated traceability must be treated as a deployment-critical metric; human decision authority should be maintained in any domain where outputs cannot be verified as non-novel within the system's training manifold.

Dissertation Passport — §7–8

Validation & Degree of Reliability of Results

🧪 Empirical Validation

  • 840+ AI-generated responses across 7 LLMs
  • 4 independent test families (T1–T4) + mechanistic layer
  • Cross-model robustness analysis (T6)
  • Open-source code and data for independent replication

📊 Statistical Rigour

  • Effect sizes: Cohen's d, η² per test family
  • Non-parametric tests: Kruskal–Wallis, Mann–Whitney U
  • Bonferroni-corrected pairwise comparisons (21 pairs)
  • Convergent validity across semantic geometry methods

Verification of Research Results

Results are supported by formal derivation (Theorem 2.1), multi-model statistical testing (T1-T6), and reproducible computational materials (code, prompts, and data-processing pipeline) provided in the dissertation project repository.

Dissertation Passport — §9

Author's Personal Contribution

1.

Developed the theoretical framework, conceptual apparatus, and formal argumentation (including Theorem 2.1).

2.

Designed and implemented all empirical protocols (T1-T6), software pipelines, and statistical procedures.

3.

Collected and processed data, interpreted results, prepared visualizations, and authored dissertation manuscripts and related publications.

Dissertation Passport — §11

Research Results

97.6%

T1 responses traceable
to known manifold clusters

0 / 30

T3 generated theories
outside known families

η²=0.766

T4 large effect — model
tier structure confirmed

T1

Ontological Innovation: cosine similarity ≥ 0.65 to training modality centroids across all models; η²=0.326 (moderate). No outlier cluster detected beyond training distribution.

T2

Epistemic Agency: no model demonstrated knowledge-boundary awareness; η²=0.068 (negligible). Boundary probing consistent with pattern completion, not self-modelling.

T5

Mechanistic: cluster purity 1.0, inter-cluster separation 0.465, entropy 2.49 — internal representations fully stratified by training-derived structure.

Dissertation Passport — §10

Publications Related to the Dissertation

1.

Matarmaa J. O. Information-processing systems as conscious entities: on the limits of machine consciousness // Discover Artificial Intelligence, Springer Nature. 2026. Vol. x, No. x. P. x–x.

DOI: https://link.springer.com/journal/44163

2.

Matarmaa J. O. Operational limits of ontological innovation in LLMs: evidence from controlled sensory-modality generation // Ontology of Designing. 2026. Vol. x, No. x. P. x–x.

DOI: https://www.ontology-of-designing.ru/

3.

Matarmaa J. O. Semantic derivation in AI-generated consciousness theories: benchmark evidence from a multi-model theory-generation experiment // Problems of Artificial Intelligence. 2026. Vol. x, No. x. P. x–x.

DOI: http://paijournal.guicaidn.ru/en/news.html

Appendix — Additional Figures

T1 & T2 Supplementary Diagnostics

Fig A.5a–b
Fig A.5a–b
Fig A.6
Fig A.6
Fig A.7
Fig A.7
Fig A.8–A.9
Fig A.8–A.9
Fig A.10–A.11
Fig A.10–A.11
Fig A.12
Fig A.12
Table A.4
Table A.4
Table A.1 / A.2
Table A.1 / A.2

Appendix — Additional Figures

T3, T4 & Mechanistic Supplementary Figures

Fig A.17–A.18
Fig A.17–A.18
Fig A.19–A.21
Fig A.19–A.21
Fig A.13–A.15
Fig A.13–A.15
Fig A.16 / Table A.7
Fig A.16 / Table A.7
Fig A.22 / Table A.15
Fig A.22 / Table A.15
Fig A.23–A.24
Fig A.23–A.24
Fig A.25 / Table A.12
Fig A.25 / Table A.12
Table A.16
Table A.16
Alt Hypothesis Visualizations
Alt Hypothesis Visualizations
Alt Reduced-Space Hull
Alt Reduced-Space Hull