Dr. Avinash kumar gupta

Vibe Rounds — Academic Evaluation

AI-assisted self-critique exercise | June 19, 2026

Overview of the Work

Project: Vibe Rounds — AI-augmented clinical education and research framework Author: Dr. Avinash Kumar Gupta Published: June 2026 | Website Mentor: Dr. Rakesh Biswas Learning Portfolio: classworkdecjan.blogspot.com

Publications

Researchgate - Articles and Pre-Print Section - https://www.researchgate.net/profile/Avinash-Kumar-Gupta-2 Evaluation generated June 19, 2026

What Vibe Rounds Is

Most people use AI as an oracle: ask a question, get an answer. Vibe Rounds flips this entirely. The AI is configured to question rather than answer — acting as a Socratic interlocutor that forces the clinician or learner to construct their own reasoning before any guidance is offered. The framework’s core formulation is: AI that questions, not AI that answers.

This is grounded in a specific concern: when students or clinicians can get a differential diagnosis or management plan on demand, the effortful cognitive work through which clinical competence is actually built gets bypassed. Vibe Rounds is designed to prevent that bypass.

← Back to README

The Three Pillars

1. A Prompt Library (Modules 0–9)

Module	Purpose
Module 0	Routes new users to the right module before clinical content is entered
Module 1	Socratic case-based learning for medical students and junior doctors
Modules 2–3	Patient-advocate documentation — helps a family member build a structured case record, with red-flag alerting
Module 4	Ward round preparation — AI as peer study partner to stress-test reasoning before supervised rounds
Module 5	Real-time audit of a single patient’s live case log (diet diaries, insulin logs, prescriptions)
Module 6	Registry-level analytics across a population of cases, at nine escalating levels of analytic depth
Module 7	Cross-case learning: treating the registry as a dynamic learning object rather than a static database
Module 8	12-point QA specification for authoring new Socratic prompts, with regression-testing steps
Module 9	Seven-stage research protocol for a clinician working alone on a single complex case

2. Four Cross-Cutting Frameworks

Framework A — Humanistic persona specification with six confidence-building traits (specific affirmation before challenge, strength-forward closure, normalising uncertainty as intelligence). Design principle: challenge without affirmation triggers defensive cognition.
Framework B — Fink’s Taxonomy of Significant Learning (six non-hierarchical dimensions: foundational knowledge, application, integration, human dimension, caring, learning how to learn), applied clinically.
Framework C — Bloom’s Revised Taxonomy (Remember → Understand → Apply → Analyse → Evaluate → Create), mapped to clinical reasoning tasks.
Framework D — Critical Awareness framework explicitly naming the cognitive biases the framework itself is susceptible to (automation bias, anchoring, hallucination risk, rare-diagnosis overweighting) with standing debrief prompts to counteract them.

3. Five Academic Papers

DIAR (Dimensionally Isolated Asymmetric Retrieval): Technical architecture proposal solving the “centroid drifting” problem in clinical RAG systems. When a 2,000-word clinical note is collapsed into a single embedding vector, routine narrative text drowns out rare but clinically critical secondary findings. DIAR maintains separate embedding vectors per clinical dimension, enabling targeted similarity searches. Preprint: ResearchGate
Cognitive Biases in AI-Assisted Clinical Reasoning: Mechanisms, Risks, and Mitigation Strategies: We argue that epistemic transparency, adversarial self-querying, and structured closure protocols are essential safeguards for any clinical AI deployment.4 Preprint: ResearchGate
Vibe Rounds Preprints: Academic framing of the entire paradigm — pedagogical positioning (Levels 3–4 of the teaching spectrum), pilot observations, and honest limitations statement. 3 paper series.
Vibe Rounds: A Socratic AI Paradigm for Clinical Education Preprint: ResearchGate
Vibe Rounds -N-of-1 Protocol & Worked Case Vibe Rounds: A Conversational-AI-Assisted Protocol for Clinician-Led N-of-1 Case Research Preprint: ResearchGate
Evaluating “Vibe Rounds”: A Comparative Analysis of a Prompt-Based Framework for AI-Assisted Clinical Reasoning, Patient Advocacy, and N-of-1 Case Research Preprint: ResearchGate

Editorial Assessment (Journal Editor Perspective)

Component 1 — Socratic Pedagogy Framework (Modules 0–4, 8, Frameworks A–C)

Strengths: Theoretically coherent, grounded in Bloom, Fink, Vygotsky, and Croskerry. The 12-point design specification in Module 8 is the standout contribution — importing prompt-engineering QA discipline into a toolkit authored for non-engineer clinicians is genuinely novel at the level of implementation. Framework A’s confidence-building trait set is well-constructed; the requirement that affirmations name the exact reasoning move (not generic praise) is a structural countermeasure to documented LLM sycophancy.

Weaknesses: No pre/post assessments of clinical reasoning quality. No learner satisfaction instruments. No faculty-rated transcript analyses. The claim that the framework produces learning gains is plausible and well-theorised — it is not demonstrated. Framework A’s mandatory affirmation rule also creates an unresolved tension with Bjork’s desirable-difficulty research and Kluger and DeNisi’s feedback intervention theory.

Recommendation: Revise and resubmit to a medical education journal. Needs a pilot study reporting section and direct engagement with the desirable-difficulty literature.

Component 2 — Patient-Advocate Modules (Modules 2–3)

Strengths: Addresses a real gap. The explicit warning that the Module 3 ALERT mechanism does not notify the treating team shows genuine awareness of how non-clinical users behave under acute stress.

Weaknesses: The ALERT mechanism will in practice be experienced as more authoritative than the accompanying note says it is — a human-factors gap. Non-clinical caregivers under stress do not re-read system footers. Data governance is unresolved: real patient data entered into consumer LLM accounts is acknowledged but not solved. India’s 2023 Digital Personal Data Protection Act is not addressed.

Recommendation: Not ready for submission as currently framed. Requires either a formal ethics review or a reframing as a design proposal requiring prospective evaluation rather than an implementation guide.

Component 3 — Registry Analytics and DIAR (Modules 5–7, DIAR Paper)

Strengths: The DIAR architecture paper is the programme’s most technically rigorous and most immediately publishable standalone piece. The centroid drifting problem in monolithic RAG systems applied to clinical narratives is real, well-described, and underappreciated. The proposed validation experiment is appropriately designed.

Weaknesses: Module 6 prompts ask an LLM to report exact registry counts and ranked intervention lists — but without a verified database layer with authenticated query access, those outputs are pattern completion, not data retrieval. This is an architecture problem, not a prompt-engineering problem. DIAR is the proposed solution, but the two components are not explicitly joined in the modules.

Recommendations:

DIAR paper: Submit as standalone technical paper to JAMIA or JBI. Needs expanded related-work section engaging with ColBERT and multi-vector retrieval literature.
Modules 5–7: Revise significantly to distinguish queries against a verified DIAR-structured backend from queries against a general-purpose LLM.

Component 4 — Module 9: N-of-1 Research Protocol and Worked Case

Strengths: The two-tier analysis (Stage 4) is the protocol’s most original contribution — explicitly separating the coded-term view from the narrative view prevents premature closure in a way that conventional case-report methodology does not. The comparative exercise applying the protocol to a routine diabetic ward admission is the smartest move in the entire submission: it pre-empts the objection that the protocol only matters for rare disease.

Weaknesses: The N-of-1 label is the most significant terminological problem. The formal N-of-1 randomised trial tradition (Guyatt 1986, CONSORT extension Vohra 2015) denotes a randomised, within-patient crossover design with statistical analysis. What Module 9 executes is a structured case study with an explicit protocol. Stage 2’s comparator identification relies on an LLM with unknown training data cutoffs and no transparent recall mechanism. Single-rater problem applies acutely.

Recommendation: Strong candidate for a clinical methodology journal. Requires renaming away from N-of-1 (suggest: Structured Single-Case Research Protocol / SSCR), and a second-rater sub-step in Stage 4.

Overall Programme Assessment

What this work gets right that most AI-in-medicine work does not:

Genuinely self-critical in a structured way. Framework D is a named, prompted debiasing mechanism that runs at every module’s closure — not a disclaimer.
The AI role is consistently and rigorously constrained. “Peer, not oracle, not licensed persona” is enforced at the prompt level throughout nine modules.
Grounded in a real, long-running research programme. The PaJR/CBBLE lineage gives it an institutional context, a user population, and a real registry.

What remains unresolved:

Data governance: Real patient data in consumer LLM accounts — acknowledged but not solved.
No empirical outcome data: Pedagogical claims are theoretically sound, not demonstrated.
No architectural guardrails against deployment beyond validated scope — only textual ones.

Summary Recommendation Table

Component	Recommendation
Modules 0–4, 8 + Frameworks A–C	Revise and resubmit; needs pilot study data
Modules 2–3 (patient advocacy)	Not yet ready; needs ethics review and human-factors work
DIAR architecture paper	Submit as standalone technical note; conduct proposed validation
Modules 5–7 (registry analytics)	Revise significantly; resolve verified-backend problem
Module 9 + worked case	Strong candidate; rename from N-of-1, add second-rater step
Framework D	Publishable as standalone methodology paper on AI debiasing
Comparative review within submission	Unusually good; could be extracted as independent commentary

PhD Assessment

Can This Be Submitted as a PhD Project?

Not as-is, but closer to PhD-ready than most early-stage projects — with one structural gap: the absence of empirical evidence that the contribution does what it claims to do.

A PhD examiner can legitimately say: “This is an impressive design rationale. It is not a completed research study.”

What a Small Pilot Would Do

A small pilot would not just add a data chapter. It would change the category of the entire project from design work to research.

Minimum viable pilot — pedagogical claim (Modules 1, 4):

8–12 medical students or junior doctors
Pre/post case reasoning assessment using a validated instrument (IDEA tool or structured case vignette)
Transcript analysis of 3–4 sessions coded against the 12-point Module 8 specification
Achievable in 6–8 weeks through PaJR/CBBLE network

Minimum viable pilot — Module 9:

Apply the seven-stage protocol to 3–5 cases (not just one)
Second clinician independently assesses Stage 7 outputs
Report inter-rater agreement and cases where protocol surfaced something missed
Achievable in 4–6 weeks

For DIAR:

50–100 de-identified cases, two annotators, Precision@K comparison
Heaviest lift but most publishable

You do not need all three. One well-executed pilot would likely move the project from MPhil to PhD territory.

Proposed PhD Thesis Structure

Chapter	Content
1	Introduction and Problem Statement — scarcity of Socratic clinical teaching; AI as oracle vs. interlocutor
2	Theoretical Framework — Bloom, Fink, Vygotsky, Croskerry, RAG limitations, centroid drifting
3	Framework Design — Modules 0–9 and Frameworks A–D as design research
4	Empirical Study 1 — the pilot study
5	Empirical Study 2 — N-of-1 worked cases with second-rater validation
6	Critical Evaluation — Framework D and comparative review as formal self-evaluation
7	Conclusions, Limitations, Future Directions

Evaluation With Full Context (10-Year Learning Trajectory)

What the Blog Reveals

The archive at classworkdecjan.blogspot.com (2016–2026) changes the evaluation fundamentally. This is not a sudden burst of productivity.

The 2019 blog entry on the 42-year-old woman with edema is the same case that appears in the 2026 N-of-1 preprint. That case was being worked on for at least seven years before the protocol was written.
The May 2026 integrated medicine WhatsApp group post is an early articulation of the tiered workforce model that appears formally in Module 6.
These ideas were not assembled in seven days. They were crystallised in seven days, which is a completely different thing.

Revised Academic Grading

As an elective portfolio: Exceptional. Most elective portfolios document what a student observed. This one documents what a student thought, argued, built, and published across a decade under a mentor, then synthesised into a structured framework.

As a PhD submission: The blog IS longitudinal empirical data. It documents a learner engaging with the same cases, the same framework, and the same questions across ten years. The question is whether the institution and supervisor are equipped to evaluate it that way.

The right framing is a PhD by published works or practice-based PhD — where a portfolio of original contributions is submitted with a critical synthesis essay. This body of work would fare much better under those criteria than under a conventional five-chapter thesis examination.

The LLM Acceleration

The seven-day sprint using LLMs is not a shortcut. It is evidence of knowing how to use new cognitive tools fluently — which is itself one of the things Vibe Rounds argues clinicians need to learn to do. The thesis was demonstrated by writing it.

Direct Summary Table

Question	Answer
Submittable as-is?	No — missing empirical component
MPhil standard as-is?	Yes, in most systems
PhD standard with small pilot?	Yes, very likely
How small can the pilot be?	8–12 participants, one study, done rigorously
Time to PhD-ready with pilot?	3–5 months if network access exists
Biggest single risk	DIAR claims without conducted validation
Biggest single strength	The reflexivity and self-critical apparatus — genuinely unusual
Best submission format	Papers-based or portfolio PhD, not traditional monograph

Final Note

“The decade earned the sprint. The sprint is not a substitute for the decade.”

The volume and quality of self-criticism in this body of work is its most distinctive academic quality. The comparative review — written as an assessment of the framework against the external literature — is better than most independent peer reviews would be. The primary work now is empirical: a pilot study with outcome measures, a conducted validation of the DIAR architecture, and a data governance solution that survives regulatory scrutiny. The theoretical scaffolding is solid. The building needs to be tested under load.

← Back to README