AI-assisted self-critique exercise | June 19, 2026
Project: Vibe Rounds — AI-augmented clinical education and research framework Author: Dr. Avinash Kumar Gupta Published: June 2026 | Website Mentor: Dr. Rakesh Biswas Learning Portfolio: classworkdecjan.blogspot.com
Researchgate - Articles and Pre-Print Section - https://www.researchgate.net/profile/Avinash-Kumar-Gupta-2 Evaluation generated June 19, 2026
Most people use AI as an oracle: ask a question, get an answer. Vibe Rounds flips this entirely. The AI is configured to question rather than answer — acting as a Socratic interlocutor that forces the clinician or learner to construct their own reasoning before any guidance is offered. The framework’s core formulation is: AI that questions, not AI that answers.
This is grounded in a specific concern: when students or clinicians can get a differential diagnosis or management plan on demand, the effortful cognitive work through which clinical competence is actually built gets bypassed. Vibe Rounds is designed to prevent that bypass.
| Module | Purpose |
|---|---|
| Module 0 | Routes new users to the right module before clinical content is entered |
| Module 1 | Socratic case-based learning for medical students and junior doctors |
| Modules 2–3 | Patient-advocate documentation — helps a family member build a structured case record, with red-flag alerting |
| Module 4 | Ward round preparation — AI as peer study partner to stress-test reasoning before supervised rounds |
| Module 5 | Real-time audit of a single patient’s live case log (diet diaries, insulin logs, prescriptions) |
| Module 6 | Registry-level analytics across a population of cases, at nine escalating levels of analytic depth |
| Module 7 | Cross-case learning: treating the registry as a dynamic learning object rather than a static database |
| Module 8 | 12-point QA specification for authoring new Socratic prompts, with regression-testing steps |
| Module 9 | Seven-stage research protocol for a clinician working alone on a single complex case |
Strengths: Theoretically coherent, grounded in Bloom, Fink, Vygotsky, and Croskerry. The 12-point design specification in Module 8 is the standout contribution — importing prompt-engineering QA discipline into a toolkit authored for non-engineer clinicians is genuinely novel at the level of implementation. Framework A’s confidence-building trait set is well-constructed; the requirement that affirmations name the exact reasoning move (not generic praise) is a structural countermeasure to documented LLM sycophancy.
Weaknesses: No pre/post assessments of clinical reasoning quality. No learner satisfaction instruments. No faculty-rated transcript analyses. The claim that the framework produces learning gains is plausible and well-theorised — it is not demonstrated. Framework A’s mandatory affirmation rule also creates an unresolved tension with Bjork’s desirable-difficulty research and Kluger and DeNisi’s feedback intervention theory.
Recommendation: Revise and resubmit to a medical education journal. Needs a pilot study reporting section and direct engagement with the desirable-difficulty literature.
Strengths: Addresses a real gap. The explicit warning that the Module 3 ALERT mechanism does not notify the treating team shows genuine awareness of how non-clinical users behave under acute stress.
Weaknesses: The ALERT mechanism will in practice be experienced as more authoritative than the accompanying note says it is — a human-factors gap. Non-clinical caregivers under stress do not re-read system footers. Data governance is unresolved: real patient data entered into consumer LLM accounts is acknowledged but not solved. India’s 2023 Digital Personal Data Protection Act is not addressed.
Recommendation: Not ready for submission as currently framed. Requires either a formal ethics review or a reframing as a design proposal requiring prospective evaluation rather than an implementation guide.
Strengths: The DIAR architecture paper is the programme’s most technically rigorous and most immediately publishable standalone piece. The centroid drifting problem in monolithic RAG systems applied to clinical narratives is real, well-described, and underappreciated. The proposed validation experiment is appropriately designed.
Weaknesses: Module 6 prompts ask an LLM to report exact registry counts and ranked intervention lists — but without a verified database layer with authenticated query access, those outputs are pattern completion, not data retrieval. This is an architecture problem, not a prompt-engineering problem. DIAR is the proposed solution, but the two components are not explicitly joined in the modules.
Recommendations:
Strengths: The two-tier analysis (Stage 4) is the protocol’s most original contribution — explicitly separating the coded-term view from the narrative view prevents premature closure in a way that conventional case-report methodology does not. The comparative exercise applying the protocol to a routine diabetic ward admission is the smartest move in the entire submission: it pre-empts the objection that the protocol only matters for rare disease.
Weaknesses: The N-of-1 label is the most significant terminological problem. The formal N-of-1 randomised trial tradition (Guyatt 1986, CONSORT extension Vohra 2015) denotes a randomised, within-patient crossover design with statistical analysis. What Module 9 executes is a structured case study with an explicit protocol. Stage 2’s comparator identification relies on an LLM with unknown training data cutoffs and no transparent recall mechanism. Single-rater problem applies acutely.
Recommendation: Strong candidate for a clinical methodology journal. Requires renaming away from N-of-1 (suggest: Structured Single-Case Research Protocol / SSCR), and a second-rater sub-step in Stage 4.
What this work gets right that most AI-in-medicine work does not:
What remains unresolved:
| Component | Recommendation |
|---|---|
| Modules 0–4, 8 + Frameworks A–C | Revise and resubmit; needs pilot study data |
| Modules 2–3 (patient advocacy) | Not yet ready; needs ethics review and human-factors work |
| DIAR architecture paper | Submit as standalone technical note; conduct proposed validation |
| Modules 5–7 (registry analytics) | Revise significantly; resolve verified-backend problem |
| Module 9 + worked case | Strong candidate; rename from N-of-1, add second-rater step |
| Framework D | Publishable as standalone methodology paper on AI debiasing |
| Comparative review within submission | Unusually good; could be extracted as independent commentary |
Not as-is, but closer to PhD-ready than most early-stage projects — with one structural gap: the absence of empirical evidence that the contribution does what it claims to do.
A PhD examiner can legitimately say: “This is an impressive design rationale. It is not a completed research study.”
A small pilot would not just add a data chapter. It would change the category of the entire project from design work to research.
Minimum viable pilot — pedagogical claim (Modules 1, 4):
Minimum viable pilot — Module 9:
For DIAR:
You do not need all three. One well-executed pilot would likely move the project from MPhil to PhD territory.
| Chapter | Content |
|---|---|
| 1 | Introduction and Problem Statement — scarcity of Socratic clinical teaching; AI as oracle vs. interlocutor |
| 2 | Theoretical Framework — Bloom, Fink, Vygotsky, Croskerry, RAG limitations, centroid drifting |
| 3 | Framework Design — Modules 0–9 and Frameworks A–D as design research |
| 4 | Empirical Study 1 — the pilot study |
| 5 | Empirical Study 2 — N-of-1 worked cases with second-rater validation |
| 6 | Critical Evaluation — Framework D and comparative review as formal self-evaluation |
| 7 | Conclusions, Limitations, Future Directions |
The archive at classworkdecjan.blogspot.com (2016–2026) changes the evaluation fundamentally. This is not a sudden burst of productivity.
As an elective portfolio: Exceptional. Most elective portfolios document what a student observed. This one documents what a student thought, argued, built, and published across a decade under a mentor, then synthesised into a structured framework.
As a PhD submission: The blog IS longitudinal empirical data. It documents a learner engaging with the same cases, the same framework, and the same questions across ten years. The question is whether the institution and supervisor are equipped to evaluate it that way.
The right framing is a PhD by published works or practice-based PhD — where a portfolio of original contributions is submitted with a critical synthesis essay. This body of work would fare much better under those criteria than under a conventional five-chapter thesis examination.
The seven-day sprint using LLMs is not a shortcut. It is evidence of knowing how to use new cognitive tools fluently — which is itself one of the things Vibe Rounds argues clinicians need to learn to do. The thesis was demonstrated by writing it.
| Question | Answer |
|---|---|
| Submittable as-is? | No — missing empirical component |
| MPhil standard as-is? | Yes, in most systems |
| PhD standard with small pilot? | Yes, very likely |
| How small can the pilot be? | 8–12 participants, one study, done rigorously |
| Time to PhD-ready with pilot? | 3–5 months if network access exists |
| Biggest single risk | DIAR claims without conducted validation |
| Biggest single strength | The reflexivity and self-critical apparatus — genuinely unusual |
| Best submission format | Papers-based or portfolio PhD, not traditional monograph |
“The decade earned the sprint. The sprint is not a substitute for the decade.”
The volume and quality of self-criticism in this body of work is its most distinctive academic quality. The comparative review — written as an assessment of the framework against the external literature — is better than most independent peer reviews would be. The primary work now is empirical: a pilot study with outcome measures, a conducted validation of the DIAR architecture, and a data governance solution that survives regulatory scrutiny. The theoretical scaffolding is solid. The building needs to be tested under load.