Glossary

Agent: An AI system set up to take actions toward a goal, not just answer a question. An agent can plan steps, use tools, check the result, and try again, often with little or no input between steps. This is the noun behind agentic AI. The distinction that matters for care and for governance is action: a tool that advises leaves the decision and the act with a person, while an agent does the act, which changes who is accountable when something goes wrong and what oversight the system needs.
Agentic AI 3 levels →: AI that does not just produce an output you act on, but takes actions on its own: it sets a goal, plans steps, uses tools, and adapts, with a human reviewing at checkpoints rather than every step.
AI harness: The scaffolding wrapped around an AI model that turns it into something able to do real work. A model on its own just predicts the next piece of text. The harness is everything around it: the tools it can call, the loop that lets it take an action and check the result before acting again, the memory and context it works from, and the guardrails and permissions that limit what it can touch. Think of the model as the engine and the harness as the rest of the car. This matters for safety, because the limits on what an agent is allowed to do often live in the harness, not in the model itself. The word is also used in AI evaluation to mean a test harness, the standardized setup for running a model through benchmarks, but in agentic AI it means the orchestration layer described here.
AI penalty: The finding that disagreeing with an AI recommendation, even when you are clinically right, raises your liability in the eyes of a lay jury, while agreeing with it gives you no protection. A one-way ratchet.
Alert fatigue 3 levels →: When a tool fires so many low-value alerts that clinicians start tuning them all out, including the important ones. A common reason a technically accurate model fails in practice.
Ambient AI scribe 3 levels →: AI that listens to a clinical encounter and drafts the documentation automatically, cutting the time a clinician spends writing notes.
Anchoring: When an early piece of information, such as an AI's suggested diagnosis, pulls your thinking toward it and makes you less likely to consider alternatives.
Artificial general intelligence (AGI): A system that transfers mastery learned in one domain across all others with no task-specific training, matching or surpassing human ability across virtually all cognitive tasks. The goal large tech companies and countries are racing toward. Whether it has already arrived is actively debated: some classify today's leading models as early or "emerging" AGI, while others hold that true general intelligence is still ahead.
Artificial super intelligence: The most speculative level of AI: capability beyond artificial general intelligence that, by definition, exceeds what humans can imagine. Would emerge if AGI grew past anything humanity could achieve on its own.
Attention mechanism 2 levels →: The core component of a transformer that lets the model weigh how much every part of the input matters to every other part. It builds an attention filter from the inputs and uses it to give some words more weight than others when forming a representation.
AUROC 3 levels →: Area under the ROC curve, a common measure of how well a model separates two outcomes (for example disease vs no disease). 1.0 is perfect, 0.5 is a coin toss. A high AUROC does not guarantee the model is useful in a real clinic.
Automation bias: The tendency to over-trust an automated system and stop questioning it. In clinical AI it means accepting a model's output without applying your own judgment.

B

BERT 2 levels →: A well-known encoder-only (autoencoding) transformer, pre-trained with masked language modeling so it reads context bidirectionally. Used for understanding tasks like classification. In healthcare, the BEHRT adaptation was trained on sequential diagnosis codes to predict future diagnoses.
Broad AI: A system that could handle any task at a human level, not just one. Also called strong AI or general AI. The counterpart to narrow AI, which covers the task-specific tools running everywhere today. Whether broad AI has arrived yet is debated: some researchers count today's leading models as early examples, others say true general capability is still ahead.

C

Calibration 2 levels →: Whether a model's stated confidence matches reality. A well-calibrated model that says "70% risk" is right about 70% of the time. A model can discriminate well and still be poorly calibrated, so its numbers mislead.
Computer-aided detection: Older imaging software that marks areas of interest for a clinician to confirm. Branded CAD in the 2000s partly to avoid the word AI after earlier hype. Same basic idea as today's detection-assist tools.
Context window: The amount of text a model can hold in mind at once, counting both what you give it and what it writes back. It is measured in tokens. Once a conversation or document runs past the limit, the model starts losing track of the earliest parts. This explains a common frustration, that a tool seems to forget something said earlier, and it has practical consequences in healthcare: a long record or a lengthy guideline may not fit in one window, so what the model can reason over at any moment is bounded. Larger context windows cost more to run, which is part of why capability and price move together.
Convolutional neural network (CNN)2 levels →: A neural network built for spatial data like images, where each layer looks at a small neighbourhood of the input and global context only builds up in the deeper layers. CNNs are strong on local patterns but weaker at long-range relationships, which is part of why vision transformers are used as an alternative.

D

Decoder 2 levels →: The part of a transformer that generates an output sequence one token at a time from the encoder's representation and the tokens it has produced so far. Its final layer scores every word in the vocabulary, and the highest-scoring word becomes the next one generated.
Decoder-only model 2 levels →: A transformer that uses only the decoder, also called an autoregressive model. Built to generate text by predicting the next token in a sequence. The GPT models are the famous example.
Deep learning 2 levels →: A kind of machine learning that uses layered neural networks to learn patterns from large amounts of data, such as millions of medical images. It is what made image-based AI like diabetic retinopathy screening work.
Deskilling 2 levels →: The gradual loss of a clinician's own skill when a tool does the task for them often enough that the underlying ability fades.
Diffusion model: A generative AI foundation model that creates new images and video by borrowing artistic elements from existing visual content. A visual counterpart to the text-based large language models.
Distillation: A technique for training a smaller, cheaper model to copy the behavior of a larger, more capable one, so you get much of the quality at a fraction of the running cost. The large model acts as a teacher and the small model as a student. This is increasingly how the AI tools people actually use are built, because the biggest models are expensive to run at scale. For a healthcare audience the point is that a small fast model in a product may be a distilled version of a frontier model, and it can carry both the strengths and the blind spots of its teacher.

E

Electronic health record 3 levels →: The digital record of a patient's care. The push to adopt EHRs created the large digital datasets that machine learning in healthcare runs on.
Encoder 2 levels →: The part of a transformer that reads an input sequence and turns it into a fixed-size representation. It stacks several layers, usually six to eight, each combining multi-head self-attention with a feed-forward network, to capture the context of the whole input.
Encoder-only model 2 levels →: A transformer that uses only the encoder, also called an autoencoding model. Built for understanding context, such as classification and sentiment analysis. BERT is the classic example, pre-trained with masked language modeling and fine-tuned by adding classifier layers.
Expert system: An early form of AI built from hand-coded rules, like 1970s systems MYCIN and INTERNIST. Powerful in narrow tests but brittle: it cannot handle anything it was not explicitly programmed for.
External validation 2 levels →: Testing a model on data from a different hospital, region, or population than it was trained on. It is the real test of whether a model generalizes; performance almost always drops compared to internal testing.

F

False discovery rate 2 levels →: Of everyone an AI flags as high risk, the fraction who turn out not to have the condition. It tells you how much weight a flag can actually bear.
False negative 3 levels →: When an AI tool misses something that is actually a problem. False negatives are the dangerous ones in healthcare, because the thing the model failed to flag still needs attention.
False positive 3 levels →: When an AI tool flags something as a problem that turns out not to be one. A high false positive rate means the tool cries wolf often, which can wear down the clinicians acting on its alerts.
Federated learning 3 levels →: A way to train one AI model across several hospitals or sites without moving any patient data out of where it lives. Instead of pooling everyone's records in one place, the model is sent to each site, learns locally, and only the model updates are shared back and combined. It lets AI learn from large, diverse data while the records stay put, which is why it is attractive for privacy-sensitive healthcare data.
Feed-forward network 2 levels →: A fully connected network inside each transformer layer that applies further transformations to the attention output, letting the model capture more complex relationships between tokens. It sits alongside the attention sublayer in both encoder and decoder.
Fine-tuning 2 levels →: The second training stage, where a pre-trained model is further trained on a smaller, task-specific dataset. Because the model already carries general knowledge, fine-tuning converges faster and needs far less labelled data to reach strong performance.
Foundation model 3 levels →: A large model trained broadly on huge datasets that can then be adapted to many tasks, rather than built for one. In imaging, a foundation model can handle several modalities instead of one narrow job.

G

General-purpose AI: An AI model trained on broad data that can handle many different tasks, like ChatGPT or other large language models. Regulators including the EU AI Act use this term (GPAI) for foundation models. General-purpose AI is broad in capability, but it is still not broad AI or AGI: it does not truly understand or transfer skill across any domain the way a person does. It sits on the narrow side of the narrow-vs-broad split.
Generative AI 3 levels →: AI that produces new content, text, images, audio, rather than just classifying or scoring existing data. Large language models are the most familiar example.
GPT 2 levels →: A well-known family of decoder-only (autoregressive) transformers, pre-trained by predicting the next word in a sequence and used to generate text. Fine-tuned to a domain through transfer learning.
Grounding: Tying a model's answer to a trusted source, such as a specific document, record, or database, so it is based on real information rather than generated from memory. Grounding is the main defense against hallucination: instead of letting the model produce a plausible-sounding answer on its own, you make it draw from and point to source material, often using retrieval-augmented generation. For clinicians and the people who buy these tools, a grounded answer that cites where it came from can be checked, which is what makes it safer to act on than an ungrounded one.
Guardrails: The concrete limits built around an AI system to stop it doing things it should not. Guardrails can block certain outputs, refuse certain requests, keep the system from acting outside an approved scope, or force a human check before anything consequential happens. When someone asks "does it have guardrails?" they are really asking what the system is technically prevented from doing, not whether it has been told to behave. For clinicians and the leaders accountable for a tool, this is the difference between a promise and an actual constraint: ask where the guardrail lives and what it blocks, because a guardrail that is only a line in a policy is not a guardrail.

H

Hallucination 3 levels →: When an AI generates content that is fluent and plausible but wrong or never happened, such as a measurement that was never taken or a finding never discussed. In clinical documents this is a patient-safety issue, not a quirk.
Human in the loop 3 levels →: A setup where a person reviews or approves an AI output before it acts. It only adds safety when the human is genuinely engaged with the decision, not just clicking approve.
Human on the loop 2 levels →: An oversight model where a clinician reviews AI work at defined checkpoints rather than approving every single step. The shift agentic systems make from human-in-the-loop, without removing the human.

I

Inference: What happens when a trained model is actually used, taking an input and producing an output. It is the counterpart to training: training is how the model is built, inference is every time it is run afterward. The word comes up around cost and speed, because each use consumes computing power, and a tool that is cheap to use once can be expensive across a whole health system. For clinical use the practical point is that the model is not learning from your case during inference unless it has been specifically set up to, so it is applying what it already learned, not adapting to you in the moment.
Informed consent 2 levels →: A patient's right to understand and agree to their care, including how a recommendation was reached. With AI involved, this is starting to include knowing that AI played a part and what its role was.

J

J-curve: The shape of technology adoption where costs come first and returns arrive later. Organizations that give up during the early dip pay for the learning without ever collecting the reward.

L

Large language model 3 levels →: An AI trained on huge amounts of text that can read and write in plain language. ChatGPT is one. In healthcare they draft notes, summarize, and answer questions, but can be confidently wrong on facts.

M

Machine learning 3 levels →: A way of building software where, instead of writing rules by hand, you show the system many examples and let it find the patterns itself. Most modern clinical AI is machine learning.
Masked attention 2 levels →: Attention in the decoder where each word may only score against the words before it; later words get a score that becomes zero after softmax. This forces the model to predict each token using only earlier ones, never future ones, which is what makes left-to-right text generation work.
Masked language modeling 2 levels →: A self-supervised pre-training task where part of the input tokens are hidden and the model is trained to predict them from the surrounding unmasked tokens. Because it can use context on both sides, it produces bidirectional encoders. It is the task BERT is pre-trained with.
Model Context Protocol (MCP): A shared standard for connecting AI models to the tools, data, and systems they need to do work, so each connection does not have to be built from scratch. You can think of it as a common plug that lets a model and an outside system talk to each other. The term is appearing more in agentic AI conversations because it is becoming a common way to wire models into real software. For decision-makers the relevant point is governance: a standard connection still needs the same scrutiny over what data flows through it and what the model is permitted to do once connected.
Model drift 3 levels →: When an AI model's real-world performance degrades because the patients it now sees no longer match the data it was trained on. It is why models need monitoring and revalidation, not just a one-time launch.
Multi-head attention 2 levels →: Running several attention layers in parallel so each head can form a different filter and focus on a different part of the input. The heads' outputs are concatenated and passed through a linear layer to combine them.
Multimodal AI 3 levels →: AI that takes in more than one type of data at once, for example imaging plus lab values plus clinical notes, and integrates them into a single prediction, closer to how a clinician actually reasons.

N

Narrow AI: Every AI system built so far, past and present, from single-task tools to general-purpose models like ChatGPT. Even models that handle many tasks are still considered narrow by many researchers, because they are built for specific tasks rather than matching human general intelligence. Also called weak AI. The counterpart to broad AI.
Natural language processing 2 levels →: The branch of AI that works with human language: reading clinical notes, transcribing speech, pulling structured meaning out of free text.
Neural network 2 levels →: The layered mathematical structure deep learning uses to find patterns. Loosely inspired by how neurons connect, though it does not actually work like a brain.

O

Operating threshold 3 levels →: The cut-off that decides how readily a predictive tool flags a case. Lower it and you catch more, at the cost of more false alarms; raise it and you cut false alarms, at the cost of missed cases. Where it sits is a clinical decision, not a fixed property of the tool.

P

PACS 2 levels →: Picture archiving and communication system: the digital storage hospitals use for medical images. Because radiology went digital early through PACS, it had the labeled image archives that made imaging AI possible first.
Population stability index: A number that measures how much the population a model sees has shifted away from the population it was built on. A higher value means more drift. It is one of the concrete triggers proposed for when a deployed AI tool needs re-checking: a population stability index above 0.2 is a common threshold for "investigate this". A practical early-warning signal that the data environment around a model has changed.
Position encoding 2 levels →: Extra numbers added to each word so a transformer knows the word order. A transformer reads all words at once, which is fast but loses order, and order changes meaning. Position encodings, built from waveforms, give every position a distinct signature that is concatenated onto the word embedding.
Positive predictive value 3 levels →: Of the patients a model flags as positive, the share who actually have the condition. It depends on how common the condition is, so the same model has different predictive value in different populations.
Post-market surveillance: Watching a tool after it is approved and deployed, to catch problems that only show up in real-world use. For clinical AI it means tracking whether a live tool is still accurate, still processing on time, and still being used as intended, because regulatory approval at launch does not guarantee ongoing performance. Increasingly treated as an obligation rather than best-effort. Closely related to post-deployment monitoring.
Pre-training 2 levels →: The first training stage, where a model learns general representations from a large dataset (often unlabelled, via self-supervision) before being adapted to a specific task. Pre-training ideally uses data from the same domain as the eventual task.
Predictive AI 2 levels →: AI that scores, classifies, or flags from data it was trained on: a CAD tool, a risk score, a triage algorithm. You direct it by selection and configuration, picking the right tool and setting it correctly, rather than by typing an instruction. Distinct from generative AI, which produces new content from a prompt.
Prompt: The instruction or question you give an AI model to get a response. A prompt can be a short request or several paragraphs of context. The model has no goal of its own, it reacts to what it is given, so the wording, the detail, and the framing of the prompt shape the answer you get back. In practice this means two clinicians asking the same model the same clinical question in different words can get meaningfully different answers, which is why how a prompt is written is part of how safely a tool is used.
Prompt engineering: The practice of writing and refining prompts so a model gives more reliable, useful answers. It covers giving clear instructions, supplying the right context, showing examples, and asking for a specific format. For a clinical or decision-making audience the point is not that this is a special skill, but that the quality of an AI tool in real use depends partly on how the prompts behind it are built, and that a vendor demo with carefully engineered prompts may not reflect what a busy user types at 2am.

Q

Query, key, value 2 levels →: The three vectors an attention layer produces from three linear layers. The key and query are multiplied to form an attention filter (a correlation matrix of how words relate), and that filter then weights the value. The recipe is the scaled dot-product: multiply key and query, divide by the square root of the hidden dimension, then softmax.
Quintuple Aim: A checklist of what "value" means in healthcare: better patient outcomes, improved population health, lower cost, better clinician well-being, and greater health equity. It extends the older Triple and Quadruple Aims by adding equity. For AI, it is the test of whether a tool actually helped: if a deployed tool is not moving at least one of the five, the value case is just words.

R

Reasoning model: A type of AI model that works through a problem in steps before giving a final answer, rather than responding in one shot. This step-by-step working is often called chain-of-thought. Reasoning models tend to do better on harder problems that need several linked judgments, which is why they have become a distinct category. The trade-offs matter for healthcare use: they are usually slower and cost more to run, and producing visible reasoning is not the same as being correct, so the steps should be treated as a model showing its work, not as a guarantee.
Recursive self-improvement (RSI): The idea that an AI system can take over part of the work of building the next, better AI, instead of only helping a human researcher. The research loop is propose an idea, build it, run the experiment, check the result, learn from it, pick what to try next. RSI is when machines start running stages of that loop and the output of one generation helps create a stronger next one, with less human involvement each time. It is worth being precise, because the phrase "AI that builds AI" oversells where things actually are. Today RSI mostly means automated engineering and automated research on things that are easy to measure, like making model training faster or cheaper, not AI inventing a brand new model on its own. It is also different from a self-improving agent, which improves its own prompts, tools, and memory but does not rebuild the model. For people accountable for AI in healthcare the relevant shift is about oversight: as more of the loop moves from human hands to machine hands, the human job moves toward setting goals, validating results, and governing the process, which is exactly where the hard questions sit.
Red teaming: Deliberately attacking or stress-testing an AI system to find where it fails before it is deployed, by trying to make it produce harmful, wrong, or out-of-bounds output. The name comes from security, where a "red team" plays the adversary. It is becoming a standard expectation in AI safety and is now appearing in procurement and governance conversations. For decision-makers it is a fair question to ask of any healthcare AI vendor: was this system red-teamed, by whom, and what did they find, because a tool that has never been tested against misuse has unknown failure modes.
Reinforcement learning: Machine learning that learns the best actions by trial and error in its environment, guided by rewards. A robot vacuum learns the layout of a home this way, memorising every misstep to improve the next run.
Retrieval-augmented generation 3 levels →: A method that grounds an AI's answers in specific, vetted documents instead of its general memory, which reduces hallucination. Often shortened to RAG.
ROI: Return on investment: what you get back relative to what you spent. For healthcare AI this includes clinical, workforce, and financial returns, not just cost savings.

S

Search and planning algorithms: A kind of symbolic AI that follows fixed logic to explore possible paths toward a goal, without learning from past attempts. Amazon's warehouse robots use search algorithms to find the most efficient route between items.
Self-attention 2 levels →: Attention where the query, key, and value all come from the same input, so a sequence attends to itself. It is how a transformer works out the contextual relationships among the words within one sequence.
Self-supervised learning 2 levels →: Training a model on large amounts of unlabelled data by having it predict parts of its own input, so no expensive expert annotations are needed. It is the usual first stage for transformers, producing general representations that a later fine-tuning step adapts to a specific task.
Sensitivity 3 levels →: Of the patients who truly have a condition, the share the model correctly flags. High sensitivity means it misses few real cases.
Sequence-to-sequence model 2 levels →: A transformer that uses both encoder and decoder, also called an encoder-decoder model. Built to turn one sequence into another, for tasks like translation, summarisation, and speech recognition. T5 and BART are the well-known examples.
Simple rule-based systems: The most basic kind of symbolic AI: a system that automates a single decision using basic if-then logic. A light-sensing plug that switches on outdoor lights at sunset, or a basic programmable thermostat. Very limited, very good at the one thing it does.
Softmax 2 levels →: A function that turns a set of raw scores into a probability distribution that sums to one. Transformers use it inside attention to normalise the filter, and at the final layer to turn vocabulary scores into the probability of each possible next word.
Software as a medical device 3 levels →: Software that is regulated as a medical device because its intended use is to diagnose, treat, prevent, or inform clinical decisions. What the software claims to do often matters more for regulation than how it is built.
Specificity 3 levels →: Of the patients who do not have a condition, the share the model correctly clears. High specificity means few false alarms.
Standard of care 2 levels →: The level of care a reasonably competent clinician would provide in a given situation. It is the legal benchmark used to judge negligence, and it shifts over time as technology and practice change.
Supervised learning 3 levels →: Machine learning that learns patterns from data humans have already labelled, then makes decisions on new data. The phone face-recognition setup is a clean example: tilting your face labels each frame, and the phone uses that labelled data to recognise you later.
Swin transformer 2 levels →: A hierarchical vision transformer that processes images in shifted local windows, making it efficient and well suited to dense tasks like medical image segmentation. Swin UNETR pairs a Swin encoder with a UNet-style decoder for 3D segmentation.
Symbolic AI: AI that follows explicit rules coded in advance for every anticipated outcome, without learning. Also called rule-based AI. Splits into simple rule-based systems, expert systems, and search and planning algorithms. The counterpart to machine learning inside narrow AI.
System prompt: A fixed set of instructions that shapes how an AI model behaves before the user types anything. It sets the role, the rules, the tone, and often the safety limits the model should follow in every conversation. Two organizations using the same underlying model can get very different behavior because their system prompts differ. For people accountable for a deployed tool this matters: a lot of what makes a healthcare AI product safe or unsafe is decided in the system prompt and the guardrails, not in the model itself, so it is fair to ask to see it.

T

Token 2 levels →: A small chunk of an input sequence, often a word or part of a word, that a transformer processes as one unit. The input is broken into tokens, each is turned into numbers, and attention then works out how the tokens relate.
Tool use: When an AI model does more than produce text by calling on outside tools, such as searching a database, looking up a record, running a calculation, or sending a request to another system. Tool use is what lets a model act in the real world rather than just describe it, and it is the core of how agents get things done. For a healthcare audience this is where capability and risk both rise sharply: the moment a model can reach into a real system, what it is allowed to touch, and what it is blocked from touching, becomes a safety question, not just a feature.
Traditional machine learning: Machine learning that uses classical statistical methods and human-selected features on structured data (tables, spreadsheets, databases) to find patterns, classify, and predict. Techniques include clustering and k-nearest-neighbour. Distinct from deep learning, which handles unstructured data.
Training data 2 levels →: The examples an AI model learns from. A model can only recognize patterns that were present in its training data, which is why gaps and bias in that data carry straight into the model.
Transfer learning 2 levels →: Reusing what a model learned on one task as the starting point for another, by loading the pre-trained weights into a downstream model. It is how the knowledge from a large pre-training run is carried into a specific clinical task with limited labelled data.
Transformer model 2 levels →: A neural network architecture that reads a whole sequence at once instead of one step at a time. It uses attention to weigh how much every part of the input matters to every other part, which lets it train in parallel and handle long-range context that older recurrent networks lost track of. Transformers are the architecture behind modern large language models and most current medical-imaging foundation models.

U

Unstructured data 2 levels →: Information that does not fit neatly into rows and columns: free-text notes, images, waveforms, audio. Roughly 80% of healthcare data is unstructured, and it is both the richest and the hardest for AI to use.
Unsupervised learning: Machine learning that finds patterns in data with no human labels, forming its own groupings. Used in bank fraud detection and the Netflix recommendation system, which sorts viewers from what they watch, finish, and skip.

V

Validated population 2 levels →: The patient group a tool was actually tested on. Performance inside that group is evidenced; performance outside it is unknown, not assumed. Checking whether the patient in front of you fits the validated population is part of using any AI tool well.
Vision transformer 2 levels →: A transformer applied to images rather than text, used as an alternative to convolutional networks (CNNs). It splits an image into patches and uses attention to relate them, borrowing the transformer advantages of scaling to large data and training end to end without hand-designed layers. Vision transformers are increasingly common in radiology and pathology AI.

W

Word embedding 2 levels →: A fixed-length vector of trainable numbers that represents a word so a computer can work with it. Related words end up with similar values in some dimensions: apple sits near orange, both share something with juice, while airplane sits far from all of them.
World model: An AI system that builds an internal model of how a situation works and how it changes over time, so it can predict what happens next and plan actions, rather than only matching patterns in text or images. The idea is to give a system something closer to a working understanding of an environment, the way a person carries a rough mental map of a room. World models are central to robotics, simulation, and physical AI, where a system has to anticipate consequences before it acts. For a healthcare audience this is still mostly a frontier and research topic rather than everyday clinical software, but it is worth recognizing, because it is the direction behind a lot of the robotics and embodied-AI work that is starting to reach into care settings.

Z

Zero-shot 2 levels →: When a model handles a task it was never specifically trained for, using only its general training. A zero-shot imaging model can flag a finding without task-specific examples for that exact use case.

Want the full picture?

Every one of these terms shows up in HelloAI courses: bite-sized modules, real clinical scenarios, verifiable certificates. An account takes one minute, no password needed.