Appendix A — Glossary
This glossary provides definitions for key terms in neuroscience, artificial intelligence, and their intersection. Chapter references are provided where relevant.
A
Action Potential: A rapid rise and fall in membrane potential that propagates along axons to transmit information between neurons. The fundamental unit of neural communication. See Chapter 2.
Activation Function: Mathematical function that determines the output of a neural network node, introducing nonlinearity (e.g., ReLU, sigmoid, tanh). See Chapter 12.
AIC (Akaike Information Criterion): Model selection criterion that balances goodness of fit with model complexity: AIC = 2k - 2ln(L). Lower values indicate better models. See Chapter 10.
Astrocyte: Star-shaped glial cells that support neuronal function, maintain the blood-brain barrier, and modulate synaptic transmission.
Attention Mechanism: Computational technique that allows models to focus on relevant parts of input data by computing weighted combinations. Central to Transformer architectures. See Chapter 14.
Autoencoder: Neural network trained to reconstruct its input through a bottleneck layer, used for dimensionality reduction and feature learning. See Chapter 13.
Axon: Long projection of a neuron that conducts electrical impulses (action potentials) away from the cell body to other neurons.
B
Backpropagation: Algorithm for training neural networks by computing gradients of the loss function with respect to weights using the chain rule. See Chapter 13.
Basal Ganglia: Group of subcortical nuclei involved in motor control, habit formation, reward processing, and action selection. See Chapter 5.
Batch Normalization: Technique to stabilize and accelerate neural network training by normalizing layer inputs across mini-batches.
Bayesian Brain Hypothesis: Theory that the brain represents knowledge as probability distributions and uses Bayesian inference to update beliefs. See Chapter 11.
Bayesian Inference: Statistical method for updating beliefs about hypotheses using Bayes’ rule: P(H|D) = P(D|H)P(H)/P(D). See Chapter 11.
BCM Rule: Bienenstock-Cooper-Munro learning rule with a sliding threshold for synaptic plasticity stability. See Chapter 6.
BERT (Bidirectional Encoder Representations from Transformers): Influential pre-trained language model using bidirectional context. See Chapter 15.
Bias-Variance Tradeoff: Fundamental tradeoff in model fitting where reducing bias (underfitting) may increase variance (overfitting), and vice versa. See Chapter 10.
BIC (Bayesian Information Criterion): Model selection criterion similar to AIC but with stronger penalty for complexity: BIC = k·ln(n) - 2ln(L). See Chapter 10.
Blood-Brain Barrier: Selective barrier preventing most substances in blood from entering the brain, protecting neural tissue.
Brain-Computer Interface (BCI): System enabling direct communication between brain activity and external devices. See Chapter 20.
Broca’s Area: Brain region in frontal lobe crucial for speech production and motor aspects of language.
C
Catastrophic Forgetting: Phenomenon where neural networks forget previously learned tasks when training on new tasks. A key challenge in continual learning. See Chapter 26.
Causal Graph: Directed graph representing causal relationships between variables, where edges indicate direct causal influence. See Chapter 9.
Cerebellum: Brain structure important for motor control, coordination, balance, and motor learning.
CLIP (Contrastive Language-Image Pre-training): Multimodal model learning joint embeddings of images and text. See Chapter 16.
CNN (Convolutional Neural Network): Neural network architecture with convolutional layers for processing grid-like data (images). Inspired by visual cortex. See Chapters 4, 13.
Confounding Variable: Variable that influences both the treatment and outcome, creating spurious correlations. See Chapter 9.
Cortical Column: Vertical organization of neurons in cerebral cortex sharing similar response properties and forming functional units.
Cross-Entropy: Loss function commonly used for classification tasks, measuring difference between predicted and true probability distributions.
Cross-Validation: Model evaluation technique partitioning data into training and validation sets to assess generalization. See Chapter 10.
D
DAG (Directed Acyclic Graph): Graph with directed edges and no cycles, used to represent causal structures. See Chapter 9.
Default Mode Network (DMN): Large-scale brain network active during rest and internally-directed cognition. See Chapter 5.
Dendrite: Tree-like extensions of neurons that receive synaptic inputs and perform local computations, integrating signals.
Diffusion Model: Generative model that learns to reverse a gradual noising process to generate new samples. See Chapter 16.
Divisive Normalization: Canonical neural computation where responses are normalized by summed activity of a pool of neurons.
Do-Calculus: Mathematical framework by Judea Pearl for reasoning about interventions in causal systems. See Chapter 9.
Dopamine: Neurotransmitter involved in reward prediction, motivation, motor control, and learning. Implements reward prediction error signals. See Chapters 5, 12.
Dropout: Regularization technique that randomly deactivates neurons during training to prevent overfitting. See Chapter 13.
Drift-Diffusion Model: Model of perceptual decision-making where evidence accumulates until reaching a threshold. See Chapter 11.
E
Echo State Network: Recurrent network with fixed random connections and trained readout layer, used in reservoir computing.
Efficient Coding Hypothesis: Theory that neural systems maximize information transmission while minimizing resources. See Chapter 7.
Elastic Weight Consolidation (EWC): Continual learning method that protects important weights from large changes. See Chapter 26.
Embedding: Dense vector representation of discrete objects like words or categories in a continuous space. See Chapter 14.
Entropy: Measure of uncertainty or information content in a probability distribution. See Chapter 7.
Entorhinal Cortex: Brain region containing grid cells and serving as interface between hippocampus and neocortex. See Chapter 3.
Executive Control Network (ECN): Brain network involved in attention, working memory, and cognitive control. See Chapter 5.
F
Few-Shot Learning: Learning from very few examples, often using meta-learning or transfer learning. See Chapter 15.
Fine-Tuning: Adapting a pre-trained model to a specific task with additional training on task-specific data. See Chapter 15.
fMRI (Functional Magnetic Resonance Imaging): Brain imaging technique measuring blood oxygen level changes related to neural activity.
Free Energy Principle: Theory that the brain minimizes prediction error (free energy) through perception and action. See Chapter 11.
G
GABA (Gamma-Aminobutyric Acid): Primary inhibitory neurotransmitter in the brain, reducing neural excitability.
GAN (Generative Adversarial Network): Architecture with generator and discriminator networks competing in a minimax game.
Generalized Linear Model (GLM): Flexible framework extending linear regression to non-Gaussian data using link functions. See Chapter 10.
GLU (Gated Linear Unit): Activation function using gating mechanism, common in modern transformers.
Glutamate: Primary excitatory neurotransmitter in the brain, essential for synaptic plasticity.
GPT (Generative Pre-trained Transformer): Family of autoregressive language models trained on next-token prediction. See Chapter 15.
Gradient Descent: Optimization algorithm iteratively moving in direction of steepest decrease of loss function. See Chapter 13.
Grid Cells: Neurons in entorhinal cortex that fire in hexagonal spatial patterns, forming a metric for navigation. See Chapter 3.
H
Hallucination (AI): When generative AI models produce plausible but factually incorrect information. See Chapter 15.
Hebbian Learning: “Cells that fire together wire together” - correlated activity strengthens synaptic connections. See Chapter 6.
Hippocampus: Brain structure crucial for episodic memory formation, spatial navigation, and pattern separation. See Chapter 3.
Hodgkin-Huxley Model: Detailed mathematical model of ionic mechanisms underlying action potential generation. See Chapter 2.
Hyperparameter: Parameter that controls the learning process (e.g., learning rate) rather than being learned from data.
I
Information Theory: Mathematical framework for quantifying information, uncertainty, and communication. See Chapter 7.
Instrumental Variable: Variable used in causal inference that affects treatment but not outcome directly. See Chapter 9.
Integrate-and-Fire Model: Simplified neuron model where membrane potential integrates inputs until threshold triggers spike. See Chapter 2.
Intervention: Act of forcing a variable to a specific value to determine causal effects, denoted do(X). See Chapter 9.
K
KL Divergence (Kullback-Leibler): Measure of difference between two probability distributions. See Chapter 7.
k-Winners-Take-All: Competitive mechanism where k strongest neurons remain active while others are suppressed. See Chapter 6.
L
L1 Regularization (Lasso): Penalty on sum of absolute values of weights, promoting sparsity and feature selection. See Chapter 10.
L2 Regularization (Ridge): Penalty on sum of squared weights, encouraging small but non-zero weights. See Chapter 10.
Lateral Inhibition: Process where activated neurons suppress neighboring neurons, enhancing contrast.
Likelihood: Probability of observed data given a model or hypothesis: P(D|H). See Chapter 11.
LoRA (Low-Rank Adaptation): Parameter-efficient fine-tuning method using low-rank matrix decompositions. See Chapter 15.
Loss Function: Mathematical function measuring difference between model predictions and target values. See Chapter 13.
LSTM (Long Short-Term Memory): RNN architecture with gating mechanisms designed to handle long-term dependencies. See Chapter 14.
LTP (Long-Term Potentiation): Persistent strengthening of synapses based on recent patterns of activity, basis of learning.
M
MAP (Maximum A Posteriori): Bayesian estimate selecting the most probable parameter value given data and prior. See Chapter 11.
Markov Decision Process (MDP): Mathematical framework for sequential decision-making under uncertainty.
Maximum Likelihood Estimation (MLE): Method finding parameters that maximize probability of observed data. See Chapter 10.
Meta-Learning: Learning to learn, where models improve their learning process through experience across tasks. See Chapter 5.
Motion Energy Model: Computational model of motion detection using spatiotemporal filters.
Multi-Head Attention: Parallel attention mechanisms in transformer architectures allowing different representation subspaces. See Chapter 14.
Multisensory Integration: Process of combining information from different sensory modalities. Often follows Bayesian cue integration. See Chapter 11.
Mutual Information: Measure of information shared between two variables. See Chapter 7.
Myelin: Fatty substance insulating axons, speeding neural transmission through saltatory conduction.
N
Neocognitron: Early convolutional neural network architecture inspired by visual cortex, precursor to modern CNNs. See Chapter 4.
Neural Coding: How neurons represent and transmit information through patterns of action potentials. See Chapter 7.
Neuromorphic Computing: Hardware implementations mimicking neural computation principles for energy efficiency. See Chapter 21.
Neuromodulation: Process where neurotransmitters modify neural circuit properties, affecting learning and attention. See Chapter 6.
Neurotransmitter: Chemical messengers transmitting signals across synapses between neurons.
NMDA Receptor: Glutamate receptor crucial for synaptic plasticity, requiring both glutamate and depolarization.
O
Optogenetics: Technique using light to control genetically modified neurons expressing light-sensitive proteins. See Chapter 6.
Overfitting: When a model performs well on training data but poorly on new data due to learning noise. See Chapter 10.
P
Perceptron: Early artificial neural network model capable of linear classification with error-correction learning. See Chapter 1.
Place Cells: Neurons in hippocampus that fire when an animal is in specific spatial locations. See Chapter 3.
Plasticity: Brain’s ability to reorganize and form new neural connections in response to experience. See Chapter 6.
Poisson GLM: Generalized linear model with Poisson distribution, commonly used for spike count data. See Chapter 10.
Population Coding: Information representation through combined activity of neural populations rather than single neurons.
Posterior Distribution: Updated beliefs about parameters after observing data: P(H|D). See Chapter 11.
Predictive Coding: Theory that brain maintains hierarchical models, propagating prediction errors upward. See Chapters 4, 11.
Prefrontal Cortex: Brain region involved in executive functions, planning, decision-making, and working memory.
Prior Distribution: Initial beliefs about parameters before observing data: P(H). See Chapter 11.
Prompt Engineering: Crafting input text to elicit desired behavior from language models. See Chapter 15.
R
Rate Coding: Neural code where information is carried by the firing rate (spikes per second) of neurons.
Receptive Field: Region of sensory space that influences a neuron’s firing. Can be characterized by preferred stimuli. See Chapter 4.
Regularization: Techniques to prevent overfitting by constraining model complexity (e.g., L1, L2, dropout). See Chapter 10.
Reichardt Detector: Model of elementary motion detection through delayed correlation.
Reinforcement Learning: Learning paradigm where agents learn through trial-and-error using reward signals. See Chapter 12.
ReLU (Rectified Linear Unit): Common activation function f(x) = max(0, x), introducing nonlinearity with sparse activations.
Reservoir Computing: Framework using fixed random recurrent networks with only the readout layer trained.
ResNet (Residual Network): Deep architecture with skip connections allowing training of very deep networks. See Chapter 13.
Retina: Light-sensitive tissue at back of eye containing photoreceptors (rods and cones) and performing early visual processing.
RLHF (Reinforcement Learning from Human Feedback): Training method aligning AI systems with human preferences. See Chapter 15.
RNN (Recurrent Neural Network): Network with connections forming cycles, enabling processing of sequential data. See Chapter 14.
S
Salience Network (SN): Brain network detecting and orienting attention to salient internal and external stimuli. See Chapter 5.
Self-Attention: Mechanism allowing each element in a sequence to attend to all other elements. Core of Transformers. See Chapter 14.
Serotonin: Neurotransmitter involved in mood regulation, sleep, appetite, and emotional processing.
SGD (Stochastic Gradient Descent): Optimization algorithm using random samples (mini-batches) to compute gradients. See Chapter 13.
Softmax: Function converting logits to probability distribution: exp(xi) / Σexp(xj).
Sparse Coding: Representation where few neurons are active at once, maximizing information efficiency. See Chapter 4.
Spike: Action potential, a brief electrical pulse generated and propagated by neurons.
Spike-Timing-Dependent Plasticity (STDP): Learning rule where synapse strength changes based on precise timing of pre- and post-synaptic spikes. See Chapter 6.
Supervised Learning: Learning paradigm with labeled training examples mapping inputs to outputs. See Chapter 12.
Synapse: Junction between neurons where information is transmitted chemically or electrically.
T
Temporal Coding: Information carried by precise timing of spikes rather than just firing rates.
Temporal Difference (TD) Learning: Reinforcement learning method learning from differences between successive predictions. See Chapter 12.
Thalamus: Brain structure relaying sensory and motor signals to cerebral cortex, acting as information hub.
Tokenization: Process of converting text into discrete units (tokens) for processing by language models. See Chapter 15.
Transfer Learning: Using knowledge from one task to improve performance on another, related task. See Chapter 15.
Transformer: Architecture using self-attention mechanisms, revolutionary in NLP and now vision. See Chapters 14, 15.
Tuning Curve: Function describing a neuron’s response as a function of stimulus parameters (e.g., orientation tuning). See Chapter 4.
U
Underfitting: When a model is too simple to capture the underlying pattern in data, resulting in poor performance.
Unsupervised Learning: Learning paradigm discovering patterns in data without labeled examples. See Chapter 12.
Utility Function: Function quantifying value or desirability of outcomes in decision-making. See Chapter 11.
V
V1 (Primary Visual Cortex): First cortical area processing visual information, containing orientation-selective cells. See Chapter 4.
VAE (Variational Autoencoder): Generative model learning probabilistic latent representations with encoder-decoder structure.
Vanishing Gradient Problem: Difficulty training deep networks when gradients become very small, preventing learning. See Chapter 14.
Ventral Stream: Visual processing pathway from V1 to temporal cortex for object recognition (“what” pathway). See Chapter 4.
Vision Transformer (ViT): Transformer architecture adapted for image processing using patch-based inputs. See Chapter 16.
W
Weight Decay: Regularization technique adding penalty for large weights, equivalent to L2 regularization.
Wernicke’s Area: Brain region important for language comprehension and semantic processing.
Winner-Take-All (WTA): Competitive mechanism where the strongest input suppresses all others, creating sparse representations.
Working Memory: Cognitive system for temporarily holding and manipulating information, supported by prefrontal cortex. See Chapter 5.
Z
Zero-Shot Learning: Performing tasks without any task-specific training examples, using knowledge from pre-training. See Chapter 15.