Appendix A — Glossary

This glossary provides definitions for key terms in neuroscience, artificial intelligence, and their intersection. Chapter references are provided where relevant.

A

Action Potential: A rapid rise and fall in membrane potential that propagates along axons to transmit information between neurons. The fundamental unit of neural communication. See Chapter 2.

Activation Function: Mathematical function that determines the output of a neural network node, introducing nonlinearity (e.g., ReLU, sigmoid, tanh). See Chapter 12.

AIC (Akaike Information Criterion): Model selection criterion that balances goodness of fit with model complexity: AIC = 2k - 2ln(L). Lower values indicate better models. See Chapter 10.

Astrocyte: Star-shaped glial cells that support neuronal function, maintain the blood-brain barrier, and modulate synaptic transmission.

Attention Mechanism: Computational technique that allows models to focus on relevant parts of input data by computing weighted combinations. Central to Transformer architectures. See Chapter 14.

Autoencoder: Neural network trained to reconstruct its input through a bottleneck layer, used for dimensionality reduction and feature learning. See Chapter 13.

Axon: Long projection of a neuron that conducts electrical impulses (action potentials) away from the cell body to other neurons.

B

Backpropagation: Algorithm for training neural networks by computing gradients of the loss function with respect to weights using the chain rule. See Chapter 13.

Basal Ganglia: Group of subcortical nuclei involved in motor control, habit formation, reward processing, and action selection. See Chapter 5.

Batch Normalization: Technique to stabilize and accelerate neural network training by normalizing layer inputs across mini-batches.

Bayesian Brain Hypothesis: Theory that the brain represents knowledge as probability distributions and uses Bayesian inference to update beliefs. See Chapter 11.

Bayesian Inference: Statistical method for updating beliefs about hypotheses using Bayes’ rule: P(H|D) = P(D|H)P(H)/P(D). See Chapter 11.

BCM Rule: Bienenstock-Cooper-Munro learning rule with a sliding threshold for synaptic plasticity stability. See Chapter 6.

BERT (Bidirectional Encoder Representations from Transformers): Influential pre-trained language model using bidirectional context. See Chapter 15.

Bias-Variance Tradeoff: Fundamental tradeoff in model fitting where reducing bias (underfitting) may increase variance (overfitting), and vice versa. See Chapter 10.

BIC (Bayesian Information Criterion): Model selection criterion similar to AIC but with stronger penalty for complexity: BIC = k·ln(n) - 2ln(L). See Chapter 10.

Blood-Brain Barrier: Selective barrier preventing most substances in blood from entering the brain, protecting neural tissue.

Brain-Computer Interface (BCI): System enabling direct communication between brain activity and external devices. See Chapter 20.

Broca’s Area: Brain region in frontal lobe crucial for speech production and motor aspects of language.

C

Catastrophic Forgetting: Phenomenon where neural networks forget previously learned tasks when training on new tasks. A key challenge in continual learning. See Chapter 26.

Causal Graph: Directed graph representing causal relationships between variables, where edges indicate direct causal influence. See Chapter 9.

Cerebellum: Brain structure important for motor control, coordination, balance, and motor learning.

CLIP (Contrastive Language-Image Pre-training): Multimodal model learning joint embeddings of images and text. See Chapter 16.

CNN (Convolutional Neural Network): Neural network architecture with convolutional layers for processing grid-like data (images). Inspired by visual cortex. See Chapters 4, 13.

Confounding Variable: Variable that influences both the treatment and outcome, creating spurious correlations. See Chapter 9.

Cortical Column: Vertical organization of neurons in cerebral cortex sharing similar response properties and forming functional units.

Cross-Entropy: Loss function commonly used for classification tasks, measuring difference between predicted and true probability distributions.

Cross-Validation: Model evaluation technique partitioning data into training and validation sets to assess generalization. See Chapter 10.

D

DAG (Directed Acyclic Graph): Graph with directed edges and no cycles, used to represent causal structures. See Chapter 9.

Default Mode Network (DMN): Large-scale brain network active during rest and internally-directed cognition. See Chapter 5.

Dendrite: Tree-like extensions of neurons that receive synaptic inputs and perform local computations, integrating signals.

Diffusion Model: Generative model that learns to reverse a gradual noising process to generate new samples. See Chapter 16.

Divisive Normalization: Canonical neural computation where responses are normalized by summed activity of a pool of neurons.

Do-Calculus: Mathematical framework by Judea Pearl for reasoning about interventions in causal systems. See Chapter 9.

Dopamine: Neurotransmitter involved in reward prediction, motivation, motor control, and learning. Implements reward prediction error signals. See Chapters 5, 12.

Dropout: Regularization technique that randomly deactivates neurons during training to prevent overfitting. See Chapter 13.

Drift-Diffusion Model: Model of perceptual decision-making where evidence accumulates until reaching a threshold. See Chapter 11.

E

Echo State Network: Recurrent network with fixed random connections and trained readout layer, used in reservoir computing.

Efficient Coding Hypothesis: Theory that neural systems maximize information transmission while minimizing resources. See Chapter 7.

Elastic Weight Consolidation (EWC): Continual learning method that protects important weights from large changes. See Chapter 26.

Embedding: Dense vector representation of discrete objects like words or categories in a continuous space. See Chapter 14.

Entropy: Measure of uncertainty or information content in a probability distribution. See Chapter 7.

Entorhinal Cortex: Brain region containing grid cells and serving as interface between hippocampus and neocortex. See Chapter 3.

Executive Control Network (ECN): Brain network involved in attention, working memory, and cognitive control. See Chapter 5.

F

Few-Shot Learning: Learning from very few examples, often using meta-learning or transfer learning. See Chapter 15.

Fine-Tuning: Adapting a pre-trained model to a specific task with additional training on task-specific data. See Chapter 15.

fMRI (Functional Magnetic Resonance Imaging): Brain imaging technique measuring blood oxygen level changes related to neural activity.

Free Energy Principle: Theory that the brain minimizes prediction error (free energy) through perception and action. See Chapter 11.

G

GABA (Gamma-Aminobutyric Acid): Primary inhibitory neurotransmitter in the brain, reducing neural excitability.

GAN (Generative Adversarial Network): Architecture with generator and discriminator networks competing in a minimax game.

Generalized Linear Model (GLM): Flexible framework extending linear regression to non-Gaussian data using link functions. See Chapter 10.

GLU (Gated Linear Unit): Activation function using gating mechanism, common in modern transformers.

Glutamate: Primary excitatory neurotransmitter in the brain, essential for synaptic plasticity.

GPT (Generative Pre-trained Transformer): Family of autoregressive language models trained on next-token prediction. See Chapter 15.

Gradient Descent: Optimization algorithm iteratively moving in direction of steepest decrease of loss function. See Chapter 13.

Grid Cells: Neurons in entorhinal cortex that fire in hexagonal spatial patterns, forming a metric for navigation. See Chapter 3.

H

Hallucination (AI): When generative AI models produce plausible but factually incorrect information. See Chapter 15.

Hebbian Learning: “Cells that fire together wire together” - correlated activity strengthens synaptic connections. See Chapter 6.

Hippocampus: Brain structure crucial for episodic memory formation, spatial navigation, and pattern separation. See Chapter 3.

Hodgkin-Huxley Model: Detailed mathematical model of ionic mechanisms underlying action potential generation. See Chapter 2.

Hyperparameter: Parameter that controls the learning process (e.g., learning rate) rather than being learned from data.

I

Information Theory: Mathematical framework for quantifying information, uncertainty, and communication. See Chapter 7.

Instrumental Variable: Variable used in causal inference that affects treatment but not outcome directly. See Chapter 9.

Integrate-and-Fire Model: Simplified neuron model where membrane potential integrates inputs until threshold triggers spike. See Chapter 2.

Intervention: Act of forcing a variable to a specific value to determine causal effects, denoted do(X). See Chapter 9.

K

KL Divergence (Kullback-Leibler): Measure of difference between two probability distributions. See Chapter 7.

k-Winners-Take-All: Competitive mechanism where k strongest neurons remain active while others are suppressed. See Chapter 6.

L

L1 Regularization (Lasso): Penalty on sum of absolute values of weights, promoting sparsity and feature selection. See Chapter 10.

L2 Regularization (Ridge): Penalty on sum of squared weights, encouraging small but non-zero weights. See Chapter 10.

Lateral Inhibition: Process where activated neurons suppress neighboring neurons, enhancing contrast.

Likelihood: Probability of observed data given a model or hypothesis: P(D|H). See Chapter 11.

LoRA (Low-Rank Adaptation): Parameter-efficient fine-tuning method using low-rank matrix decompositions. See Chapter 15.

Loss Function: Mathematical function measuring difference between model predictions and target values. See Chapter 13.

LSTM (Long Short-Term Memory): RNN architecture with gating mechanisms designed to handle long-term dependencies. See Chapter 14.

LTP (Long-Term Potentiation): Persistent strengthening of synapses based on recent patterns of activity, basis of learning.

M

MAP (Maximum A Posteriori): Bayesian estimate selecting the most probable parameter value given data and prior. See Chapter 11.

Markov Decision Process (MDP): Mathematical framework for sequential decision-making under uncertainty.

Maximum Likelihood Estimation (MLE): Method finding parameters that maximize probability of observed data. See Chapter 10.

Meta-Learning: Learning to learn, where models improve their learning process through experience across tasks. See Chapter 5.

Motion Energy Model: Computational model of motion detection using spatiotemporal filters.

Multi-Head Attention: Parallel attention mechanisms in transformer architectures allowing different representation subspaces. See Chapter 14.

Multisensory Integration: Process of combining information from different sensory modalities. Often follows Bayesian cue integration. See Chapter 11.

Mutual Information: Measure of information shared between two variables. See Chapter 7.

Myelin: Fatty substance insulating axons, speeding neural transmission through saltatory conduction.

N

Neocognitron: Early convolutional neural network architecture inspired by visual cortex, precursor to modern CNNs. See Chapter 4.

Neural Coding: How neurons represent and transmit information through patterns of action potentials. See Chapter 7.

Neuromorphic Computing: Hardware implementations mimicking neural computation principles for energy efficiency. See Chapter 21.

Neuromodulation: Process where neurotransmitters modify neural circuit properties, affecting learning and attention. See Chapter 6.

Neurotransmitter: Chemical messengers transmitting signals across synapses between neurons.

NMDA Receptor: Glutamate receptor crucial for synaptic plasticity, requiring both glutamate and depolarization.

O

Optogenetics: Technique using light to control genetically modified neurons expressing light-sensitive proteins. See Chapter 6.

Overfitting: When a model performs well on training data but poorly on new data due to learning noise. See Chapter 10.

P

Perceptron: Early artificial neural network model capable of linear classification with error-correction learning. See Chapter 1.

Place Cells: Neurons in hippocampus that fire when an animal is in specific spatial locations. See Chapter 3.

Plasticity: Brain’s ability to reorganize and form new neural connections in response to experience. See Chapter 6.

Poisson GLM: Generalized linear model with Poisson distribution, commonly used for spike count data. See Chapter 10.

Population Coding: Information representation through combined activity of neural populations rather than single neurons.

Posterior Distribution: Updated beliefs about parameters after observing data: P(H|D). See Chapter 11.

Predictive Coding: Theory that brain maintains hierarchical models, propagating prediction errors upward. See Chapters 4, 11.

Prefrontal Cortex: Brain region involved in executive functions, planning, decision-making, and working memory.

Prior Distribution: Initial beliefs about parameters before observing data: P(H). See Chapter 11.

Prompt Engineering: Crafting input text to elicit desired behavior from language models. See Chapter 15.

R

Rate Coding: Neural code where information is carried by the firing rate (spikes per second) of neurons.

Receptive Field: Region of sensory space that influences a neuron’s firing. Can be characterized by preferred stimuli. See Chapter 4.

Regularization: Techniques to prevent overfitting by constraining model complexity (e.g., L1, L2, dropout). See Chapter 10.

Reichardt Detector: Model of elementary motion detection through delayed correlation.

Reinforcement Learning: Learning paradigm where agents learn through trial-and-error using reward signals. See Chapter 12.

ReLU (Rectified Linear Unit): Common activation function f(x) = max(0, x), introducing nonlinearity with sparse activations.

Reservoir Computing: Framework using fixed random recurrent networks with only the readout layer trained.

ResNet (Residual Network): Deep architecture with skip connections allowing training of very deep networks. See Chapter 13.

Retina: Light-sensitive tissue at back of eye containing photoreceptors (rods and cones) and performing early visual processing.

RLHF (Reinforcement Learning from Human Feedback): Training method aligning AI systems with human preferences. See Chapter 15.

RNN (Recurrent Neural Network): Network with connections forming cycles, enabling processing of sequential data. See Chapter 14.

S

Salience Network (SN): Brain network detecting and orienting attention to salient internal and external stimuli. See Chapter 5.

Self-Attention: Mechanism allowing each element in a sequence to attend to all other elements. Core of Transformers. See Chapter 14.

Serotonin: Neurotransmitter involved in mood regulation, sleep, appetite, and emotional processing.

SGD (Stochastic Gradient Descent): Optimization algorithm using random samples (mini-batches) to compute gradients. See Chapter 13.

Softmax: Function converting logits to probability distribution: exp(xi) / Σexp(xj).

Sparse Coding: Representation where few neurons are active at once, maximizing information efficiency. See Chapter 4.

Spike: Action potential, a brief electrical pulse generated and propagated by neurons.

Spike-Timing-Dependent Plasticity (STDP): Learning rule where synapse strength changes based on precise timing of pre- and post-synaptic spikes. See Chapter 6.

Supervised Learning: Learning paradigm with labeled training examples mapping inputs to outputs. See Chapter 12.

Synapse: Junction between neurons where information is transmitted chemically or electrically.

T

Temporal Coding: Information carried by precise timing of spikes rather than just firing rates.

Temporal Difference (TD) Learning: Reinforcement learning method learning from differences between successive predictions. See Chapter 12.

Thalamus: Brain structure relaying sensory and motor signals to cerebral cortex, acting as information hub.

Tokenization: Process of converting text into discrete units (tokens) for processing by language models. See Chapter 15.

Transfer Learning: Using knowledge from one task to improve performance on another, related task. See Chapter 15.

Transformer: Architecture using self-attention mechanisms, revolutionary in NLP and now vision. See Chapters 14, 15.

Tuning Curve: Function describing a neuron’s response as a function of stimulus parameters (e.g., orientation tuning). See Chapter 4.

U

Underfitting: When a model is too simple to capture the underlying pattern in data, resulting in poor performance.

Unsupervised Learning: Learning paradigm discovering patterns in data without labeled examples. See Chapter 12.

Utility Function: Function quantifying value or desirability of outcomes in decision-making. See Chapter 11.

V

V1 (Primary Visual Cortex): First cortical area processing visual information, containing orientation-selective cells. See Chapter 4.

VAE (Variational Autoencoder): Generative model learning probabilistic latent representations with encoder-decoder structure.

Vanishing Gradient Problem: Difficulty training deep networks when gradients become very small, preventing learning. See Chapter 14.

Ventral Stream: Visual processing pathway from V1 to temporal cortex for object recognition (“what” pathway). See Chapter 4.

Vision Transformer (ViT): Transformer architecture adapted for image processing using patch-based inputs. See Chapter 16.

W

Weight Decay: Regularization technique adding penalty for large weights, equivalent to L2 regularization.

Wernicke’s Area: Brain region important for language comprehension and semantic processing.

Winner-Take-All (WTA): Competitive mechanism where the strongest input suppresses all others, creating sparse representations.

Working Memory: Cognitive system for temporarily holding and manipulating information, supported by prefrontal cortex. See Chapter 5.

Z

Zero-Shot Learning: Performing tasks without any task-specific training examples, using knowledge from pre-training. See Chapter 15.