Skip to content
Param Desai
Back to Writing
Explainer 8 min read December 11, 2025

What LIME actually explains — and what it hides

LIME generates a local approximation around one prediction. It does not explain the model globally. Understanding the difference matters enormously for security and medical AI applications.

For: Technically literate non-specialists, ML practitioners, CS grad students
explainable ai security

LIME appears in four of my seven papers. It is the most used tool in my research portfolio. I want to explain exactly what it does — and exactly what it does not do — because the gap between those two things is where the most interesting research questions live.

The core idea

LIME stands for Local Interpretable Model-Agnostic Explanations. The key word is local.

When a machine learning model makes a prediction, LIME answers a specific question: “For this particular input, which features pushed the model toward this output?”

It does not answer: “Which features are most important to the model in general?”

Those are different questions. LIME answers the first one. The second question requires different tools (SHAP with proper baselines, integrated gradients, global sensitivity analysis). Conflating the two is one of the most common misuses of LIME in applied ML.

How LIME actually works

LIME generates a local approximation around a specific prediction by treating the model as a black box and constructing a simpler (linear) model that mimics the black box’s behavior in the neighborhood of the input.

Concretely, for a given input sample x:

  1. Generate perturbed samples in the neighborhood of x — slightly modified versions of x that vary each feature individually
  2. Query the black box with each perturbed sample to get its predictions
  3. Fit a weighted linear model on those perturbed samples, where samples closer to x get higher weight
  4. Read the coefficients of the linear model as the feature importances for this prediction

The output is a list of features with signed importances: “feature A pushed the prediction toward class 1 by +0.3, feature B pushed toward class 0 by -0.2,” etc.

This is a local linear approximation. It is valid in the neighborhood of x. It is not guaranteed to be valid anywhere else.

What “local” means in practice

Consider a nonlinear model with a complex decision boundary. In the neighborhood of a specific point, the boundary might be approximately linear — which is why LIME’s linear approximation is useful there. But a different point, even nearby, might sit in a different part of the decision boundary where the approximation would look completely different.

This means two very similar inputs can generate very different LIME explanations — not because the model is inconsistent, but because the local geometry of the decision surface changes between those two points.

In the CML’25 paper, we saw this directly. Two network traffic instances that were both classified as the same attack type generated LIME explanations with different top features. This is not a LIME failure — it is LIME correctly characterizing that the model uses different features for different instances within the same class.

For a security analyst reading IDS alerts, this creates a practical challenge: you cannot build a general rule from LIME outputs. Each explanation is specific to one classification. Two alerts with the same class label may have arrived there via different feature paths.

Where LIME is strong

Debugging model behavior on specific instances. If a model misclassifies an input, LIME shows which features were responsible. This makes the failure mode legible without requiring access to the model internals.

Auditing for spurious correlations. If a feature that should not be predictive shows up with high LIME importance for a class of predictions, that is evidence of a spurious correlation in the training data. In the CML’25 paper, this is exactly what the misclassified instances revealed.

Forensic identification of anomalous inputs. In Q-PhishNet, LIME was used to identify which training samples had anomalously high feature importances — pointing to adversarially poisoned data. This is an inversion of LIME’s typical use: instead of “explain this prediction to a user,” we used it as “find which training samples generated statistically abnormal explanations.”

Where LIME is weak

Global conclusions from local evidence. A single LIME explanation tells you nothing about the model’s general behavior. One hundred LIME explanations tell you about the model’s behavior on one hundred specific inputs. Aggregating LIME explanations to make global claims requires caution — the aggregation method matters, and the conclusions are only as valid as the sample is representative.

High-dimensional inputs with correlated features. LIME’s perturbation strategy assumes features are approximately independent. When features are strongly correlated (e.g., two network traffic features that always move together), perturbing one without the other creates samples that the model has never seen during training. The explanations it generates for those artificial samples may not reflect the model’s actual behavior.

Faithful approximation for complex neural networks. LIME’s linear approximation is a coarser fit for a highly nonlinear decision surface. For Random Forest (piecewise linear decision boundaries), LIME’s approximation is more faithful than for a deep neural network. This was one of the reasons RF was chosen over a neural network in the HealthCom’25 paper: in a clinical context, the explainability has to be trustworthy, not just present.

The stability problem

LIME is stochastic — it generates different perturbed samples each time it runs. For the same input, two LIME runs can produce different feature importances. The variance depends on the model complexity and the neighborhood size.

For a security application, unstable explanations are a problem: an analyst who runs LIME twice and gets different results for the same alert cannot act confidently on either explanation.

Stabilising LIME (by averaging over multiple runs, or by deterministically sampling the perturbation space) is possible but adds computational cost. In the papers in this portfolio, we used fixed random seeds for LIME runs during evaluation — which produces reproducible results for the reported experiments but does not solve the stability problem in a deployed system with streaming inputs.

Why LIME appears in security research specifically

Security applications have an unusual property: the consequences of a wrong decision are asymmetric. In an IDS context, a missed attack (false negative) is far more costly than a false alarm (false positive). In a clinical context, a missed TB diagnosis is more costly than an unnecessary follow-up X-ray.

LIME is valuable in these contexts not because it improves accuracy, but because it makes the error structure visible. A model with 99.7% accuracy and no explanations has an invisible 0.3% error rate. The same model with LIME has a 0.3% error rate with legible failure modes — which is actually fixable.

The papers in this portfolio treat XAI as a tool for making AI systems trustworthy, not just accurate. LIME is the most tractable tool for that purpose on tabular and image data. Understanding what it can and cannot do is a prerequisite for using it correctly.


LIME appears in the papers CML’25, ICSC’25, HealthCom’25, and GlobeCom’25. Each paper uses it differently — the GlobeCom’25 publication covers the forensic inversion that is the most unusual application.