pyvene.models.interventions

pyvene.models.interventions#

Classes

AdditionIntervention(**kwargs)

Intervention the original representations with activation addition.

AutoencoderIntervention(**kwargs)

Intervene in the latent space of an autoencoder.

BasisAgnosticIntervention(**kwargs)

Intervention that will modify its basis in a uncontrolled manner.

BoundlessRotatedSpaceIntervention(**kwargs)

Intervention in the rotated space with boundary mask.

CollectIntervention(**kwargs)

Collect activations.

ConstantSourceIntervention(**kwargs)

Constant source.

DistributedRepresentationIntervention(**kwargs)

Distributed representation.

Intervention(**kwargs)

Intervention the original representations.

InterventionOutput([output, latent])

Output of the IntervenableModel, including original outputs, intervened outputs, and collected activations.

JumpReLUAutoencoderIntervention(**kwargs)

Interchange intervention on JumpReLU SAE's latent subspaces

LocalistRepresentationIntervention(**kwargs)

Localist representation.

LowRankRotatedSpaceIntervention(**kwargs)

Intervention in the rotated space.

NoiseIntervention(**kwargs)

Noise intervention

PCARotatedSpaceIntervention(**kwargs)

Intervention in the pca space.

RotatedSpaceIntervention(**kwargs)

Intervention in the rotated space.

SharedWeightsTrainableIntervention(**kwargs)

Intervention the original representations.

SigmoidMaskIntervention(**kwargs)

Intervention in the original basis with binary mask.

SigmoidMaskRotatedSpaceIntervention(**kwargs)

Intervention in the rotated space with boundary mask.

SkipIntervention(**kwargs)

Skip the current intervening layer's computation in the hook function.

SourcelessIntervention(**kwargs)

No source.

SubtractionIntervention(**kwargs)

Intervention the original representations with activation subtraction.

TrainableIntervention(**kwargs)

Intervention the original representations.

VanillaIntervention(**kwargs)

Intervention the original representations.

ZeroIntervention(**kwargs)

Zero-out activations.