pyvene.models.backpack_gpt2.modelings_backpack_gpt2.BackpackGPT2PreTrainedModel#

class BackpackGPT2PreTrainedModel(*inputs, **kwargs)[source]#

Bases: GPT2PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

__init__(*inputs, **kwargs)[source]#: Args:

Methods

`__init__`(inputs, *kwargs)	Args:
`active_adapter`()
`active_adapters`()	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`add_adapter`(adapter_config[, adapter_name])	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`add_memory_hooks`()	Add a memory hook before and after each sub-module forward pass to record increase in memory consumption.
`add_model_tags`(tags)	Add custom tags into the model that gets pushed to the Hugging Face Hub.
`add_module`(name, module)	Add a child module to the current module.
`apply`(fn)	Apply `fn` recursively to every submodule (as returned by `.children()`) as well as self.
`bfloat16`()	Casts all floating point parameters and buffers to `bfloat16` datatype.
`buffers`([recurse])	Return an iterator over module buffers.
`can_generate`()	Returns whether this model can generate sequences with .generate() from the GenerationMixin.
`children`()	Return an iterator over immediate children modules.
`compile`(args, *kwargs)	Compile this Module's forward using `torch.compile()`.
`cpu`()	Move all model parameters and buffers to the CPU.
`create_extended_attention_mask_for_decoder`(...)
`cuda`([device])	Move all model parameters and buffers to the GPU.
`delete_adapter`(adapter_names)	Delete an adapter's LoRA layers from the underlying model.
`dequantize`()	Potentially dequantize the model in case it has been quantized by a quantization method that support dequantization.
`disable_adapters`()	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`disable_input_require_grads`()	Removes the _require_grads_hook.
`double`()	Casts all floating point parameters and buffers to `double` datatype.
`enable_adapters`()	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`enable_input_require_grads`()	Enables the gradients for the input embeddings.
`estimate_tokens`(input_dict)	Helper function to estimate the total number of tokens from the model inputs.
`eval`()	Set the module in evaluation mode.
`extra_repr`()	Return the extra representation of the module.
`float`(*args)	Casts all floating point parameters and buffers to `float` datatype.
`floating_point_ops`(input_dict[, ...])	Get number of (optionally, non-embeddings) floating-point operations for the forward and backward passes of a batch with this transformer model.
`forward`(*input)	Define the computation performed at every call.
`from_pretrained`(...[, config, cache_dir, ...])	Instantiate a pretrained pytorch model from a pre-trained model configuration.
`get_adapter_state_dict`([adapter_name, ...])	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`get_buffer`(target)	Return the buffer given by `target` if it exists, otherwise throw an error.
`get_compiled_call`(compile_config)	Return a torch.compile'd version of self.__call__.
`get_extended_attention_mask`(attention_mask, ...)	Makes broadcastable attention and causal masks so that future and masked tokens are ignored.
`get_extra_state`()	Return any extra state to include in the module's state_dict.
`get_head_mask`(head_mask, num_hidden_layers)	Prepare the head mask if needed.
`get_init_context`(is_quantized, ...)
`get_input_embeddings`()	Returns the model's input embeddings.
`get_memory_footprint`([return_buffers])	Get the memory footprint of a model.
`get_output_embeddings`()	Returns the model's output embeddings.
`get_parameter`(target)	Return the parameter given by `target` if it exists, otherwise throw an error.
`get_parameter_or_buffer`(target)	Return the parameter or buffer given by target if it exists, otherwise throw an error.
`get_position_embeddings`()
`get_submodule`(target)	Return the submodule given by `target` if it exists, otherwise throw an error.
`gradient_checkpointing_disable`()	Deactivates gradient checkpointing for the current model.
`gradient_checkpointing_enable`([...])	Activates gradient checkpointing for the current model.
`half`(*args)	Casts all floating point parameters and buffers to `half` datatype.
`init_weights`()	If needed prunes and maybe initializes weights.
`initialize_weights`()	This is equivalent to calling self.apply(self._initialize_weights), but correctly handles composite models.
`invert_attention_mask`(encoder_attention_mask)	Invert an attention mask (e.g., switches 0.
`ipu`([device])	Move all model parameters and buffers to the IPU.
`is_backend_compatible`()
`load_adapter`([peft_model_id, adapter_name, ...])	Load adapter weights from file or remote Hub folder.
`load_state_dict`(state_dict[, strict, assign])	Copy parameters and buffers from `state_dict` into this module and its descendants.
`load_tf_weights`(config, gpt2_checkpoint_path)	Load tf checkpoints in a pytorch model
`modules`()	Return an iterator over all modules in the network.
`mtia`([device])	Move all model parameters and buffers to the MTIA.
`named_buffers`([prefix, recurse, ...])	Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
`named_children`()	Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
`named_modules`([memo, prefix, remove_duplicate])	Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
`named_parameters`([prefix, recurse, ...])	Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
`num_parameters`([only_trainable, ...])	Get number of (optionally, trainable or non-embeddings) parameters in the module.
`parameters`([recurse])	Return an iterator over module parameters.
`post_init`()	A method executed at the end of each Transformer model initialization, to execute code that needs the model's modules properly initialized (such as weight initialization).
`prune_heads`(heads_to_prune)	Prunes heads of the base model.
`push_to_hub`(repo_id[, use_temp_dir, ...])	Upload the model file to the 🤗 Model Hub.
`register_backward_hook`(hook)	Register a backward hook on the module.
`register_buffer`(name, tensor[, persistent])	Add a buffer to the module.
`register_for_auto_class`([auto_class])	Register this class with a given auto class.
`register_forward_hook`(hook, *[, prepend, ...])	Register a forward hook on the module.
`register_forward_pre_hook`(hook, *[, ...])	Register a forward pre-hook on the module.
`register_full_backward_hook`(hook[, prepend])	Register a backward hook on the module.
`register_full_backward_pre_hook`(hook[, prepend])	Register a backward pre-hook on the module.
`register_load_state_dict_post_hook`(hook)	Register a post-hook to be run after module's `load_state_dict()` is called.
`register_load_state_dict_pre_hook`(hook)	Register a pre-hook to be run before module's `load_state_dict()` is called.
`register_module`(name, module)	Alias for `add_module()`.
`register_parameter`(name, param)	Add a parameter to the module.
`register_state_dict_post_hook`(hook)	Register a post-hook for the `state_dict()` method.
`register_state_dict_pre_hook`(hook)	Register a pre-hook for the `state_dict()` method.
`requires_grad_`([requires_grad])	Change if autograd should record operations on parameters in this module.
`reset_memory_hooks_state`()	Reset the mem_rss_diff attribute of each module (see [~modeling_utils.ModuleUtilsMixin.add_memory_hooks]).
`resize_position_embeddings`(...)
`resize_token_embeddings`([new_num_tokens, ...])	Resizes input token embeddings matrix of the model if new_num_tokens != config.vocab_size.
`retrieve_modules_from_names`(names[, ...])
`reverse_bettertransformer`()	Reverts the transformation from [~PreTrainedModel.to_bettertransformer] so that the original modeling is used, for example in order to save the model.
`save_pretrained`(save_directory[, ...])	Save a model and its configuration file to a directory, so that it can be re-loaded using the [~PreTrainedModel.from_pretrained] class method.
`set_adapter`(adapter_name)	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`set_extra_state`(state)	Set extra state contained in the loaded state_dict.
`set_input_embeddings`(value)	Set model's input embeddings.
`set_submodule`(target, module[, strict])	Set the submodule given by `target` if it exists, otherwise throw an error.
`share_memory`()	See `torch.Tensor.share_memory_()`.
`state_dict`(*args[, destination, prefix, ...])	Return a dictionary containing references to the whole state of the module.
`tie_weights`()	Tie the weights between the input embeddings and the output embeddings.
`to`(args, *kwargs)	Move and/or cast the parameters and buffers.
`to_bettertransformer`()	Converts the model to use [PyTorch's native attention implementation](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html), integrated to Transformers through [Optimum library](https://huggingface.co/docs/optimum/bettertransformer/overview).
`to_empty`(*, device[, recurse])	Move the parameters and buffers to the specified device without copying storage.
`train`([mode])	Set the module in training mode.
`type`(dst_type)	Casts all parameters and buffers to `dst_type`.
`warn_if_padding_and_no_attention_mask`(...)	Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.
`xpu`([device])	Move all model parameters and buffers to the XPU.
`zero_grad`([set_to_none])	Reset gradients of all model parameters.

Attributes

`T_destination`
`base_model`	torch.nn.Module: The main body of the model.
`base_model_prefix`
`call_super_init`
`device`	torch.device: The device on which the module is (assuming that all the module parameters are on the same device).
`dtype`	torch.dtype: The dtype of the module (assuming that all the module parameters have the same dtype).
`dummy_inputs`	dict[str, torch.Tensor]: Dummy inputs to do a forward pass in the network.
`dump_patches`
`framework`	str: Identifies that this is a PyTorch model.
`is_gradient_checkpointing`	Whether gradient checkpointing is activated for this model or not.
`is_parallelizable`
`loss_function`
`main_input_name`
`model_tags`
`supports_gradient_checkpointing`
`supports_pp_plan`
`supports_tp_plan`	Returns whether the model has a tensor parallelism plan.
`tp_size`	Returns the model's tensor parallelism degree.
`training`

config_class#: alias of BackpackGPT2Config

pyvene.models.backpack_gpt2.modelings_backpack_gpt2.BackpackGPT2PreTrainedModel

Contents

pyvene.models.backpack_gpt2.modelings_backpack_gpt2.BackpackGPT2PreTrainedModel#