att_viz package

Submodules

att_viz.attention_aggregation_method module

class att_viz.attention_aggregation_method.AttentionAggregationMethod(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Represents the possible attention aggregation methods. The supported aggregation methods are:

  • NONE: no dimension collapses

  • HEADWISE_AVERAGING: head dimension is collapsed

HEADWISE_AVERAGING = 2

Represents the headwise averaging aggregation method - the head dimension of the attention matrix collapses, while the layer dimension is kept.

NONE = 1

Represents the empty aggregation method - all layer and head dimensions of the attention matrix are kept.

att_viz.attention_matrix module

class att_viz.attention_matrix.AttentionMatrix(attention_matrix)

Bases: object

Represents a self-attention matrix, recording the number of layers and heads, as well as whether it has been formatted for visualization or not.

format(aggr_method: AttentionAggregationMethod, zero_first_attention: bool) None

Formats the wrapped attention matrix for HTML visualization, aggregating it based on the specified aggregation method.

Parameters:
  • aggr_method – the aggregation method of the attention matrix. See AttentionAggregationMethod.

  • zero_first_attention – whether to ignore self attention values towards the first token.

att_viz.renderer module

class att_viz.renderer.RenderConfig(y_margin: float = 30, line_length: float = 770, num_chars_block: float = 11, token_width: float = 110, min_token_width: float = 20, token_height: float = 22.5, x_margin: float = 20, matrix_width: float = 115)

Bases: object

Rendering configuration class for specifying JavaScript preferences.

class att_viz.renderer.Renderer(render_config: RenderConfig, aggregation_method: AttentionAggregationMethod = AttentionAggregationMethod.NONE)

Bases: object

Renderering class for visualizing self-attention matrices.

create_token_info(tokens: list[str]) tuple[list[tuple[int, int, int, int]], float]

Used for js visualization. Computes the (x, y) coordinates for a list of tokens.

Parameters:

tokens – an array of tokens

Returns:

a pair (res, dy) where res contains the (x, y) coordinates of the given tokens, and dy is the total height of the computed token sequence.

render(tokens: list[str], prompt_length: int, attention_matrix: AttentionMatrix, prettify_tokens: bool = True, render_in_chunks: bool = True) None

Creates and saves one or more interactive HTML visualizations of the given attention matrix.

Parameters:
  • tokens – the list of tokens of the prompt and model completion

  • prompt_length – the length of the prompt in tokens

  • attention_matrix – a formatted AttentionMatrix (see AttentionMatrix.format)

  • prettify_tokens – indicates whether to remove special characters in tokens, e.g. Ġ. (default True)

  • render_in_chunks – indicates whether to render in chunks or not (default True)

att_viz.self_attention_model module

class att_viz.self_attention_model.SelfAttentionModel(model_name_or_directory: str)

Bases: object

Wrapper for a self-attention model.

Loads and stores the model and its corresponding tokenizer.

generate_text(prompt: str, max_new_tokens: int = 512, save_prefix: str | None = None, prompt_template: str | None = 'user\n{p}<|endoftext|>\nassistant\n') tuple[list[str], AttentionMatrix, int]

Generates text and returns the completion, attention matrix, and prompt length (in tokens).

Args:

prompt: the prompt to use for text generation

max_new_tokens: the maximum number of tokens to be generated

save_prefix: the prefix to use if saving the computation results (default None)

prompt_template: the prompt template to use for text generation (default: `”user

{p}<|endoftext|> assistant “`)

Returns:

the generated completion (as a string and as a list of tokens), attention matrix, and prompt length (in tokens)

load_model(model_name_or_directory: str) tuple[AutoModelForCausalLM, AutoTokenizer]

Loads and returns a HuggingFace pretrained model and the corresponding tokenizer.

Parameters:

model_name_or_directory – the name of the model to load, or alternatively the directory from which to load the model

Returns:

the loaded model and tokenizer

att_viz.utils module

class att_viz.utils.Experiment(model: SelfAttentionModel, renderer: Renderer)

Bases: object

A simple attention visualization experiment.

basic_experiment(prompt: str, aggr_method: AttentionAggregationMethod) None

A simple text generation experiment, in which the resulting prompt-completion pair is visualized with self-attention information in HTML format.

Parameters:
  • prompt – the prompt to use for text generation

  • aggr_method – the aggregation method of the attention matrix. See AttentionAggregationMethod

att_viz.utils.process_saved_completions(render_config: RenderConfig, aggregation_method: AttentionAggregationMethod, save_prefixes: list[str]) None

Render inference results obtained using save_completions.

Parameters:
  • render_config – the rendering configuration. See RenderConfig.

  • aggregation_method – the aggregation method of the attention matrix. See AttentionAggregationMethod

  • save_prefixes – the list of save prefixes that have been used for storing inference results

att_viz.utils.save_completions(model_name_or_directory: str, prompts: list[str], save_prefixes: list[str]) None

Load a self-attention model and do inference for the given prompts.

Parameters:
  • model_name_or_directory – the name of the model to load, or alternatively the directory from which to load the model

  • prompts – the list of prompts to use for text generation

  • save_prefixes – the list of save prefixes to use for storing inference results - should have the same length as prompts

Module contents