Models

API documentation for markovclick.models.

Models module which holds MarkovClickstream model.

class markovclick.models.MarkovClickstream(clickstream_list: list = None, prefixed=True)[source]

Builds a Markov chain from input clickstreams.

Parameters:clickstream_list (list) – List of clickstream data. Each page should be encoded as a string, prefixed by a letter e.g. ‘P1’
calc_prob_all_routes_to(clickstream: list, end_page: str, clicks: int, cartesian_product=True)[source]

Calculates the probability given an input sequence of page clicks, to reach the specified end state with the specified number of transitions before the end state.

Parameters:
  • clickstream (list) – List (sequence) of states
  • end_state (str) – Desired end to state to calculate probability towards
  • transitions (int) – Number of transitions to make after input sequence, before reaching end state.
Returns:

Probability

Return type:

float

calc_prob_to_page(clickstream: list, verbose=True) → float[source]

Calculates the probability for a sequence of clicks (clickstream) taking place.

Parameters:
  • clickstream (list) – Sequence of clicks (pages), for which to calculate the probability of occuring.
  • verbose (bool, optional) – Defaults to True. Specifies whether the output is printed to the terminal, or simply provided back.
calculate_pagerank(max_nodes: int = 2, pr_kwargs: dict = {}) → Tuple[networkx.classes.digraph.DiGraph, dict][source]

Calculates the Google PageRank for each of the pages in the Markov chain.

Converts the Markov chain into a directed graph using networkx, and uses its built in functions to calculate the PageRank score for each page represented as a node in the graph.

Parameters:
Returns:

networkx DiGraph object, and associated

PageRank scores for each page (node in DiGraph).

Return type:

Tuple[nx.DiGraph, dict]

static cartesian_product(iterable, repeats=1)[source]

Modifies Python’s itertools.product() function to return a list of lists, rather than list of tuples.

Parameters:
  • iterable (list) – List of iterables to assemble Cartesian product from
  • repeats (int) – Number of elements in each list of the Cartesian product
Returns:

List of lists of Cartesian product

compute_prob_matrix()[source]

Computes the probability matrix for the input clickstream.

count_matrix

Sets attribute to access the count matrix

get_unique_pages(prefixed=True)[source]

Retrieves all the unique pages within the provided list of clickstreams.

initialise_count_matrix()[source]

Initialises an empty count matrix.

static normalise_row(row)[source]

Normalises each row in count matrix, to produce a probability.

To be used when iterating over rows of self.count_matrix. Sum of each row adds up to 1.

Parameters:row – Each row within numpy matrix to act upon.
static permutations(iterable, r=None)[source]

Modification of itertools.permutations() function to yield a mutable list rather than an immutable tuple.

Unlike the Cartesian product, this does not return a sequence with repetitions in it.

populate_count_matrix()[source]

Assembles a matrix of counts of transitions from each possible state, to every other possible state.

prob_matrix

Sets attribute to access the probability matrix