Models¶

API documentation for markovclick.models.

Models module which holds MarkovClickstream model.

class markovclick.models.MarkovClickstream(clickstream_list: list = None, prefixed=True)[source]¶

Builds a Markov chain from input clickstreams.

Parameters:	clickstream_list (list) – List of clickstream data. Each page should be encoded as a string, prefixed by a letter e.g. ‘P1’

calc_prob_all_routes_to(clickstream: list, end_page: str, clicks: int, cartesian_product=True)[source]¶

Calculates the probability given an input sequence of page clicks, to reach the specified end state with the specified number of transitions before the end state.

Parameters:	clickstream (list) – List (sequence) of states end_state (str) – Desired end to state to calculate probability towards transitions (int) – Number of transitions to make after input sequence, before reaching end state.
Returns:	Probability
Return type:	float

calc_prob_to_page(clickstream: list, verbose=True) → float[source]¶

Calculates the probability for a sequence of clicks (clickstream) taking place.

Parameters:	clickstream (list) – Sequence of clicks (pages), for which to calculate the probability of occuring. verbose (bool, optional) – Defaults to True. Specifies whether the output is printed to the terminal, or simply provided back.

calculate_pagerank(max_nodes: int = 2, pr_kwargs: dict = {}) → Tuple[networkx.classes.digraph.DiGraph, dict][source]¶

Calculates the Google PageRank for each of the pages in the Markov chain.

Converts the Markov chain into a directed graph using networkx, and uses its built in functions to calculate the PageRank score for each page represented as a node in the graph.

Parameters:

max_nodes (int) – (Optional, defaults to 2). Specifies the number of edges (pages) to add to the digraph in order of most probable transition.
pr_kwargs (dict) – (Optional, defaults to empty dictionary.) Dictionary of arguments to provide to the networkx function for calculating PageRank. Refer to https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html for more details.

Returns:

networkx DiGraph object, and associated: PageRank scores for each page (node in DiGraph).

Return type:

Tuple[nx.DiGraph, dict]

static cartesian_product(iterable, repeats=1)[source]¶

Modifies Python’s itertools.product() function to return a list of lists, rather than list of tuples.

Parameters:	iterable (list) – List of iterables to assemble Cartesian product from repeats (int) – Number of elements in each list of the Cartesian product
Returns:	List of lists of Cartesian product

compute_prob_matrix()[source]¶: Computes the probability matrix for the input clickstream.

count_matrix¶: Sets attribute to access the count matrix

get_unique_pages(prefixed=True)[source]¶: Retrieves all the unique pages within the provided list of clickstreams.

initialise_count_matrix()[source]¶: Initialises an empty count matrix.

static normalise_row(row)[source]¶

Normalises each row in count matrix, to produce a probability.

To be used when iterating over rows of self.count_matrix. Sum of each row adds up to 1.

Parameters:	row – Each row within numpy matrix to act upon.

static permutations(iterable, r=None)[source]¶

Modification of itertools.permutations() function to yield a mutable list rather than an immutable tuple.

Unlike the Cartesian product, this does not return a sequence with repetitions in it.

populate_count_matrix()[source]¶: Assembles a matrix of counts of transitions from each possible state, to every other possible state.

prob_matrix¶: Sets attribute to access the probability matrix