Models¶
API documentation for markovclick.models
.
Models module which holds MarkovClickstream model.
-
class
markovclick.models.
MarkovClickstream
(clickstream_list: list = None, prefixed=True)[source]¶ Builds a Markov chain from input clickstreams.
Parameters: clickstream_list (list) – List of clickstream data. Each page should be encoded as a string, prefixed by a letter e.g. ‘P1’ -
calc_prob_all_routes_to
(clickstream: list, end_page: str, clicks: int, cartesian_product=True)[source]¶ Calculates the probability given an input sequence of page clicks, to reach the specified end state with the specified number of transitions before the end state.
Parameters: Returns: Probability
Return type:
-
calc_prob_to_page
(clickstream: list, verbose=True) → float[source]¶ Calculates the probability for a sequence of clicks (clickstream) taking place.
Parameters:
-
calculate_pagerank
(max_nodes: int = 2, pr_kwargs: dict = {}) → Tuple[networkx.classes.digraph.DiGraph, dict][source]¶ Calculates the Google PageRank for each of the pages in the Markov chain.
Converts the Markov chain into a directed graph using networkx, and uses its built in functions to calculate the PageRank score for each page represented as a node in the graph.
Parameters: - max_nodes (int) – (Optional, defaults to 2). Specifies the number of edges (pages) to add to the digraph in order of most probable transition.
- pr_kwargs (dict) – (Optional, defaults to empty dictionary.) Dictionary of arguments to provide to the networkx function for calculating PageRank. Refer to https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html for more details.
Returns: - networkx DiGraph object, and associated
PageRank scores for each page (node in DiGraph).
Return type: Tuple[nx.DiGraph, dict]
-
static
cartesian_product
(iterable, repeats=1)[source]¶ Modifies Python’s itertools.product() function to return a list of lists, rather than list of tuples.
Parameters: Returns: List of lists of Cartesian product
-
count_matrix
¶ Sets attribute to access the count matrix
-
get_unique_pages
(prefixed=True)[source]¶ Retrieves all the unique pages within the provided list of clickstreams.
-
static
normalise_row
(row)[source]¶ Normalises each row in count matrix, to produce a probability.
To be used when iterating over rows of self.count_matrix. Sum of each row adds up to 1.
Parameters: row – Each row within numpy matrix to act upon.
-
static
permutations
(iterable, r=None)[source]¶ Modification of itertools.permutations() function to yield a mutable list rather than an immutable tuple.
Unlike the Cartesian product, this does not return a sequence with repetitions in it.
-
populate_count_matrix
()[source]¶ Assembles a matrix of counts of transitions from each possible state, to every other possible state.
-
prob_matrix
¶ Sets attribute to access the probability matrix
-