Measures and Metrics

Centrality measures

class semanticlayertools.metric.centralities.CalculateCentralities(clusterFile: str, graphdataPath: str, metadataPath: str, outPath: str, timerange: tuple = (1945, 2004), numberProc: int = 0, debug: bool = False)

Calculate centralities for networks.

run(centrality: str = 'all', useGC: bool = True, calculateClusters: bool = True)

Run calculation based on Pajek (useGC=True) or NCol (useGC=False) network data.

For centralities choose “all” or one of the following: “authority”, “betweenness”, “closeness”, “degree”. Centralities are normalized by the maximal value per year, where applicable.

Note that for closeness centrality, binning is chosen as normal, while for all other centralities, binning is logarithmic.

runparallel(centrality: str, useGC: bool = True, calculateClusters: bool = True)

Run parallel centrality calculation based on Pajek (useGC=True) or NCol (useGC=False) network data.

For centralities choose “all” or one of the following: “authority”, “betweenness”, “closeness”, “degree”. Centralities are normalized by the maximal value per year, where applicable.

Note that for closeness centrality, binning is chosen as normal, while for all other centralities, binning is logarithmic.

setupClusterData(minClusterSize: int = 1000, idcolumn: str = 'nodeID')

Initial gathering of metadata for previously found time clusters.

Set minClusterSize to limit clusters considered for analysis.

For all files in the metadata path, this calls _mergeData if the found year in the filename falls in the bounds.

This step needs to be run once, all cluster metadata is generated and can be reused

Linguistic measures

semanticlayertools.metric.linguistic.corpusKDL(corpusPath, direction='post', yearColumn='date', languageModel='en_core_web_lg', maxLength=2000000, debug=False): Calculate Kullback-Leibler divergence between current and next (direction = “post”) or previous (direction = “pre”) time slice of corpus.

semanticlayertools.metric.linguistic.corpusSurprise(corpusPath, yearColumn='date', languageModel='en_core_web_lg', maxLength=2000000, debug=False): Calculate surprise for time slices of corpus.