Measures and Metrics

Centrality measures

class semanticlayertools.metric.centralities.CalculateCentralities(clusterFile: str, graphdataPath: str, metadataPath: str, outPath: str, timerange: tuple = (1945, 2004), numberProc: int = 0, debug: bool = False)

Calculate centralities for networks.

run(centrality: str = 'all', useGC: bool = True, calculateClusters: bool = True)

Run calculation based on Pajek (useGC=True) or NCol (useGC=False) network data.

For centralities choose “all” or one of the following: “authority”, “betweenness”, “closeness”, “degree”. Centralities are normalized by the maximal value per year, where applicable.

Note that for closeness centrality, binning is chosen as normal, while for all other centralities, binning is logarithmic.

runparallel(centrality: str, useGC: bool = True, calculateClusters: bool = True)

Run parallel centrality calculation based on Pajek (useGC=True) or NCol (useGC=False) network data.

For centralities choose “all” or one of the following: “authority”, “betweenness”, “closeness”, “degree”. Centralities are normalized by the maximal value per year, where applicable.

Note that for closeness centrality, binning is chosen as normal, while for all other centralities, binning is logarithmic.

setupClusterData(minClusterSize: int = 1000, idcolumn: str = 'nodeID')

Initial gathering of metadata for previously found time clusters.

Set minClusterSize to limit clusters considered for analysis.

For all files in the metadata path, this calls _mergeData if the found year in the filename falls in the bounds.

This step needs to be run once, all cluster metadata is generated and can be reused

Linguistic measures

semanticlayertools.metric.linguistic.corpusKDL(corpusPath, direction='post', yearColumn='date', languageModel='en_core_web_lg', maxLength=2000000, debug=False)

Calculate Kullback-Leibler divergence between current and next (direction = “post”) or previous (direction = “pre”) time slice of corpus.

semanticlayertools.metric.linguistic.corpusSurprise(corpusPath, yearColumn='date', languageModel='en_core_web_lg', maxLength=2000000, debug=False)

Calculate surprise for time slices of corpus.