Measures and Metrics
Centrality measures
- class semanticlayertools.metric.centralities.CalculateCentralities(clusterFile: str, graphdataPath: str, metadataPath: str, outPath: str, timerange: tuple = (1945, 2004), numberProc: int = 0, debug: bool = False)
Calculate centralities for networks.
- run(centrality: str = 'all', useGC: bool = True, calculateClusters: bool = True)
Run calculation based on Pajek (useGC=True) or NCol (useGC=False) network data.
For centralities choose “all” or one of the following: “authority”, “betweenness”, “closeness”, “degree”. Centralities are normalized by the maximal value per year, where applicable.
Note that for closeness centrality, binning is chosen as normal, while for all other centralities, binning is logarithmic.
- runparallel(centrality: str, useGC: bool = True, calculateClusters: bool = True)
Run parallel centrality calculation based on Pajek (useGC=True) or NCol (useGC=False) network data.
For centralities choose “all” or one of the following: “authority”, “betweenness”, “closeness”, “degree”. Centralities are normalized by the maximal value per year, where applicable.
Note that for closeness centrality, binning is chosen as normal, while for all other centralities, binning is logarithmic.
- setupClusterData(minClusterSize: int = 1000, idcolumn: str = 'nodeID')
Initial gathering of metadata for previously found time clusters.
Set minClusterSize to limit clusters considered for analysis.
For all files in the metadata path, this calls _mergeData if the found year in the filename falls in the bounds.
This step needs to be run once, all cluster metadata is generated and can be reused
Linguistic measures
- semanticlayertools.metric.linguistic.corpusKDL(corpusPath, direction='post', yearColumn='date', languageModel='en_core_web_lg', maxLength=2000000, debug=False)
Calculate Kullback-Leibler divergence between current and next (direction = “post”) or previous (direction = “pre”) time slice of corpus.
- semanticlayertools.metric.linguistic.corpusSurprise(corpusPath, yearColumn='date', languageModel='en_core_web_lg', maxLength=2000000, debug=False)
Calculate surprise for time slices of corpus.