Utility functions for visualizations
The usage of some of these methods requires installing the package with the extra requirements for text embedding and clustering
1pip install semanticlayertools[embeddml]
Representing temporal cluster evolution with a streamgraph
This utility function is meant to support the visualization of calculated temporal clusters. Parameters to vary are the smoothing (bool) and the minimal cluster size to consider (default=1000).
1streamgraph(file, smooth, minClusterSize)
Embedding a text corpus in 2 dimensions
Meant to be used to visualize a corpus on 2D by embedding a text column using the SentenceTransformer approach of SBERT and UMAP. Time consuming method!
1embeddedTextPlotting(infolderpath, columnName, outpath, umapNeighors)
Clustering texts using SentenceEmbedding
Similar to the above method but extended to help finding large scale structures of a given text corpus. Similar to topic modelling, in addition makes use of HDBSCAN clustering. Reuses previously generated embedding of corpus.
1embeddedTextClustering(
2 infolderpath, columnName, embeddingspath, outpath,
3 umapNeighors, umapComponents, hdbscanMinCluster
4)
See also
Generate citation and reference tree graph
Using the Dimensions AI dataset, this routine generates a structure starting from a source publications, that represents its references and their references as well as its citations and their citations. With this means, visualizations of it show academic roots and conduits and can display disciplinary pathways.
- class semanticlayertools.visual.citationnet.GenerateTree(verbose: bool = False, api_key='')
Generate tree for citationent visualization.
For a given input document, its references and citations are evaluated. In a second step, citations of citations and references of references are extracted. This information is used to generate a tree like network for visualization.
- _cleanTitleString(row)
Clean non-JSON characters from titles.
Removes newline characters, double backslashes and quoted ‘”’.
- _editDF(inputdf, dftype='cite_l1', level2List=None)
Return reformated dataframe.
- _formatFOR(row)
Format existing FOR codes.
Each publication has a total value of one. Only first level parts of codes are counted. If no FOR code exist, return ‘00:1’.
Example: “02, 0201, 0204, 06” yields “02:0.75;06:025”
- _getMissing(idlist)
Get metadata for second level reference nodes.
- generateNetworkFiles(outfolder)
Generates JSON with nodes and edges lists.
- query(startDoi='', citationLimit=100)
- returnLinks()
Return all links as dataframe.
Plotting routines for 3D and stream- graphs
A 3d routine generates multiplex or multilayer network plots from sets of dataframes. Uses edge bundling for more clear visuals and allows manual setting of cluster colors.
Another routine creates 3D graphs for clustered centralities measures.
To compare found time cluster a third routine plots streamgraphs of the clustersizes across time.