After generating the intermediary data and populating the database, the pipeline proceeds with the transformation of the canonical graph G into its corresponding line graph L(G). This is achieved through a dedicated function that systematically removes all auxiliary information previously incorporated into G. The nodes of L(G) are then stored in the database, where each entry corresponds to a gReaction along with its associated reactions.
Subsequently, the pipeline performs a comprehensive analysis of each canonical L(G) by computing key graph properties, including radius and diameter. Additionally, various node-specific metrics are calculated, such as connectivity, number of triangles, eccentricity, degree, betweenness, clustering coefficient, closeness coefficient, eigenvector score, and hub score. All computed results are systematically stored in the database for further analysis.
Following this, the pipeline identifies Articulation Points (APs) within L(G), marking them accordingly in the database. Nodes classified as APs are assigned a value of “1”, while non-APs (nAPs) receive a value of “0”.
Finally, to complete the data processing workflow, the pipeline evaluates each canonical pathway in conjunction with available organism-specific pathways. It systematically verifies whether the nodes present in L(G) are also included in the organism-specific pathway, ensuring an accurate mapping of species-specific metabolic variations.
