Title: | Calculates Basic Network Measures Commonly Used in Network Medicine |
---|---|
Description: | Calculates network measures commonly used in Network Medicine. Measures such as the Largest Connected Component, the Relative Largest Connected Component, Proximity and Separation are calculated along with their statistical significance. Significance can be computed both using a degree-preserving randomization and non-degree preserving. |
Authors: | Deisy Morselli Gysi [aut, cre] |
Maintainer: | Deisy Morselli Gysi <[email protected]> |
License: | GPL-2 |
Version: | 1.0.1 |
Built: | 2024-11-14 04:29:34 UTC |
Source: | https://github.com/cran/NetSci |
Calculates the average proximity from a set of targets to a set of source nodes. It is calculate using a degree preserving randomization. It is calculated as described in Guney, E. et al (2016) <doi.org:10.1038/ncomms10331>
avr_proximity_multiple_target_sets( set, G, ST, source, N = 1000, bins = 100, min_per_bin = 20, weighted = FALSE )
avr_proximity_multiple_target_sets( set, G, ST, source, N = 1000, bins = 100, min_per_bin = 20, weighted = FALSE )
set |
Name of the sets you have targets for. (In a drug-target setup, those would be the drugs of interest). |
G |
The original graph (often an interactome). |
ST |
Set-Target data. It is a data.frame with two columns. ID and Target. |
source |
The source nodes (disease genes). |
N |
Number of randomizations. |
bins |
the number os bins for the degree preserving randomization. |
min_per_bin |
the minimum size of each bin. |
weighted |
consider a weighted graph? TRUE/FALSE |
proximity and its significance based on the degree preserving randomization.
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) net$weight = runif(nrow(net)) g <- igraph::graph_from_data_frame(net, directed = FALSE ) S = c("N", "A", "F", "I") T1 = data.frame(ID = "T1", Target = c("H", "M")) T2 = data.frame(ID = "T2", Target = c("G", "O")) avr_proximity_multiple_target_sets(set = c('T1', 'T2'), G = g, source = S, ST = rbind(T1,T2), bins = 1, min_per_bin = 2) # In a weighted graph # avr_proximity_multiple_target_sets(set = c('T1', 'T2'), # G = g, # source = S, # ST = rbind(T1,T2), # bins = 1, # min_per_bin = 2, # weighted = TRUE)
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) net$weight = runif(nrow(net)) g <- igraph::graph_from_data_frame(net, directed = FALSE ) S = c("N", "A", "F", "I") T1 = data.frame(ID = "T1", Target = c("H", "M")) T2 = data.frame(ID = "T2", Target = c("G", "O")) avr_proximity_multiple_target_sets(set = c('T1', 'T2'), G = g, source = S, ST = rbind(T1,T2), bins = 1, min_per_bin = 2) # In a weighted graph # avr_proximity_multiple_target_sets(set = c('T1', 'T2'), # G = g, # source = S, # ST = rbind(T1,T2), # bins = 1, # min_per_bin = 2, # weighted = TRUE)
Extract LCC from a graph
extract_LCC(g)
extract_LCC(g)
g |
is the graph you want to extract the largest connected component |
a graph (from igraph) with only the largest connected component
set.seed(12) x = data.frame(n1 = sample(LETTERS[1:5]), n2 = sample(LETTERS[1:20])) g = igraph::graph_from_data_frame(x, directed = FALSE) g = igraph::simplify(g) LCC = extract_LCC(g)
set.seed(12) x = data.frame(n1 = sample(LETTERS[1:5]), n2 = sample(LETTERS[1:20])) g = igraph::graph_from_data_frame(x, directed = FALSE) g = igraph::simplify(g) LCC = extract_LCC(g)
Plots the histogram to evaluate the significance of the Largest Connected Component (LCC).
Histogram_LCC(LCC_L, Name = NULL)
Histogram_LCC(LCC_L, Name = NULL)
LCC_L |
an output from the function LCC_Significance or LCC_Bipartide |
Name |
title of the plot |
An Histogram of the simulated LCC, and a red line of the actual LCC.
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) g <- igraph::graph_from_data_frame(net, directed = FALSE ) targets = c("N", "A", "I", "F") LCC_Out = LCC_Significance(N = 1000, Targets = targets, G = g, bins = 5, min_per_bin = 2) # in a real interactome, please use the default Histogram_LCC(LCC_Out, "Example")
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) g <- igraph::graph_from_data_frame(net, directed = FALSE ) targets = c("N", "A", "I", "F") LCC_Out = LCC_Significance(N = 1000, Targets = targets, G = g, bins = 5, min_per_bin = 2) # in a real interactome, please use the default Histogram_LCC(LCC_Out, "Example")
Calculates the significance of an overlap of two sets using an hypergeometric test. It is a wrapper of the 'phyper' function.
Hypergeometric.test( success, universe_success, universe_failure, size_collected, lower.tail = FALSE )
Hypergeometric.test( success, universe_success, universe_failure, size_collected, lower.tail = FALSE )
success |
Is the number of elements in the overlap of the sets. |
universe_success |
Is the number of elements of the set of interest. |
universe_failure |
Is the number of elements of the set of the other set. |
size_collected |
The total of elements in the universe |
lower.tail |
Should the test be calculated on the lower tail? (Hypothesis test is lower than) |
the p-value for the hypergeometric test.
require(magrittr) s = 10; S = 15; f = 10; T = 30 Hypergeometric.test(success = s, universe_success = S, universe_failure = f, size_collected = T )
require(magrittr) s = 10; S = 15; f = 10; T = 30 Hypergeometric.test(success = s, universe_success = S, universe_failure = f, size_collected = T )
Calculates the Jaccard index between different sets.
Jaccard(Data)
Jaccard(Data)
Data |
A data.frame with 2 columns. The first refers to the set and the second the elements |
a data.frame with the set names and their Jaccard index
set.seed(123) Data = data.frame(Class = sample(c("X", "Y", "Z"), replace = TRUE, size = 50), Element = sample(LETTERS[1:15], replace = TRUE, size = 50)) Data = unique(Data) Jaccard(Data)
set.seed(123) Data = data.frame(Class = sample(c("X", "Y", "Z"), replace = TRUE, size = 50), Element = sample(LETTERS[1:15], replace = TRUE, size = 50)) Data = unique(Data) Jaccard(Data)
Calculates the Largest Connected Component (LCC) from a given graph, and calculates its significance using a degree preserving approach. Menche, J., et al (2015) <doi.org:10.1126/science.1065103>
LCC_Significance( N = N, Targets = Targets, G, bins = 100, hypothesis = "greater", min_per_bin = 20 )
LCC_Significance( N = N, Targets = Targets, G, bins = 100, hypothesis = "greater", min_per_bin = 20 )
N |
Number of randomizations. |
Targets |
Name of the nodes that the subgraph will focus on - Those are the nodes you want to know whether if forms an LCC. |
G |
The graph of interest (often, in NetMed it is an interactome - PPI). |
bins |
the number os bins for the degree preserving randomization. When bins = 1, assumes a uniform distribution for nodes. |
hypothesis |
are you expecting an LCC greater or smaller than the average? |
min_per_bin |
the minimum size of each bin. |
a list with the LCC - $LCCZ all values from the randomizations - $mean the average LCC of the randomizations - $sd the sd LCC of the randomizations - $Z The score - $LCC the LCC of the given targets - $emp_p the empirical p-value for the LCC - $rLCC the relative LCC
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) g <- igraph::graph_from_data_frame(net, directed = FALSE ) plot(g) targets = c("I", "H", "F", "E") LCC_Significance(N = 100, Targets = targets, G = g, bins = 1, min_per_bin = 2)
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) g <- igraph::graph_from_data_frame(net, directed = FALSE ) plot(g) targets = c("I", "H", "F", "E") LCC_Significance(N = 100, Targets = targets, G = g, bins = 1, min_per_bin = 2)
Calculates the proximity (average or closest) from source to targets.
proximity_average(G, source, targets)
proximity_average(G, source, targets)
G |
The original graph (often an interactome). |
source |
nodes from the network (in a drug repurpusing set-up those are the disease genes) |
targets |
targets in the network (in a drug repurpusing set-up those are the drug-targets) |
the proximity value for the source-targets
#' set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) g <- igraph::graph_from_data_frame(net, directed = FALSE ) T = c("G", "A", "D") S = c("C", "M") proximity_average(g, source = S, targets = T)
#' set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) g <- igraph::graph_from_data_frame(net, directed = FALSE ) T = c("G", "A", "D") S = c("C", "M") proximity_average(g, source = S, targets = T)
Calculates the weighted average proximity from source to targets.
proximity_average_weighted(G, source, targets)
proximity_average_weighted(G, source, targets)
G |
The original graph (often a weighted interactome). |
source |
nodes from the network (in a drug repurpusing set-up those are the disease genes) |
targets |
targets in the network (in a drug repurpusing set-up those are the drug-targets) |
the proximity value for the source-targets
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) net$weight = runif(nrow(net)) g <- igraph::graph_from_data_frame(net, directed = FALSE ) T = c("G", "A", "D") S = c("C", "M") proximity_average_weighted(g, source = S, targets = T)
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) net$weight = runif(nrow(net)) g <- igraph::graph_from_data_frame(net, directed = FALSE ) T = c("G", "A", "D") S = c("C", "M") proximity_average_weighted(g, source = S, targets = T)
Calculates the proximity (average or closest) from source to targets.
proximity_close(G, source, targets)
proximity_close(G, source, targets)
G |
The original graph (often an interactome). |
source |
nodes from the network (in a drug repurpusing set-up those are the disease genes) |
targets |
targets in the network (in a drug repurpusing set-up those are the drug-targets) |
the proximity value for the source-targets
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) g <- igraph::graph_from_data_frame(net, directed = FALSE ) T = c("G", "A", "D") S = c("C", "M") proximity_close(g, source = S, targets = T)
set.seed(666) net = data.frame( Node.1 = sample(LETTERS[1:15], 15, replace = TRUE), Node.2 = sample(LETTERS[1:10], 15, replace = TRUE)) net$value = 1 net = CoDiNA::OrderNames(net) net = unique(net) g <- igraph::graph_from_data_frame(net, directed = FALSE ) T = c("G", "A", "D") S = c("C", "M") proximity_close(g, source = S, targets = T)
Calculates the separation of two set of targets on a network. Often used to measure separation of disease modules in a interactome. Separation is calculated as in Menche, J. et al (2015) <doi:10.1126/science.1257601>.
separation(G, ST)
separation(G, ST)
G |
The original graph (often an interactome). |
ST |
Set-Target data. It is a data.frame with two columns. ID and Target. |
the separation and distance of modules.
require(magrittr) set.seed(12) x = data.frame(n1 = sample(LETTERS[1:5]), n2 = sample(LETTERS[1:20])) D1 = data.frame(gene = c("H", "I", "S", "N", "A"), disease = "D1") D2 = data.frame(gene = c("E", "C", "R" , "J", "Q", "O"), disease = "D2") D3 = data.frame(gene = c("E", "G", "T", "P"), disease = "D3") D4 = data.frame(gene = c("A", "B", "E"), disease = "D4") Diseases = rbind(D1, D2, D3, D4) Diseases %<>% dplyr::select(disease, gene) g = igraph::graph_from_data_frame(x, directed = FALSE) g = igraph::simplify(g) separation(G = g, ST = Diseases)
require(magrittr) set.seed(12) x = data.frame(n1 = sample(LETTERS[1:5]), n2 = sample(LETTERS[1:20])) D1 = data.frame(gene = c("H", "I", "S", "N", "A"), disease = "D1") D2 = data.frame(gene = c("E", "C", "R" , "J", "Q", "O"), disease = "D2") D3 = data.frame(gene = c("E", "G", "T", "P"), disease = "D3") D4 = data.frame(gene = c("A", "B", "E"), disease = "D4") Diseases = rbind(D1, D2, D3, D4) Diseases %<>% dplyr::select(disease, gene) g = igraph::graph_from_data_frame(x, directed = FALSE) g = igraph::simplify(g) separation(G = g, ST = Diseases)
Calculates the separation of two set of targets on a network and assigns a p-value to it. Often used to measure separation of disease modules in a interactome. Separation is calculated as in Menche, J. et al (2015) <doi:10.1126/science.1257601>. p-values are calculates based on the permutation of nodes, you can set the full network to be in the set for permutation or can select the ones you include as input.
separation_Significance(G, ST, Threads = 2, N = 1000, correct_by_target = TRUE)
separation_Significance(G, ST, Threads = 2, N = 1000, correct_by_target = TRUE)
G |
The original graph (often an interactome / PPI). |
ST |
Set-Target data. It is a data.frame with two columns. ID and Target. |
Threads |
How many threads you'd like to use (for parallel computation). |
N |
default to 1000. The number of permutations |
correct_by_target |
TRUE by default. If you want to use the set of targets for the permutation or the full network. |
the separation and distance of modules and its p-value.
set.seed(12) require(magrittr) x = data.frame(n1 = sample(LETTERS[1:5]), n2 = sample(LETTERS[1:20])) D1 = data.frame(gene = c("H", "I", "S", "N", "A"), disease = "D1") D2 = data.frame(gene = c("E", "C", "R" , "J", "Q", "O"), disease = "D2") D3 = data.frame(gene = c("E", "G", "T", "P"), disease = "D3") D4 = data.frame(gene = c("A", "B", "E"), disease = "D4") D5 = data.frame(gene = c("D", "F", "L"), disease = "D5") D6 = data.frame(gene = c("D", "F", "K"), disease = "D6") D7 = data.frame(gene = c("A", "B" ,"F", "K"), disease = "D7") Diseases = rbind(D1, D2, D3, D4, D5, D6, D7) Diseases %<>% dplyr::select(disease, gene) g = igraph::graph_from_data_frame(x, directed = FALSE) g = igraph::simplify(g) separation_Significance(G = g, ST = Diseases, correct_by_target = FALSE, Threads = 2)
set.seed(12) require(magrittr) x = data.frame(n1 = sample(LETTERS[1:5]), n2 = sample(LETTERS[1:20])) D1 = data.frame(gene = c("H", "I", "S", "N", "A"), disease = "D1") D2 = data.frame(gene = c("E", "C", "R" , "J", "Q", "O"), disease = "D2") D3 = data.frame(gene = c("E", "G", "T", "P"), disease = "D3") D4 = data.frame(gene = c("A", "B", "E"), disease = "D4") D5 = data.frame(gene = c("D", "F", "L"), disease = "D5") D6 = data.frame(gene = c("D", "F", "K"), disease = "D6") D7 = data.frame(gene = c("A", "B" ,"F", "K"), disease = "D7") Diseases = rbind(D1, D2, D3, D4, D5, D6, D7) Diseases %<>% dplyr::select(disease, gene) g = igraph::graph_from_data_frame(x, directed = FALSE) g = igraph::simplify(g) separation_Significance(G = g, ST = Diseases, correct_by_target = FALSE, Threads = 2)