Title: | Implementation of Fused MGM to Infer 2-Class Networks |
---|---|
Description: | Implementation of fused Markov graphical model (FMGM; Park and Won, 2022). The functions include building mixed graphical model (MGM) objects from data, inference of networks using FMGM, stable edge-specific penalty selection (StEPS) for the determination of penalization parameters, and the visualization. For details, please refer to Park and Won (2022) <doi:10.48550/arXiv.2208.14959>. |
Authors: | Jaehyun Park [aut, cre, cph]
|
Maintainer: | Jaehyun Park <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2025-02-15 03:39:52 UTC |
Source: | https://github.com/cran/fusedMGM |
A dataset containing 50 numeric and 50 categorical variables Includes 250 observations in each group
data_all
data_all
## 'data_all' A data frame with 500 rows and 100 columns.
A dataset containing 4 numeric and 6 categorical variables Includes 250 observations in each group
data_mini
data_mini
## 'data_mini' A data frame with 500 rows and 10 columns.
Infers networks from 2-class mixed data
FMGM_mc( data, ind_disc, group, t = 1, L = NULL, eta = 2, lambda_intra, lambda_intra_prior = NULL, lambda_inter, with_prior = FALSE, prior_list = NULL, converge_by_edge = TRUE, tol_edge = 3, tol_mgm = 1e-05, tol_g = 1e-05, tol_fpa = 1e-12, maxit = 1e+06, polish = TRUE, tol_polish = 1e-12, cores = parallel::detectCores(), verbose = FALSE )
FMGM_mc( data, ind_disc, group, t = 1, L = NULL, eta = 2, lambda_intra, lambda_intra_prior = NULL, lambda_inter, with_prior = FALSE, prior_list = NULL, converge_by_edge = TRUE, tol_edge = 3, tol_mgm = 1e-05, tol_g = 1e-05, tol_fpa = 1e-12, maxit = 1e+06, polish = TRUE, tol_polish = 1e-12, cores = parallel::detectCores(), verbose = FALSE )
data |
Data frame with rows as observations and columns as variables |
ind_disc |
Indices of discrete variables |
group |
Group indices, must be provided with the observation names |
t |
Numeric. Initial value of coefficient that reflect 2 previous iterations in fast proximal gradient method. Default: 1 |
L |
Numeric. Initial guess of Lipschitz constant. Default: missing (use backtracking) |
eta |
Numeric. Multipliers for L in backtracking. Default: 2 |
lambda_intra |
Vector with 3 numeric variables. Penalization parameters for network edge weights |
lambda_intra_prior |
Vector with 3 numeric variables. Penalization parameters for network edge weights, applied to the edges with prior information |
lambda_inter |
Vector with 3 numeric variables. Penalization parameters for network edge weight differences |
with_prior |
Logical. Is prior information provided? Default: FALSE |
prior_list |
List of prior information. Each element must be a 3-column data frames, with the 1st and the 2nd columns being variable names and the 3rd column being prior confidence (0,1) |
converge_by_edge |
Logical. The convergence should be judged by null differences of network edges after iteration. If FALSE, the rooted mean square difference (RMSD) of edge weights is used. Default: TRUE |
tol_edge |
Integer. Number of consecutive iterations of convergence to stop the iteration. Default: 3 |
tol_mgm |
Numeric. Cutoff of network edge RMSD for convergence. Default: 1e-05 |
tol_g |
Numeric. Cutoff of iternations in prox-grad map calculation. Default: 1e-05 |
tol_fpa |
Numeric. Cutoff for fixed-point approach. Default: 1e-12 |
maxit |
Integer. Maximum number of iterations in fixed-point approach. Default: 1000000 |
polish |
Logical. Should the edges with the weights below the cutoff should be discarded? Default: TRUE |
tol_polish |
Numeric. Cutoff of polishing the resulting network. Default: 1e-12 |
cores |
Integer. Number of cores to use multi-core utilization. Default: maximum number of available cores |
verbose |
Logical. If TRUE, the procedures are reported in real-time manner. Default: FALSE |
If the value of Lipschitz constant, L, is not provided, the backtracking will be performed
The resulting networks, in the form of a list of MGMs
chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (Sys.info()['sysname'] != 'Linux') { cores=1L } else { chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (nzchar(chk) && (chk != "false")) { cores=2L } else { cores=parallel::detectCores() - 1 ; } } ## Not run: data(data_all) ; # Example 500-by-100 simulation data data(ind_disc) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_all) ; res_FMGM <- FMGM_mc(data_all, ind_disc, group, lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), cores=cores, verbose=TRUE) ## End(Not run) data(data_mini) ; # Minimal example 500-by-10 simulation data data(ind_disc_mini) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_mini) ; res_FMGM_mini <- FMGM_mc(data_mini, ind_disc_mini, group, lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), cores=cores, verbose=TRUE)
chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (Sys.info()['sysname'] != 'Linux') { cores=1L } else { chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (nzchar(chk) && (chk != "false")) { cores=2L } else { cores=parallel::detectCores() - 1 ; } } ## Not run: data(data_all) ; # Example 500-by-100 simulation data data(ind_disc) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_all) ; res_FMGM <- FMGM_mc(data_all, ind_disc, group, lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), cores=cores, verbose=TRUE) ## End(Not run) data(data_mini) ; # Minimal example 500-by-10 simulation data data(ind_disc_mini) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_mini) ; res_FMGM_mini <- FMGM_mc(data_mini, ind_disc_mini, group, lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), cores=cores, verbose=TRUE)
This function is written based on R base function 'heatmap'.
FMGM_plot( MGM_list, sortby = "diff", highlight = c(), tol_polish = 1e-12, tol_plot = 0.01, sideColor = FALSE, distfun = dist, hclustfun = hclust, reorderfun = function(d, w) reorder(d, w), margins = c(2.5, 2.5), cexRow = 0.1 + 0.5/log10(n), cexCol = cexRow, main = NULL, xlab = NULL, ylab = NULL, verbose = getOption("verbose") )
FMGM_plot( MGM_list, sortby = "diff", highlight = c(), tol_polish = 1e-12, tol_plot = 0.01, sideColor = FALSE, distfun = dist, hclustfun = hclust, reorderfun = function(d, w) reorder(d, w), margins = c(2.5, 2.5), cexRow = 0.1 + 0.5/log10(n), cexCol = cexRow, main = NULL, xlab = NULL, ylab = NULL, verbose = getOption("verbose") )
MGM_list |
A list of graphs from 2 groups. Usually a result of FMGM main function. |
sortby |
Determines the standard of sorting & dendrograms. Either 1, 2, or "diff" (default). |
highlight |
A vector of variable names or indices to highlight |
tol_polish |
A threshold for the network edge presence |
tol_plot |
Only network edges above this value will be displayed on the heatmap |
sideColor |
A named vector determining a sidebar colors. Set NULL to make the colors based on the variable types (discrete/continuous). Default: FALSE (no sidebars) |
distfun |
A function for the distances between rows/columns |
hclustfun |
A function for hierarchical clustering |
reorderfun |
A function of dendrogram and weights for reordering |
margins |
A numeric vector of 2 numbers for row & column name margins |
cexRow |
A visual parameter cex for row axis labeling |
cexCol |
A visual parameter cex for column axis labeling, default to be same as cexRow |
main |
Main title, default to none |
xlab |
X-axis title, default to none |
ylab |
Y-axis title, default to none |
verbose |
Logical. Should plotting information be printed? |
None
chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (Sys.info()['sysname'] != 'Linux') { cores=1L } else { chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (nzchar(chk) && (chk != "false")) { cores=2L } else { cores=parallel::detectCores() - 1 ; } } ## Not run: data(data_all) ; # Example 500-by-100 simulation data data(ind_disc) ; group <- rep(c(1,2), each=250) ; names(group) <- seq(500) ; res_FMGM <- FMGM_mc(data_all, ind_disc, group, lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), cores=cores, verbose=TRUE) FMGM_plot(res_FMGM) ## End(Not run) data(data_mini) ; # Minimal example 500-by-10 simulation data data(ind_disc_mini) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_mini) ; res_FMGM_mini <- FMGM_mc(data_mini, ind_disc_mini, group, lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), cores=cores, verbose=TRUE) FMGM_plot(res_FMGM_mini)
chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (Sys.info()['sysname'] != 'Linux') { cores=1L } else { chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (nzchar(chk) && (chk != "false")) { cores=2L } else { cores=parallel::detectCores() - 1 ; } } ## Not run: data(data_all) ; # Example 500-by-100 simulation data data(ind_disc) ; group <- rep(c(1,2), each=250) ; names(group) <- seq(500) ; res_FMGM <- FMGM_mc(data_all, ind_disc, group, lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), cores=cores, verbose=TRUE) FMGM_plot(res_FMGM) ## End(Not run) data(data_mini) ; # Minimal example 500-by-10 simulation data data(ind_disc_mini) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_mini) ; res_FMGM_mini <- FMGM_mc(data_mini, ind_disc_mini, group, lambda_intra=c(0.2,0.15,0.1), lambda_inter=c(0.2,0.15,0.1), cores=cores, verbose=TRUE) FMGM_plot(res_FMGM_mini)
From large to small values of candidates, calculate the edge inference instabilities from subsamples The smallest values with the instabilities under the cutoff are chosen. See Sedgewich et al. (2016) for more details
FMGM_StEPS( data, ind_disc, group, lambda_list, with_prior = FALSE, prior_list = NULL, N = 20, b = NULL, gamma = 0.05, perm = 10000, eps = 0.05, tol_polish = 1e-12, ..., cores = parallel::detectCores(), verbose = FALSE )
FMGM_StEPS( data, ind_disc, group, lambda_list, with_prior = FALSE, prior_list = NULL, N = 20, b = NULL, gamma = 0.05, perm = 10000, eps = 0.05, tol_polish = 1e-12, ..., cores = parallel::detectCores(), verbose = FALSE )
data |
Data frame with rows as observations and columns as variables |
ind_disc |
Indices of discrete variables |
group |
Group indices, must be provided with the observation names |
lambda_list |
Vector with numeric variables. Penalization parameter candidates |
with_prior |
Logical. Is prior information provided? Default: FALSE |
prior_list |
List of prior information. Each element must be a 3-column data frames, with the 1st and the 2nd columns being variable names and the 3rd column being prior confidence (0,1) |
N |
Integer. Number of subsamples to use. Default: 20 |
b |
Integer. Number of observations in each subsample. Default: ceiling(10*sqrt(number of total observations)) |
gamma |
Numeric. Instability cutoff. Default: 0.05 |
perm |
Integer. Number of permutations to normalize the prior confidence. Default: 10000 |
eps |
Numeric. Pseudocount to calculate the likelihood of edge detection. Default: 0.05 |
tol_polish |
Numeric. Cutoff of polishing the resulting network. Default: 1e-12 |
... |
Other arguments sent to fast proximal gradient method |
cores |
Integer. Number of cores to use multi-core utilization. Default: maximum number of available cores |
verbose |
Logical. If TRUE, the procedures are reported in real-time manner. Default: FALSE |
The resulting networks, in the form of a list of MGMs
chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (Sys.info()['sysname'] != 'Linux') { cores=1L } else { chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (nzchar(chk) && (chk != "false")) { cores=2L } else { cores=parallel::detectCores() - 1 ; } } ## Not run: data(data_all) ; # Example 500-by-100 simulation data data(ind_disc) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_all) ; lambda_list <- 2^seq(log2(.08), log2(.32), length.out=7) ; lambda_list <- sort(lambda_list, decreasing=TRUE) ; res_steps <- FMGM_StEPS(data_all, ind_disc, group, lambda_list=lambda_list, cores=cores, verbose=TRUE) data(data_mini) ; # Minimal example 500-by-10 simulation data data(ind_disc_mini) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_mini) ; lambda_list <- 2^seq(log2(.08), log2(.32), length.out=7) ; lambda_list <- sort(lambda_list, decreasing=TRUE) ; res_steps_mini <- FMGM_StEPS(data_mini, ind_disc_mini, group, lambda_list=lambda_list, cores=cores, verbose=TRUE) ## End(Not run)
chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (Sys.info()['sysname'] != 'Linux') { cores=1L } else { chk = tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (nzchar(chk) && (chk != "false")) { cores=2L } else { cores=parallel::detectCores() - 1 ; } } ## Not run: data(data_all) ; # Example 500-by-100 simulation data data(ind_disc) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_all) ; lambda_list <- 2^seq(log2(.08), log2(.32), length.out=7) ; lambda_list <- sort(lambda_list, decreasing=TRUE) ; res_steps <- FMGM_StEPS(data_all, ind_disc, group, lambda_list=lambda_list, cores=cores, verbose=TRUE) data(data_mini) ; # Minimal example 500-by-10 simulation data data(ind_disc_mini) ; group <- rep(c(1,2), each=250) ; names(group) <- rownames(data_mini) ; lambda_list <- 2^seq(log2(.08), log2(.32), length.out=7) ; lambda_list <- sort(lambda_list, decreasing=TRUE) ; res_steps_mini <- FMGM_StEPS(data_mini, ind_disc_mini, group, lambda_list=lambda_list, cores=cores, verbose=TRUE) ## End(Not run)
A vector indicating which columns in 'data_all' have categorical variables
ind_disc
ind_disc
## 'ind_disc' A 50-length vector with discrete variable indices.
A vector indicating which columns in 'data_mini' have categorical variables
ind_disc_mini
ind_disc_mini
## 'ind_disc_mini' A 6-length vector with discrete variable indices.
Make MGM lists from input data
make_MGM_list(X, Y, group)
make_MGM_list(X, Y, group)
X |
data frame or matrix of continuous variables (row: observation, column: variable) |
Y |
data frame or matrix of discrete variables (row: observation, column: variable) |
group |
group variable vector, with the sample names |
A list of MGM objects. The length is equal to the unique number of groups.
Defining S3 object "MGM"
MGM(X, Y, g)
MGM(X, Y, g)
X |
data frame or matrix of continuous variables (row: observation, column: variable) |
Y |
data frame or matrix of discrete variables (row: observation, column: variable) |
g |
group index, needed for temporary files |
An S3 'MGM' object, containing data, network parameters, and the 1st derivatives