Introduction to flagr package

Siliang Zhang

2017-12-19

flagr-package for FLaG models R

About installing flagr package on Yeti:

Before installing flagr package, make sure you have installed the following

Dependent packages: Rcpp, RcppProgress and RcppArmadillo

For installing RcppArmadillo, see the instructions below

  1. After you login to Yeti, to get the newest RcppArmadillo package: $ module load gcc/4.9.1

  2. then enter R, install the package by typing: > install.packages(“RcppArmadillo”)

  3. quit R environment by typing : > quit()

  4. copy the flagr source package “flagr_0.3.2.tar.gz” to your home directory, and type command: $ R CMD INSTALL flagr_0.3.2.tar.gz

Model Specification

In this section we illustrate severals model our package will use, all the model is under 0/1 condition by default. We offer functions to transform between 0/1 and +1/-1 condition.

1. MIRT model(with normal prior)

We use the mirt function in the mirt package to fit a simulated data.

rm(list = ls())
library(flagr)
library(mirt)

# This data is generated from mirt model with multinormal prior(J=30,K=3,N=10000)
load("~/Github/flagr-github/flagr/Medium-scale-poscor.RData")
cmodel <- mirt.model('F1=1-10,
                     F2=11-20,
                     F3=21-30,
COV = F1*F2*F3')
system.time(result_bi <- mirt(response, cmodel,method = "EM"))
theta_est <- fscores(result_bi)

2. confirmatory low-rank Ising model

In this section, we show how to use

cMIRT(Y, Q, A0 = NA, b0 = NA, Sigma0 = NA, print_proc = T, accuracy = NA, min_step = 5, max_step = 1000)

function in our package to model a simulated data, be noted that our model uses a different prior as usual mirt model uses multivariate normal prior, we’ll show a example code and give a comparison between our function and mirt function.

# We use the previous data generated from mirt model
Q <- A>0
K <- ncol(A)
system.time(result22 <- confirmIRT22(A0, b0, Sigma0, response, Q, accuracy = .1))
# calculate the posterior mean of theta
theta_est22 <- t(result22$Sigma%*%t(result22$A)%*%t(response))

#calculate Kendall's tau correlation between two estimated posterior mean of theta
for(i in 1:K){
  cat(cor(theta_est[,i],theta_est22[,i],method = "kendall"),"\n")
}


# Another example, in this example, response data is simulated multivariate IRT model with our spcial prior
# J=60, K=3, N=10000

rm(list = ls())
load("~/Github/flagr-github/flagr/confirmIsing_exam.RData")

system.time(result22 <- confirmIRT22(A0, b0, Sigma0, response, Q, accuracy = 3))
# calculate the posterior mean of theta
theta_est22 <- t(result22$Sigma%*%t(result22$A)%*%t(response))


cmodel <- mirt.model('F1=1-10,
                     F2=11-20,
                     F3=21-30,
COV = F1*F2*F3')
system.time(result_bi <- mirt(response, cmodel,method = "EM"))
system.time(result_bi <- mirt(response, cmodel,method = "MHRM"))


# Third example, data is simulated from mirt model with multivariate normal prior(J=30,K=3,)

Speed comparison:

K=2,J=20 K=3,J=30 K=3, J=60 K=4,J=80
confirmIRT22 1.829s 4.923s 31.007s 37.331s
mirt(method=“EM”) 7.132s 41.600s 297.545s * 555.753s *
mirt(method=“MHRM”) 193.078s * *

*: the mirt function stopped with the warning message: Log-likelihood was decreasing near the ML solution. EM method may be unstable.

As we can see, when scale go large, the mirt package can not handle very well, it’s result is unreliable anymore.

2.1 confirmatory low-rank Ising model with covariates

In out flagr package, we use function

confirmIRT_cov <- function(A0, b0, beta0, response, Z, Q, print_proc = T, accuracy = 1, min_step = 5, max_step = 1000)

to estimate confrimatory low-rank Ising model with covariates.

beta0 is the initial value of coefficient, Z is a matrix for covariates.

3. exploratory low-rank Ising model

In our flagr package, we use function

explorIRT(A0, b0, response, print_proc = T, accuracy = 0.01, min_step = 5, max_step = 1000)

to estimate parameters, the following is an example:

rm(list = ls())

# This data is generated from mirt model with our special prior(J=20,K=2,N=1000)
load("~/Github/flagr-github/flagr/explorIsing_exam.RData")
system.time(result <- explorIRT(A0, rep(0,J), response, accuracy = 0.01))

One advantage of our package is that we can handle large scale parameters estimating problem in a fast speed like the following one:

rm(list = ls())

#This data is generated from mirt model with our prior(J=100,K=5,N=1000)
load("~/Github/flagr-github/flagr/explorIsing_exam2.RData")

system.time(result <- explorIRT(A0, b0, response, accuracy = 1))

# Even we start from a far initial value
system.time(result <- explorIRT(A0, rep(0,J), response, accuracy = 1))

Our package can also handle the exploratory low-rank Ising model with covariates information, we illustrate the usage of

explorIRT_cov <- function(A0, b0, beta0, response, Z, print_proc = T, accuracy = 1, min_step = 5, max_step = 1000)

by the following example:

J = 20;
N = 1000;
K = 2;

x <- 0.5;
A <- matrix(0, J, K);
A[1:10, 1] <- x;
A[11:20,2] <- x;
S <- matrix(0, J, J);
diag(S) <- -2.5;
Sigma <- matrix(c(1,0,0,1),2)
Z <- matrix(rnorm(N), N, 1)
beta <- matrix(c(rep(0,J/2),rep(1,J/2)),1,J)

system.time(response1 <- sample_flag_cpp1(A, S, beta, Z, N, burn = 200))
beta_n <- nrow(beta)
A0 <- matrix(0, J, K);
A0[1:10, 1] <- x + rnorm(10);
A0[11:20,2] <- x + rnorm(10);
b0 <- rep(0, J)
beta0 <- matrix(0,beta_n,J)

system.time(result1 <- explorIRT_cov(A0, b0, beta0, response1, Z, accuracy = 0.01))

We can give variance estimation by the following command:

result.var <- IRT.var(result1, response1, Z)
result.sd <- sqrt(diag(result.var))

4. confirmatory FLaG model

We can use function

confirmFLaG <- function(A0, S0, Sigma0, Q, gamma, response, stepA = 5, stepS = 5, stepB = 5, discA = 0.5, discS = 0.5, discB = 0.5, tol = 1e-7, print_proc = T)

to model data using confirmatory FLaG model

rm(list = ls())
# This data is generated from FLaG model(J=20,K=2,N=10000)
load("~/Github/flagr-github/flagr/confirmFLaG_exam1.RData")
result22 <- confirmFLaG(A0, S0, diag(2), Q, gamma, response)

Specificly, if the support of graph is known, we can add the information supp as a parameter of the function

confirmFLaG_supp <- function(A0, S0, Sigma0, Q, supp, response, stepA = 1, stepS = 1, stepB = 1, discA = 0.5, discS = 0.5, discB = 0.5, tol = 1e-7, print_proc = T)

to make the estimation more accurate.

result_supp22<- confirmFLaG_supp(A0, S0, diag(2), Q, supp, response,tol = 1e-7)

5. exploratory FLaG model

We can use function

ADMM5(L0, S0, response, delta, gamma, lambda = 5, tol = 1e-5)

to model data using exploratory FLaG model, see the following example:

J = 20;
N = 1000;
K = 2;

x <- 0.5;
A <- matrix(0, J, K);
A[1:10, 1] <- x;
A[11:20,2] <- x;
L <- A%*% t(A);

S <- matrix(0, J, J);
for(i in seq(1, J-1, by = 2)){
  S[i,i+1] <- 1;
  S[i+1,i] <- 1;
}
diag(S) <- -3.5;
system.time(response <- sample_flag_cpp(A, S, N, burn = 200))
colMeans(response)
L0 = matrix(0, J, J);
S0 = matrix(0, J, J);

delta = 0.06;
gamma = 0.03;
system.time(result <- ADMM5(L0, S0, response, delta, gamma))
result_A <- loadings(result)
varimax(result_A)[["loadings"]][[1]]

Performance on large sample size

# Let's start with small scale problem(J=300,K=1, N=1000000)
J = 300;
N = 1e6;
K = 1;

x <- 0.5;

A <- matrix(x, J, K)
S <- matrix(0, J, J);

diag(S) <- -3;
system.time(response <- sample_flag_cpp(A, S, N, burn = 200))
colMeans(response)

########### 
J=300
N=1e6
K=1

A <- matrix(1,J,K)
response <- matrix(0,N,J)
theta <- matrix(rnorm(N,0,1),N,1)
b <- 0
for(i in 1:N){
  for(j in 1:J){
    p <- sum(A[j,] * theta[i,]) + b
    response[i,j] <- rbinom(1, 1, 1/(1+exp(-p)))
  }
}

Vignette Info

Note the various macros within the vignette section of the metadata block above. These are required in order to instruct R how to build the vignette. Note that you should change the title field and the \VignetteIndexEntry to match the title of your vignette.

Styles

The html_vignette template includes a basic CSS theme. To override this theme you can specify your own CSS in the document metadata as follows:

output: 
  rmarkdown::html_vignette:
    css: mystyles.css

Figures

The figure sizes have been customised so that you can easily put two images side-by-side.

plot(1:10)
plot(10:1)

You can enable figure captions by fig_caption: yes in YAML:

output:
  rmarkdown::html_vignette:
    fig_caption: yes

Then you can use the chunk option fig.cap = "Your figure caption." in knitr.

More Examples

You can write math expressions, e.g. \(Y = X\beta + \epsilon\), footnotes1, and tables, e.g. using knitr::kable().

mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

Also a quote using >:

“He who gives up [code] safety for [code] speed deserves neither.” (via)

Simulations about variance estimation

Using the sandwich estimator to estimate variance of loadings A and descrimination b:

  1. Simulate data from mirt model with independent theta:
J = 20;
N = 1000;
K = 2;

x <- 1.5;
A <- matrix(0, J, K);
A[1:10, 1] <- x;
A[11:20,2] <- x;

b <- rep(0, J)
Q <- A>0
Sigma <- matrix(c(1,0,0,1),2)

A
##       [,1] [,2]
##  [1,]  1.5  0.0
##  [2,]  1.5  0.0
##  [3,]  1.5  0.0
##  [4,]  1.5  0.0
##  [5,]  1.5  0.0
##  [6,]  1.5  0.0
##  [7,]  1.5  0.0
##  [8,]  1.5  0.0
##  [9,]  1.5  0.0
## [10,]  1.5  0.0
## [11,]  0.0  1.5
## [12,]  0.0  1.5
## [13,]  0.0  1.5
## [14,]  0.0  1.5
## [15,]  0.0  1.5
## [16,]  0.0  1.5
## [17,]  0.0  1.5
## [18,]  0.0  1.5
## [19,]  0.0  1.5
## [20,]  0.0  1.5
b
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Sigma
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

results under 100 simulations:

load("variance_conf_ind.RData")
average_res = colMeans(res_point)
A=matrix(0,20,2)
A[1:10,1]=average_res[1:10]
A[11:20,2]=average_res[11:20]
A
##            [,1]      [,2]
##  [1,] 0.3197856 0.0000000
##  [2,] 0.3215668 0.0000000
##  [3,] 0.3190496 0.0000000
##  [4,] 0.3210873 0.0000000
##  [5,] 0.3223154 0.0000000
##  [6,] 0.3251583 0.0000000
##  [7,] 0.3264015 0.0000000
##  [8,] 0.3203971 0.0000000
##  [9,] 0.3234195 0.0000000
## [10,] 0.3272770 0.0000000
## [11,] 0.0000000 0.3256401
## [12,] 0.0000000 0.3258470
## [13,] 0.0000000 0.3212222
## [14,] 0.0000000 0.3210236
## [15,] 0.0000000 0.3209790
## [16,] 0.0000000 0.3198053
## [17,] 0.0000000 0.3201307
## [18,] 0.0000000 0.3213765
## [19,] 0.0000000 0.3218453
## [20,] 0.0000000 0.3236067
b=average_res[21:40]
b
##  [1]  0.0060823391  0.0034769598  0.0011617287  0.0027049749  0.0037400256
##  [6] -0.0082606372 -0.0001975696  0.0004260730 -0.0043843911  0.0067771553
## [11] -0.0043611927 -0.0042773736  0.0039032161  0.0120867333  0.0082584887
## [16] -0.0035608326 -0.0023652364 -0.0073793718  0.0065301187 -0.0100547650
SE SEE
0.0234219 0.0246073
0.0252923 0.0245810
0.0245263 0.0244840
0.0250991 0.0245905
0.0240166 0.0247542
0.0244815 0.0249228
0.0253028 0.0249089
0.0251527 0.0245719
0.0249885 0.0248041
0.0259739 0.0249775
0.0288664 0.0249329
0.0246151 0.0248858
0.0226919 0.0247120
0.0235724 0.0246705
0.0257544 0.0246624
0.0291460 0.0245736
0.0237269 0.0247957
0.0222439 0.0247639
0.0288633 0.0248576
0.0242742 0.0248165
0.0656738 0.0724263
0.0708166 0.0725486
0.0765891 0.0724397
0.0687344 0.0725231
0.0682073 0.0725347
0.0681675 0.0726526
0.0759917 0.0727231
0.0744917 0.0725017
0.0698336 0.0726103
0.0688569 0.0727634
0.0757150 0.0726038
0.0676043 0.0726280
0.0735290 0.0724435
0.0756669 0.0724471
0.0797230 0.0724501
0.0725141 0.0723966
0.0618378 0.0723328
0.0781103 0.0724510
0.0682416 0.0724257
0.0749746 0.0725300
  1. Simulate data from mirt model with correlated theta:
J = 20;
N = 1000;
K = 2;

x <- 1.5;
A <- matrix(0, J, K);
A[1:10, 1] <- x;
A[11:20,2] <- x;

b <- rep(0, J)
Q <- A>0
Sigma <- matrix(c(1,0.3,0.3,1),2)

A
##       [,1] [,2]
##  [1,]  1.5  0.0
##  [2,]  1.5  0.0
##  [3,]  1.5  0.0
##  [4,]  1.5  0.0
##  [5,]  1.5  0.0
##  [6,]  1.5  0.0
##  [7,]  1.5  0.0
##  [8,]  1.5  0.0
##  [9,]  1.5  0.0
## [10,]  1.5  0.0
## [11,]  0.0  1.5
## [12,]  0.0  1.5
## [13,]  0.0  1.5
## [14,]  0.0  1.5
## [15,]  0.0  1.5
## [16,]  0.0  1.5
## [17,]  0.0  1.5
## [18,]  0.0  1.5
## [19,]  0.0  1.5
## [20,]  0.0  1.5
b
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Sigma
##      [,1] [,2]
## [1,]  1.0  0.3
## [2,]  0.3  1.0

results under 100 simulations:

load("variance_conf_cor.RData")
average_res = colMeans(res_point)
A=matrix(0,20,2)
A[1:10,1]=average_res[1:10]
A[11:20,2]=average_res[11:20]
A
##            [,1]      [,2]
##  [1,] 0.3231537 0.0000000
##  [2,] 0.3188977 0.0000000
##  [3,] 0.3180814 0.0000000
##  [4,] 0.3174558 0.0000000
##  [5,] 0.3181322 0.0000000
##  [6,] 0.3195720 0.0000000
##  [7,] 0.3216282 0.0000000
##  [8,] 0.3181760 0.0000000
##  [9,] 0.3199462 0.0000000
## [10,] 0.3207602 0.0000000
## [11,] 0.0000000 0.3196714
## [12,] 0.0000000 0.3177176
## [13,] 0.0000000 0.3221163
## [14,] 0.0000000 0.3163536
## [15,] 0.0000000 0.3232995
## [16,] 0.0000000 0.3230148
## [17,] 0.0000000 0.3214805
## [18,] 0.0000000 0.3188889
## [19,] 0.0000000 0.3153299
## [20,] 0.0000000 0.3185675
b=average_res[21:40]
b
##  [1] -0.009009077  0.003921514  0.007357218  0.004927920  0.001050645
##  [6] -0.006365575 -0.002099554 -0.001802810  0.001247737  0.010514934
## [11]  0.003393165 -0.006382762  0.000094245 -0.005147718  0.001771069
## [16] -0.001534055 -0.007747437 -0.004909551  0.002292500 -0.006232389
SE SEE
0.0256555 0.0256911
0.0255997 0.0254838
0.0245582 0.0253566
0.0244937 0.0254540
0.0268723 0.0254660
0.0252039 0.0256531
0.0245700 0.0256068
0.0260349 0.0255406
0.0254411 0.0255923
0.0248871 0.0255178
0.0221761 0.0239484
0.0244666 0.0238029
0.0277688 0.0240292
0.0233263 0.0237430
0.0270260 0.0240584
0.0236524 0.0240483
0.0230023 0.0239832
0.0224501 0.0238005
0.0236259 0.0236726
0.0230402 0.0238611
0.0692641 0.0733547
0.0701528 0.0731796
0.0628415 0.0731442
0.0754760 0.0730990
0.0744631 0.0731637
0.0718771 0.0732168
0.0670819 0.0733380
0.0800385 0.0731766
0.0729246 0.0732374
0.0755642 0.0733428
0.0793707 0.0719600
0.0730091 0.0718903
0.0719055 0.0720433
0.0694591 0.0718479
0.0766842 0.0721292
0.0765639 0.0720976
0.0721641 0.0720169
0.0771268 0.0719439
0.0811696 0.0717857
0.0683278 0.0719017

  1. A footnote here.