# br As an example of non

As an example of non-weighted TRNs, we considered the TRRUST database [22] (http://www.grnpedia.org/trrust),

to be used as data source of iMaxDriver for non-weighted networks (iMaxDriverN). The TRRUST has been constructed using text mining methods, and then, by manual curation. We retrieved a human TRN from TRRUST v2 which includes with 9,396 TF-gene and TF-TF regulatory interactions.

We obtained GE data for three specific cancer types from GEO database (Table 1). The GE values were taken from the curated GEO dataset (GDS). In cases where only the .CEL files were available, we extracted the CL316 , 243 values using RMA method implemented in Affy package in R [23]. In each of the selected GEO datasets, GE value

ACCEPTED MANUSCRIPT

is reported for both the tumor and for its adjacent normal tissue. In the preprocessing step, first we removed rows with missing gene ID and then combined gene expression values which their gene ID was synonym by averaging

gene expression value of respective columns, afterwards we normalized the GE value such that by dividing each value by maximum the value of GE in the relevant tissue type (tumor or normal).

Table 1. The list of GE datasets used in the present study

GEO ID
Cancer type
Number of entries

In the next step, the collected data were preprocessed to be prepared for IM algorithm. The linear threshold approach in IM problem requires a threshold value bound to each gene (node) which stored in list . For each gene , we assigned a threshold value as following:

where is the GE value of in the normal tissue and is its GE value in the corresponding adjacent tumor

tissue. Since all of the expression values have been normalized before, . In the next step, the edge

weights should also be normalized such that the constraint in equation (1) is satisfied. In the iMaxDriverW the edge confidence values are provided, and therefore, we exploited them to compute the required edge weights by normalizing these values. More specifically, the normalization is done by division of edge weight by the maximum weight of all edge that their destination is same as destination of it:

When the edge weights are unavailable i.e. original gene regulatory network is presented as a non-weighted graph, one may assign random numbers to put weights on the edges, and then, normalize these weights such that equation
(1) is satisfied. The equation (4) further is applicable for iMaxDriverN, but since the random weight values are

ACCEPTED MANUSCRIPT

independent, the normalization for iMaxDriverN can be done simpler by division of random edge weight by the

sum of weights of the incoming edges to node :

Indeed, practically both equations generate normalized random values for input of the algorithm, therefore the results will not be changed by using equation (4) or (5). The complete process of the network weighting is shown in the Fig. 3.

Figure 3. An example of the network weighting procedure based on a given TRN. (a) For each gene , the threshold value is computed; (b) In case of weighted TRNs, the weights are normalized, to satisfy equation (1). (c) in case of non-weighted TRNs, the edge weights are random generated, and then normalized, to satisfy equation (1).

ACCEPTED MANUSCRIPT

2.2 The iMaxDriver Algorithm

As mentioned above, the IM approach tries to find a minimal subset of initially active nodes with maximum influence. In the classic algorithm, one should test all possible subsets of nodes (i.e., 1-member subsets, 2-member subsets, etc.) as the seed subset. In the present work, we used a modified version of IM which considers only a single active node as the seed. We developed a multi-threaded implementation of the algorithm which could be run

on multi-core processors. The implementation of iMaxDriverW and iMaxDriverN algorithms, is available from https://github.com/majid-rh/iMaxDriver.

The network weighting and preprocessing steps are different based on the type of input network. In the case of

iMaxDriverW that edge weights are non-random and are based on the provided interaction confidence values, the

weights are defined, and therefore, the algorithm is run once (Algorithm 1). Conversely, in the case of iMaxDriverN that network is non-weighted, since the random values can affect final results, we repeated the experiment 100 times and computed the average of all coverage values, to alleviate the effect of random weight assignments (Algorithm 2). The FindCoverage function is the main procedure used by Algorithm 1 and Algorithm 2 for computing coverage (Algorithm 3), representing the modified version of the linear threshold approach.