diffeRenTES: An R package for computing cell differentiation trees from Boolean networks

The analysis of possible cell differentiation scenarios of gene regulatory network models can represent an invaluable tool for testing existing hypotheses and developing new biological insights. For this reason, diffeRenTES , an R package that implements the abstractions of a prominent mathematical model of cell differentiation, is presented. diffeRenTES computes tree-like structures describing the progression of the cell differentiation process, starting from Boolean models of gene regulatory networks. Additionally, it offers the possibility to perform robustness analyses of Boolean networks.


Introduction
Dynamical systems have been successfully employed to investigate and reproduce biological phenomena.Among these, Boolean networks (BNs) [1,2], thanks to their simplicity combined with the rich bouquet of complex behaviors they can exhibit, have proved capable of reproducing relevant properties and dynamics of cell differentiation [2][3][4][5][6][7][8].In these works, the focus of the analysis is on the identification of the attractors -portions of the state space of a dynamical system that attract distinct trajectories -which can express gene expression patterns similar to those found in real cells.However, the description of the relations and constraints among the attractors [9][10][11][12], which ultimately determines the whole differentiation process, is often neglected or, if present, tailored to the real case at hand [13,14], and thus not generalizable.
Recently, a powerful abstract mathematical model able to reproduce the differentiation process of a cell modeled as a noisy Boolean The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/).More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.E-mail address: m.braccini@unibo.it.
network subject to progressively decreasing noise has been presented [15][16][17].This model, based on the Threshold Ergodic Set (TES) concept, proved to be able to replicate the main properties of biological differentiation, namely: different degrees of differentiation, stochastic and deterministic differentiation and induced pluripotency, just to cite a few.
Here diffeRenTES is presented, an R package that implements the abstractions of the above-mentioned differentiation model.It offers the possibility to compute the typical tree-like structures describing the global picture of the all possible differentiation pathways that a single cell can experience, starting from the Boolean networks and the relative synchronous attractors objects obtained through the wellknown ''BoolNet'' R package [18].The returned differentiation tree describes the process that, in principle, goes from the totipotent cells to the fully differentiated ones, where the switch among the different

Software description
The diffeRenTES package is written using the R statistical software environment [19].It is freely available on the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=diffeRenTES and can be installed using the following commands: The functions provided by the package together with the model abstractions they implement are described below.

Attractor transition matrix
As stated in the introduction, the cell differentiation model implemented by diffeRenTES is based on the asymptotic dynamics of BNs subject to noise.The kind of intracellular noise introduced in this model is a transient logic negation of a node's state, after which the dynamics of the Boolean network returns to follow the synchronous and deterministic updating scheme.In this way, we can observe jumps from one attractor to another.The Attractor Transition Matrix (ATM) gives us an estimate of the probabilities of the observed transitions between attractors, in the form of a square matrix of dimension , where  is the number of attractors.It can also represent a robustness measure of the system with respect to a random flip of an arbitrary state [8].
To compute the ATM, the diffeRenTES package requires the previously computed set of attractors of a Boolean network.So, after having generated a Boolean network and having computed its synchronous attractors by means of the BoolNet package, the ATM can be computed by using the getATM() function.
getATM() accepts the following arguments: • net -a Boolean network loaded or generated by using the ''BoolNet'' package.• synchronous_attractors -the attractors of the Boolean network, obtained with the ''BoolNet'' package by simulating its dynamics with the synchronous update scheme.• MAX_STEPS_TO_FIND_ATTRACTORS -maximum number of steps after which the search for the attractors stops.Its default value is 1000.
In the named list obtained as a result of calling the getATM() function we can find the computed ATM structure and the number of the lost flips, i.e., the number of perturbations that have not reached another attractor within the maximum number of steps.

Threshold ergodic set
At the top of the ATM it is possible to calculate the set of Threshold Ergodic Sets (TESs), that is, the abstractions that model the different cell types.A TES is a set of attractors that traps the long-term dynamics of the BN, under the hypothesis that attractor transitions with a probability less than a given threshold are not possible.Formally, a TES is a strongly connected component with no outgoing arcs of the resulting directed graph composed of nodes representing the attractors and arcs summarizing the transitions among them.
So, with low threshold values only the transitions having low probabilities are removed.This case models the condition of pluripotent cells since the BN dynamics can freely wander among the attractors.Instead, high threshold values cause the removal of even the most likely transitions, thus representing the case of mature differentiated cells, ideally trapped in one or few attractors.In short, by varying the threshold, which scales as the reciprocal of the noise, the TESs are computed and the differentiation process unfolds.
diffeRenTES computes the complete set of TESs of a BN by following this procedure: first, the probabilities reported in the ATM are arranged in ascending order, then they are picked progressively from lowest to highest and used as thresholds values for computing the TESs, thresholds progression also determines the direction of the differentiation process.
The function for computing the cell types (i.e., the TESs) is getT-ESs(), which accepts the following argument: • ATM -the Attractor Transition Matrix structure as returned from the getATM() function.
From the named list obtained as output, we can extract the computed cell types, which eventually determine the different differentiation stages, and the noise thresholds at which they emerged.

Cell differentiation process
The software, diffeRenTES, provides the ability to produce a visual representation of the main characteristics of the differentiation process that a Boolean network is capable of expressing according to the cell differentiation model mentioned above.It arranges the landscape of cell types (i.e., TESs) that emerge as a result of the application of the noise thresholds to the ATM structure into a tree structure.The levels of the tree structure describe the different degrees of cell differentiation; while their progression, from top to bottom, represents the (pseudo-)temporal evolution of cell types imposed by the progressive decrease in noise.
The resulting differentiation tree (a forest if multiple TESs are present at the root level) can be saved to the file system as an SVG image using the function saveDifferentiationTreeToFile().
• filename -the filename used for the image representing the differentiation tree.
Fig. 1 shows an example of a differentiation process computed using the presented package.

Illustrative example
In this section, a complete and working usage example is reported.For a complete user's guide, see the library documentation.
In the code snippet shown in listing 1, we can see how first a random Boolean network is created using the BoolNet package.The network model, as will be explained later in Section 4, does not have to be random, but can be read from a file using the loadSBML() function of the BoolNet package.Next, the network attractors are computed.At this point, the proposed software package comes into play, which through the getATM() function calculates the Attractor Transition Matrix.Then, passing the latter as the actual argument of the getTESs() function, we obtain the abstractions that in the differentiation model represent the various cell types expressed by the gene regulatory network model of interest.Finally, with the function saveDifferentiationTreeToFile(), the cell types (i.e., TES abstractions) are visually organized into a typical differentiation tree, respecting their mutual relationships.

# A t t r a c t o r s c o m p u t a t i o n v i a B o o l N e t p a c k a g e synch _ a t t r a c t o r s <− BoolNet : : g e t A t t r a c t o r s ( n e t ) # A t t r a c t o r s T r a n s i t i o n M a t r i x c o m p u t a t i o n
ATM <− getATM ( net , synch _ a t t r a c t o r s ) Fig. 1.The image shows an example of a differentiation tree that can be obtained using the diffeRenTES package.Each node represents a TES with the additional detail of the attractors that compose it.Instead, labels on the arcs indicate the threshold value at which TESs at the previous level split or reduce to produce the ones in the subsequent level.

# S a v i n g t h e d i f f e r e n t i a t i o n t r e e i n " example .
s v g " s a v e D i f f e r e n t i a t i o n T r e e T o F i l e ( TESs , " example .svg " )

Impact
Given the increasing availability of real gene expression data, such as single-cell data, it becomes of paramount importance to formulate models that can accommodate the dynamics these data reveal of the biological phenomenon they represent.Software tools can enable and accelerate the process from model construction to the stage of hypothe-sis formulation and subsequent testing in wet lab, and finally to model revision, if necessary.
The proposed software package, diffeRenTES, addresses the issue of modeling cell differentiation by allowing computational biologists the possibility of obtaining the differentiation dynamics that a Boolean network is capable of expressing.A far from exhaustive list of possible applications and scientific research that this software can enable includes: (i) the simulation of the differentiation scenarios of existing Boolean models of genetic regulatory networks taken from online datasets, such as Cell Collective [4]; (ii) the possibility of verifying models predictions against real data, e.g. using it in combination with Bioconductor [20,21] and (iii), following the ''ensemble approach'' proposed by Kauffman [3,22], the identification of the generic properties of gene regulatory networks able to match the statistical features of the specific differentiation process under consideration, similar to what is done in [23][24][25].
The intermediate results that can be obtained from the functions in this package are also noteworthy.Indeed, as already stated in Section 2.1, the ATM structure computed with the getATM() function can be used for performing the robustness analysis of gene regulatory network models, as done in the works [8,26,27].
To date, an initial implementation of the mathematical differentiation model on which diffeRenTES is based can be found in the CABeRNET software [28], as a Cytoscape application [29].diffeRenTES makes interoperability one of its main strengths.Indeed, since it is developed with R, a programming language widely used in multidisciplinary research fields and particularly in the biology community, it can be easily integrated with existing pipelines developed in both R and python, by taking advantage of one of the many software bridges that allow the combined use of these two programming languages in mixed work environments, e.g., reticulate [30] and rpy2.This helps to make the world of biological modeling increasingly accessible even to scientists without a background in computer science.

Conclusion and future development
Understanding cell differentiation and the implications of its dysregulations on biological organisms are very challenging tasks.Modeling approaches can mitigate these difficulties.The proposed diffeRenTES package implements the abstractions of a prominent mathematical model of cell differentiation.In particular, it allows to compute the typical differentiation tree describing the whole set of differentiation scenarios that a cell, modeled as a noisy Boolean network, is able to express.Moreover, it provides the possibility to analyze the robustness of the attractors of a Boolean network.
The capabilities offered by the proposed software, together with their ease of use and widespread adoption of the R language, can help advance the understanding of the processes underpinning biological cell differentiation.
Future development of this package will be devoted primarily to the integration of automatic design techniques for the production of Boolean networks capable of giving rise to the differentiation dynamics of the real biological system under investigation; examples of these techniques can be found in [31].In addition, another direction of development will involve the implementation of other models of cell differentiation, again based on Boolean models of gene regulatory networks.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Listing 1 :
Code example l i b r a r y ( diffeRenTES ) l i b r a r y ( BoolNet )