direct link in between, cyt oHubba provides a shortest
path detect ion tool (“display the shortest path” on the
display option). All connectible but not direct connected
node pairs in a network, retrieved either by ID search or
by top-ranked in to pological feature score, ar e con-
nected by dotted-lines with number of the smallest edge
number (shortest path) to make thi s link. The stepping-
stone nodes and edges composing the shortest path will
be expanded by a mouse right-click action. Comparing
with the other cytoscape plugin Short estPath which
sketches the path between two nodes [5], cytoHubba
fetches the s hortes t path among a group of nodes. This
abstractive view provides the distance among essential
nodes.
The performance
The studies of protein-protein interactions will be more
powerful whe n the interactome co verage increases. How-
ever, the complexity of the network will also increase,
that always hampers computation tasks. After the optimi-
zation on the programs, cytoHubba is able to complete all
eleven analysis of a small network (e.g. 330 nodes, 360
edges), a mid dle size one (7,600 nodes, 20,000 edges) and
a large set (11,500 nodes, 33,600 edges) in few seconds,
around 30 seco nds a nd fe w minute s, respe ctively, on a
common desktop/ notebook (Cytoscape version 2.6.x /
2.7.x / 2.8.x on Window 7/8 platform; hardware spec as
Intel i7, 8 GB of RAM). CytoHubba has been updated
several times since 2009 (from v1.0 to v1.6). It is freely
accessible in Cyt oscape App store (http://apps.cyt oscape.
org/apps/cytohubba). The accumul ated dow nloading
number is around 6,500 (http://chianti.ucsd.edu/cyto_-
web/plugins/plugindownloadstatistics.php, statistics on
May 2014). And it is used widely to analyze cancer meta-
bolic network[6], innate immune network[7], complex
biofilm communities[8] and so on.
Validation by Predict yeast essential proteins
We use cytoHubba to score all proteins in the yeast pro -
tein interaction network by the e leven methods. DIP
database (http://dip.do e-mbi.ucla.edu, version: 20140117)
is composed of 4,908 proteins and 21,732 interactions
after removing self-interactions and redundant records.
The essential protein lists are collected from Saccharo-
myces Genome Deletion Project (SGDP) and Saccharo-
myces Genome Databa se (SGD). There are 1,122 and
1,280 proteins defined as essential pr oteins by SGDP and
SGD respectively. We use the union set (1,297 proteins)
for verifying the performance of the predictions.
The statistics of yeast PPIs are shown in Table 1. Twenty
three percent (=1148/4908) of the proteins in this network
are defined as esse ntial proteins in the dataset. We call a
nodeishigh-degreeifthenumberofitsneighborsisgreater
than a threshold; otherwise we call it a low-degree node,
where the threshold is the maximum integer such that
2 ×
v∈V,De
(v)>t
Deg(v) >
v∈V
Deg(v
.Thethreshold
of the PPI network (DIP 20140117) used in this paper is 21.
As shown as Table 1, there are 4,396 proteins in low-degree
category and 512 proteins in high-degree category, in which
908proteinsand214proteins are essential proteins.
Table 2 shows the performance (precision of predic-
tion) to predict essential proteins in top × ranked node
identified by each method. In most methods, the preci-
sion decrea ses w hen the selected n umber increase.
Besides, a local-based method is better than global-
based method in discovering yeast essential proteins. To
further understand the preference of network feature
selected by different methods, we compare the number
of proteins in common in the top 100 ranked of any
two scoring methods (Table 3). The top 100 ranked list
of Closeness is most identical to the result of Radiality
(99%), indicate that the network features detected by
these two methods are very similar. MCC shares less
common components to oth er methods (less than 30%).
The top 100 ranked proteins suggested by DMNC do
not appear in other methods’ list except MCC (30%),
means this method detect different features from the
other ten methods.
As shown as Table 1, in the y east PPIs, 21% of pro-
teins in low -degree cat egory are essential prote ins and
42% of proteins are essential proteins in high-degree
category. Accordingly, if we pick a protein randomly
from the high-degree pool, the probability that an essen-
tial protein being chosen is 0.42, an d 0.21 from a low-
degree protein p ool respectively. Table 4 is the number
of essential proteins found in the top × list with low-
degree feature. Fo r example, among the top 30 protein
ranked by DMNC, 29 proteins are in the l ow-degree
category, in which 21 out of 29 proteins are essential
proteins. Methods except DMNC, MCC and EcCentri-
city tend to assign higher scores to a node when it owns
more neighbors while almost no any low-degree pro-
teins are found in their top × list. In other word, these
methods cannot find low-degree essential proteins.
Conclusions
In this study, we im plement o ur network scoring meth-
ods, MCC, MNC and DMNC, and eight other popular
Table 1 Statistics of Yeast PPIs used in this study (DIP
database, 20140117 released set), in the aspects of
degree and essentiality
Total Low-degree High-degree
The number of proteins 4908 4396 512
Essential proteins (%) 1148 908 214
(23%) (21%) (42%)
Chin et al. BMC Systems Biology 2014, 8(Suppl 4):S11
http://www.biomedcentral.com/1752-0509/8/S4/S11
Page 3 of 7