Knowledge (XXG)

Biclustering

Source 📝

928:
feature words in the text, will eventually cluster the feature words. This is called co-clustering. There are two advantages of co-clustering: one is clustering the test based on words clusters can extremely decrease the dimension of clustering, it can also appropriate to measure the distance between the tests. Second is mining more useful information and can get the corresponding information in test clusters and words clusters. This corresponding information can be used to describe the type of texts and words, at the same time, the result of words clustering can be also used to text mining and information retrieval.
975:. Instead of explicitly clustering rows and columns alternately, they consider higher-order occurrences of words, inherently taking into account the documents in which they occur. Thus, the similarity between two words is calculated based on the documents in which they occur and also the documents in which "similar" words occur. The idea here is that two documents about the same topic do not necessarily use the same set of words to describe it, but a subset of the words and other similar words that are characteristic of that topic. This approach of taking higher-order similarities takes the 816:, including: block clustering, CTWC (Coupled Two-Way Clustering), ITWC (Interrelated Two-Way Clustering), δ-bicluster, δ-pCluster, δ-pattern, FLOC, OPC, Plaid Model, OPSMs (Order-preserving submatrixes), Gibbs, SAMBA (Statistical-Algorithmic Method for Bicluster Analysis), Robust Biclustering Algorithm (RoBA), Crossing Minimization, cMonkey, PRMs, DCC, LEB (Localize and Extract Biclusters), QUBIC (QUalitative BIClustering), BCCA (Bi-Correlation Clustering Algorithm) BIMAX, ISA and FABIA ( 851:-CCC-Biclustering. The approximate patterns in CCC-Biclustering algorithms allow a given number of errors, per gene, relatively to an expression profile representing the expression pattern in the Bicluster. The e-CCC-Biclustering algorithm uses approximate expressions to find and report all maximal CCC-Bicluster's by a discretized matrix A and efficient string processing techniques. 183:(KL-distance) between P and Q. P represents the distribution of files and feature words before Biclustering, while Q is the distribution after Biclustering. KL-distance is for measuring the difference between two random distributions. KL = 0 when the two distributions are the same and KL increases as the difference increases. Thus, the aim of the algorithm was to find the minimum 943:
assign each row to a cluster of documents and each column to a cluster of words such that the mutual information is maximized. Matrix-based methods focus on the decomposition of matrices into blocks such that the error between the original matrix and the regenerated matrices from the decomposition is
306:
For Biclusters with coherent values on rows and columns, an overall improvement over the algorithms for Biclusters with constant values on rows or on columns should be considered. This algorithm may contain analysis of variance between groups, using co-variance between both rows and columns. In Cheng
982:
In text databases, for a document collection defined by a document by term D matrix (of size m by n, m: number of documents, n: number of terms) the cover-coefficient based clustering methodology yields the same number of clusters both for documents and terms (words) using a double-stage probability
927:
Text clustering can solve the high-dimensional sparse problem, which means clustering text and words at the same time. When clustering text, we need to think about not only the words information, but also the information of words clusters that was composed by words. Then, according to similarity of
820:
for Bicluster Acquisition), runibic, and recently proposed hybrid method EBIC (evolutionary-based Biclustering), which was shown to detect multiple patterns with very high accuracy. More recently, IMMD-CC is proposed that is developed based on the iterative complexity reduction concept. IMMD-CC is
297:
Unlike the constant-value Biclusters, these types of Biclusters cannot be evaluated solely based on the variance of their values. To finish the identification, the columns and the rows should be normalized first. There are, however, other algorithms, without the normalization step, that can find
280:
is used to compute constant Biclusters. Hence, a perfect Bicluster may be equivalently defined as a matrix with a variance of zero. In order to prevent the partitioning of the data matrix into Biclusters with the only one row and one column; Hartigan assumes that there are, for example,
247:
When a Biclustering algorithm tries to find a constant-value Bicluster, it reorders the rows and columns of the matrix to group together similar rows and columns, eventually grouping Biclusters with similar values. This method is sufficient when the data is normalized. A
226:. The maximum size Bicluster is equivalent to the maximum edge biclique in the bipartite graph. In the complex case, the element in matrix A is used to compute the quality of a given Bicluster and solve the more restricted version of the problem. It requires either large 202:
The complexity of the Biclustering problem depends on the exact problem formulation, and particularly on the merit function used to evaluate the quality of a given Bicluster. However, the most interesting variants of this problem are
174:
In 2001 and 2003, I. S. Dhillon published two algorithms applying biclustering to files and words. One version was based on bipartite spectral graph partitioning. The other was based on information theory. Dhillon assumed the loss of
2218:
N.K. Verma, S. Bajpai, A. Singh, A. Nagrare, S. Meena, Yan Cui, "A Comparison of Biclustering Algorithms" in International conference on Systems in Medicine and Biology (ICSMB 2010)in IIT Kharagpur India, pp. 90–97, Dec.
2222:
J. Gupta, S. Singh and N.K. Verma "MTBA: MATLAB Toolbox for Biclustering Analysis", IEEE Workshop on Computational Intelligence: Theories, Applications and Future Directions", IIT Kanpur India, pp. 148–152, Jul.
888:
allow the exclusion of hard-to-reconcile columns/conditions. Not all of the available algorithms are deterministic and the analyst must pay attention to the degree to which results represent stable
2280:
Adetayo Kasim, Ziv Shkedy, Sebastian Kaiser, Sepp Hochreiter, Willem Talloen (2016), Applied Biclustering Methods for Big and High-Dimensional Data Using R, Chapman & Hall/CRC Press
307:
and Church's theorem, a Bicluster is defined as a subset of rows and columns with almost the same score. The similarity score is used to measure the coherence of rows and columns.
1984:
Madeira SC, Teixeira MC, Sá-Correia I, Oliveira AL (2010). "Identification of Regulatory Modules in Time Series Gene Expression Data using a Linear Time Biclustering Algorithm".
904:
voting amongst them to decide the best result. Another way is to analyze the quality of shifting and scaling patterns in Biclusters. Biclustering has been used in the domain of
1599:
Kriegel, H.-P.; Kröger, P.; Zimek, A. (March 2009). "Clustering High Dimensional Data: A Survey on Subspace Clustering, Pattern-based Clustering, and Correlation Clustering".
1021: 1513: 824:
Biclustering algorithms have also been proposed and used in other application fields under the names co-clustering, bi-dimensional clustering, and subspace clustering.
153: 155:
matrix). The Biclustering algorithm generates Biclusters. A Bicluster is a subset of rows which exhibit similar behavior across a subset of columns, or vice versa.
127: 107: 87: 67: 1774:, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W, Bijnens L, Gohlmann HW, Shkedy Z, Clevert DA (2010). 194:
To cluster more than two types of objects, in 2005, Bekkerman expanded the mutual information in Dhillon's theorem from a single pair into multiple pairs.
971:, for cross similarity) is based on finding document-document similarity and word-word similarity, and then using classical clustering methods such as 2283:
Orzechowski, P., Sipper, M., Huang, X., & Moore, J. H. (2018). EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery.
167:
in 1972. The term "Biclustering" was then later used and refined by Boris G. Mirkin. This algorithm was not generalized until 2000, when Y. Cheng and
924:
algorithms are then applied to discover blocks in D that correspond to a group of documents (rows) characterized by a group of words(columns).
963:
More recently (Bisson and Hussain) have proposed a new approach of using the similarity between words and the similarity between documents to
2139: 1960: 1266: 1158: 1174:
R. Balamurugan; A.M. Natarajan; K. Premalatha (2016). "A Modified Harmony Search Method for Biclustering Microarray Gene Expression Data".
1319: 858:
find and report all maximal Biclusters with coherent and contiguous columns with perfect/approximate expression patterns, in time linear/
191:
instead of KL-distance to design a Biclustering algorithm that was suitable for any kind of matrix, unlike the KL-distance algorithm.
1636:"Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data" 884:
There is an ongoing debate about how to judge the results of these methods, as Biclustering allows overlap between clusters and some
862:
which is obtained by manipulating a discretized version of original expression matrix in the size of the time-series gene expression
1023:
where t is the number of non-zero entries in D. Note that in D each row and each column must contain at least one non-zero element.
900:
makes it difficult to spot errors in the results. One approach is to utilize multiple Biclustering algorithms, with the majority or
171:
proposed a biclustering algorithm based on the mean squared residue score (MSR) and applied it to biological gene expression data.
983:
experiment. According to the cover coefficient concept number of clusters can also be roughly estimated by the following formula
931:
Several approaches have been proposed based on the information contents of the resulting blocks: matrix-based approaches such as
909: 180: 979:
structure of the whole corpus into consideration with the result of generating a better clustering of the documents and words.
877:
Some recent algorithms have attempted to include additional support for Biclustering rectangular matrices in the form of other
1488: 1443: 1398: 1353: 2323: 821:
able to identify co-cluster centroids from highly sparse transformation obtained by iterative multi-mode discretization.
932: 893: 2029:"A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series" 1081:
G. Govaert; M. Nadif (2008). "Block clustering with bernoulli mixture models: Comparison of different approaches".
2313: 1031: 1722:"Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks" 976: 944:
minimized. Graph-based methods tend to minimize the cuts between the clusters. Given two groups of documents d
42:. The term was first introduced by Boris Mirkin to name a technique introduced many years earlier, in 1972, by 273: 2318: 1051: 972: 874:. These algorithms are also applied to solve problems and sketch the analysis of computational complexity. 1210: 797: 916:
D whose rows denote the documents and whose columns denote the words in the dictionary. Matrix elements D
1034:. FABIA utilizes well understood model selection techniques like variational approaches and applies the 936: 1320:
https://www.cs.princeton.edu/courses/archive/fall03/cs597F/Articles/biclustering_of_expression_data.pdf
1931:
Fanaee-T H, Thoresen, M (2020). "Iterative Multi-mode Discretization: Applications to Co-clustering".
1693:
Abdullah, Ahsan; Hussain, Amir (2006). "A new biclustering technique based on crossing minimization".
2165:"Concepts and effectiveness of the cover coefficient based clustering methodology for text databases" 1647: 1338:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
1035: 913: 897: 863: 39: 1422:
Banerjee, Arindam; Dhillon, Inderjit; Ghosh, Joydeep; Merugu, Srujana; Modha, Dharmendra S. (2004).
1215: 1428:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
1383:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
1039: 986: 2164: 1556:
Madeira SC, Oliveira AL (2004). "Biclustering Algorithms for Biological Data Analysis: A Survey".
2197: 2145: 2009: 1966: 1885: 1616: 1581: 1535: 1494: 1449: 1404: 1359: 1302: 1236: 1201:
Van Mechelen I, Bock HH, De Boeck P (2004). "Two-mode clustering methods:a structured overview".
1131: 908:(or classification) which is popularly known as co-clustering. Text corpora are represented in a 176: 952:, the number of cuts can be measured as the number of words that occur in documents of groups d 831:. Recent proposals have addressed the Biclustering problem in the specific case of time-series 132: 2269: 2135: 2122:
Bisson G.; Hussain F. (2008). "Chi-Sim: A New Similarity Measure for the Co-clustering Task".
2101: 2060: 2001: 1956: 1913: 1854: 1805: 1753: 1675: 1573: 1484: 1439: 1394: 1349: 1262: 1228: 1154: 1061: 867: 840: 222:
either 0 or 1 in the binary matrix A, a Bicluster is equal to a biclique in the corresponding
1825:"runibic: a Bioconductor package for parallel row-based biclustering of gene expression data" 2259: 2251: 2187: 2179: 2127: 2091: 2050: 2040: 1993: 1946: 1938: 1903: 1895: 1844: 1836: 1795: 1787: 1743: 1733: 1702: 1665: 1655: 1608: 1565: 1525: 1476: 1431: 1386: 1341: 1294: 1220: 1183: 1121: 1090: 828: 188: 168: 164: 43: 35: 1771: 1424:"A generalized maximum entropy approach to bregman co-clustering and matrix approximation" 1027: 832: 817: 269: 223: 1651: 1026:
In contrast to other approaches, FABIA is a multiplicative model that assumes realistic
2231: 2055: 2028: 1908: 1873: 1849: 1824: 1800: 1775: 1748: 1721: 1110:"Stellar-Mass Black Hole Optimization for Biclustering Microarray Gene Expression Data" 901: 813: 112: 92: 72: 52: 2264: 2239: 1670: 1635: 1530: 2307: 2297: 1970: 1585: 964: 921: 16:
Data mining technique for simultaneous clustering of the rows and columns of a matrix
2201: 2149: 2096: 2079: 1899: 1840: 1791: 1620: 1408: 1363: 1240: 1135: 796:
The relationship between these cluster models and other types of clustering such as
2013: 1874:"EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery" 1539: 1453: 207:. NP-complete has two conditions. In the simple case that there is an only element 1498: 1126: 1109: 1706: 1942: 905: 871: 844: 836: 204: 184: 31: 1473:
Proceedings of the 22nd international conference on Machine learning - ICML '05
1334:"Co-clustering documents and words using bipartite spectral graph partitioning" 1094: 835:
data. In this case, the interesting Biclusters can be restricted to those with
1224: 1187: 885: 859: 809: 231: 2240:"Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions" 2226:
A. Tanay. R. Sharan, and R. Shamir, "Biclustering Algorithms: A Survey", In
1738: 1660: 1612: 1480: 1435: 1423: 1322:
Cheng Y, Church G M. Biclustering of expression data//Ismb. 2000, 8: 93–103.
940: 855: 285:
Biclusters within the data matrix. When the data matrix is partitioned into
2273: 2131: 2105: 2064: 2005: 1917: 1858: 1809: 1757: 1679: 1577: 1378: 1333: 1232: 2124:
2008 Seventh International Conference on Machine Learning and Applications
2045: 1468: 1390: 1345: 1997: 1377:
Dhillon, Inderjit S.; Mallela, Subramanyam; Modha, Dharmendra S. (2003).
1056: 878: 698: 604: 504: 410: 316: 277: 227: 2192: 2183: 1569: 1042:
of each Bicluster to separate spurious Biclusters from true Biclusters.
1951: 1937:. Lecture Notes in Computer Science. Vol. 12323. pp. 94–105. 1306: 2255: 89:-dimensional feature vector, the entire dataset can be represented as 889: 1932: 1298: 1890: 1038:
framework. The generative framework allows FABIA to determine the
298:
Biclusters which have rows and columns with different approaches.
276:, by splitting the original data matrix into a set of Biclusters, 1986:
IEEE/ACM Transactions on Computational Biology and Bioinformatics
1558:
IEEE/ACM Transactions on Computational Biology and Bioinformatics
256:
are equal to a given constant μ. In tangible data, these entries
1469:"Multi-way distributional clustering via pairwise interactions" 2298:
FABIA: Factor Analysis for Bicluster Acquisition, an R package
827:
Given the known importance of discovering local patterns in
187:
between P and Q. In 2004, Arindam Banerjee used a weighted-
1285:
Hartigan JA (1972). "Direct clustering of a data matrix".
2080:"Shifting and scaling patterns from gene expression data" 293:
Bicluster with constant values on rows (b) or columns (c)
1467:
Bekkerman, Ron; El-Yaniv, Ran; McCallum, Andrew (2005).
1176:
International Journal of Data Mining and Bioinformatics
1823:
Orzechowski P, Pańszczyk A, Huang X, Moore JH (2018).
1108:
R. Balamurugan; A.M. Natarajan; K. Premalatha (2015).
989: 135: 115: 95: 75: 55: 843:
and enables the development of efficient exhaustive
1872:Orzechowski P, Sipper M, Huang X, Moore JH (2018). 1551: 1549: 700:e) Bicluster with coherent values (multiplicative) 1776:"FABIA: factor analysis for bicluster acquisition" 1514:"The maximum edge biclique problem is NP-complete" 1151:Co-clustering: models, algorithms and applications 1015: 147: 121: 101: 81: 61: 2238:Kluger Y, Basri R, Chang JT, Gerstein MB (2003). 1601:ACM Transactions on Knowledge Discovery from Data 1640:Proceedings of the National Academy of Sciences 1287:Journal of the American Statistical Association 1634:Tanay A, Sharan R, Kupiec M, Shamir R (2004). 606:d) Bicluster with coherent values (additive) 506:c) Bicluster with constant values on columns 8: 600: 312: 2228:Handbook of Computational Molecular Biology 920:denote occurrence of word j in document i. 2117: 2115: 1280: 1278: 1259:Mathematical Classification and Clustering 1083:Computational Statistics and Data Analysis 412:b) Bicluster with constant values on rows 163:Biclustering was originally introduced by 2263: 2191: 2095: 2054: 2044: 1950: 1907: 1889: 1848: 1799: 1747: 1737: 1669: 1659: 1529: 1214: 1125: 1005: 988: 134: 114: 94: 74: 54: 1252: 1250: 847:algorithms such as CCC-Biclustering and 1720:Reiss DJ, Baliga NS, Bonneau R (2006). 1203:Statistical Methods in Medical Research 1073: 1379:"Information-theoretic co-clustering" 935:and BVD, and graph-based approaches. 839:columns. This restriction leads to a 302:Bicluster with coherent values (d, e) 252:is a matrix(I,J) in which all values 179:during biclustering was equal to the 7: 2172:ACM Transactions on Database Systems 34:technique which allows simultaneous 28:Co-clustering or two-mode clustering 967:the matrix. Their method (known as 2163:Can, F.; Ozkarahan, E. A. (1990). 318:a) Bicluster with constant values 243:Bicluster with constant values (a) 234:to short-circuit the calculation. 14: 260:may be represented with the form 2033:Algorithms for Molecular Biology 2027:Madeira SC, Oliveira AL (2009). 289:Biclusters, the algorithm ends. 1114:Applied Artificial Intelligence 1261:. Kluwer Academic Publishers. 1002: 990: 1: 2097:10.1093/bioinformatics/bti641 1900:10.1093/bioinformatics/bty401 1841:10.1093/bioinformatics/bty512 1792:10.1093/bioinformatics/btq227 1531:10.1016/S0166-218X(03)00333-0 1332:Dhillon, Inderjit S. (2001). 1149:G. Govaert; M. Nadif (2013). 1127:10.1080/08839514.2015.1016391 1016:{\displaystyle (m\times n)/t} 38:of the rows and columns of a 1707:10.1016/j.neucom.2006.02.018 1518:Discrete Applied Mathematics 808:There are many Biclustering 1943:10.1007/978-3-030-61527-7_7 894:unsupervised classification 230:effort or the use of lossy 2340: 1095:10.1016/j.csda.2007.09.007 1030:signal distributions with 250:perfect constant Bicluster 69:samples represented by an 1225:10.1191/0962280204sm373ra 1188:10.1504/IJDMB.2016.082205 181:Kullback–Leibler-distance 148:{\displaystyle m\times n} 2078:Aguilar-Ruiz JS (2005). 1739:10.1186/1471-2105-7-280 1661:10.1073/pnas.0308661100 1613:10.1145/1497577.1497578 1481:10.1145/1102351.1102357 1436:10.1145/1014052.1014111 1052:Formal concept analysis 973:hierarchical clustering 896:problem, the lack of a 2132:10.1109/ICMLA.2008.103 1257:Mirkin, Boris (1996). 1017: 798:correlation clustering 149: 123: 103: 83: 63: 2046:10.1186/1748-7188-4-8 1391:10.1145/956750.956764 1346:10.1145/502512.502550 1018: 937:Information-theoretic 892:. Because this is an 881:, including cMonkey. 150: 124: 104: 84: 64: 2324:NP-complete problems 2126:. pp. 211–217. 1998:10.1109/TCBB.2008.34 1701:(16–18): 1882–1896. 1430:. pp. 509–514. 1340:. pp. 269–274. 987: 870:techniques based on 274:Hartigan's algorithm 133: 113: 93: 73: 53: 2184:10.1145/99935.99938 1652:2004PNAS..101.2981T 1570:10.1109/TCBB.2004.2 1040:information content 701: 607: 507: 413: 319: 238:Types of Biclusters 1726:BMC Bioinformatics 1512:Peeters R (2003). 1475:. pp. 41–48. 1385:. pp. 89–98. 1013: 699: 605: 505: 411: 317: 177:mutual information 145: 129:columns (i.e., an 119: 99: 79: 59: 2256:10.1101/gr.648603 2141:978-0-7695-3495-4 2090:(10): 3840–3845. 1962:978-3-030-61526-0 1934:Discovery Science 1884:(21): 3719–3726. 1835:(24): 4302–4304. 1786:(12): 1520–1527. 1268:978-0-7923-4159-8 1160:978-1-84821-473-6 1062:Galois connection 868:string processing 841:tractable problem 800:is discussed in. 793: 792: 789: 788: 695: 694: 599: 598: 595: 594: 501: 500: 407: 406: 122:{\displaystyle n} 102:{\displaystyle m} 82:{\displaystyle n} 62:{\displaystyle m} 2331: 2314:Cluster analysis 2277: 2267: 2234:, Chapman (2004) 2206: 2205: 2195: 2169: 2160: 2154: 2153: 2119: 2110: 2109: 2099: 2075: 2069: 2068: 2058: 2048: 2024: 2018: 2017: 1981: 1975: 1974: 1954: 1928: 1922: 1921: 1911: 1893: 1869: 1863: 1862: 1852: 1820: 1814: 1813: 1803: 1768: 1762: 1761: 1751: 1741: 1717: 1711: 1710: 1690: 1684: 1683: 1673: 1663: 1646:(9): 2981–2986. 1631: 1625: 1624: 1596: 1590: 1589: 1553: 1544: 1543: 1533: 1509: 1503: 1502: 1464: 1458: 1457: 1419: 1413: 1412: 1374: 1368: 1367: 1329: 1323: 1317: 1311: 1310: 1282: 1273: 1272: 1254: 1245: 1244: 1218: 1198: 1192: 1191: 1171: 1165: 1164: 1146: 1140: 1139: 1129: 1105: 1099: 1098: 1089:(6): 3233–3245. 1078: 1022: 1020: 1019: 1014: 1009: 866:using efficient 829:time-series data 702: 608: 601: 508: 414: 320: 313: 189:Bregman distance 169:George M. Church 165:John A. Hartigan 154: 152: 151: 146: 128: 126: 125: 120: 108: 106: 105: 100: 88: 86: 85: 80: 68: 66: 65: 60: 44:John A. Hartigan 24:block clustering 2339: 2338: 2334: 2333: 2332: 2330: 2329: 2328: 2304: 2303: 2300:—software 2294: 2244:Genome Research 2237: 2215: 2210: 2209: 2167: 2162: 2161: 2157: 2142: 2121: 2120: 2113: 2077: 2076: 2072: 2026: 2025: 2021: 1983: 1982: 1978: 1963: 1930: 1929: 1925: 1871: 1870: 1866: 1822: 1821: 1817: 1770: 1769: 1765: 1719: 1718: 1714: 1692: 1691: 1687: 1633: 1632: 1628: 1598: 1597: 1593: 1555: 1554: 1547: 1511: 1510: 1506: 1491: 1466: 1465: 1461: 1446: 1421: 1420: 1416: 1401: 1376: 1375: 1371: 1356: 1331: 1330: 1326: 1318: 1314: 1299:10.2307/2284710 1284: 1283: 1276: 1269: 1256: 1255: 1248: 1216:10.1.1.706.4201 1200: 1199: 1195: 1173: 1172: 1168: 1161: 1153:. ISTE, Wiley. 1148: 1147: 1143: 1107: 1106: 1102: 1080: 1079: 1075: 1070: 1048: 985: 984: 977:latent semantic 959: 955: 951: 947: 919: 833:gene expression 818:Factor analysis 806: 795: 310: 272:. According to 240: 224:bipartite graph 221: 200: 161: 131: 130: 111: 110: 91: 90: 71: 70: 51: 50: 49:Given a set of 17: 12: 11: 5: 2337: 2335: 2327: 2326: 2321: 2319:Bioinformatics 2316: 2306: 2305: 2302: 2301: 2293: 2292:External links 2290: 2289: 2288: 2285:Bioinformatics 2281: 2278: 2250:(4): 703–716. 2235: 2232:Srinivas Aluru 2224: 2220: 2214: 2211: 2208: 2207: 2178:(4): 483–517. 2155: 2140: 2111: 2084:Bioinformatics 2070: 2019: 1992:(7): 153–165. 1976: 1961: 1923: 1878:Bioinformatics 1864: 1829:Bioinformatics 1815: 1780:Bioinformatics 1763: 1712: 1695:Neurocomputing 1685: 1626: 1591: 1545: 1524:(3): 651–654. 1504: 1489: 1459: 1444: 1414: 1399: 1369: 1354: 1324: 1312: 1293:(337): 123–9. 1274: 1267: 1246: 1193: 1182:(4): 269–289. 1166: 1159: 1141: 1120:(4): 353–381. 1100: 1072: 1071: 1069: 1066: 1065: 1064: 1059: 1054: 1047: 1044: 1012: 1008: 1004: 1001: 998: 995: 992: 957: 953: 949: 945: 917: 902:super-majority 814:bioinformatics 812:developed for 805: 802: 791: 790: 787: 786: 783: 780: 777: 774: 770: 769: 766: 763: 760: 757: 753: 752: 749: 746: 743: 740: 736: 735: 732: 729: 726: 723: 719: 718: 715: 712: 709: 706: 696: 693: 692: 689: 686: 683: 680: 676: 675: 672: 669: 666: 663: 659: 658: 655: 652: 649: 646: 642: 641: 638: 635: 632: 629: 625: 624: 621: 618: 615: 612: 597: 596: 593: 592: 589: 586: 583: 580: 576: 575: 572: 569: 566: 563: 559: 558: 555: 552: 549: 546: 542: 541: 538: 535: 532: 529: 525: 524: 521: 518: 515: 512: 502: 499: 498: 495: 492: 489: 486: 482: 481: 478: 475: 472: 469: 465: 464: 461: 458: 455: 452: 448: 447: 444: 441: 438: 435: 431: 430: 427: 424: 421: 418: 408: 405: 404: 401: 398: 395: 392: 388: 387: 384: 381: 378: 375: 371: 370: 367: 364: 361: 358: 354: 353: 350: 347: 344: 341: 337: 336: 333: 330: 327: 324: 239: 236: 211: 199: 196: 160: 157: 144: 141: 138: 118: 98: 78: 58: 15: 13: 10: 9: 6: 4: 3: 2: 2336: 2325: 2322: 2320: 2317: 2315: 2312: 2311: 2309: 2299: 2296: 2295: 2291: 2286: 2282: 2279: 2275: 2271: 2266: 2261: 2257: 2253: 2249: 2245: 2241: 2236: 2233: 2229: 2225: 2221: 2217: 2216: 2212: 2203: 2199: 2194: 2189: 2185: 2181: 2177: 2173: 2166: 2159: 2156: 2151: 2147: 2143: 2137: 2133: 2129: 2125: 2118: 2116: 2112: 2107: 2103: 2098: 2093: 2089: 2085: 2081: 2074: 2071: 2066: 2062: 2057: 2052: 2047: 2042: 2038: 2034: 2030: 2023: 2020: 2015: 2011: 2007: 2003: 1999: 1995: 1991: 1987: 1980: 1977: 1972: 1968: 1964: 1958: 1953: 1948: 1944: 1940: 1936: 1935: 1927: 1924: 1919: 1915: 1910: 1905: 1901: 1897: 1892: 1887: 1883: 1879: 1875: 1868: 1865: 1860: 1856: 1851: 1846: 1842: 1838: 1834: 1830: 1826: 1819: 1816: 1811: 1807: 1802: 1797: 1793: 1789: 1785: 1781: 1777: 1773: 1767: 1764: 1759: 1755: 1750: 1745: 1740: 1735: 1731: 1727: 1723: 1716: 1713: 1708: 1704: 1700: 1696: 1689: 1686: 1681: 1677: 1672: 1667: 1662: 1657: 1653: 1649: 1645: 1641: 1637: 1630: 1627: 1622: 1618: 1614: 1610: 1606: 1602: 1595: 1592: 1587: 1583: 1579: 1575: 1571: 1567: 1563: 1559: 1552: 1550: 1546: 1541: 1537: 1532: 1527: 1523: 1519: 1515: 1508: 1505: 1500: 1496: 1492: 1486: 1482: 1478: 1474: 1470: 1463: 1460: 1455: 1451: 1447: 1441: 1437: 1433: 1429: 1425: 1418: 1415: 1410: 1406: 1402: 1396: 1392: 1388: 1384: 1380: 1373: 1370: 1365: 1361: 1357: 1351: 1347: 1343: 1339: 1335: 1328: 1325: 1321: 1316: 1313: 1308: 1304: 1300: 1296: 1292: 1288: 1281: 1279: 1275: 1270: 1264: 1260: 1253: 1251: 1247: 1242: 1238: 1234: 1230: 1226: 1222: 1217: 1212: 1209:(5): 363–94. 1208: 1204: 1197: 1194: 1189: 1185: 1181: 1177: 1170: 1167: 1162: 1156: 1152: 1145: 1142: 1137: 1133: 1128: 1123: 1119: 1115: 1111: 1104: 1101: 1096: 1092: 1088: 1084: 1077: 1074: 1067: 1063: 1060: 1058: 1055: 1053: 1050: 1049: 1045: 1043: 1041: 1037: 1033: 1029: 1024: 1010: 1006: 999: 996: 993: 980: 978: 974: 970: 966: 961: 942: 938: 934: 929: 925: 923: 922:Co-clustering 915: 911: 907: 903: 899: 898:gold standard 895: 891: 887: 882: 880: 875: 873: 869: 865: 861: 857: 852: 850: 846: 842: 838: 834: 830: 825: 822: 819: 815: 811: 803: 801: 799: 784: 781: 778: 775: 772: 771: 767: 764: 761: 758: 755: 754: 750: 747: 744: 741: 738: 737: 733: 730: 727: 724: 721: 720: 716: 713: 710: 707: 704: 703: 697: 690: 687: 684: 681: 678: 677: 673: 670: 667: 664: 661: 660: 656: 653: 650: 647: 644: 643: 639: 636: 633: 630: 627: 626: 622: 619: 616: 613: 610: 609: 603: 602: 590: 587: 584: 581: 578: 577: 573: 570: 567: 564: 561: 560: 556: 553: 550: 547: 544: 543: 539: 536: 533: 530: 527: 526: 522: 519: 516: 513: 510: 509: 503: 496: 493: 490: 487: 484: 483: 479: 476: 473: 470: 467: 466: 462: 459: 456: 453: 450: 449: 445: 442: 439: 436: 433: 432: 428: 425: 422: 419: 416: 415: 409: 402: 399: 396: 393: 390: 389: 385: 382: 379: 376: 373: 372: 368: 365: 362: 359: 356: 355: 351: 348: 345: 342: 339: 338: 334: 331: 328: 325: 322: 321: 315: 314: 311: 308: 304: 303: 299: 295: 294: 290: 288: 284: 279: 275: 271: 267: 263: 259: 255: 251: 245: 244: 237: 235: 233: 229: 228:computational 225: 219: 215: 210: 206: 197: 195: 192: 190: 186: 182: 178: 172: 170: 166: 158: 156: 142: 139: 136: 116: 96: 76: 56: 47: 45: 41: 37: 33: 29: 25: 21: 2284: 2247: 2243: 2230:, Edited by 2227: 2193:2374.MIA/246 2175: 2171: 2158: 2123: 2087: 2083: 2073: 2036: 2032: 2022: 1989: 1985: 1979: 1933: 1926: 1881: 1877: 1867: 1832: 1828: 1818: 1783: 1779: 1772:Hochreiter S 1766: 1729: 1725: 1715: 1698: 1694: 1688: 1643: 1639: 1629: 1604: 1600: 1594: 1564:(1): 24–45. 1561: 1557: 1521: 1517: 1507: 1472: 1462: 1427: 1417: 1382: 1372: 1337: 1327: 1315: 1290: 1286: 1258: 1206: 1202: 1196: 1179: 1175: 1169: 1150: 1144: 1117: 1113: 1103: 1086: 1082: 1076: 1028:non-Gaussian 1025: 981: 968: 962: 930: 926: 883: 876: 872:suffix trees 853: 848: 826: 823: 807: 794: 309: 305: 301: 300: 296: 292: 291: 286: 282: 268:denotes the 265: 261: 257: 253: 249: 246: 242: 241: 217: 213: 208: 201: 193: 173: 162: 48: 27: 23: 20:Biclustering 19: 18: 1952:10852/82994 1732:: 280–302. 1607:(1): 1–58. 1032:heavy tails 941:iteratively 939:algorithms 906:text mining 845:enumeration 205:NP-complete 185:KL-distance 159:Development 32:data mining 2308:Categories 1891:1801.03039 1490:1595931805 1445:1581138881 1400:1581137370 1355:158113391X 1068:References 965:co-cluster 912:form as a 886:algorithms 860:polynomial 856:algorithms 837:contiguous 810:algorithms 804:Algorithms 262:n(i,j) + μ 232:heuristics 198:Complexity 36:clustering 1971:222832035 1586:206628783 1211:CiteSeerX 997:× 879:datatypes 140:× 2274:12671006 2202:14309214 2150:15506600 2106:16144809 2065:19497096 2039:(8): 8. 2006:20150677 1918:29790909 1859:29939213 1810:20418340 1758:16749936 1680:14973197 1621:17363900 1578:17048406 1409:12286784 1364:11847258 1241:19058237 1233:15516031 1136:44624424 1057:Biclique 1046:See also 1036:Bayesian 910:vectoral 278:variance 109:rows in 2056:2709627 2014:7369531 1909:6198864 1850:6289127 1801:2881408 1749:1502140 1648:Bibcode 1540:3102766 1454:2719002 1307:2284710 2272:  2265:430175 2262:  2219:16–18. 2213:Others 2200:  2148:  2138:  2104:  2063:  2053:  2012:  2004:  1969:  1959:  1916:  1906:  1857:  1847:  1808:  1798:  1756:  1746:  1678:  1671:365731 1668:  1619:  1584:  1576:  1538:  1499:858524 1497:  1487:  1452:  1442:  1407:  1397:  1362:  1352:  1305:  1265:  1239:  1231:  1213:  1157:  1134:  914:matrix 890:minima 864:matrix 854:These 266:n(i,j) 264:where 258:a(i,j) 254:a(i,j) 40:matrix 2223:2013. 2198:S2CID 2168:(PDF) 2146:S2CID 2010:S2CID 1967:S2CID 1886:arXiv 1617:S2CID 1582:S2CID 1536:S2CID 1495:S2CID 1450:S2CID 1405:S2CID 1360:S2CID 1303:JSTOR 1237:S2CID 1132:S2CID 969:χ-Sim 956:and d 948:and d 270:noise 30:is a 2270:PMID 2136:ISBN 2102:PMID 2061:PMID 2002:PMID 1957:ISBN 1914:PMID 1855:PMID 1806:PMID 1754:PMID 1676:PMID 1574:PMID 1485:ISBN 1440:ISBN 1395:ISBN 1350:ISBN 1263:ISBN 1229:PMID 1155:ISBN 785:4.0 779:10.0 768:3.2 751:2.4 734:1.6 717:0.8 691:2.5 674:5.5 657:3.5 640:4.5 623:1.5 591:5.0 574:5.0 557:5.0 540:5.0 523:5.0 497:5.0 480:4.0 463:3.0 446:2.0 429:1.0 403:2.0 386:2.0 369:2.0 352:2.0 335:2.0 2260:PMC 2252:doi 2188:hdl 2180:doi 2128:doi 2092:doi 2051:PMC 2041:doi 1994:doi 1947:hdl 1939:doi 1904:PMC 1896:doi 1845:PMC 1837:doi 1796:PMC 1788:doi 1744:PMC 1734:doi 1703:doi 1666:PMC 1656:doi 1644:101 1609:doi 1566:doi 1526:doi 1522:131 1477:doi 1432:doi 1387:doi 1342:doi 1295:doi 1221:doi 1184:doi 1122:doi 1091:doi 933:SVD 782:1.0 776:2.5 773:5.0 765:0.8 762:8.0 759:2.0 756:4.0 748:0.6 745:6.0 742:1.5 739:3.0 731:0.4 728:4.0 725:1.0 722:2.0 714:0.2 711:2.0 708:0.5 705:1.0 688:1.0 685:6.0 682:5.0 679:2.0 671:4.0 668:9.0 665:8.0 662:5.0 654:2.0 651:7.0 648:6.0 645:3.0 637:3.0 634:8.0 631:7.0 628:4.0 620:0.0 617:5.0 614:4.0 611:1.0 588:4.0 585:3.0 582:2.0 579:1.0 571:4.0 568:3.0 565:2.0 562:1.0 554:4.0 551:3.0 548:2.0 545:1.0 537:4.0 534:3.0 531:2.0 528:1.0 520:4.0 517:3.0 514:2.0 511:1.0 494:5.0 491:5.0 488:5.0 485:5.0 477:4.0 474:4.0 471:4.0 468:4.0 460:3.0 457:3.0 454:3.0 451:3.0 443:2.0 440:2.0 437:2.0 434:2.0 426:1.0 423:1.0 420:1.0 417:1.0 400:2.0 397:2.0 394:2.0 391:2.0 383:2.0 380:2.0 377:2.0 374:2.0 366:2.0 363:2.0 360:2.0 357:2.0 349:2.0 346:2.0 343:2.0 340:2.0 332:2.0 329:2.0 326:2.0 323:2.0 2310:: 2268:. 2258:. 2248:13 2246:. 2242:. 2196:. 2186:. 2176:15 2174:. 2170:. 2144:. 2134:. 2114:^ 2100:. 2088:21 2086:. 2082:. 2059:. 2049:. 2035:. 2031:. 2008:. 2000:. 1988:. 1965:. 1955:. 1945:. 1912:. 1902:. 1894:. 1882:34 1880:. 1876:. 1853:. 1843:. 1833:34 1831:. 1827:. 1804:. 1794:. 1784:26 1782:. 1778:. 1752:. 1742:. 1728:. 1724:. 1699:69 1697:. 1674:. 1664:. 1654:. 1642:. 1638:. 1615:. 1603:. 1580:. 1572:. 1560:. 1548:^ 1534:. 1520:. 1516:. 1493:. 1483:. 1471:. 1448:. 1438:. 1426:. 1403:. 1393:. 1381:. 1358:. 1348:. 1336:. 1301:. 1291:67 1289:. 1277:^ 1249:^ 1235:. 1227:. 1219:. 1207:13 1205:. 1180:16 1178:. 1130:. 1118:29 1116:. 1112:. 1087:52 1085:. 960:. 918:ij 46:. 26:, 22:, 2287:. 2276:. 2254:: 2204:. 2190:: 2182:: 2152:. 2130:: 2108:. 2094:: 2067:. 2043:: 2037:4 2016:. 1996:: 1990:1 1973:. 1949:: 1941:: 1920:. 1898:: 1888:: 1861:. 1839:: 1812:. 1790:: 1760:. 1736:: 1730:7 1709:. 1705:: 1682:. 1658:: 1650:: 1623:. 1611:: 1605:3 1588:. 1568:: 1562:1 1542:. 1528:: 1501:. 1479:: 1456:. 1434:: 1411:. 1389:: 1366:. 1344:: 1309:. 1297:: 1271:. 1243:. 1223:: 1190:. 1186:: 1163:. 1138:. 1124:: 1097:. 1093:: 1011:t 1007:/ 1003:) 1000:n 994:m 991:( 958:2 954:1 950:2 946:1 849:e 287:K 283:K 220:) 218:j 216:, 214:i 212:( 209:a 143:n 137:m 117:n 97:m 77:n 57:m

Index

data mining
clustering
matrix
John A. Hartigan
John A. Hartigan
George M. Church
mutual information
Kullback–Leibler-distance
KL-distance
Bregman distance
NP-complete
bipartite graph
computational
heuristics
noise
Hartigan's algorithm
variance
correlation clustering
algorithms
bioinformatics
Factor analysis
time-series data
gene expression
contiguous
tractable problem
enumeration
algorithms
polynomial
matrix
string processing

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.