Knowledge

Factorial code

Source 📝

99:
permutation) which provides a greedy yet very effective approximation of the optimal solution. Practically, they show that with a careful implementation, the favorable properties of the order permutation may be achieved in an asymptotically optimal computational complexity. Importantly, they provide theoretical guarantees, showing that while not every random vector can be efficiently decomposed into independent components, the majority of vectors do decompose very well (that is, with a small constant cost), as the dimension increases. In addition, they demonstrate the use of factorial codes to data compression in multiple setups (2017).
98:
over finite alphabet sizes. Through a series of theorems they show that the factorial coding problem can be accurately solved with a branch and bound search tree algorithm, or tightly approximated with a series of linear problems. In addition, they introduce a simple transformation (namely, order
79:, each receiving the raw data as an input. For each detector there is a predictor that sees the other detectors and learns to predict the output of its own detector in response to the various input vectors or images. But each detector uses a 38:
usually works much better when the raw input data is first translated into such a factorial code. For example, suppose the final goal is to classify images with highly redundant pixels. A
23:. In other words, knowing the value of an element will provide information about the value of elements in the data vector. When this occurs, it can be desirable to create a 163:
A. Painsky, S. Rosset and M. Feder. Large Alphabet Source Coding using Independent Component Analysis. IEEE Transactions on Information Theory, 63(10):6514 - 6529, 2017
160:
A. Painsky, S. Rosset and M. Feder. Generalized independent component analysis over finite alphabets. IEEE Transactions on Information Theory, 62(2):1038-1053, 2016
157:
J. Schmidhuber and M. Eldracher and B. Foltin. Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786, 1996
31:
of each data vector such that it gets uniquely encoded by the resulting code vector (loss-free coding), but the code components are statistically independent.
176: 49:
and therefore fail to produce good results. If the data are first encoded in a factorial way, however, then the naive Bayes classifier will achieve its
95: 113: 28: 181: 73: 91:
corresponds to a factorial code represented in a distributed fashion across the outputs of the feature detectors.
43: 20: 108: 148:, T. P. Kaushal, and G. J. Mitchison. Finding minimum entropy codes. Neural Computation, 1:412-423, 1989. 39: 123: 65: 151: 69: 35: 154:. Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879, 1992 88: 133: 128: 80: 46: 118: 84: 50: 94:
Painsky, Rosset and Feder (2016, 2017) further studied this problem in the context of
19:
Most real world data sets consist of data vectors whose individual components are not
170: 145: 57: 76: 72:(1992) re-formulated the problem in terms of predictors and binary 61: 83:
algorithm to become as unpredictable as possible. The
60:
and co-workers suggested to minimize the sum of the
53:performance (compare Schmidhuber et al. 1996). 8: 16:Data representation for machine learning 27:of the data, i.e., a new vector-valued 7: 64:entropies of the code components of 114:Principal component analysis (PCA) 14: 177:Independence (probability theory) 96:independent component analysis 1: 109:Blind signal separation (BSS) 56:To create factorial codes, 42:will assume the pixels are 198: 44:statistically independent 21:statistically independent 40:naive Bayes classifier 124:Unsupervised learning 36:supervised learning 152:Jürgen Schmidhuber 89:objective function 70:Jürgen Schmidhuber 182:Signal processing 134:Signal processing 189: 129:Image processing 81:machine learning 47:random variables 197: 196: 192: 191: 190: 188: 187: 186: 167: 166: 142: 119:Factor analysis 105: 17: 12: 11: 5: 195: 193: 185: 184: 179: 169: 168: 165: 164: 161: 158: 155: 149: 141: 138: 137: 136: 131: 126: 121: 116: 111: 104: 101: 85:global optimum 68:codes (1989). 29:representation 25:factorial code 15: 13: 10: 9: 6: 4: 3: 2: 194: 183: 180: 178: 175: 174: 172: 162: 159: 156: 153: 150: 147: 146:Horace Barlow 144: 143: 139: 135: 132: 130: 127: 125: 122: 120: 117: 115: 112: 110: 107: 106: 102: 100: 97: 92: 90: 86: 82: 78: 75: 71: 67: 63: 59: 58:Horace Barlow 54: 52: 48: 45: 41: 37: 32: 30: 26: 22: 93: 55: 33: 24: 18: 171:Categories 140:References 77:detectors 103:See also 87:of this 74:feature 51:optimal 66:binary 34:Later 62:bit 173::

Index

statistically independent
representation
supervised learning
naive Bayes classifier
statistically independent
random variables
optimal
Horace Barlow
bit
binary
Jürgen Schmidhuber
feature
detectors
machine learning
global optimum
objective function
independent component analysis
Blind signal separation (BSS)
Principal component analysis (PCA)
Factor analysis
Unsupervised learning
Image processing
Signal processing
Horace Barlow
Jürgen Schmidhuber
Categories
Independence (probability theory)
Signal processing

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.