Representing knowledge as a partitioning on a single set of words

Imagine a tessellation made from equilateral triangle shaped tiles. We could add more and more tiles besides each other till we reach infinity. Now we would like to model the following properties of language using a tessellation. We want for different shapes of tiles to be fitted together in our tessellation, where different tiles contain different information in the form of sentences.This will be useful in a number of ways. The process of communication will be equivalent to selecting a particular tile from this space.So could the process of acquiring commands to give to an agent responsible for making actions. To select a tile we may choose to select an Nth term in this tessellation according to some sorting algorithm that sorts this space of tessellations. We will also depart from having these tiles tessellate a Euclidean space, but have it be a non-Euclidean manifold, in order for the shapes to fit as we would like. In our tessellation procedure, we may begin with one sentence and add its corresponding shape. Upon adding the second shape / sentence we have to consider the case where they share some of the same words. In this case, we deform the initially flat manifold the following way. The second sentences tile will include some of the first sentences tile in it, just as a tessellation of rectangles, if set up right, could be viewed as two layers of tiling, one of squares,(by joining appropriate rectangles together) which is superimposed on the one of the rectangles themselves. Now this manifold we are creating has a notion or measure of dimensionality as associated with its complexity, measured by how much tiles encroach on each other. This measure will be crucial to how the model learns. We will call the dimensionality of the structure we are creating Gamma. To visualize what is going on, if two sentences/tiles contain the word “Dog” it is as if we have taken both tiles and found a way to deform them in order for them to share that part in the overall tessellation. Individual words then form the basis of the deformation procedure and each new modification by a unique word, of two or more sentence/ tiles within the tessellation, adjusts the value of gamma of the overall tessellation in a unique way. What we have in fact in our system is the set of English words. Each word is related to different values of gamma in different ways. And different values of gamma partition this set differently. This partition is the tessellation. Each tessellation is associated with a set of sentences.This allows us to store an arbitrarily large document, in a much reduced space as we only need to have the set of English words and a value for gamma to store it. But we can do much more with this. In a conversational interface we could return a small “document” at each response. To train the system for this function we need to have a model of logic built into the database. When a sentence is parsed into it, all the sentences related and dependent on the information contained in that sentence should be adjusted automatically by the system. Upon learning that “the cat is hungry” the sentences, “I must feed it” and “it is alive”, along with many others, should all come into existence in the database. As mentioned we have tiles within tiles representing words, within sentences giving gamma its value. Changing Gamma will change the tessellation and add sentences and remove others. The problem we now have to deal with is that the changes in gamma required for learning new facts, as the system is, are nonlinear, and are full of jump discontinuities. But what we do know is that these jumps are part of a pattern, and we only need to learn the required pattern and extrapolate during operation. That is all established material. Finding a pattern in gamma however removes from us the need to learn the pattern as it is, contained in multiple words that are found in multiple sentences in multiple contexts and limits our search considerably.

Comments

Popular posts from this blog

ABSTRACT submited