Some Tally Concepts

This guide outlines the idea of Tally and the flow of elements and the paramters available in Tally. It goes over distinction

The Tallying Schema

This is a schema as passed when intializing a cluster/table for a particular dataset. You may modify this schema as necessary. Queries are generated automatically based on the flags that are present.

The total count and sum of values is tracked by default, unless flagged for Distinct values, where each value is maintained.

Keep in mind, that the totals change every day. You may use our save state option to take snapshots of your aggregates over time, or you may read your set everyday, and keep track of state on your own.

[
    { 
        "attrName":"City",
          "type":"string",
          "isDistinct":true
    },
       { 
        "attrName":"Temperature",
          "type":"integer",
          "isDistinct":false
    },
    { 
        "attrName":"Forecast",
          "type":"string",
          "isDistinct":true,
          "relate":["City"]
    }

]

Breaking it Down:

Attribute Name	Meaning
`attrName`	This identifies the "column" or attribute that is present in the schema
`type`	This will let us know the type for the actual data value
`isDisinct`	The isDistinct Flag let's us keep track of individual elements when necessary. If elements repeat, how many times they repeat is also kept. Read below for more details
`relate`	An array with attribute names to keep a running aggregate of the `count` of one attribute in relation to another. Useful to see metrics such as Number of Sunny Days in Seattle

Note:

In the future, we will expand our relate functions such that they may hold more aggregate functions more than just counting. For example, this might mean relating such that a set of integers, or averages might be held instead of a count. Eg. The average temperatures in a relation over time.

How Disinction Works

When the Distinct flag is used, we keep track in array like so:

[
    {
        "Value": "Seattle",
     "Count": 11
    },
    {
        "Value": "Dallas",
        "Count": 11
    }
]

Disctinct values are indexed to minimize runtime and keep track of each Distinct Value and their count.

How Relation Works

Relating an attribute to another, would mean that an attribute will be kept in reference to the other. Of course, the more relations, the higher the costs. Remember, since data is not permanent, once data leaves your table, it will no longer be possible to relate data points after data is no longer being stored. This is how a relation will be stored for your retrieval later:

[
    {
        "Value": "Sunny",
     "Related": {
         "Seattle": 3,
         "Dallas": 9
     }
    },
    {
        "Value": "Cloudy",
        "Related": {
         "Seattle": 8,
         "Dallas": 2
     }
    }
]

Relations are useful when comparing datasets.