Text 850, 221 rader
Skriven 2004-09-26 12:31:46 av Ellen K. (1:379/45)
Kommentar till text 849 av Gary Britt (1:379/45)
Ärende: Re: need algorithm
==========================
From: Ellen K. <72322.enno.esspeeayem.1016@compuserve.com>
No formula. You construct the dimensions as part of creating the data
warehouse, the only important thing about the key is that it's numeric. So you
can just make the key column an "identity" column, which means the database
automically assigns the numbers in numerical order as you populate the
dimension. What I did for all the dimensions except for
the market basket (unique combinations of items on sales transactions) was to
programmatically create each dimension without a key, then drop the results
into the respective real dimension table which assigned the key values
automatically. (The market basket one will be created from
the actual sales transactions as time goes by because prepopulating every
possible combination would result in a row count higher than I think any RDBMS
can go -> the algorithm for number of unique combinations of a set of N items
is 2 ^ N, so 31 items already gets you to over 2 billion, whereas the real-life
number of rows can never exceed the number of sales transactions.)
The data cube is not made up of spreadsheets although that would be a good way
to think of it if the cube only had 3 dimensions. Then it would be like a
workbook of related spreadsheets, e.g. one row per item, one column per
time-period, one spreadsheet per store, where all the rows and columns were the
same. In fact the Lotus workbooks were a
three-dimensional array, you could drill through the same cell on all the
pages, while an Excel workbook is a collection of unrelated two-dimensional
arrays -- you can still drill through but you have to do it yourself, in Lotus
it was built in.
A cube under the hood is a multi-dimensional array with typically many
more than 3 dimensions -- which most people (myself included) can't visualize.
(Which is why it is a Good Thing that there is a special
language for creating and dealing with cubes, trying to do it low-level would
probably result in screwing it up.) In fact I remember reading
an article one time by somebody who said if you're debugging someone else's
code and it's anything other than a data warehouse, an array with more than 3
dimensions probably means the person didn't understand what they were doing!
I had to take over something from another guy one
time where he had created a 5-dimensional array... I figured out what he really
meant was a 2-dimensional array with 5 slots in the second dimension and made
him redo it
On Sun, 26 Sep 2004 14:59:42 -0400, "Gary Britt" <garyb@nospamforme.com> wrote
in message <41570fb4@w3.nls.net>:
>OK thanks for the additional information. Do you use a formula of some kind
>to determine the key value for any particular row of info? (I understand
>that the formula or whatever would have to be designed with the particular
>kind of info in mind) Or is the key value basically a reference to a
>spreadsheet within the data cube?
>
>Thanks,
>
>Gary
>
>"Ellen K." <72322.enno.esspeeayem.1016@compuserve.com> wrote in message
>news:beocl05t5o3rj1d57rmja7ijo0a23o255o@4ax.com...
>> Let's stick with the time dimension. You are scrolling through the
>> month's sales and decide you want to see them by day of week, the
>> database already knows how to show you that without you having to
>> formulate a query to get there. Or you are looking at a report of
>> sales by store by month and November looks crummy and you want to know
>> why so you drill down to sales by week -- again, the database already
>> knows this, you don't need a separate query.
>>
>> How the numeric key value helps is twofold -- first of all that one key
>> value already "knows" the day, week, month, year, etc. Secondly,
>> numeric values are processed much faster than textual ones.
>>
>>
>> On Fri, 24 Sep 2004 20:13:04 -0400, "Gary Britt" <garyb@nospamforme.com>
>> wrote in message <4154b610@w3.nls.net>:
>>
>> >Glad you got it figured out.
>> >
>> >??
>> >> When the
>> >> transactional data are transformed the numeric key value for each
>> >> dimension applicable to each measure is substituted for the actual
>value
>> >> or values it represents. The point of all this is to make online
>> >> analysis really fast.
>> >??
>> >
>> >It seems like you are saying you are building tables that make
>consolidation
>> >of data in various ways a bit easier; like pulling up sales of a
>particular
>> >product by week, month, year to date. etc. Is that correct?
>> >
>> >I'm not clear on how what you've described as the dimension value
>applicable
>> >to each measure point and is substituted for the actual value/values
>helps
>> >in this process?
>> >
>> >Just curious,
>> >
>> >Gary
>> >
>> >
>> >"Ellen K." <72322.enno.esspeeayem.1016@compuserve.com> wrote in message
>> >news:tp19l0dt8vi73kfq3cj27s370gf5hku92k@4ax.com...
>> >> This is for a data warehouse, not a transactional database. The
>> >> structure is completely different.
>> >>
>> >> A data warehouse holds two kinds of information, measures and
>> >> dimensions. Measures are the information we want to know about, and
>> >> dimensions are what we want to know about it. So for example, sales
>is
>> >> a measure, product is a dimension.
>> >>
>> >> The dimension tables have one row per possible atomic level of
>> >> information. So a time dimension (the easiest to understand) has one
>> >> row per calendar date, and columns for month and year (date-month-year
>> >> would be the granularity hierarchy) and also day-of-week and maybe
>> >> week-of-year and maybe a "holiday" flag (non-hierarchical attributes).
>> >> Each row in the dimension has a numeric key.
>> >>
>> >> Every however-often (mine will be done nightly), data are extracted
>from
>> >> the transactional databases, transformed to the OLAP (online analytical
>> >> processing) format, and loaded into the data warehouse tables. When
>the
>> >> transactional data are transformed the numeric key value for each
>> >> dimension applicable to each measure is substituted for the actual
>value
>> >> or values it represents. The point of all this is to make online
>> >> analysis really fast.
>> >>
>> >> In the data warehouse database usually "cubes" are created. Under the
>> >> hood these are predefined multi-dimensional arrays. Then the front end
>> >> allows the user to automatically drill down and/or roll up
>> >> interactively. If you build the data warehouse in SQL Server, users
>> >> can point their Excel at a cube directly and automatically be able to
>> >> create their own pivot tables off it.
>> >>
>> >> On Fri, 24 Sep 2004 11:42:38 -0400, "Gary Britt"
><garyb@nospamforme.com>
>> >> wrote in message <41543e68$1@w3.nls.net>:
>> >>
>> >> >Ellen, can you tell me the purpose of the table you are trying to
>build.
>> >> >Are you trying to just have a lookup table that validates if
>information
>> >> >entered is within an acceptable range? In other words can you
>describe
>> >what
>> >> >is the basic transaction for which this table you are constructing
>will
>> >be
>> >> >used?
>> >> >
>> >> >Thanks,
>> >> >
>> >> >Gary
>> >> >
>> >> >"Ellen K" <Ellen.K@harborwebs.com> wrote in message
>> >> >news:908654.36bcab@harborwebs.com...
>> >> >> I need a mathematical algorithm as follows:
>> >> >>
>> >> >> For the risk dimension of my data warehouse I need one row for every
>> >> >possible
>> >> >> combination of down payment amount, down payment percentage, and
>term
>> >in
>> >> >> months. I created a working table by means of a Cartesian product.
>> >> >Then I
>> >> >> deleted all the rows where down payment was 0 but down payment
>> >percentage
>> >> >was
>> >> >> not 0, and also deleted all the rows where down payment percentage
>was
>> >0
>> >> >but
>> >> >> down payment dollars was not 0. I still have close to 13 million
>rows
>> >> >and
>> >> >> would just as soon reduce the size further by removing rows
>> >representing
>> >> >other
>> >> >> impossible combinations... so I need an algorithm for defining the
>> >> >impossible
>> >> >> combinations. For example, if $5000 is the highest sales price for
>a
>> >> >single
>> >> >> transaction, then $5000 can't be, say, a 1% down payment. Note
>that
>> >> >there is
>> >> >> an additional complication in that down payment percentage is
>> >calculated
>> >> >> against the sales price, not against the total transaction, which is
>> >> >comprised
>> >> >> of sales price + shipping + interest + sometimes sales tax, so that
>a
>> >> >$1000
>> >> >> sales price might be an $1100 total transaction... in which case a
>$200
>> >> >down
>> >> >> payment would be considered to be 20%, and if they pay cash, their
>> >"down
>> >> >> payment" is 110%.
>> >> >>
>> >> >> So far I think I see that "impossible" only applies to the higher
>> >dollar
>> >> >down
>> >> >> payments, IOW a $100 "down payment" could theoretically be 100% of a
>> >$100
>> >> >sale.
>> >> >> It seems like the algorithm must be some kind of percentage
>> >calculation
>> >> >of
>> >> >> down payment amount vs the maximum sales price... ?
>> >> >>
>> >> >> Any light anyone can shed will be greatly appreciated. :)
>> >> >
>> >>
>> >
>>
>
--- BBBS/NT v4.01 Flag-5
* Origin: Barktopia BBS Site http://HarborWebs.com:8081 (1:379/45)
|