Posts
Running totals using windowed functions
[Category: Programming and Databases] [link] [Date: 2013-08-14 17:39:10]
Many modern databases support windowed aggregate functions. One of the most useful applications of these are computing running totals for either entire data sets or by a sub-grouping. On older database systems, one was reduced to while loops, cursors, CTE's, or perhaps very inefficient nested select queries to generate running totals. Thankfully, there are better options now. The below example is in PostgreSQL (supported 8.4+). A similar strategy works in Microsoft SQL but only in 2012+.
-- create data drop table if exists pricedata; create temp table pricedata (category text,item text,price float8); insert into pricedata (category,item,price) values ('Fishing','Tackle Box',20) ,('Camping','Sleeping Bag',50) ,('Baseball','Bat',40) ,('Fishing','Rod',100) ,('Fishing','Bait',5) ,('Camping','Stove',30); -- use windowed aggregate function with over () select category,item ,sum(price) over (partition by category order by item) totalprice from pricedata;
We get a logical result of:
category item totalprice Baseball Bat 40 Camping Sleeping Bag 50 Camping Stove 80 Fishing Bait 5 Fishing Rod 105 Fishing Tackle Box 125
It's fast, convenient, and pretty powerful.
After listening to the song "DEA" by The American Dollar on repeat for the last few weeks, I noticed that it would fit very well as music to one of my favorite montage videos of all time: a BBC Planet Earth montage promotional video that originally features "Hoppipolla" by Sigur Ros. I hadn't done any video editing in quite some time and enjoyed the opportunity to combine two of my favorite things. After extracting the clips and a few evenings of editing, I was satisfied with the result. I did, however, run out of clips near the end (I was mostly shortening them from their original length).
Using a recursive CTE to traverse a simple directed graph
[Category: Programming and Databases] [link] [Date: 2013-07-26 18:28:53]
Many modern databases support the use of Common Table Expressions (CTE). Numerous times these have saved me the effort of writing code outside of the database when attempting to do basic routing, tree walking, or graph traversal. I, personally, find the CTE syntax somewhat difficult to follow. There are numerous examples online already showing CTE usage, but I thought I would throw my hat into the ring with basic example showing an interaction with a simple directed graph in PostgreSQL.
Suppose there is a directed graph, like the one seen below, and we want to walk the path from the node with id=3 to the node where id=0. The representation of the simple directed graph as a table gives us each id and a parentid.
We write a CTE like below
-- create data drop table if exists relations; create temp table relations (id int4, parentid int4); insert into relations (id,parentid) values (0,-1), (1,0), (2,1), (3,2), (4,2), (5,0); -- walk graph with recursive link(id,parentid) as ( select id,parentid from relations where id=3 union all select r.id,r.parentid from relations r, link l where l.parentid=r.id and r.parentid not in (-1) ) select id,parentid from link;
When the above code sample is run, it returns
id parentid 3 2 2 1 1 0
So how does this wizardry work? Notice the union all statement within the recursive statement. The part before the union all is the "anchor". This defines the starting point for the CTE. Think of it like the first record (we start at id=3 with parentid=2). If this query returns more than one record, each record will be evaluated recursively, and it may be necessary to include another field to be able to separate the resulting records. The lower query is where the magic happens. The statement should call itself again.
To invoke the recursion, simply call the recursive statement. This is what the last select statement does. In this case "link" is the name of the recursive statement. Make sure that the recursive statement eventually returns no records, or else it will run forever. If you have a non-directed graph with loops, you will need to store visited locations to avoid such problems.