I've been pondering recently the idea of a library of statistical and heuristic functions that run inside MongoDB using the aggregation Pipeline. After all if we can avoid pulling data out of the database that must help performance. As a little experiment, here is the correlation co-efficient of two fields using Pearsons Rho. It's broken down into individual variables to make it easier to read rather than a huge piece of javascript. That's usually the best way to write pipelines.

**//Pearsons Rho as a pipeline**

**testdata = [{x:1,y:2},****{x:2,y:3},****{x:3,y:6},****{x:4,y:8}]****db=db.getSiblingDB("stats")****db.pearsons.drop();****db.pearsons.insertMany(testdata)**

**x = "$x"****y = "$y"**

**//This is a pipeline stage****sumcolumns = { $group : { _id: true,****count: { $sum: 1 },****sumx : { $sum : x},****sumy : { $sum : y},****sumxsquared : { $sum : { $multiply : [x,x] } },****sumysquared : { $sum : { $multiply : [y,y] } },****sumxy : { $sum : { $multiply : [x,y] } }****}}**

**//This is building a pipeline stage from objects****multiply_sumx_sumy = { $multiply : [ "$sumx","$sumy"] }****multiply_sumxy_count = { $multiply : ["$sumxy","$count"]}****partone = { $subtract : [ multiply_sumxy_count, multiply_sumx_sumy ]}**

**multiply_sumxsquared_count = { $multiply : ["$sumxsquared","$count"]}****sumx_squared = { $multiply : ["$sumx","$sumx"]}****subparttwo = { $subtract : [ multiply_sumxsquared_count,sumx_squared ]}**

**multiply_sumysquared_count = { $multiply : ["$sumysquared","$count"]}****sumy_squared = { $multiply : ["$sumy","$sumy"]}****subpartthree = { $subtract : [ multiply_sumysquared_count,sumy_squared ]}**

**parttwo = { $sqrt : {$multiply : [ subparttwo,subpartthree ]}}**

**//Glue it all together****rho = {$project : { rho: {$divide : [partone,parttwo]}}}**

**pipeline = [sumcolumns,rho]****db.pearsons.aggregate(pipeline)**
## 2 comments:

Nice blog. Thanks for sharing such great information.Inwizards offers Mongo database services for our Mongodb Client. Start mongodb development with our skilled and experienced mongodb developers. Intrested click here - Mongo Database Services

Really nice blog post. provided a helpful information. I hope that you will post more updates like this Big Data Hadoop Online Course Bangalore

Post a Comment