btree

Why is my mongo query so slow?

Why's my mongodb query so slow? I got my geospatial collection set-up -- I am running some really great queries making sure that the locations I am pulling aren't in any sort of cache, and I am just blown-away by how fast data is being returned.

The problem is:  when I query the collection to pull up the requisite lon/lat data by name:  city & state, or city & country, the query seems to take seconds to complete!

I set-up the table correctly...I indexed the crap out of all my columns...a week or two ago, I was at the mongoSV 2011 in Santa Clara and learned some really cool stuff about queries, indexing, and performance management, so let's dig-out the notes and see where I went wrong.  Because I strongly doubt that the problem is in mongo but, rather as we used to say in technical support: this is a PBCK issue...

The first thing I want to do is run an explain against my query so I can see mongo's query plan for my query.  This should provide me with a starting point for trying to figure out what went wrong.

> db.geodata_geo.find({ cityName : "Anniston", stateName : "Alabama" }).explain();

By adding the trailing function: .explain(), I'm requesting that mongoDB return the query-plan to me instead of executing the query.  I hit enter to launch the explain() and get back the following output:

> db.geodata_geo.find({ cityName : "Anniston", stateName : "Alabama"}).explain(); { "cursor" : "BasicCursor", "nscanned" : 3691723, "nscannedObjects" : 3691723, "n" : 1, "millis" : 2269, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : {

} }

The important information, I bold-faced in the query output (above).   What this output is telling me is that I've using a "BasicCursor" for my search cursor -- which is indicates that, yes, I am doing a table-scan on the collection.  So, already I know my query is not optimal.  But, wait!  More good news...

The value for nscanned and nscannedObjects is the same: 3,691,723 -- which coincidently is the same as the cardinality of the collection.  This number is the number of documents scanned to satisfy the query which, given it's value, confirms that I am doing a full table scan.

millis tells me the number of milliseconds that the query would take:  2.269 seconds:  way too slow for my back-end methods() serving a REST API -- unacceptable.

And then we get to the tell:  IndexOnly tells me that if the query could have been resolved by an (existing) covering index.  Seeing the value false here tells me that the collection has no index on the columns I am scanning against.

What?!?  I know I indexed this collection...

So, I run db.geodata_geo.getIndexes() to dump my indexes and ... I ... don't see my name columns indexed.  Oh, I remembered to index the the ID and Code columns...but somehow, indexing the Name columns completely slipped past my lower brain-pan.

I add these indexes to my collection:

> db.geodata_geo.ensureIndex({ cityName : 1 }); > db.geodata_geo.ensureIndex({ stateName : 1 });

And then I rerun the query plan and see the following output:

> db.geodata_geo.find({ cityName : "Anniston", stateName : "Alabama"}).explain(); { "cursor" : "BtreeCursor cityName_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 101, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "cityName" : [ [ "Anniston", "Anniston" ] ] } }

Instead of BasicCursor, I see BtreeCursor which gives me a happy.  I also see that the nscanned and nscannedObjects values are now more realistic...seriously:  2 is a LOT better than 3.6 million something, right?  Another happy for me!

I score the third happy when I see that the millis has dropped down to 101:  0.101 seconds to execute this search/query!  Not jaw-dropping, I agree -- but acceptable considering that everything is running off my laptop...I know production times will be much, much lower.

 

In the end, I learned that a simple tool like .explain() can tell me where my attention is needed when it comes to optimization and fixing even simple, seemingly innocent queries.  Knowing what you're looking at, and what you're looking for, is pretty much thick-end of the baseball bat when it comes to crushing one out of the park.

I hope this helps!

 

Reference Link:  Explain

O'Reilly...and my timestamp index problem...

Just a quick jot - I'm headed to the O'Reilly MySQL conference, April 12-14 in Santa Clara for three joyous days filled with database geekiness. There's a lot of diversity in my schedule for the conference - I'm focusing mainly on performance tuning for mySQL, cloud deployment, replication and sharding, and a shallow dip into noSQL strategies.

Here's the current problem I'm working on:

For any given table, (I don't believe that the ddl matters), say you have a timestamp field which is indexed and referenced in a query.  The table is Innodb so the timestamp field can only be a btree index.  Which means that the index is stored in ascending order.

In your query, your conditional reads something to the effect of (where timestamp(now) > timestamp(now) - 5 minutes)  (it's psuedo-code, ok?)

What I expect to happen is that as the query is calculating the conditional border value and then doing an index-scan on the btree-index, examining each index value (row) until the current timestamp exceeds the calculated value.

What I'm seeing (in the slow-query log, among other places), is that the query is executing as an index-scan, but is scanning all the rows of the index, essentially doing a full table scan.

The query is calculated and stored as a variable passed to mysql_query() -- so the parser is determining the now()-5 minutes value and is passing it off as a string constant...

Anyone have any suggestions, I'd be most happy.  This lil' kitteh turd has filled my slow-query litter box....