mongodb

mongod refuses to start - no errors in log file

mongod refuses to start - no errors in log file

"What I was seeing was that, when I started the demon, I would get a success message (mongod start/running, process xxxxxx) but there would be no mongod process running.  Additionally, even though I have my (development) logging set to the highest levels, not one error message appeared in the log to provide a hint as to why the service failed to start."

Can't Connect to mongo 28017 from remote host...

Admittedly, it's been a long, long time since I've had to do a fresh install of mongodb...I am in the process of setting up a couple of mongo servers behind my firewall to use for cluster testings.  Ancient PC's.  AMD Athalon class.  One even has a floppy disk installed. Anyway, once I had the OS installed (Ubuntu 12.10 server) and all the various packages, including mongodb, added to the system, I wanted to access the mongodb from another machine on my network but for the life of me I couldn't connect to the default port of 28017.

Oh, I could connect from localhost using wget without problem.

netstat -a | grep -i listen

Showed port *:28017 in listen mode so no problem there...

I even added the port via iptables to the firewall rules:

iptables -A INPUT -p tcp -d 0/0 -s 0/0 --dport 28017 -j ACCEPT

But I still couldn't connect.

I started to browse /etc/mongodb.conf file looking for a configuration setting that may prevent me from accessing remotely and there is was:

bind-ip = 127.0.0.1

Since I don't have concerns about security on my private network, I commented out this line and restarted mongo services.

(side note:  you don't want to do this on a production server - instead, use a comma-separated list of ip-addresses to specifically authorize which remote hosts you will permit to connect to your mongo server.)

Worked!  Full access from within my network to mongod!

Hope this helps!

Quick Update...

Sorry this blog has been inactive for so long but I've been really, really busy with work, and my move to Puerto Nuevo, Mexico in northern BC. I am thinking about putting together a series of posts that detail how to set-up a data-processing stack, in PHP, for mongodb that allows you to dynamically generate all CRUD queries via the class stack.

The front-end interface, to this stack, is through RabbitMQ -- also written in PHP -- which eliminates Apache from the  LAMP stack, and no longer requires a REST interface for transferring data requests to-and-from store.

The stack includes services such as auditing, registration for public-facing requests, memcached and membase support, error-logging, and internal checks on requests that prevent things like query generation that result in full-table scans or any searches on un-indexed columns within either mongodb or mysql. (I think I still remember how to code for mysql... :) )

Anyway, this project has been all-consuming for me for the past year and the concept of generalizing the stack for instructional purposes has been rattling around in my can now, looking for a way out, for quite some time.  It's not like there's a plethora of PHP-based RabbitMQ tutorials out there either.

So, that's the happs.  Now that things are settling down a bit, I'll try to get more information out.

Thank you for checking-in!

Renaming mongodb Columns

Today I was putzing around in the geo-spatial collection when I noticed that I had an unhappy over one of the column names within the collection.

In the mySQL world, changing a column name is pretty straight-forward courtesy of the alter table command.

Mongo...not so much...

<BEGIN_UNRELATED_SIDE_RANT>

The Mongo documentation is normally the first place most of us go when we're looking for help in using our favorite noSQL database.

Why?

Well...because that's usually where Google directs us to go and also because there just isn't a whole lot of documentation out there on the subject to begin with.

The mongo (10gen) documentation is pretty good.  It's not, however, excellent.  And I can articulate the reason why.

It's pretty easy to identify documentation written by engineers as opposed to documentation written by everyone else (on the planet).  And not because of technical content or the (ab)use of really big and impressive-sounding jargon.

No - it's because most engineering-authored documents are written using a solution-based voice instead of a problem-based voice.

Think about it:  when I have to go to the man-page for help, it's because I have a problem.  If I had a solution, I would be writing a blog post.    But since I have a problem, I need the man-pages, online docs, whatever, to help me figure-out a solution.

Engineering documents are written from a solution perspective:  the document assumes you possess some bit of arcane lore (which is probably just exactly that little bit of lore that you're missing which has caused your trip to the documentation vault) and everything that is explained within the document all hinges on this piece of knowledge which the author, albeit with the finest of intentions, assumes is already firmly in your mental possession.

And that's why I usually don't like 10gen's documentation.  But, like I said earlier, it's the only game in (Google)town.

<END_UNRELATED_SIDE_RANT>

In mongo, to change the name of a column within a collection, you first have to be on a release of mongodb 1.7.2 or later.  Since most of us bleeding-edge, early-adopter types are all 2.x versioned, this shouldn't be an issue.

This page from 10Gen is the update page and, within, talked about the $rename modifier to the update command.  What the section doesn't say, because it's assuming you're wanting to update records and not schema, is how to apply a change to all of the records in your collection.

In my case, I have a column-name which I fat-fingered the name right out it's camel-case:  CountryID instead of countryID.  (And, yes, OCD-peeps, I know that it's not strictly camelCase, thank-you!)  I want to spin through all 3.7 million rows in my collection and rename this column...

[codesyntax lang="javascript" lines="no"]


> db.geodata_geo.update( {} , { $rename : { 'CountryID' : 'countryID' }}, true, true );

[/codesyntax]

So what we have here is the update command to the collection (geodata_geo) and four parameters:

  1. {} -- the empty set (this is what's missing from the 10gen doc) implying to do whatever to each record in the collection
  2. $rename -- the modifier to the update command which, in this case: replace 'CountryID' with 'countryID'
  3. false -- indicates to allow upserts if the record does not exist
  4. true -- multi option:  means to apply command to all records since, by default, the update() quits after updating the first record

And I run this command and mongo goes off (whirr...whirr ... I have two-node replication...) and renames the column in my collection!

What it didn't do was update my index. 

So, after my column-renaming completed, I needed to drop the index(es) that had 'CountryID' as members and re-index the collection to reflect the new column name.

Executing getIndexes() confirmed that my mongo world was back in it's correct orbit and life, once again, was good.

Why is my mongo query so slow?

Why's my mongodb query so slow? I got my geospatial collection set-up -- I am running some really great queries making sure that the locations I am pulling aren't in any sort of cache, and I am just blown-away by how fast data is being returned.

The problem is:  when I query the collection to pull up the requisite lon/lat data by name:  city & state, or city & country, the query seems to take seconds to complete!

I set-up the table correctly...I indexed the crap out of all my columns...a week or two ago, I was at the mongoSV 2011 in Santa Clara and learned some really cool stuff about queries, indexing, and performance management, so let's dig-out the notes and see where I went wrong.  Because I strongly doubt that the problem is in mongo but, rather as we used to say in technical support: this is a PBCK issue...

The first thing I want to do is run an explain against my query so I can see mongo's query plan for my query.  This should provide me with a starting point for trying to figure out what went wrong.

> db.geodata_geo.find({ cityName : "Anniston", stateName : "Alabama" }).explain();

By adding the trailing function: .explain(), I'm requesting that mongoDB return the query-plan to me instead of executing the query.  I hit enter to launch the explain() and get back the following output:

> db.geodata_geo.find({ cityName : "Anniston", stateName : "Alabama"}).explain(); { "cursor" : "BasicCursor", "nscanned" : 3691723, "nscannedObjects" : 3691723, "n" : 1, "millis" : 2269, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : {

} }

The important information, I bold-faced in the query output (above).   What this output is telling me is that I've using a "BasicCursor" for my search cursor -- which is indicates that, yes, I am doing a table-scan on the collection.  So, already I know my query is not optimal.  But, wait!  More good news...

The value for nscanned and nscannedObjects is the same: 3,691,723 -- which coincidently is the same as the cardinality of the collection.  This number is the number of documents scanned to satisfy the query which, given it's value, confirms that I am doing a full table scan.

millis tells me the number of milliseconds that the query would take:  2.269 seconds:  way too slow for my back-end methods() serving a REST API -- unacceptable.

And then we get to the tell:  IndexOnly tells me that if the query could have been resolved by an (existing) covering index.  Seeing the value false here tells me that the collection has no index on the columns I am scanning against.

What?!?  I know I indexed this collection...

So, I run db.geodata_geo.getIndexes() to dump my indexes and ... I ... don't see my name columns indexed.  Oh, I remembered to index the the ID and Code columns...but somehow, indexing the Name columns completely slipped past my lower brain-pan.

I add these indexes to my collection:

> db.geodata_geo.ensureIndex({ cityName : 1 }); > db.geodata_geo.ensureIndex({ stateName : 1 });

And then I rerun the query plan and see the following output:

> db.geodata_geo.find({ cityName : "Anniston", stateName : "Alabama"}).explain(); { "cursor" : "BtreeCursor cityName_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 101, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "cityName" : [ [ "Anniston", "Anniston" ] ] } }

Instead of BasicCursor, I see BtreeCursor which gives me a happy.  I also see that the nscanned and nscannedObjects values are now more realistic...seriously:  2 is a LOT better than 3.6 million something, right?  Another happy for me!

I score the third happy when I see that the millis has dropped down to 101:  0.101 seconds to execute this search/query!  Not jaw-dropping, I agree -- but acceptable considering that everything is running off my laptop...I know production times will be much, much lower.

 

In the end, I learned that a simple tool like .explain() can tell me where my attention is needed when it comes to optimization and fixing even simple, seemingly innocent queries.  Knowing what you're looking at, and what you're looking for, is pretty much thick-end of the baseball bat when it comes to crushing one out of the park.

I hope this helps!

 

Reference Link:  Explain