Skip to content

Instantly share code, notes, and snippets.

@JoshReedSchramm
Last active December 26, 2015 11:39
Show Gist options
  • Save JoshReedSchramm/7145255 to your computer and use it in GitHub Desktop.
Save JoshReedSchramm/7145255 to your computer and use it in GitHub Desktop.
Mongo index issue I can't figure out
I have a pretty complicated query. It looks like this -
db.articles.find({"online_product_id": ObjectId("4e8910b27fd0dc66ef000006"), "draft": { "$ne":true }, "approval_status":"approved", "publish_time": { "$lte":ISODate("2013-10-24T16:45:35.000Z")}, "$or": [{"section":{"$in":["weekender"]}}, {"labels.identifier":{"$in":["weekender", "news local"]}}]}).sort({"publish_time":-1,"rank":1}).hint("article_lookup")
running that on my 2million record articles collection takes about 2 seconds. I added the following index:
{
"v" : 1,
"key" : {
"online_product_id" : 1,
"section" : 1,
"labels.identifier" : 1,
"draft" : 1,
"approval_status" : 1,
"publish_time" : -1
},
"ns" : "articlesite_development.articles",
"name" : "article_lookup"
}
even with that index it's pretty slow but if i get rid of the sort and run this instead:
db.articles.find({"online_product_id": ObjectId("4e8910b27fd0dc66ef000006"), "draft": { "$ne":true }, "approval_status":"approved", "publish_time": { "$lte":ISODate("2013-10-24T16:45:35.000Z")}, "$or": [{"section":{"$in":["weekender"]}}, {"labels.identifier":{"$in":["weekender", "news local"]}}]}).hint("article_lookup")
it's really fast < 300ms.
If i get rid of the second clause of the $OR but leave the sort and instead run this:
db.articles.find({"online_product_id": ObjectId("4e8910b27fd0dc66ef000006"), "draft": { "$ne":true }, "approval_status":"approved", "publish_time": { "$lte":ISODate("2013-10-24T16:45:35.000Z")}, "$or": [{"section":{"$in":["weekender"]}}]}).hint("article_lookup")
It's also really fast. Same results if i switch which clause of the or i include and put back the labels.identifier but eliminate the section.
So 2 questions.
1) Why does the sort break my index?
2) If you noticed I'm explicitley using hint() in the queries. That's because without it mongo doesn't use the right index for the query. Any idea why it's not picking the right index?
Thanks to anyone who sees this!
EDIT: Posting explains for the slow query and one of the quicker ones.
EXPLAIN FOR - db.articles.find({"online_product_id": ObjectId("4e8910b27fd0dc66ef000006"), "draft": { "$ne":true }, "approval_status":"approved", "publish_time": { "$lte":ISODate("2013-10-24T16:45:35.000Z")}, "$or": [{"section":{"$in":["weekender"]}}, {"labels.identifier":{"$in":["weekender", "news local"]}}]}).sort({"publish_time":-1,"rank":1}).hint("article_lookup")
{
"cursor" : "BtreeCursor article_lookup multi",
"isMultiKey" : true,
"n" : 254,
"nscannedObjects" : 195182,
"nscanned" : 195188,
"nscannedObjectsAllPlans" : 195182,
"nscannedAllPlans" : 195188,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1682,
"indexBounds" : {
"online_product_id" : [
[
ObjectId("4e8910b27fd0dc66ef000006"),
ObjectId("4e8910b27fd0dc66ef000006")
]
],
"section" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"labels.identifier" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"draft" : [
[
{
"$minElement" : 1
},
true
],
[
true,
{
"$maxElement" : 1
}
]
],
"approval_status" : [
[
"approved",
"approved"
]
],
"publish_time" : [
[
ISODate("2013-10-24T16:45:35Z"),
true
]
]
},
"server" : "Joshs-MacBook-Pro.local:27017"
}
EXPLAIN FOR - db.articles.find({"online_product_id": ObjectId("4e8910b27fd0dc66ef000006"), "draft": { "$ne":true }, "approval_status":"approved", "publish_time": { "$lte":ISODate("2013-10-24T16:45:35.000Z")}, "$or": [{"section":{"$in":["weekender"]}}]}).hint("article_lookup")
{
"clauses" : [
{
"cursor" : "BtreeCursor article_lookup multi",
"isMultiKey" : true,
"n" : 251,
"nscannedObjects" : 265,
"nscanned" : 265,
"nscannedObjectsAllPlans" : 265,
"nscannedAllPlans" : 265,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 3,
"indexBounds" : {
"online_product_id" : [
[
ObjectId("4e8910b27fd0dc66ef000006"),
ObjectId("4e8910b27fd0dc66ef000006")
]
],
"section" : [
[
"weekender",
"weekender"
]
],
"labels.identifier" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"draft" : [
[
{
"$minElement" : 1
},
true
],
[
true,
{
"$maxElement" : 1
}
]
],
"approval_status" : [
[
"approved",
"approved"
]
],
"publish_time" : [
[
ISODate("2013-10-24T16:45:35Z"),
true
]
]
}
},
{
"cursor" : "BtreeCursor article_lookup multi",
"isMultiKey" : true,
"n" : 3,
"nscannedObjects" : 4,
"nscanned" : 50,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 50,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"online_product_id" : [
[
ObjectId("4e8910b27fd0dc66ef000006"),
ObjectId("4e8910b27fd0dc66ef000006")
]
],
"section" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"labels.identifier" : [
[
"news local",
"news local"
],
[
"weekender",
"weekender"
]
],
"draft" : [
[
{
"$minElement" : 1
},
true
],
[
true,
{
"$maxElement" : 1
}
]
],
"approval_status" : [
[
"approved",
"approved"
]
],
"publish_time" : [
[
ISODate("2013-10-24T16:45:35Z"),
true
]
]
}
}
],
"n" : 254,
"nscannedObjects" : 269,
"nscanned" : 315,
"nscannedObjectsAllPlans" : 269,
"nscannedAllPlans" : 315,
"millis" : 4,
"server" : "Joshs-MacBook-Pro.local:27017"
}
I also forgot that I tried it with one or clause removed and the sort added back in, that was fast.
db.articles.find({"online_product_id": ObjectId("4e8910b27fd0dc66ef000006"), "draft": { "$ne":true }, "approval_status":"approved", "publish_time": { "$lte":ISODate("2013-10-24T16:45:35.000Z")}, "$or": [{"section":{"$in":["weekender"]}}]}).sort({"publish_time":-1,"rank":1}).hint("article_lookup").explain()
{
"cursor" : "BtreeCursor article_lookup multi",
"isMultiKey" : true,
"n" : 251,
"nscannedObjects" : 265,
"nscanned" : 265,
"nscannedObjectsAllPlans" : 265,
"nscannedAllPlans" : 265,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 7,
"indexBounds" : {
"online_product_id" : [
[
ObjectId("4e8910b27fd0dc66ef000006"),
ObjectId("4e8910b27fd0dc66ef000006")
]
],
"section" : [
[
"weekender",
"weekender"
]
],
"labels.identifier" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"draft" : [
[
{
"$minElement" : 1
},
true
],
[
true,
{
"$maxElement" : 1
}
]
],
"approval_status" : [
[
"approved",
"approved"
]
],
"publish_time" : [
[
ISODate("2013-10-24T16:45:35Z"),
true
]
]
},
"server" : "Joshs-MacBook-Pro.local:27017"
}
@esedor
Copy link

esedor commented Oct 24, 2013

Hi Josh,

The explain plans for all of these options should confirm that the index bounds aren't being properly applied when you use the $or and the $sort together. You will see more uses of $minElement,$maxElement in the indexBounds section than you will for either of the faster versions. This is probably due to https://jira.mongodb.org/browse/SERVER-1205.

Chances are it avoids the "right" index without the hint because when the query optimizer runs its race to determine which index to use, the "right" index is being affected by SERVER-1205 and just doesn't return as fast as other options, but I'd need to see the explains to be sure.

Sincerely,
Eric@MongoLab

@JoshReedSchramm
Copy link
Author

Hey Eric,

Thanks. I saw that bug in jira and suspected that may be it but was hoping I could find a work around.

Do you know if it's possible to union to results or something to achieve a similar situation? My main problem is that OR clause reflects a migration is strategy. So old data has the section field set, new records are using a label system so either I need to migrate the old records which could take a little while to run, or i need to have the backward compatibility in the query.

If there's no union or anything I'll just have to figure out a migration strategy, was trying to avoid it though.

@esedor
Copy link

esedor commented Oct 24, 2013

Thanks for the additional info, yes, it is definitely the case that indexBounds are no good for the first query, and easy to see--from the differences in the explained indexBounds between the three queries--why the index would not be chosen without a hint.

As far as workarounds, I am sad to say I don't have a clever out handy. I assume all my suggestions there would be variants of what you're already thinking about. At a high level, for the query to be performant:

Either the sort must be removed (or performed client-side) and you should use two indexes ($ors can use more than one):

{
"online_product_id" : 1,
"section" : 1,
"draft" : 1,
"approval_status" : 1,
"publish_time" : -1
}
+
{
"online_product_id" : 1,
"labels.identifier" : 1,
"draft" : 1,
"approval_status" : 1,
"publish_time" : -1
}

Or the $or must be removed, either by completing the migration you mention or by performing some other trick of data.

@JoshReedSchramm
Copy link
Author

bummer. thanks for the help though. i'll try looking at 2 indexes and if that doesn't work i'll investigate a migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment