[引用] mongodb index

來源: http://blog.xuite.net/flyingidea/blog/68050501
mongoDB可以建立index以利未來的searching，在以後searching時如果是searching有建立index的欄位，那麼mongoDB就不會進去資料欄位，而是調出己經建好B tree的index來查詢。

跟據以上的定義，當以下條件滿足時，mongoDB不需要去進去database撈資料(covered indexes)：

1,如果一個searching只對index欄位搜索

2,如果一個searching只要求回傳建立好index的欄位

而index在tree中如果以下的情況會被更新：

1.如果index的值有更動

2.如果index所指向的那筆資料(record)因為update而需要re-allocate空間

需要注意的是，index的名稱長度不能超過128個char。

在預設的情況下，_id欄位一開始就己經是index，同時_id也是primary key，如果一筆資料沒有指定_id的值，那麼系統會配給他一個12-byte的值（資料型態為ObjectID）。

建立secondary indexes，各式各樣的indexes介紹

除了primary key，我們也可以透過ensureIndex()來自己指定secondary indexes。例如：

db.collection.ensureIndex({ "field": 1 })

"field": 1的1代表increasing order；如果改成"field": -1則會以decreasing的方式存放。這一點在輸出時如果要sort資料就會影響到效能。

db.collection.ensureIndex({ "product.quantity": 1 })

這種方法稱作Index on Embedded fields，值得一提的是：雖然他可以建立Index，但是我使用explain試驗的結果，他不能使用covered indexes的功能，不知是不是我的mongoDB是2.0.4的關係?

db.collection.ensureIndex({ "product": 1, "quantity": 1 })

這種方法稱作compound index，要注意的是compound index不能建hashed indexes。而這種方法也最多只能包含一個multikey(下面會介紹)，以範例的情況來說，就是product和quantity二者最多只能有一個是array。

而假設今天我們是用以下的方式來建立index

db.products.ensureIndex( { "item": 1, "location": 1, "stock": 1 } )

那麼以下的情況我們讀取資料時，系統不會進去index撈

db.products.find ( { "location": value, "stock": value } )

db.products.find ( { "stock": value } )

db.products.find ( { "location": value, } )

db.products.find ( { "item": value, "stock": value } )

看出來端倪了嗎？建立index可是有順序的。

而如果我們是用下面的方式來建立compound index

db.products.ensureIndex( { "item": 1, "location": -1 } )

那麼程式就會先把item照increasing order，如果item的值是一樣的再把location照decreasing order排。

甚至我們可以對array 建立index，例如以下的資料；

{ "_id" : ObjectId("..."),
"tags" : [ "weather", "hot", "record", "april" ] }

db.collection.ensureIndex({ "tags": 1 })

則我們稱之為multikey index，在此情況下tags裡的每一個值都會建立index。

假設我們有以下的資料：

{"_id": ObjectId(...)
"name": "John Doe"
"address": {
        "street": "Main"
        "zipcode": 53511
        "state": "WI"
        }
}

db.collection.ensureIndex({ "address": 1 })

則此情況稱作indexes on sub-documents，此時以下的二種情況都會進去index查詢：

db.factories.find( { metro: { city: "New York", state: "NY" } } );

db.factories.find( { metro: { $gte : { city: "New York" } } } );

db.collection.ensureIndex({ "product": "hashed" })

這就是mongoDB在2.4新增的功能之一：hashed indexes，他會對整個product欄位的值拿去做hash。要注意的是當在sub-document的情況下也可以做hashed indexes，在此情況下他會對sub-document所有的值拿去做hash、但是compound index和multi-key index就不能了。

hashed indexes可以拿來做equality queries，但是不能做大小比較。

db.collection.ensureIndex({ "place": "2d" })

Geospatial indexes。建立geospatial index的好處是可以用地理的方式來搜索。通常這種情況下，place會是array而值會像[ x , y ]。後面的參數有"2d","2dsphere"(2.4版才支援),"geoHaystack"這三種可以填，建好後你就可以使用$near或geoNear之類的指令。其他更詳細的介紹可以看這裡。

db.places.ensureIndex( { pos : "geoHaystack", type : 1 } , { bucketSize : 1 } )

geohaystack的特點是：在小範圍的搜尋很強大。bucketSize這個參數就是用來規範bucket indexes的密度。更詳細的介紹請看這裡。

最後我要介紹mongoDB在2.4裡新增的功能：text indexes。

要使用text indeses要在mongoDB裡輸入

db.adminCommand( { setParameter : 1, textSearchEnabled : true } )

或者是在terminal裡輸以下的參數來啟動mongo

mongod --setParameter textSearchEnabled=true

建立index的方式如下

db.collection.ensureIndex({ "place": "text" })

建立完indexes後，就能使用text的方式來search了

db.runCommand( "text", { search : "Australian" } )

注意：一個collection只能有一個text indexes，而且建text indexes也會花費較大的空間，而在2.4的mongoDB text indexes 部份沒有支援中文。

the attritubes of index

db.collection.ensureIndex({ "field": 1 },{unique : true})

以上的情況稱之為unique index，使用者必須要確保field裡的值沒有互相衝到，在default的情況下是false，除了_id例外。如果在該欄位沒有值的話，那麼系統會記為null，整個collection裡只能有一個null，如果有二個null就不是unique了。

另外，在compound index的情況下，則只要確保拿來建立index的所有欄位的值不會和其他人一樣就好了。例如：

db.collection.ensureIndex({ "field": 1 ,"value" : 1 },{unique : true})

則

({"_id": ObjectId(...),"field": "name", "value": "lastname"},
{"_id": ObjectId(...),"field": "name", "value": "middle"})

是允許的，因為field+value的值組合起來是不一樣的。

而unique index也不能和hashed index一起用。

如果在有重覆的值的欄位建立unique index，那麼就要使用drop indexes的功能了。此功能只會對第一個遇到的值建立index，接下來重覆出現的值就不會理他。指令如下：

db.collection.ensureIndex({ "field": 1 },{unique : true , dropDups: true })

另外要很注意的是，dropDups會在刪掉沒被建立indexes的資料，他不是只有無視而已。

db.collection.ensureIndex({ "field": 1 },{sparse: true})

sparse index。在預設(預設sparse值是false)的情況下，如果collection中沒有field欄位，那麼會塞個null值當成起始；sparse indexes只會對collection中有field欄位的record建立index。

需要注意的是，這種方法可能會導制filtering或sorting沒有包含到整個document。

db.collection.ensureIndex({ "field": 1 },{background: true})

在預設的情況下，建立index會block住整個collectin。所以background這個參數的好處就是讓你可以一邊建indexes一邊便用mongo囉。

在mongoDB裡也可以對資料設定TTL，語法如下：

db.log.events.ensureIndex( { "status": 1 }, { expireAfterSeconds: 3600 } )

後面的3600單位是秒數，也就是時間超過status標定的一小時後，程式就會在背景砍掉他。他有以下的限制：

　1.要被建立的index型態(上面的例子即為status這一欄)必須是date的型態。

　2.不能支援compound indexes

　3.如果該欄位是array，那麼他只會看時間是最早的那一筆。(例如[2012/03/03 , 2013/03/03] 那麼他只會用2012那一筆去match)

　4.他不能已經被建立index。

　5.你不能建立在capped collection，因為capped collection不能被移除。

　6.collection的usePowerOf2Sizes會被set為true。這意味著會花費更多空間，但是DB的fragmentation會減少。

　7.TTL的background thread只會在primary下跑，所以secondaries是接收自primary的訊息來刪除。

注意和小提醒

．一個collection不能超過64個indexes。

．indexes或namespace必須要小於等於128 characters。

．在query時，如果有很多個$or則各個$or可以同步進行(excutes in parallel)並使用不同的indexes。但geospatial query不支援$or。

．請注意每當有資料被insert或delete時，indexes就會被update，所以當你的DB很常寫入資料時，在規劃indexes的欄位就要更小心。

．2.4版可以在background一口氣build很多個index。
2.4~2.2一次只能讓一個indexes在background indexes。(你可能有很多個indexes在build，但只能指定一個是background。
2.2版以前一個mongod一次只能build一個index。

．雖然是在背景執行，但是connection(或是session)還是要等執行完了才能斷。

．在背景建indexes不只會影響到在前景建index的process的速度，也會比在前景建index更耗時，如果index比可用的RAM還大的話，那麼他會更更更更慢。

．如果replica set的primary建完indexes，那麼他就會傳訊息給所有的secondaries叫他們建indexes。如果要建的 indexes很大，那麼建議使用standalone模式開啟secondary，建完後再重連進replica set，然後再依此類推直到所有的secondary都建完，repica set建立indexes可以看這裡。

．程式在建indexes時如果執行query，那麼qurey不會使用indexes；另外也不能執行針對collection的command (e.q.:drop collecion, compact, repairDatabase)

．如果程式在revocerying mode，那麼建立indexes一定會在前景執行。

．要check indexes可以用getIndexes()。

reference:

Index

ensureIndex

dropIndex

Create a text Index on a Multi-language Collection