Tags:
create new tag
, view all tags

Presentation: Intro to MongoDB - The Next-Generation Database, for Silicon Valley Perl, 2016-07-14

What is TWiki?
A leading open source enterprise wiki and web application platform used by 50,000 small businesses, many Fortune 500 companies, and millions of people.
MOVED TO... Learn more.
This is the presentation material for the talk on "Intro to MongoDB - The Next-Generation Database" at Silicon Valley Perl, 2016-07-14. TWiki founder Peter Thoeny prepared this talk for developers who want to learn about NoSQL databases and how they compare to RDBMS.

Presentation View the slides of this presentation.

    Copyright © 2016 by TWiki.org. This presentation may be reproduced as long as the copyright notice is retained and a link is provided back to http://twiki.org/.    

Start Presentation

Slide 1: Intro to MongoDB - The Next-Generation Database

mysql-vs-mongodb-900.png #Ref12

Presentation for Silicon Valley Perl, 2016-07-14

-- Peter Thoeny - @PeterThoeny - peter09[at]thoeny.org - TWiki.org

Slide 2: About Peter Thoeny

Slide 3: RDMBS - Relational Model

mysql-workbench-700.jpg    
&

eclipse-for-db.jpg

Slide 4: NoSQL - What NoSQL?

  • "A NoSQL (originally referring to "non SQL" or "non relational") database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases." ~ Wikipedia

nosql-dbs.jpg

Slide 5: RDBMS - Relational Record

relational-record.png
  • Two-dimensional storage
  • A field contains a single value
  • Query on any field
  • Very structured schema
  • Poor data locality requires many tables, joins & indexes

Slide 6: RDMBS App - Maintenance & Changes

  • Need to make changes in three places
rdbms-schema-changes.png #Ref6

Slide 7: NoSQL App - Maintenance & Changes

  • Developers can be more productivity (at least that's the goal)
nosql-dev-model.png #Ref6

Slide 8: NoSQL App - Maintenance & Changes

  • Developers can be more productivity (at least that's the goal)
nosql-dev-model-2.png #Ref6

Slide 9: Terminology: Spreadsheet, RDMBS, NoSQL DB

Spreadsheet RDMBS NoSQL DB
file system database server database server
spreadsheet database database
sheet, tab table collection
row tuple, record document
cell field key/value
N/A key ID
N/A foreign key reference
N/A join embedded document
N/A index index
scripting SQL JSON query
  • RDBMS: Relational Database Management System
  • NoSQL: No SQL (Structured Query Language)

Slide 10: NoSQL - Document Model

mongodb-document.png
  • N-dimensional storage
  • A field can contain many values & embedded values - JSON object
  • Query on any field & any level
  • No schema / flexible schema
  • Optimal data locality requires fewer indexes and provides better performance

Slide 11: NoSQL Flavors

  1. Key-value stores
    • Hash table with unique keys, each with a value
    • Simple & possibly limiting
    • Examples: Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB, Amazon SimpleDB, Riak
  2. Column family stores
    • Big table where keys point to multiple columns
    • Can process very large amounts of data distributed over many machines
    • Examples: Cassandra, HBase

Slide 12: NoSQL Flavors - Continued

  1. Document stores
    • Similar to key-value stores, but allow nested values associated with each key
    • Deep structure in an app can be stored "as is" - JSON
    • Efficient database queries
    • Examples: CouchDB, MongoDB
  2. Graph Databases
    • Flexible graph model, consisting of elements interconnected with a finite number of relations between them
    • Example data types: Social relations, public transport links, road maps. network topologies
    • Examples: Neo4J, InfoGrid, Infinite Graph

Slide 13: MongoDB at a Glance

mongodb-logo-600.png
  • MongoDB is a document database/store
  • Developed by MongoDB Inc.
  • Software is free and open-source (GNU Affero General Public License & Apache License)
  • Most popular document store
  • MongoDB stores JSON-like documents, in a format called BSON (Binary JSON)

Slide 14: JSON: JavaScript Object Notation

{
  "color": "#333333",
  "menu": [
    {
      "id": "menuFile",
      "value": "File",
      "items": [
        { "value": "New", "action": "fileCreate()" },
        { "value": "Open", "action": "fileOpen()" },
        { "value": "Close", "action": "fileClose()" }
      ]
    },
    {
      "id": "menuEdit",
      "value": "Edit",
      "items": [
        { "value": "Cut", "action": "editCut()" },
        { "value": "Copy", "action": "editCopy()" },
        { "value": "Paste", "action": "editPaste()" }
      ]
    }
  ],
  "index": 80
}
  • Lightweight data-interchange format
  • Easy for humans to read/write and easy for machines to parse/generate
  • Subset of JavaScript, Standard ECMA-262 3rd Edition
  • JSON object: Just data, e.g. for Perl coders an object without methods
  • Two structures:
    • Collection of name/value pairs - hash in Perl
    • Ordered list of values - array in Perl
    • Value is: String, number, object, array

Slide 15: MongoDB - BSON

mongodb-for-dbas__bson.png
  • Ordered list of elements
  • Each element consists of a field name, a type, and a value
  • Field names are strings
  • Types include:
    • string
    • integer (32- or 64-bit)
    • double (64-bit IEEE 754 floating point number)
    • date (integer number of milliseconds since the Unix epoch)
    • byte array (binary data)
    • boolean (true and false)
    • null
    • BSON object
    • BSON array

Slide 16: MongoDB Design Philosophy & Scalability

  • RDBMS: Scale an application by going vertical, or "buy a bigger box"

  • MongoDB: Scales horizontally, you just "buy more boxes"

Slide 17: MongoDB Design Philosophy & Features

  • Sacrifice some features to keep things manageable and fast
  • Non-linear feature/performance curve
  • 80/20 rule

Slide 18: MongoDB Installation

  • Red Hat 7 / CentOS 7 specific, many other OSes supported
  • 32 and 64-bit system available, 64-bit system strongly recommended
  • vi /etc/yum.repos.d/mongodb.repo
    [mongodb]
    name=MongoDB Repository
    baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
    gpgcheck=0
    enabled=1
  • yum install mongodb-org mongodb-org-server
  • systemctl start mongod     # start MongoDB daemon
  • systemctl enable mongod    # enable automatic restart on OS boot
  • systemctl status mongod    # check MongoDB service status
  • mongostat                  # summary list of status statistics

Slide 19: MongoDB Shell - Command Line Tool

[pthoeny@twiki ~]$ mongo -u dbUser -p *****
MongoDB shell version: 2.6.12
connecting to: restaurants
> use test
switched to db test
> show collections
helloworld
mapReduceCities
restaurants
system.indexes
system.users
tests
users
zips
> db.zips.find({ "pop": { $gt: 100000 } }, { "city": 1, "pop": 1, "state": 1 })
{ "_id" : "10021", "city" : "NEW YORK", "pop" : 106564, "state" : "NY" }
{ "_id" : "10025", "city" : "NEW YORK", "pop" : 100027, "state" : "NY" }
{ "_id" : "11226", "city" : "BROOKLYN", "pop" : 111396, "state" : "NY" }
{ "_id" : "60623", "city" : "CHICAGO", "pop" : 112047, "state" : "IL" }
> exit
bye
[pthoeny@twiki ~]$ 

Slide 20: MongoDB GUI

robomongo-screen.png
  • Two popular GUIs

  • Demo

Slide 21: MongoDB Basic Administration

  • Users and roles:
    • By default, no authentication needed if on local host, and port 27017 locked to outside
    • Manage users and roles

Slide 22: MongoDB JavaScript Operation: Find

  • db.someCollection.find( query, projection )
  • query - what to find
    - example: { "state" : "NY", "pop": { $gt: 100000 } }
  • projection - what to return
    - example: { "city": 1, "pop": 1, "state": 1 }
  • Doc: db.collection.find()

Slide 23: MongoDB JavaScript Operation: Insert

  • db.someCollection.insert( document, options )
  • document - document or array of documents to insert
    - example: { "_id": "95014", "city": "CUPERTINO", "pop": 60189, "state": "CA" }
  • options - options
    - example: { writeConcern: { w: 1, j: 1, wtimeout: 1000 }, ordered: 1 } }
  • Primary key _id automatically assigned if missing
    - example: "_id" : ObjectId("578739d3bafa7bf9abde41a4")
  • Doc: db.collection.insert()

Slide 24: MongoDB More JavaScript Operations

  • db.auth() - if running in secure mode, authenticate the user
  • db.collection.update() - update an existing document in the collection
  • db.collection.save() - insert either a new document or update an existing document in the collection
  • db.collection.remove() - delete documents from the collection
  • db.collection.drop() - drops or removes completely the collection
  • db.collection.createIndex() - create a new index on the collection if the index does not exist
  • Doc: Basic shell JavaScript operations

Slide 25: MongoDB Query Operators

  • Query selectors:
    • Comparison: $eq, $gt, $gte, $lt, $lte, $ne, $in, $nin
      - example: { "zip": { $in: [ 95014, 95015, 95025, 95036 ] } }
    • Logical: $or, $and, $not, $nor
    • Element: $exists, $type
    • Evaluation: $mod, $regex, $text, $where
      - example: { "name": { $regex: /acme.*corp/, $options: 'i' } }
    • Geospatial: $geoWithin, $geoIntersects, $near, $nearSphere
    • Array: $all, $elemMatch, $size
  • Doc: Query and projection operators

Slide 26: MongoDB Schema Change - Example

schema-change-example.png #Ref6

Slide 27: MongoDB Schema Change - Apply Change

  • Task: Convert object value to array of objects
  • Command on mongo command line tool or GUI:
    db.getCollection('contacts').find({})
      .forEach(function (doc) {
        var addrArray = [ doc.address ];
        delete doc.address;
        doc.addresses = addrArray;
        db.contacts.save(doc);
    })

Slide 28: MongoDB Data Modeling - Example Contact

data-model-address-book.png #Ref6

Slide 29: MongoDB Data Modeling - Example Document

{
    "name": "Peter Thoeny",
    "company": "TWiki.org",
    "title": "Founder & CTO",
    "twitter": {
        "account": "peterthoeny", "name": "Peter Thoeny", "url": "http://thoeny.org/"
    },
    "addresses": [
        { "type": "home", "street": "1 Privacy Way", "city": "Cupertino",
                          "zip": "95014", "state": "CA", "country": "USA" },
        { "type": "work", "street": "1 Stealth Mode", "city": "Cupertino",
                          "zip": "95014", "state": "CA", "country": "USA" }
    ],
    "phones": [
        { "type": "home", "number": "1-408-555-1212" },
        { "type": "work", "number": "1-408-555-1213" }
    ],
    "emails": [
        { "type": "home", "email": "peter09@thoeny.org" },
        { "type": "work", "email": "peter09@twiki.org" }
    ],
    "groups": [
        "57786b9f4dd6feee06abbb03",
        "57786b9f4dd6feee06abbb08"
    ]
}

Slide 30: MongoDB Data Modeling - One-to-One Recommendations

  • One-to-One: Use embedded document
  • Example:
    { ...,
      "twitter": {
        "account": "twiki", "Name": "TWiki", "url": "http://twiki.org/"
      },
      ...
    }
  • Query:
    db.contacts.find({ "twitter.account": "twiki" })

Slide 31: MongoDB Data Modeling - One-to-Many Recommendations

  • One-to-Many: Use embedded array of objects
  • Example:
    { ..., "addresses": [
        {"type": "work", "street": "", "city": "Cupertino",
        "zip": "95014" }, {...}
      ], ...
    }
  • Query:
    db.contacts.find({ "addresses.city": "Cupertino" })
  • Query, returning only matched array element:
    db.contacts.find({ "addresses.city": "Cupertino" },
                      { "addresses": { $elemMatch: { "city": "New York" })

Slide 32: MongoDB Data Modeling - Many-to-Many Recommendations

  • Many-to-Many: Use embedded array of references
  • Join table
  • Example with ID only:
    { ..., "groups": [
        "57786b9f4dd6feee06abbb03", "57786b9f4dd6feee06abbb08"
      ], ... }
  • Example with cached name:
    { ..., "groups": [
        { "id": "57786b9f4dd6feee06abbb03", "name": "all-employees" },
        { "id": "57786b9f4dd6feee06abbb08", "name": "all-USA"
      ], ... }
  • Query:
    db.contacts.find({ "groups.name": "all-USA" })

Slide 33: MongoDB Performance & Indexing

  • Small size: BSON data sent to/from client
    • Driver responsible to translate into a programming language specific representation
  • Indexing:
  • Prepare data in format needed:
    • Create additional collections via map-reduce

Slide 34: MongoDB Sharding - Distributed Data

Sharding-in-MongoDB.png
  • Method for distributing data across multiple machines - horizontal scaling
  • Goals:
    • Deployments with very large data sets
    • High throughput operations
  • Tag aware sharding
  • Doc: MongoDB sharding

Slide 35: MongoDB Replica-Sets

replica-set-primary-with-two-secondaries.png
replica-set-primary-with-secondary-and-arbiter.png
  • Replica set: A group of mongod processes that maintain the same data set
  • Goals: Redundancy and high availability
  • One is primary, a secondary can step up to become primary
  • Doc: MongoDB replication

Slide 36: MongoDB Multi-Site Deployment

replica-set-three-data-centers.png
replica-set-three-data-centers-priority.png
  • Distributed replica set: A group of mongod processes across multiple data centers that maintain the same data set
  • Goals: Redundancy, fault tolerance, performance
  • Electability of Members: Fail-over with priority
  • Doc: Replica sets distributed across data centers

Slide 37: Transactions in MongoDB

two-phase-commit.png
  • No built-in transaction handling
  • Updating a single document is atomic, also with deep structure document
  • Use $isolated operator to update documents within same collection
    • But: $isolated does not work with sharded clusters
  • Transaction-like behavior by implementing a two-phase commit

Slide 38: Map-Reduce

  • Goal: Condense large volumes of data into useful aggregated results
  • Advantages:
    • Easy to understand, especially for programmers not familiar with SQL
    • Easy to distribute map and reduce calculation over many servers
  • Database command:
    db.exampleCollection.mapReduce(
        mapFunction() {
            emit(key, value);  // from document data
        },
        reduceFunction(key, values) {
            return value;      // calculated from values
        },
        { "out": "exampleResult" }
    )

Slide 39: Map-Reduce: Example

map-reduce.png

Slide 40: Map-Reduce: Demo

var mapCities = function() {
    var cityZipObj = {
        city: this.city, zip:
        this._id
    };
    emit(this.state, cityZipObj);
};

var reduceCities = function(state, objArr) {
    var obj = {};
    for(var i = 0; i < objArr.length; i++) {
        var cityZipObj = objArr[i];
        var city = cityZipObj.city;
        var zip = cityZipObj.zip;
        if(city) {
            if(!obj[city]) {
                obj[city] = [];
            }
            obj[city].push(zip);
        }
    }
    return obj;
};

db.zips.mapReduce(
    mapCities,
    reduceCities,
    { "out": "mapReduceCities" }
)
  • Input: zips collection; sample doc:
    {   "_id" : "90001",
        "city" : "LOS ANGELES",
        "state" : "CA"
    }
  • Task: For each state, list cities with ZIP(s)
  • How:
    • Map: Return state as key, and object with zip and city as value
    • Reduce: For a state, compose list of cities with ZIPs

Slide 41: Map-Reduce: Compare to SQL

mysql-vs-mongodb.png
Source: InfoGraphic: Migrating from SQL to MapReduce with MongoDB #Ref12

Slide 42: MongoDB in Perl

  • Mango:
    • Pure-Perl, non-blocking, asynchronous driver for MongoDB
    • Designed to work with the Mojolicious web framework
  • Meerkat:
    • Manage MongoDB documents as Moose objects
    • Designed for atomic operations that keep client-side objects in sync with the database
  • MongoDB::Simple:
    • Basic object-to-document mapping system with few dependencies

Slide 43: References

  1. https://en.wikipedia.org/wiki/NoSQL - NoSQL on Wikipedia
  2. https://en.wikipedia.org/wiki/MongoDB - MongoDB on Wikipedia
  3. https://www.mongodb.com/ - MongoDB Inc.
  4. https://docs.mongodb.com/ - MongoDb documentation
  5. http://www.liquidweb.com/kb/how-to-install-mongodb-on-centos-7/ - how to install MongoDB on CentOS 7
  6. http://www.slideshare.net/mongodb/jakes-schema-design-houston-mug-20150311 - MongoDB schema design
  7. http://json.org/ - JSON
  8. https://en.wikipedia.org/wiki/BSON - BSON
  9. https://docs.mongodb.com/manual/applications/indexes/ - MongoDB Indexing Strategies
  10. http://crocodillon.com/blog/mongodb-for-dbas-introduction - MongoDB for DBAs: Introduction
  11. https://rickosborne.org/blog/2010/02/playing-around-with-mongodb-and-mapreduce-functions/ - MongoDB and MapReduce
  12. https://rickosborne.org/blog/2010/02/infographic-migrating-from-sql-to-mapreduce-with-mongodb/ - InfoGraphic MapReduce

Slide 44: BACKUP SLIDES












BACKUP SLIDES






Slide 45: What is TWiki?

  • twiki-logo-200x72.png TWiki is a wiki engine and wiki application platform, established in 1998
  • TWiki is specifically built for the workplace
  • Large number of TWiki Extensions: 200+ actively maintained extensions
  • Open Source software (GPL) with active community, hosted at http://TWiki.org/
  • 2,000+ downloads per month, 600,000 total downloads, estimate 50,000+ installations, 130+ countries
  • Est. $27M of human capital invested (ref. Open HUB)
  • Source Forge 2009 "Best Enterprise Project" Finalist (among 230,000 open source projects)

Slide 46: TWiki Open Source Community

Slide 47: TWiki I/O Architecture

twiki-io-architecture.png

Notes

    Copyright © 2016 by TWiki.org. This presentation may be reproduced as long as the copyright notice is retained and a link is provided back to http://twiki.org/.    

See also: What is TWiki, TWiki presentation, public TWiki sites, TWiki screenshots, TWiki.org Blog

-- Author: Peter Thoeny - 2016-07-14

Discussion

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2016-07-20 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2016 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.