Explore chapters and articles related to this topic
Storage and databases for big data
Published in Jun Deng, Lei Xing, Big Data in Radiation Oncology, 2019
Tomas Skripcak, Uwe Just, Ida Schönfeld, Esther G.C. Troost, Mechthild Krause
MongoDB is one of the most popular JSON-oriented document databases that have been released under the open-source license. Internally, it uses the binary-encoded format called BSON. The BSON format extends the JSON specification about low-level representation of data types such as Dates and allows document parsing operations to be much more efficient. MongoDB utilizes JavaScript as a special-purpose query language that can be used to perform ad hoc queries on document collections and for execution of user-defined functions on the server side. It is known for its developer friendliness that allows quick entry into to world of big data technologies; however, it does not currently provide the best robustness and scalability. Furthermore, the utilization of JavaScript makes MongoDB vulnerable to NoSQL injection attacks discussed in the following section. Even considering these flaws, MongoDB could be a good candidate for big data prototyping scenarios. However, in its present state, it is not a best choice for sustainable productive big data infrastructure in translational radiation oncology research.
Performance of serial and parallel processing on online sentiment analysis of fast food restaurants
Published in Shin-ya Nishizaki, Masayuki Numao, Jaime Caro, Merlin Teodosia Suarez, Theory and Practice of Computation, 2019
B. Quijano, M.R. Nabus, L.L. Figueroa
MongoDB is a document-oriented DBMS that manages “documents” i.e. a set of key-value or key-array pairs that can be semi-structured, unlike records in relational DBMSs. Its documents are stored in a Binary JavaScript Object Notation (BSON) format (MongoDB Inc., 2017), which allows it to store data that make use of the JSON format as is, such as raw data streams provided by Twitter APIs (Twitter Inc., 2017). Previous research studies such as the ones done by Oliveira et al. (2013) and Bing et al. (2014) have used MongoDB to store Twitter data for their sentiment analysis studies on Twitter sentiments and stock price correlation.
Eye forms – Online form invigilator
Published in Sangeeta Jadhav, Rahul Desai, Ashwini Sapkal, Application of Communication Computational Intelligence and Learning, 2022
Shardul Nimbalkar, Mehul Khandhadiya, Shashank Tapas, Saurabh Khandagale, Yash Jain, Vrushali K. Bongirwar
The heart of any web application that consists of user interaction is user data. The collection of such stones of data is called a database. The database used is Mongo DB. MongoDB is a NoSQL database that converts and saves data in the form of documents. The basic content of a MongoDB database is a collection that in turn consists of data stored in key-value pairs [9].
Are NoSQL Databases Affected by Schema?
Published in IETE Journal of Research, 2023
Neha Bansal, Shelly Sachdeva, Lalit K. Awasthi
MongoDB stores data in a binary format called BSON, which is “Binary JSON.” BSON is a flexible, efficient, easy-to-parse binary representation of JSON-like documents. In MongoDB, data is stored in collections, analogous to tables in a relational database. Each document in a collection is a BSON object that can contain multiple fields, including nested objects and arrays. Documents are not required to have a fixed schema, so different documents in the same collection can have various fields. MongoDB uses a flexible indexing system to optimise query performance. By default, every collection has a primary index on the _id field, a unique identifier for each document. MongoDB also supports a wide range of secondary indexes, including indexes on nested fields and multi-key indexes on arrays. MongoDB does not support joins and offers to embed or reference data instead of joins. The logical representation of MongoDB is given in Figure 1.
Weighted holoentropy-based features with optimised deep belief network for automatic sentiment analysis: reviewing product tweets
Published in Journal of Experimental & Theoretical Artificial Intelligence, 2023
Hema Krishnan, M. Sudheep Elayidom, T. Santhanakrishnan
MongoDB (Sankarapandi et al., 2015) is considered as a document-oriented database that is modelled to offer increased performance, easy scalability, and high availability. MongoDB performs based on the principle of documentation as well as collection. The physical container of MongoDB is the database, and numerous databases are obtainable in MongoDB. The design of MongoDB exploits three major components, such as Shard nodes, Routing services or mongos, and Configuration servers. In the MongoDB model, one or more shards are obtainable in every cluster of MongoDB. Also, there is a mutual association among every shard node for storing the real data in the database. The data in every shard are stored by a replicated node or a single node, and these nodes contain a write and read query option for the shards. MongoDB assists dynamic queries on documents employing a document-oriented query language that is powerful as SQL, in which the data is stored in the form of JSON-style documents. The design of the MongoDB model is shown in Figure 1.
Integrating memory-mapping and N-dimensional hash function for fast and efficient grid-based climate data query
Published in Annals of GIS, 2021
Mengchao Xu, Liang Zhao, Ruixin Yang, Jingchao Yang, Dexuan Sha, Chaowei Yang
To compare the performance of LotDB with other databases, three popular databases are chosen; specifically, they are PostgreSQL 9.3 (Relational Database), MongoDB 4.0.4 (NoSQL Database), and SciDB 18.1 (Array Database). PostgreSQL is an open source object-relational database management system, it has been launched for 30 years and maintained a very stable performance in different domains. MongoDB is a document-oriented NoSQL database system, which follows a schema-free design and is one of the most popular databases for modern applications. SciDB, as mentioned in the previous section, is a high-performance array database that designed specifically for storing and querying scientific datasets. All these databases are installed as standalone modes on individual servers with the same hardware configuration: Intel Xeon CPU X5660 @ 2.8 Ghz×24 with 24GB RAM size and 7200 rpm HDD, installed with CentOS 6.6 or Ubuntu 14.04. Data were uploaded to each database, and pre-processing work was done for databases that do not support NetCDF format. The databases are evaluated in the following aspects: (1) data uploading and pre-processing time, (2) data storage consumption, and (3) spatiotemporal query run-time. Different spatiotemporal queries are designed to evaluate the performance of selected databases (Table 1) for the year 2017 with raw data size be 3.45 GB. Different raw data sizes were chosen to evaluate the data storage consumption in different databases in additional to 3.45 GB, specifically, they are 10 MB, 100 MB, 1 GB, and 10 GB. The number of grid points is the estimated number of array cells to be retrieved for the corresponding query. The query run-time refers to the elapsed real time or wall-clock time in this experiment.