Explore chapters and articles related to this topic
Software and Technology Standards as Tools
Published in Jim Goodell, Janet Kolodner, Learning Engineering Toolkit, 2023
Jim Goodell, Andrew J. Hampton, Richard Tong, Sae Schatz
Data architectures address the structure of data and the capabilities of associated data platforms. Data platforms are the various components required to acquire, store, prepare, deliver, and manage your data (along with the requisite security).10 Most data platforms include one or more databases, including relational databases (for example, built using SQL) and / or non-relational or NoSQL databases (such as object databases, graph databases, document stores, key / value stores, triple / quad stores, and hybrid platforms). The different database types each have strengths and weaknesses. For instance, SQL needs predefined schema and structured data, while NoSQL can handle dynamic schema and unstructured data. The different database types also scale and perform differently, depending on how they’re accessed and how data are structured within them.11 For example, object databases are a convenient choice for applications built using object query languages, and they can handle complex relationships between objects. Meanwhile, graph databases excel at managing highly connected data and complex queries where the relationships between data elements are as important to the data elements themselves.
It's All about the Data
Published in James Luke, David Porter, Padmanabhan Santhanam, Beyond Algorithms, 2022
James Luke, David Porter, Padmanabhan Santhanam
The format of the data collected can vary widely depending on the source and the intended purpose. Structured data refers to the format of the data that facilitates queries against it in a pre-designed and engineered way. There are three mainstream structured data representations: Traditionally, enterprise data used for business analytics has been in the form of tables (e.g. Microsoft Excel spreadsheets) with columns representing attributes (or features) and rows representing observations. Relational Databases (e.g. IBM DB2, Oracle, etc.) that have schemas and linking tables are used for complex large-scale data and high-performance needs. Structured Query Language (SQL) allows queries against relational databases.Semantic Web is a knowledge representation promoted by the World Wide Web Consortium and Tim Berners-Lee to make the internet data machine readable. It consists of a Resource Description Framework (RDF), Web Ontology Language (OWL) and Extensible Markup Language (XML). SPARQL is the language to perform ‘semantic’ queries.Recently, graph databases (e.g. Neo4j, JanusGraph, etc.) that use nodes and edges to represent relationships between entities have gained wider acceptance. Cypher and Gremlin are examples of query languages for graph databases.
Big Data Computing
Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019
NoSQL databases have been classified into four subcategories: Column family stores: An extension of the key–value architecture with columns and column families; the overall goal was to process distributed data over a pool of infrastructure, for example, HBase and Cassandra.Key–value pairs: This model is implemented using a hash table where there is a unique key and a pointer to a particular item of data, creating a key–value pair, for example, Voldemort.Document databases: This class of databases is modeled after Lotus Notes and is similar to key–value stores. The data is stored as a document and is represented in JSON or XML formats. The biggest design feature is the flexibility to list multiple levels of key–value pairs, for example, Riak and CouchDB.Graph databases: Based on the graph theory, this class of database supports scalability across a cluster of machines. The complexity of representation for extremely complex sets of documents is evolving, for example, Neo4J.
Analysing the past to prepare for the future: Writing a literature review a roadmap for release 2.0
Published in Journal of Decision Systems, 2020
Richard T. Watson, Jane Webster
A graph is composed of nodes and edges. In the domain of literature reviewing, an element of interest (e.g. a concept or process) is a node and a relationship between a pair of elements is an edge. A labelled graph properties database allows nodes and relationships to have properties (Negro, 2018; Robinson et al., 2013). A property of an element might be a concept’s name (e.g. information asymmetry) and its type (e.g. a concept). The property of an edge relationship could be a descriptor of the relationship, such as ‘precedes‘ in the case of a process diagram or ‘causes’ for a causal model. Another property could indicate the nature of a relationship, such as causal or temporal. Nodes can also have one or more labels, which are used to group nodes together and indicate one or more roles. Thus, all elements of the same type (e.g. processes) could be so labelled to group them. We suggest that a graph description language (GDL) could help in this regard in defining elements and nomological relationship maps. A graph query language (GQL) is used to query a graph database and provides features similar to SQL for the relational model. ISO is working on specifying a standard GQL based on openCypher and similar languages.3 In this article, we use openCypher (or simply Cypher), which is currently the most widely adopted open query language for graph databases.4 Cypher can be used to define and manipulate property graphs (Appendix A). We now consider relationship maps and their descriptions.
Transit performance assessment based on graph analytics
Published in Transportmetrica A: Transport Science, 2019
Ikechukwu Derek Maduako, Monica Wachowicz, Trevor Hanson
The graph data model was implemented in the Neo4j graph database management system. Neo4j is currently the most popular native graph database widely used for graph data management and analytics. The query language in Neo4j is called Cypher, which can be used to implement ad-hoc queries, graph algorithms and User Defined Functions (UDFs). Graph metrics such as PageRank, degree centralities and other user-defined functions can be encoded within the Cypher queries. The computation of the various graph metrics used in this case study was encoded in Cypher through a pipeline of query statements as shown in the PageRank example Table 3. The two weeks of AVL data stream generated 4.5 million nodes and edges and in the Neo4j database.
Enhanced adaptive partitioning in a distributed graph database
Published in Journal of Information and Telecommunication, 2021
Lucie Svitáková, Michal Valenta, Jaroslav Pokorný
In recent times, fast-growing amount of data of various types has driven attention to NoSQL databases. The high volume of data forces the NoSQL systems to distribute the data across several cluster nodes. The problem of data distribution is a nontrivial task that requires dedicated management. Particularly difficult is the distribution of data in graph databases since the data often represent a large, highly connected graph. Its division into subgraphs leaves an immense number of edges abstractly connecting elements between individual cluster nodes, which introduces high communication overhead when working with the data.