VoltDB updates database with Web links

Responding to a growing user base of online service providers, VoltDB has outfitted its namesake in-memory database management system with additional tools to communicate with other technologies usually found in a Web applications stack.

Responding to a growing user base of online service providers, VoltDB has outfitted its namesake in-memory database management system with additional tools to communicate with other technologies usually found in a Web applications stack.

DSPL Tools

DSPL Tools is a small suite of command-line utilities designed to help generate, organize, and validate DSPL datasets. The suite currently includes the following components:

  • DSPL Check: Checks a dataset against a variety of criteria including adherence to the official DSPL schema, consistency of internal references, and CSV layout.
  • DSPL Gen: Generates a simple, DSPL dataset “template” from an input CSV file

This software is released under a BSD license; the full source code is available for browsing and download on the DSPL open source site. Release notes are provided in the DSPL Tools README file.


DSPL Developer Guide

DSPL stands for Dataset Publishing Language. It is a representation format for both the metadata (information about the dataset, such as its name and provider, as well as the concepts it contains and displays) and actual data of datasets. Datasets described in this format can be imported into the Google Public Data Explorer, a tool that allows for rich, visual exploration of the data.

Note: To upload data to Google Public Data using the Public Data upload tool, you must have a Google Account.

This document is intended for data owners who want their content to be available in the Public Data Explorer. It goes beyond the Tutorial by diving deeper into the details of the DSPL schema and supported features. Only a basic familiarity of XML is assumed, although knowledge of relational databases is also useful.

Although not a requirement, we suggest reading through the Tutorial, which is shorter and easier to digest, before looking at this document.

An identifying relationship

An identifying relationship “describes a situation in which the existence of a row in the child table depends on a row in the parent table.”

“if a child identifies its parent, it is an identifying relationship.”

The technical definition of an identifying relationship is that a child’s foreign key is part of its primary key.

CREATE TABLE AuthoredBook (
  author_id INT NOT NULL,
  book_id INT NOT NULL,
  PRIMARY KEY (author_id, book_id),
  FOREIGN KEY (author_id) REFERENCES Authors(author_id),
  FOREIGN KEY (book_id) REFERENCES Books(book_id)
);

See? book_id is a foreign key, but it’s also one of the columns in the primary key. So this table has an identifying relationship with the referenced table Books. Likewise it has an identifying relationship with Authors.

A comment on a YouTube video has an identifying relationship with the respective video. The video_id should be part of the primary key of the Comments table.

CREATE TABLE Comments (
  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  PRIMARY KEY (video_id, user_id, comment_dt),
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

It may be hard to understand this because it’s such common practice these days to use only a serial surrogate key instead of a compound primary key:

CREATE TABLE Comments (
  comment_id SERIAL PRIMARY KEY,
  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

This can obscure cases where the tables have an identifying relationship.

I would not consider SSN to represent an identifying relationship. Some people exist but do not have an SSN. Other people may file to get a new SSN. So the SSN is really just an attribute, not part of the person’s primary key.

You can take a look at MySQL Manual, explaining how to add Foreign Keys on MySQL Workbench as well.

Adding Foreign Key Relationships Using an EER Diagram

The vertical toolbar on the left side of an EER Diagram has six foreign key tools:

  • one-to-one non-identifying relationship
  • one-to-many non-identifying relationship
  • one-to-one identifying relationship
  • one-to-many identifying relationship
  • many-to-many identifying relationship
  • Place a Relationship Using Existing Columns

An identifying relationship is one where the child table cannot be uniquely identified without its parent. Typically this occurs where an intermediary table is created to resolve a many-to-many relationship. In such cases, the primary key is usually a composite key made up of the primary keys from the two original tables. An identifying relationship is indicated by a solid line between the tables and a nonidentifying relationship is indicated by a broken line.

Create or drag and drop the tables that you wish to connect. Ensure that there is a primary key in the table that will be on the “one” side of the relationship. Click on the appropriate tool for the type of relationship you wish to create. If you are creating a one-to-many relationship, first click the table that is on the “many” side of the relationship, then on the table containing the referenced key. This creates a column in the table on the many side of the relationship. The default name of this column is table_name_key_name where the table name and the key name both refer to the table containing the referenced key.

When the many-to-many tool is active, double-clicking a table creates an associative table with a many-to-many relationship. For this tool to function there must be a primary key defined in the initial table.

Use the Model menu, Menu Options menu item to set a project-specific default name for the foreign key column (see Section 8.5.1.5.4, “The Relationship Notation Submenu”). To change the global default, see Section 6.4.5, “The Model Tab”.

To edit the properties of a foreign key, double-click anywhere on the connection line that joins the two tables. This opens the relationship editor.

Mousing over a relationship connector highlights the connector and the related keys as shown in the following figure. The film and the film_actor tables are related on the film_id field and these fields are highlighted in both tables. Since the film_id field is part of the primary key in the film_actor table, a solid line is used for the connector between the two tables.

PostgreSQL

PostgreSQL, often simply Postgres, is an object-relational database management system (ORDBMS) available for many platforms including Linux, FreeBSD, Solaris, Microsoft Windows and Mac OS X.[4] It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software. PostgreSQL is developed by the PostgreSQL Global Development Group, consisting of a handful of volunteers employed and supervised by companies such as Red Hat and EnterpriseDB.[5] It implements the majority of the SQL:2008 standard,[6] is ACID-compliant, is fully transactional (including all DDL statements), has extensible data types, operators, index methods, functions, aggregates, procedural languages, and has a large number of extensions written by third parties.

The vast majority of Linux distributions have PostgreSQL available in supplied packages. Mac OS X, starting with Lion, has PostgreSQL server as its standard default database in the server edition,[7][8] and PostgreSQL client tools in the desktop edition.

PostgreSQL has bindings for many programming languages such as C, C++, Python, Java, PHP, Ruby… It can be used to power anything from simple web applications to massive databases with millions of records.

 

Client Installation

If you only wish to connect to a PostgreSQL server, do not install the whole PostgreSQL package, but install the PostgreSQL client instead. To do this, use the following command

 

 sudo apt-get install postgresql-client

you then connect to the server with the following command

 

 psql -h server.domain.org database user

 

After you inserted the password you access PostgreSQL with line commands. You may for instance insert the following

 

 SELECT * FROM table WHERE 1;

You exit the connection with

 q

 

Installing PostgreSQL Database

  1. Install PostgreSQL using the apt-get command in gnome-terminal:
    1
    sudo apt-get install postgresql libpq-dev
  2. After installation is complete, change user to the PostgreSQL user:
    1
    sudo su - postgres
  3. You are now working as the PostgreSQL user. Now, let’s change your database password to be more robust. In this example, I’m setting the password as “s0meth1ng”:
    1
    2
    3
    4
    ~$: psql -d postgres -U postgres
    psql (9.1.3) Type "help" for help.
    postgres=# alter user postgres with password 's0meth1ng'; ALTER ROLE
    postgres=# q
  4. Restart the PostgreSQL database to let the changes take effect:
    1
    sudo /etc/init.d/postgresql restart

Installing and Setting Up pgAdmin III

  1. Install pgAdmin III using the apt-get command in gnome-terminal:
    1
    sudo apt-get install pgadmin3
  2. Once installed, you can launch pgAdmin III by quick-launching it in Alt-F2, then typing pgadmin3.
  3. Now, let’s add a new PostgreSQL database server to the list of servers. Go to File > Add Server, and enter the details as the following screenshot:
    pgAdmin III Server Configuration
  4. Once that is done, you’ll now see your new server at the list of servers on the left pane. Go ahead, and create your database. Have fun!
    List of Database Servers in pgAdmin III

 

Administration

pgAdmin III is a handy GUI for PostgreSQL, it is essential to beginners. To install it, type at the command line:

 

 sudo apt-get install pgadmin3

You may also use the Synaptic package manager from the System>Administration menu to install these packages.

 

Basic Server Setup

To start off, we need to change the PostgreSQL postgres user password; we will not be able to access the server otherwise. As the “postgres” Linux user, we will execute the psql command.

In a terminal, type:

 

sudo -u postgres psql postgres

Set a password for the “postgres” database role using the command:

password postgres

and give your password when prompted. The password text will be hidden from the console for security purposes.

Type Control+D to exit the posgreSQL prompt.

 

Create database

To create the first database, which we will call “mydb”, simply type:

 

 sudo -u postgres createdb mydb

 

Install Server Instrumentation for Postgresql 8.4 or 9.1

To install Server Instrumentation, you must install postgresql-contrib:

 

 sudo apt-get install postgresql-contrib

For “”Postgresql 9.1″”+ install the adminpack “extension”:

 

 sudo -u postgres psql
 CREATE EXTENSION adminpack;

 

Alternative Server Setup

If you don’t intend to connect to the database from other machines, this alternative setup may be simpler.

By default in Ubuntu, Postgresql is configured to use ‘ident sameuser’ authentication for any connections from the same machine. Check out the excellent Postgresql documentation for more information, but essentially this means that if your Ubuntu username is ‘foo’ and you add ‘foo’ as a Postgresql user then you can connect to the database without requiring a password.

Since the only user who can connect to a fresh install is the postgres user, here is how to create yourself a database account (which is in this case also a database superuser) with the same name as your login name and then create a password for the user:

 sudo -u postgres createuser --superuser $USER
 sudo -u postgres psql

 

 postgres=# password $USER

Client programs, by default, connect to the local host using your Ubuntu login name and expect to find a database with that name too. So to make things REALLY easy, use your new superuser privileges granted above to create a database with the same name as your login name:

 createdb $USER

Connecting to your own database to try out some SQL should now be as easy as:

 psql

To create a database with a user that have full rights on the database, use the following command:

 

sudo -u postgres createuser -D -A -P myuser
sudo -u postgres createdb -O myuser mydb

The first command line creates the user with no database creation rights (-D) with no add user rights -A) and will prompt you for entering a password (-P). The second command line create the database ‘mydb with ‘myuser‘ as owner.

This little example will probably suit most of your needs. For more details, please refer to the corresponding man pages or the online documentation.

$ psql -d postgres
postgres=# create role app_name login createdb;
postgres=# q

SQL Dump

The idea behind this dump method is to generate a text file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose. The basic usage of this command is:

pg_dump dbname > outfile

As you see, pg_dump writes its result to the standard output. We will see below how this can be useful.

pg_dump is a regular PostgreSQL client application (albeit a particularly clever one). This means that you can perform this backup procedure from any remote host that has access to the database. But remember that pg_dump does not operate with special permissions. In particular, it must have read access to all tables that you want to back up, so in practice you almost always have to run it as a database superuser.

To specify which database server pg_dump should contact, use the command line options -h host and -p port. The default host is the local host or whatever your PGHOST environment variable specifies. Similarly, the default port is indicated by the PGPORT environment variable or, failing that, by the compiled-in default. (Conveniently, the server will normally have the same compiled-in default.)

Like any other PostgreSQL client application, pg_dump will by default connect with the database user name that is equal to the current operating system user name. To override this, either specify the -U option or set the environment variable PGUSER. Remember that pg_dump connections are subject to the normal client authentication mechanisms (which are described in Chapter 19).

An important advantage of pg_dump over the other backup methods described later is that pg_dump’s output can generally be re-loaded into newer versions of PostgreSQL, whereas file-level backups and continuous archiving are both extremely server-version-specific. pg_dump is also the only method that will work when transferring a database to a different machine architecture, such as going from a 32-bit to a 64-bit server.

Dumps created by pg_dump are internally consistent, meaning, the dump represents a snapshot of the database at the time pg_dump began running. pg_dump does not block other operations on the database while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as most forms of ALTER TABLE.)

Important: If your database schema relies on OIDs (for instance, as foreign keys) you must instruct pg_dump to dump the OIDs as well. To do this, use the -o command-line option.

24.1.1. Restoring the Dump

The text files created by pg_dump are intended to be read in by the psql program. The general command form to restore a dump is

psql dbname < infile

where infile is the file output by the pg_dump command. The database dbname will not be created by this command, so you must create it yourself from template0 before executing psql (e.g., with createdb -T template0 dbname). psql supports options similar to pg_dump for specifying the database server to connect to and the user name to use. See the psql reference page for more information.

Before restoring an SQL dump, all the users who own objects or were granted permissions on objects in the dumped database must already exist. If they do not, the restore will fail to recreate the objects with the original ownership and/or permissions. (Sometimes this is what you want, but usually it is not.)

By default, the psql script will continue to execute after an SQL error is encountered. You might wish to run psql with the ON_ERROR_STOP variable set to alter that behavior and have psql exit with an exit status of 3 if an SQL error occurs:

psql --set ON_ERROR_STOP=on dbname < infile

Either way, you will only have a partially restored database. Alternatively, you can specify that the whole dump should be restored as a single transaction, so the restore is either fully completed or fully rolled back. This mode can be specified by passing the -1 or --single-transaction command-line options to psql. When using this mode, be aware that even a minor error can rollback a restore that has already run for many hours. However, that might still be preferable to manually cleaning up a complex database after a partially restored dump.

The ability of pg_dump and psql to write to or read from pipes makes it possible to dump a database directly from one server to another, for example:

pg_dump -h host1 dbname | psql -h host2 dbname

Important: The dumps produced by pg_dump are relative to template0. This means that any languages, procedures, etc. added via template1 will also be dumped by pg_dump. As a result, when restoring, if you are using a customized template1, you must create the empty database from template0, as in the example above.

After restoring a backup, it is wise to run ANALYZE on each database so the query optimizer has useful statistics; see Section 23.1.3 and Section 23.1.5 for more information. For more advice on how to load large amounts of data into PostgreSQL efficiently, refer to Section 14.4.

24.1.2. Using pg_dumpall

pg_dump dumps only a single database at a time, and it does not dump information about roles or tablespaces (because those are cluster-wide rather than per-database). To support convenient dumping of the entire contents of a database cluster, the pg_dumpall program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as role and tablespace definitions. The basic usage of this command is:

pg_dumpall > outfile

The resulting dump can be restored with psql:

psql -f infile postgres

(Actually, you can specify any existing database name to start from, but if you are loading into an empty cluster then postgres should usually be used.) It is always necessary to have database superuser access when restoring a pg_dumpall dump, as that is required to restore the role and tablespace information. If you use tablespaces, make sure that the tablespace paths in the dump are appropriate for the new installation.

pg_dumpall works by emitting commands to re-create roles, tablespaces, and empty databases, then invoking pg_dump for each database. This means that while each database will be internally consistent, the snapshots of different databases might not be exactly in-sync.

24.1.3. Handling Large Databases

Some operating systems have maximum file size limits that cause problems when creating large pg_dump output files. Fortunately, pg_dump can write to the standard output, so you can use standard Unix tools to work around this potential problem. There are several possible methods:

Use compressed dumps. You can use your favorite compression program, for example gzip:

pg_dump dbname | gzip > filename.gz

Reload with:

gunzip -c filename.gz | psql dbname

or:

cat filename.gz | gunzip | psql dbname

Use split. The split command allows you to split the output into smaller files that are acceptable in size to the underlying file system. For example, to make chunks of 1 megabyte:

pg_dump dbname | split -b 1m - filename

Reload with:

cat filename* | psql dbname

Use pg_dump’s custom dump format. If PostgreSQL was built on a system with the zlib compression library installed, the custom dump format will compress data as it writes it to the output file. This will produce dump file sizes similar to using gzip, but it has the added advantage that tables can be restored selectively. The following command dumps a database using the custom dump format:

pg_dump -Fc dbname > filename

A custom-format dump is not a script for psql, but instead must be restored with pg_restore, for example:

pg_restore -d dbname filename

See the pg_dump and pg_restore reference pages for details.

For very large databases, you might need to combine split with one of the other two approaches.

psql -d myDataBase -a -f myInsertFile

Have three choices to supply a password:

  1. set the PGPASSWORD environment variable. For details see the manual: http://www.postgresql.org/docs/current/static/libpq-envars.html
  2. use a .pgpass file to store the password. For details see the manual: http://www.postgresql.org/docs/current/static/libpq-pgpass.html
  3. use “trust authentication” for that specific user: http://www.postgresql.org/docs/current/static/auth-methods.html#AUTH-TRUST

DbVisualizer is a Java Swing app which can generate relation graphs from any JDBC source (including PostgreSQL). I’ve found the best way to view the generated graph (after you have found a layout you like) is to print it to PDF and use Preview.app to view the result. The built-in view is somewhat lacking.

There are also a few Graphviz options around including AutoDoc (example output). This may be a better option if you are automating the documentation generation. With a bit of work you can style the output quite a bit.

double-click on the DB/schema in the navigator pane and then select the “References” tab. If you select the whole DB it’ll give you all of the system tables too — you can filter out tables that you don’t want by selecting the “Specified Tables” option.

You can also try out SQL*Power Architect

To download it without the registration go directly to the project page on google code:

http://code.google.com/p/power-architect/

psql

Name

psql —  PostgreSQL interactive terminal

Synopsis

psql [option…] [dbname [username]]

Description

psql is a terminal-based front-end to PostgreSQL. It enables you to type in queries interactively, issue them to PostgreSQL, and see the query results. Alternatively, input can be from a file. In addition, it provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.

Options

-a
--echo-all
Print all input lines to standard output as they are read. This is more useful for script processing than interactive mode. This is equivalent to setting the variable ECHO to all.

-A
--no-align
Switches to unaligned output mode. (The default output mode is otherwise aligned.)

-c command
--command command
Specifies that psql is to execute one command string, command, and then exit. This is useful in shell scripts. Start-up files (psqlrc and ~/.psqlrc) are ignored with this option.

command must be either a command string that is completely parsable by the server (i.e., it contains no psql-specific features), or a single backslash command. Thus you cannot mix SQL and psql meta-commands with this option. To achieve that, you could pipe the string into psql, like this: echo 'x \ SELECT * FROM foo;' | psql. (\ is the separator meta-command.)

If the command string contains multiple SQL commands, they are processed in a single transaction, unless there are explicit BEGIN/COMMIT commands included in the string to divide it into multiple transactions. This is different from the behavior when the same string is fed to psql’s standard input.

-d dbname
--dbname dbname
Specifies the name of the database to connect to. This is equivalent to specifying dbname as the first non-option argument on the command line.

If this parameter contains an = sign, it is treated as a conninfo string. See Section 31.1 for more information.

-e

JSON

JSON (/ˈsɒn/ JAY-sawn, /ˈsən/ JAY-sun), or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, it is language-independent, with parsers available for many languages.

The JSON format was originally specified by Douglas Crockford, and is described in RFC 4627. The official Internet media type for JSON is application/json. The JSON filename extension is .json.

The JSON format is often used for serializing and transmitting structured data over a network connection. It is used primarily to transmit data between a server and web application, serving as an alternative to XML.

JSON’s basic types are:

  • Number (double precision floating-point format in JavaScript, generally depends on implementation)
  • String (double-quoted Unicode, with backslash escaping)
  • Boolean (true or false)
  • Array (an ordered sequence of values, comma-separated and enclosed in square brackets; the values do not need to be of the same type)
  • Object (an unordered collection of key:value pairs with the ‘:’ character separating the key and the value, comma-separated and enclosed in curly braces; the keys must be strings and should be distinct from each other)
  • null (empty)

Non-significant white space may be added freely around the “structural characters” (i.e. brackets “{ } [ ]”, colons “:” and commas “,”).

The following example shows the JSON representation of an object that describes a person. The object has string fields for first name and last name, a number field for age, an object representing the person’s address and an array of phone number objects.

{
    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": 10021
    },
    "phoneNumbers": [
        {
            "type": "home",
            "number": "212 555-1234"
        },
        {
            "type": "fax",
            "number": "646 555-4567"
        }
    ]
}

One potential pitfall of the free-form nature of JSON comes from the ability to write numbers as either numeric literals or quoted strings. For example, ZIP Codes in the northeastern U.S. begin with zeroes (for example, 07728 for Freehold, New Jersey). If written with quotes by one programmer but not by another, the leading zero could be dropped when exchanged between systems, when searched for within the same system, or when printed. In addition, postal codes in the U.S. are numbers but other countries use letters as well. This is a type of problem that the use of a JSON Schema (see below) is intended to reduce.

Since JSON is almost a subset of JavaScript, it is possible, but not recommended,[7] to parse most JSON text into an object by invoking JavaScript’s eval() function. For example, if the above JSON data is contained within a JavaScript string variable contact, one could use it to create the JavaScript object p as follows:

 var p = eval("(" + contact + ")");

The contact variable must be wrapped in parentheses to avoid an ambiguity in JavaScript’s syntax.[8]

The recommended way, however, is to use a JSON parser. Unless a client absolutely trusts the source of the text, or must parse and accept text that is not strictly JSON-compliant, one should avoid eval(). A correctly implemented JSON parser accepts only valid JSON, preventing potentially malicious code from being executed inadvertently.

 var p = JSON.parse(contact);

Browsers, such as Firefox 4 and Internet Explorer 8, include special features for parsing JSON. As native browser support is more efficient and secure than eval(), native JSON support is included in the recently-released Edition 5 of the ECMAScript standard.[9]

jQuery library wrap JSON object in function constructor and execute it immediately if JSON.parse is not present. This avoid using eval in the code.

 var p = new Function('return ' + contact ';')();

Despite the widespread belief that JSON is a JavaScript subset, this is not the case. Specifically, JSON allows the Unicode line terminators U+2028 line separator and U+2029 paragraph separator to appear unescaped in quoted strings, while JavaScript does not.[10] This is a consequence of JSON disallowing only “control characters”. This subtlety is important when generating JSONP.