Data explorer


In case you do not live in New York City or you did not attend Data Gotham, do not worry because nearly all the videos and talks are posted on the Data Gotham 2013 Youtube page.

Logan Symposium: Google Public Data Explorer from Berkeley Graduate School of Journalism on

4th Annual Logan Symposium on Investigative Reporting


Uploaded on Jun 2, 2010

Complete video at:…

Using Google’s new Public Data Explorer tool, Ola Rosling demonstrates the effectiveness of visualizing datasets. Looking toward the next political election, Rosling hopes voters will use the tool to answer questions like: How was the money spent? Where are the biggest problems?


Ola Rosling of Google Public Data gives a presentation titled, “Google Public Data Explorer” at the Berkeley Graduate School of Journalism. This program was recorded on April 18, 2010.

Ola Rosling co-founded the Gapminder Foundation and led the development of Trendalyzer, a software that converts time series statistics into animated, interactive and comprehensible graphics. The aim of his work is to promote a fact-based world view through increased use and understanding of freely accessible public data.

In March 2007, Google acquired the Trendalyzer software, where Rosling and his team are now scaling up their tools and making them freely available for any individual or organization to use for analyzing and visualizing data.

Amazon Prime Air

Amazon Prime Air is a conceptual drone-based delivery system currently in development by

On December 1, 2013, CEO Jeff Bezos revealed plans for Amazon Prime Air in an interview on 60 Minutes. Amazon Prime Air will use multirotor Miniature Unmanned Air Vehicle (Miniature UAV, otherwise known as drone) technology to autonomously fly individual packages to customers’ doorsteps within 30 minutes of ordering.[1]To qualify for 30 minute delivery, the order must be less than five pounds (2.26 kg), must be small enough to fit in the cargo box that the craft will carry, and must have a delivery location within a ten-mile radius of a participating Amazon order fulfillment center.[1] 86% of packages sold by Amazon fit the weight qualification of the program.


Presently, the biggest hurdle facing Amazon Prime Air is that commercial use of UAV technology is not yet legal in the United States.[2] In the FAA Modernization and Reform Act of 2012, Congress issued the Federal Aviation Administration a deadline of September 30, 2015 to accomplish a “safe integration of civil unmanned aircraft systems into the national airspace system.”[3]

In March 2015 the US Federal Aviation Administration (FAA) granted Amazon permission to begin US testing of a prototype. The company responded by claiming that the vehicle cleared for use was obsolete. In April 2015, the agency allowed the company to begin testing its current models. In the interim, the company had begun testing at a secret Canadian site 2,000 ft (610 m) from the US border.[4]

The agency mandated that Amazon’s drones fly no higher than 400 ft (122 m), no faster than 100 mph (161 km/h), and remain within the pilot’s line of sight. These rules are consistent with a proposed set of FAA guidelines. Ultimately, Amazon hopes to operate in a slice of airspace above 200 ft (61 m) and beneath 500 ft (152 m), with 500 ft being where general aviation begins. It plans to fly drones weighing a maximum of 55 lb (25 kg) within a 10 mi (16 km) radius of its warehouses, at speeds of up to 50 mph (80.5 km/h) with packages weighing up to 5 lb (2.26 kg) in tow.[5]

Public concerns

Public concerns regarding this technology include public safety, privacy, and package security issues.[2] Amazon states that “Safety will be our top priority, and our vehicles will be built with multiple redundancies and designed to commercial aviation standards.”[6] However, while privacy and security remain concerns, the FAA’s recently proposed rules for small UAS operations and certifications only provides provisions on its technical and functional aspects.[7]

The fact that the drone’s navigational airspace exists below 500 feet is a big step toward safety management.[8]


Concerns over the constant connection of the drones to the internet raises concerns over personal privacy. The primary purpose of drone internet connection will be to manage flight controls and communication between drones.[9] However, the extent of Amazon’s data collection from the drones is unclear.[10] Some proposed data inputs include automated object detection, GPS surveillance, gigapixel cameras, and enhanced image resolution.[11] Because of this, Amazon’s operating center will collect unknown amounts of information, both intentionally and unintentionally, throughout the delivery process. Neither Amazon or the FAA has formed a clear policy on the management of this data.

Types of Information System

Why are there different types of Information System?

In the early days of computing, each time an information system was needed it was ‘tailor made’ – built as a one-off solution for a particular problem. However, it soon became apparent that many of the problems information systems set out to solve shared certain characteristics. Consequently, people attempted to try to build a single system that would solve a whole range of similar problems. However, they soon realized that in order to do this, it was first necessary to be able to define how and where the information system would be used and why it was needed. It was then that the search for a way to classify information systems accurately began.

DSPL Tools

DSPL Tools is a small suite of command-line utilities designed to help generate, organize, and validate DSPL datasets. The suite currently includes the following components:

  • DSPL Check: Checks a dataset against a variety of criteria including adherence to the official DSPL schema, consistency of internal references, and CSV layout.
  • DSPL Gen: Generates a simple, DSPL dataset “template” from an input CSV file

This software is released under a BSD license; the full source code is available for browsing and download on the DSPL open source site. Release notes are provided in the DSPL Tools README file.

DSPL Developer Guide

DSPL stands for Dataset Publishing Language. It is a representation format for both the metadata (information about the dataset, such as its name and provider, as well as the concepts it contains and displays) and actual data of datasets. Datasets described in this format can be imported into the Google Public Data Explorer, a tool that allows for rich, visual exploration of the data.

Note: To upload data to Google Public Data using the Public Data upload tool, you must have a Google Account.

This document is intended for data owners who want their content to be available in the Public Data Explorer. It goes beyond the Tutorial by diving deeper into the details of the DSPL schema and supported features. Only a basic familiarity of XML is assumed, although knowledge of relational databases is also useful.

Although not a requirement, we suggest reading through the Tutorial, which is shorter and easier to digest, before looking at this document.

An identifying relationship

An identifying relationship “describes a situation in which the existence of a row in the child table depends on a row in the parent table.”

“if a child identifies its parent, it is an identifying relationship.”

The technical definition of an identifying relationship is that a child’s foreign key is part of its primary key.

CREATE TABLE AuthoredBook (
  author_id INT NOT NULL,
  book_id INT NOT NULL,
  PRIMARY KEY (author_id, book_id),
  FOREIGN KEY (author_id) REFERENCES Authors(author_id),
  FOREIGN KEY (book_id) REFERENCES Books(book_id)

See? book_id is a foreign key, but it’s also one of the columns in the primary key. So this table has an identifying relationship with the referenced table Books. Likewise it has an identifying relationship with Authors.

A comment on a YouTube video has an identifying relationship with the respective video. The video_id should be part of the primary key of the Comments table.

  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  PRIMARY KEY (video_id, user_id, comment_dt),
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)

It may be hard to understand this because it’s such common practice these days to use only a serial surrogate key instead of a compound primary key:

  comment_id SERIAL PRIMARY KEY,
  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)

This can obscure cases where the tables have an identifying relationship.

I would not consider SSN to represent an identifying relationship. Some people exist but do not have an SSN. Other people may file to get a new SSN. So the SSN is really just an attribute, not part of the person’s primary key.

You can take a look at MySQL Manual, explaining how to add Foreign Keys on MySQL Workbench as well.

Adding Foreign Key Relationships Using an EER Diagram

The vertical toolbar on the left side of an EER Diagram has six foreign key tools:

  • one-to-one non-identifying relationship
  • one-to-many non-identifying relationship
  • one-to-one identifying relationship
  • one-to-many identifying relationship
  • many-to-many identifying relationship
  • Place a Relationship Using Existing Columns

An identifying relationship is one where the child table cannot be uniquely identified without its parent. Typically this occurs where an intermediary table is created to resolve a many-to-many relationship. In such cases, the primary key is usually a composite key made up of the primary keys from the two original tables. An identifying relationship is indicated by a solid line between the tables and a nonidentifying relationship is indicated by a broken line.

Create or drag and drop the tables that you wish to connect. Ensure that there is a primary key in the table that will be on the “one” side of the relationship. Click on the appropriate tool for the type of relationship you wish to create. If you are creating a one-to-many relationship, first click the table that is on the “many” side of the relationship, then on the table containing the referenced key. This creates a column in the table on the many side of the relationship. The default name of this column is table_name_key_name where the table name and the key name both refer to the table containing the referenced key.

When the many-to-many tool is active, double-clicking a table creates an associative table with a many-to-many relationship. For this tool to function there must be a primary key defined in the initial table.

Use the Model menu, Menu Options menu item to set a project-specific default name for the foreign key column (see Section, “The Relationship Notation Submenu”). To change the global default, see Section 6.4.5, “The Model Tab”.

To edit the properties of a foreign key, double-click anywhere on the connection line that joins the two tables. This opens the relationship editor.

Mousing over a relationship connector highlights the connector and the related keys as shown in the following figure. The film and the film_actor tables are related on the film_id field and these fields are highlighted in both tables. Since the film_id field is part of the primary key in the film_actor table, a solid line is used for the connector between the two tables.