The OPI design was discovered by applying research and theory from the fields of psychology, linguistics, computer science and mathematics to a careful analysis of privacy notices. The resulting ontology has thus been validated across 100 privacy notices and over 4,000 information type phrases. Below, we review the OPI design and review the theoretical foundation that was used to realize this design.

Concepts in the OPI are organized into two broad categories:

Concepts of Physical Things: this category includes sub-categories for entities, events, actions, and agents, such as webpages, site visits, and users Concepts of Information Types: this category is divided into two sub-categories:
  • basic categories, or types of information named by privacy notice authors (e.g., user information), and
  • ad hoc categories that describe information about physical things (e.g., information about user).

Basic Level Categories

Basic level categories are "information-rich bundles" of things perceived by humans that are grouped by shared "perceptual and functional attributes that form natural discontinuities" [1]. By "discontinuities" we mean these attributes are prominent among the members of the group, and thus distinguish this group from other groups. The basic level category for furniture, for example, distinguishes the attributes of chairs, tables and desks from the category of vehicles.

In the OPI, all concepts of physical things and some concepts of information types are organized into basic level categories. We primarily rely on policy authors to identify basic level categories. Many policies refer to the user, which serves as a basic level category that includes account user and visiting user as sub-categories. In addition, policies refer to user information as a category of information about various types of users. User information may include other information types, such as a user's name, alias, e-mail address, and so on.

Ad Hoc Categories

Ad hoc categories are created by people for a specific purpose or goal and, unlike basic level categories, members of an ad hoc category are not grouped by shared attributes. The ad hoc category "things to sell at a garage sale" organizes its membership around the purpose "to sell at a garage sale." Ad hoc categories further differ from basic level categories in that they do not exhibit the same graded structure, and do not have the same well established representation in memory [2].

In the OPI, some concepts of information types are ad hoc categories. These categories are described as "information about [thing]" where [thing] is a concept of a physical thing. When a synonymous basic level category exists for an ad hoc category in the concept of information types, then these two categories will be equivalent. For example, the ad hoc category “information about a user” is equivalent to user information and also describes a user.

Ontology Construction

The OPI was constructed using a multistep process that began with crowdworkers annotating information types [3, 4, 5] in over 100 privacy notices. The policies were selected to represent over 20 different domains and service types, including companies that provide frontend and backend services, and that have both online and brick-and-mortar stores. The domains covered include health, finance, shopping, social media, telecommunications, and travel, among others.

Inferred Types

In addition to information types found in privacy policies, information types may be inferred using one of three methodologies:




Carnegie Mellon University | Privacy Policy