Work
| Research
| RDF
| First step
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The simplest and most intuitive approach to authoring semantic content that occurs to me is to manage resources through a directory style view, as one does bookmarks or email. Rather than navigating solely by file system directories and files, however, the directories in this case would be things like RDF classes and properties and the files would be resources.
When a new resource is added, a url is entered as its unique identifier and a label (or labels, in multilingual mode, using xml:lang) is entered, as with bookmarks, for ease of human readability. (Later, when the manager is more advanced, RDF classes will be offered for use as templates for the new resource, and there will be an option to mark the new resource as a property.) Subsequently, statements may be made about the resource by adding properties to it. Known RDF properties (either general, such as the dublin core, or class specific, if the resource has a class) are offered by their labels (with full URL on mouseover) and sorted by namespace for selection, or a new one can be created. Once a property is selected, known resources are offered or a new resource or literal may be entered as its value. Thus a full statement can be entered into the manager.
The manager will offer three other features of primary interest:
RDF Schema and Instance Storage, Querying, and Versioning
RDF Triples are stored in a set of three tables. This design was based on suggestions found in the work of Sergey Melnik, the mailing list of the W3C's RDF Interest Group, R.V. Guha, Matt Biddulph, Libby Miller and Dan Brickley, who have all been investigating how to store and query triples against standard RDBMSs.
The current design is implemented in postgresql thusly:
However, there is nothing PostgreSQL-centric about the design. It is ready to be instantiated in any other RDBMS easily, and can be interfaced with using plain SQL. Other query languages like SquishQL, RQL, RDQL, and RDFQL and others in the rdfdb-style can presumably interface to this design with minimal design changes.
URIs in this context shall be considered to be valid according to rfc2396. URI recognition and validation algorithms will be based on this assumption.
Resource Types
Resource Types should be defined through a Schema Manager which authors RDFS through a familiar folders, files, and forms tool. Each resource type could have an index page (which describes the type and browses all of the resources of that type), and a management page (where the schema can be edited). These pages should be standard locations both for users and other programs.
Each resource type should have independant privileging per-group and per-user. Custom privileges could also be modelled for various management areas (see privileging and customization).
New Resource Types
Bookmarks
Contacts: Anyone can grab or feed another user's public (or privileged) info as a contact. Annotating other users with creator-marked metadata is encouraged. Anyone can created a new, non-active user as a contact (how is this reconciled with an active user with the same email address?).
FAQ: FAQs could have Question and Answer pairs which publish into a CSSable template just like blog posts do. They could be categorized (even faceted).
Polls: Anonymous voting. Standard reports with graphs, percentile, and full count, sortable, filterable and searchable. Includeable and syndicated at standard URLs. Link in feed.
Quotes: Semantic author linking. Offer bibliographic references; ISBN, link to line or chapter on Project Gutenberg or equivalent if available. Faceted/thesaurus/semantic directory management.
News Page: Serialized articles and stories publish to CSSable template, blog-style. Sharable, faceted topics and sections for categorization. Could also receive aggregation mixed in. Specialized workFlow.
Time, Address, and Name Format templates. Blog-style templates.
Notification templates.
Resource Negotiation
In his Design Issues document about Generic Resources, Tim Berners-Lee defines a "resource" thusly:
A "resource" is a conceptual entity (a little like a Platonic ideal). When represented electronically, a resource may be of the kind which corresponds to only one posisble bit stream representation. An example is the text version of an Internet RFC. That never changes. It will always have the same checksum.
He goes on to give a suggestion about using RDF to model these relationships.
The main trick in this area is generally called Content Negotiation. A more advanced version of this has been proposed by rfc2295 Transparent Content Negotiation in HTTP (TCN) and rfc2296 HTTP Remote Variant Selection Algorithm -- RVSA/1.0.
Standard variance techniques are to be supported, where applicable. The .var type map file/Multiviews approach, using the http headers URI, Content-Type, Content-Language, Content-Encoding, Content-Length, and Description has yet to take into account time- and semantic-variants. While TCN's new catch-all header Accept-Features (more accurate: Feature-Set-Info) seems prepared to handle these (new?) variant types, even it is limited to static content negotiation.
As more and more dimensions of content variation emerge, it becomes increasingly unwieldy to maintain vast numbers of variants and their metadata in the filesystem. Dynamic content, where the varying content itself (and metadata) is either stored in an encapsulating XML file, a database, or even contextually generated by server scripts, allows greater flexibility of both management and request-time transformation. This necessitates Dynamic Content Negotiation, server-scripted variant selection algorithms that respond not only according to client Accept-* headers, but also to ReSTful (querystring, form, and cookie) parameters to extend the means of negotiation.
New File/Database Resource Manager
The listing for a given directory will now come out of the metadata statements made about resources in that directory, rather than directly from a scan of the directory in the file system. The metadata about file resources will be kept up to date through a combination of scanning the directory upon load, update upon check-in/check-out, and other maintenance operations such as import, backup, and syndication. It is vital that the user be presented with a seamless view of the namespace as it covers both file system resources and database resources.
Whenever a new directory or file is discovered (new in this case meaning a resource with no statements yet recorded in the database), the application will create a baseline set of statements that describe the resource, all sharing the url of the resource as subject, and setting values for properties such as name, type, creation and last modified dates, etc. This RDF instance will be used as the basis for further management statements to allow the user to assign settings, set up privileged relationships, serialize, syndicate, and reuse the resource.
The manager will also be the tool used to create database resources, where an RDF instance stored as triples in the database includes not only the metadata describing a resource, but also the actual content of the resource as a literal. Even file resources may have their content duplicated in the database, so as to be searchable using high-speed SQL queries.
Resources should at least be kept both in the file system and in the db, if not only in the db for the sake of full-text searching via SQL. If the resource is only in the file system, it cannot be queried.
Some display techniques picked up from CVS:
Moving files (and directories) singly or as a bulk operation is very important. The Shop module has the idea of a 'pallet' which you put items into and then move or alias them into other categories. Likewise the Pages module could have a 'binder' (or some more intuitive metaphor), an erzatz clipboard that allows the user to 'take' a group of pages and/or directories, navigate to another directory and either move or copy (or alias) them to that location.
Other interesting things to see about a resource are size (in k or in bytes) and number of lines (for at a glance change detection). Also users who have edited it, last modified date, creation date, etc.
The user's current privilege should be displayed for each resource (even the icon/entry could reflect restricted access by being greyed-out or similar).
A style option for this and all manager views could be alternating row colors for readability. This could be implemented in the background as CSS classes. They could be styled identically for the current look, or differently for the alternating look.
Resource Locator
The technique for locating database resources picks up where normal Content Negotiation leaves off. By inserting a custom handler for the HTTP status code 404 File Not Found, in this case a server-side script called ResourceLocator.html, the application is able to take the requested url and retrieve any statements made about it, such as what versions are available. From there, it can compare against the usual content negotiation headers and any additional ReSTful restrictions and return the appropriate variant resource, which may originate in the RDF itself, elsewhere in the filesystem, or elsewhere on the net.
If no statements are found about the requested resource, the name can be analyzed for typos and casing (see mod_speling) and for otherwise similar names both in the db and the fs.
And ultimately, if no resource matches by any strech of the imachination, the application can present a management page that offers to create the requested resource, asking for appropriate title, description, and other metadata. If no content is given, the statements can be recorded about the non-existant resource (or empty node?). If the user is a guest (that is, has not logged in as a registered user), or has logged in, but hasn't the privilege to allow them to create the resource (or metadata), the submission is marked for moderation and is not published by the site unless approved by someone with appropriate privilege. Of course, the submission will be immediately available to the submitting user at least as a page in a "scrapbook" style view of recent activity.
Every resource, when browsed, should offer (by author/owner preference/privilege, of course) all its metadata, probably discreetly in the footer, with links to the appropriate variant resources along each dimension, as well as authorship, interesting dates, commentary, etc.
Resource Structuring
Some structuring could take place inside of each resource's content. If there are anchors assigned to various elements, those anchors could be recognized and leveraged by server-side analysis. Statements could be made about these fragments. If CSS usage of ids conflicts, perhaps a new sky:Ref or sky:Anc or just sky:Id attribute could be introduced. Standard structures that could be generated as part of a view of such a resource include:
These structures could be offered as feeds in rss or opml, to give and abstract view of a the resource's contents.
All public metadata [c|sh]ould also be generated into the resource's head as meta tags.
Resource Settings
Auto-destruct: any resource could be scheduleable for auto-deletion as of a date/time.
Request/Publication restrictions: Requests could be rejected from anywhere off-site, specific IP groups or domains, or even specific Template Resources ("I won't let that rag publish my article"). Conversely, A resource that is aggreagating or otherwise accepting submissions from the public could reject anything from a specific address or even individual authors.
Tag Restriction: Any resource (or metadata field of its Type) could define tag restrictions to be validated against upon data submission. This is just a list of tags not to be allowed, in case there should be a reason not to.
Tentative Resources
Allow 'submissions'; tentative edits to any resource, marked as tentative and not-to-be-published, for the admin/owner to edit, schedule, annotate, merge, validate, or reject as is seen fit. Tentative resources could also be created where no prior resource exists, to fall under the domain of the directory owner. There could be notification settings assoc with submission arrival. Site and directory settings could also only allow submissions from particular groups or users, or not at all.
Resource Management
There probably ought to be a flag (or workFlow setting) that determines whether a given resource will be published upon request or not, like enable/disable in Shop (perhaps a preference could cause workFlow to inhibit pulishing unless page status is Publish).
Perhaps Fragment Resources could have a setting to keep them from serving into inappropriate contexts. Either they could only publish to Template Resource requests, or restrict based on user agent ip/domain.
Metadata Management
Every resource should have a title, a description, and an abstract. Full Dublin Core, whenever possible.
Resources, esp. Page Series and Pages, should be categorizable from their Metadata Management pages.
Time Variance
The following references have been invaluable in standardizing the storage, comparison, and display of date/time data:
ISO 8601 Date/Time Representations GNU Date Input Formats W3C XML Schema Date Time Format W3C XML Schema Duration Format PHP strtotime function Also see: ISO 8601 Description Internet Calendaring and Scheduling Core Object Specification (iCalendar)
A resource could be scheduled according to a rule rather than at a concrete date/time (moment?). For instance, a variant could be scheduled for any monday while another variant could be scheduled for any tuesday, etc.
Language Variance
Include a Content-Language header in each Language Variant page as it is created.
A full locale code list as found in any browser is required, so that variants for any locale may be constructed, indepedently of whether a string set has been created for each of them. This may require a separate structure from Locales, depending on how AdminLabels handles the creation of new labels.
Representation Variance
Semantic Variance
Directory Resources
Directories must be treated as resources, with standard metadata kept such as title and description. Also vital for each directory is per-group and per-user privileging.
skyWriter
A keystroke like control-s should be fielded by javascript to execute the Save command, just like clicking the link.
Inclusion/Reuse of Resources
One resource can be included in another using what we call a hyperinc.
When managing a Template Resource (a resource that includes other resources via sky:Inc), the included resources should be likewise manageable either via bulk operations or links to the same management interface for the included resources (especially skyWriter should link to skywriting each included resource). Conversely, a Fragment Resource (a resource so included) should provide the same operations and management links (again, esp. skyWriter). Keeping track of the contexts in which a Fragment Resource is being used is key. Forking the content of the resource is tantamount to moving a resource in complexity, as some of the references to the original resource will have to change to point at the forked version. But offering an index to the including Template Resources makes the job a little easier. There is a question here about whether this kind of forking can happen within a Series, as a semantic mark between variants, or whether a new Series is always required.
Blog posts, as viewed by anyone aggregating them or being notified about them (say, an administrator), could be reposted (riposted?) to another blog at a click. The repost interface could offer to crosspost, add content, post as text (quote) for annotation or post as include (hyperinc, sky:Inc) for reuse.
Search and Replace
When browsing a resource, a special search mode could be made available that reloads the page, highlighting the search matches, offering an index to results at the top (each could be dynamically anchored with a result index), and a prev and next link could be appended before and after each result (probably a tiny arrow gif).
Thesaurus-like alternates should be offered via a see also section, including preferred terms, broader terms, narrower terms, alternate terms, related terms, opposite terms, etc.
Regular expressions should also be supported.
Users search in particular should expand to make them searchable by any piece of metadata available, including url, registration, etc. These searches should not be made case sensitive unless the user requests that mode. Reports should offer sorting/filtering by latest logins/registrations. Results and reports could also display whether the user is online.
Results for all resource types should be filterable and sortable by most recent activity, popularity (hit count or rating), creation date/modification date, etc. Each resource type could offer a contextual faceted directory view and faceted thesaurus matching for criteria for improved searching.
workFlow
workFlow is a system of resources that allow management processes to be modelled and tracked. Flow charts can be generated, even published and imported in a variety of XML formats for interchange with other tools.
It's also very important for other resource type to be integrated with workFlow. Pages is a good example, where only pages that are promoted to publish should be visible. The actual operation of other resources should depend on their workFlow state. Conversely, workFlow Tasks, Steps, and Phases should be able to be contingent on the filling out of a Form, the confirmation of a Payment in the Shop, or other resource actions.
Scheduled Operations
A cron setting calls a script like pulse.html every five minutes, which in turn checks the database for any scheduled operations that need to be performed.
Reminders, subscriptions, newsletters, notifications, upgrades/syncs, backups, aggregation, deployment, etc.
Installation should install the cron job. A regularly visited page such as the login page should (offline from the user) check whether the job is still in place and reestablish it if necessary. Here should be added how to check it and how to set it.
Notification
For series edits/accesses/changeovers, a possible form design follows.
I am interested in being notified when:
Notifications need management, privileging, logging, etc. just like any other scheduled server action.
Different types of notification may need templating, for boilerplate insertion, styling, etc.
These templates could be a resource type.
skyMail
Offer attachments to be managed as related resources.
Offer multiple selection and bulk operations such as move and delete.
Persistent Archive
The persistent archive should offer a toggle for chrono sorting asc or desc, sort by author, by title, and a toggle for descriptions.
Personal Info
There could be a preference could either conceal email from the interface altogether (though allow mail to be sent to that user by that address without showing it), or display it in a spam-safe way like "dtd@skybuilders.com" would appear "dtd at skybuilders dot com", a trivial transform.
Other interesting personal info that could be kept:
Personal skyPage
The personal skyPage should be a serialized Template Resource that includes various feed resources. It should be autogenerated as part of the user's home directory. The key resources to include are personal info feeds such as name, email, phones, addresses, settings, and privs, each with a link to edit the info displayed. Other useful resources are activity reports like recently edited (see activity log) pages, events, tasks, email, latest comments made by the user, latest comments on user's resources, etc. All of these feeds should be customizeable (via flat HTML stylable with CSS or DHTML by preference) and link through to managing the displayed info. They should also be available to be included in any resource.
The same page could be available to other users, even guests, restricting itself to info the user has marked as public (see privacy control).
Community/Site Metadata
Site slogan.
Formats. Email errors to support. Messages, ads, or copyright info to appear in footer. These could be channeled from any resource/feed. User Registration
Upon submission, a user's name must be analyzed. If the surname is missing, split given name on space if any, or use given as surname.
Explore everywhere names are inserted or updated.
Standard API
Resources should have a standard API for editing, managing, and commenting.
Likewise for privileging.
Privileging
Privileges can be module wide, based on a type of resource, on a series of resources, or on an individual resource.
Some privileges could be marked as dependant; the privilege for one user would be contingent on the privilege of the user who assigned it. The assignment of privileges should be logged and auditable. It could have notifications associated with it, etc. The report could be viewed on a timeline.
UI
The entire interface should be generated as XHTML with full CSS and DHTML.
An optional widget could feed the list of users currently online (which should be available as an includable resource and syndicated).
An optional widget could feed the list of 10 latest pages to go live, or the 10 about to go live (which should be available as an includable resource and syndicated), as a sanity check for publishers (restrictable to the resources you own, are associated with, or all).
Coding Practices
New table names are needed for storing XML nodes and RDF triples.
Errors could be caught and emailed to support, based on a site setting.
Specific Fixes
Change Password should have links back to other personal info pages (and User Manager, when applicable).
Activity Log
All activity (every insert or update) should be logged against the resource being modified. This provides an audit trail, reversion history, statistics, etc. Actions could be modelled as properties. Activity reports (avail. as rss/xml/rdf feeds and as sharable, configurable, includable resources like blocks) should be filterable/sortable by resource type, action, user, date, etc.
Referer Log
Track referers per resource. Offer referer management from resource management page. Content could offer separate versions per referer (referer become a semantic dimension). Footer should link to referer list, list should be syndicated and includable.
Statistics
Stats may be displayed as graphs. Measurements may be given both as percentile and as total number. Most are based on resource hit count. Interesting statistics include:
Customization
Form element font size and border are stylable using CSS.
Each user could override any of the standard icons or images as part of a custom theme.
Server pages could have CSS pages assigned as settings, overrideable per-group or per-user.
Themes? CSS groups? Formats?
Formats/Internationalization/I18n
Date/Time, Address, Phone, and Name Formats. Default, site-wide, per-group, and per-user settings. Theme-based? By locale? By dialect?
Setting could dictate what template is used. Template could be blog-style, with skyTags for positioning of known data elements. Templates could be a new resource type.
A full language list for resource localization with flags for label support and babelfish/translation support.
Rankings and Ratings
Resources could be rated semantically; given a score based on how accurate they are according an arbitrary semantic (useful, informative, funny, on-topic, etc.).
New Tools
A skyLink (or new term) ought to be a new way of linking to a resource that checks for the real resource at the given url, returns it if available, returns a (well-identified) cached version if not.
A skyImage ought to be a new standard resource type that offers a link to variants by resolution, color space, and subject (other pictures of the same or similar things, for instance).
Publish a snapshot of a directory structure to a deployment server at the click of a button as static files.
A view source link could appear on every server page and resource. In fact, this could be a set of links: one to the client-side, generated html source (which may have to display a choice of link, one for the called page, and one for each include) and one to the server-side, php generator source (which in turn may have to display a choice of links, one for the actual script called, and one each for any includes). There should be a standard API for getting a server page/resource to return as unrendered text source rather than html (a mime type setting, perhaps), and as bare script source rather than preprocessed. Note that apparently (according to Jon Udell) view-source: works as a protocol in the browser. Would this also work in a link? Apparently not.
A simple, non-javascript based date input wouldn't have day names or time zone selection or dynamic dates per month (the features that require scripting). Would this lightweight version be preferable anywhere?
Memes
Metalog
Metapad skyLog skyPad |