Building a Context around ASKOSI...

Press the spacebar or click here to see next step...
The design of an Information System starts from its user: a person, with her/his skills, capabilities, limitations, preferences ...
Documents and other electronic resources are designed to convey a discourse toward a learner.
They are sequences of interrelated information forming a "story": the best scientific articles are read like novels.
Documents can come from many different sources (Institutions, Internet, Repositories, etc.).
Most of these sources are identified by a Community of Practice and selected as more or less trusted references.

Metadata (data about data) is therefore very important to know who produced the document, in which circumstances, for which purposes, etc.

This is an important way to make explicit the "context" of a document and to potentially relate it to something the user already knows (topics, authors, institutions, journals, places, etc.).

Community of Practice (CoP) groups humans with common aims or methods. Nowadays, a CoP is very dependent of IT tools to gather and circulate information between members.
Many participants are contributing new documents and electronic knowledge resources to the CoP. The more you learn, the more you author and vice-versa.
The two most popular Internet applications are:
  1. communicating with peers (e-mail, Facebook and other social media) and the CoP,
  2. searching for documents, for data but also for elements of information within the conversations with peers.
Searching starts from words identifying the subjects desired. In the past, Google was only proposing a very simple box where the user could type the words. Now, Google Search does not look more complicated but it also makes suggestions based on frequent word combinations found in the indexed documents.

Queries can also be targeted to specific sources (documents, wikis, videos, etc.) and/or metadata fields (author, title, subject, abstract, etc.): users may write "search equations" indicating the index to be used and the boolean combinations desired.

The searched terms are then compared with the terms appearing in the documents.
The words in the documents are extracted to create "indexes": those indexes indicates where each word appears in which documents.
For a search for a given concept:
  1. the precision of the result depends of the precision of the term within the subject domains of the system: "boat" may be precise enough for many scientific disciplines but certainly not for history, geography or oceanography.
  2. the exhaustivity (completeness) of the result involves that the user cares to list all the possible synonyms or translations for the desired concept(s). A very cumbersome operation.
Documents are for direct human consumption. There is also raw data from different operational management systems: this data also has to be in relation with a context to be understandable (meaning of the fields, meaning of the different codes within those fields, origin of the data, etc.). Metadata (context) is therefore very important for raw data too.
Data can come from internal systems or from external systems. It has often to be managed in real time. Data can be formatted and contextualized for immediate re-use by external applications (Standards of the Semantic Web: RDF, SparQL) or may need to be processed for aggregation and visualization (starting from SQL databases or CSV files for instance).
Data management and data visualization tools allow users to interpret the data (within its context), to assess new facts, to take decisions and make transactions.
When filtered, aggregated, processed and visualized, Data becomes a document.
By her/his activity with an operational application or using a data management tool, the user produces new datasets.
The W3C SKOS standard defines a RDF vocabulary to describe ConceptSchemes. For instance Substances, Symptoms, Places, Animals, Plants, etc.
The terms (appearing in documents or queried by users) can therefore be grouped and linked to the concept they represent.
A concept is a symbolic entity that exists in the mind of each user but that exists also as a "code" manageable by computers. The code of a Concept therefore represents it in all indexes and data management applications.
Documents can be retrieved by Concept. A good "Concept search" therefore ensure easy exhaustivity (completeness):
  1. Whatever the term (synonym or translation) used in a query, the concept is retrieved and then all the documents or data linked to that concept.
  2. With hierarchies going from generic concepts to specific ones, searches on a generic term (like "boat") can retrieve documents linked to specific ones (like "yacht", "canoo" or "steamliner").
An important feature of ASKOSI is to list, for each concept, which sources can provide linked documents. The RDF vocabulary VoID is designed to publish this linking information.
Concept codes have an even more important relation with data: data cannot be interpreted by humans without the terms associated with each concept.

An important feature of ASKOSI is to list, for each concept, which data sources can provide linked information. Here also, the RDF vocabulary VoID is designed to publish information about linked data.

Data is made of records, each record being made of fields, fields being coded using Concepts. Records are also often linked together: one data record (for instance a Manufacturer) can be linked to many others (Products made by this Manufacturer).
Unforeseen or new documents or data sources are happening more and more often with the gradual worldwide integration of institutions and organisations.
Terms appearing in those documents or data, needs to be discovered and integrated.
Those terms can be mapped to existing or new Concepts. The new document or data source can then be indexed and integrated within ASKOSI. The user therefore: