Semantics for Dummies PDF

January 22, 2021 – 10:43 am

Beyond the PDF: when should we add semantics

On BTPDF here has a lively debate on adding semantics to scientific publications and this is a snapshot of some of my own contributions. The site (and the meeting) are – I think – open to anyone. The idea is that people will offer ideas and materials in to support the meeting.

** One discipline I think we should adopt before BTPDF is that we should all read a variety of papers in different fields. I suspect the majority of attendees will be bioscientists because they have an excellent record of knowling they need semantics and developing it. I don't think we can become polymaths and if we try to solve all disciplines simultaneously we will get nowhere. The problems of a clinical trial are completely different from string theory.

My zeroth law of semantic enhancement (not well phrased but it was late at night). Readers of this blog will know how easy it is to corrupt information

All discipline-independent syntactic problems are soluble and must be solved

By this I mean that characters must be expressed as characters with encoding (and not pixel images). All images with text should have the text machine processable. All graphs should be accomained by the raw data that created them (e.g. CSV). All numeric quantities should have units. All maths should use MathML. All chemistry should be in CML. Geolocations should use KML; maps should use polygons. All line graphics should contain scalable vectors. None of this is rocket science - it's purely a question of will. The temperature is not 278, it is 278 K.

If we do not solve the zeroth law there is little point in aiming for anything higher. Because the failure to obey the law corrupts the information irretrievably.

And some types of enhancement.

There are three main places that semantics can be created:

(a) by machines at the time of machine authoring. Where this can be done this is undoudtedly the highest quality as the machine defines the semantics and the consistency. An incresing amount of authoring is now done by simulations or instruments and there is absolutely no reason why the semantics should not be preserved. It is simply laziness to discard machine-produced annotations such as units, errors, etc.

(b) by humans. The earlier that this is done the better. the person best placed to annotate information is the person who created it, although this must be modulated with experience. Semantic information added later may involve guessing (units, errors, conditions, etc.). Annotation at time of conventional publication is almost certain to involve uncertainty. One consequence of this is that we should develop semantic notebooks before we put effort into late authoring tools. Even more of a problem is annotation introduced by technical editors in publishing houses who were not involved in the science is likely to involve errors and misconceptions.

Traffic stats

Source: blogs.ch.cam.ac.uk

Semantic Text Processing: Example Application

Semantics

DPL Week 2 - 03b Operational Semantics Example

Ontological Engineering: with examples from the areas of Knowledge Management, e-Commerce and the Semantic Web. First Edition (Advanced Information and Knowledge Processing)
Book (Springer)

Used Book in Good Condition

Semantics for Dummies PDF

Beyond the PDF: when should we add semantics

Categories:

Latest Stories

It's Interesting...

Twitter Activity

Semantics for Dummies PDF

Beyond the PDF: when should we add semantics

You might also like

Categories:

Latest Stories

It's Interesting...

Twitter Activity