Most multimedia objects are spatio-temporal simulacrums of the real world. This supports our view that the next grand challenge for our community will be understanding and formally modeling the flow of life around us, over many modalities and scales. As technology advances, the nature of these simulacrums will evolve as well, becoming more detailed and revealing to us more information concerning the nature of reality.
Currently, IoT is the state-of-the-art organizational approach to construct complex representations of the flow of life around us. Various, perhaps pervasive, sensors, working collectively, will broadcast to us representations of real events in real time. It will be our task to continuously extract the semantics of these representations and possibly react to them by injecting some response actions into the mix to ensure some desired outcome.
Pragmatics studies context and how it affects meaning, and context is usually culturally, socially, and historically based. For example, pragmatics would encompass the speaker’s intent, body language, and penchant for sarcasm, as well as other signs, usually culturally based, such as the speaker’s type of clothing, which could influence a statement’s meaning. Generic signal/sensor-based retrieval should also use syntactical, semantic, and pragmatics-based approaches. If we are to understand and model the flow of life around us, this will be a necessity.
Our community has successfully developed various approaches to decode the syntax and semantics of these artifacts. The development of techniques that use contextual information is in its infancy, however. With the expansion of the data horizon, through the ever-increasing use of metadata, we can certainly put all media on more equal footing.
The NLP community has its own set of approaches in semantics and pragmatics. Natural language is certainly an excellent exemplar of multimedia, and the use of audio and text features has played a part in the development of our field.
However, if we are to develop more unified approaches to modeling the flow of life around us, both of our communities can certainly benefit by examining in detail what the other can offer. Many approaches are the same, but many are different. Certainly, the research in many areas, such as word2vec, from the NLP community can have a positive benefit to the multimedia community.
Now is the perfect time to actively promote this cross-fertilization of our ideas to solve some very hard and important problems.