March 28-30, 2019  ·  San Jose, California

Want to visit MIPR'19 Website ?

Click on the green button !


In the early days of our community when the idea of intelligently managing visual information was becoming quite exciting, text was downplayed, and audio was largely ignored as being in the purview of speech processing, an already established field of study. We strongly believe that now is the time to continue to formalize the more general field of study of what we call sensor-based data management, and to bring the multimedia community and the natural language processing closer together.

Most multimedia objects are spatio-temporal simulacrums of the real world. This supports our view that the next grand challenge for our community will be understanding and formally modeling the flow of life around us, over many modalities and scales. As technology advances, the nature of these simulacrums will evolve as well, becoming more detailed and revealing to us more information concerning the nature of reality.

Linguistics is commonly represented by a hierarchical arrangement of layers, each layer depending on the previous layers. These layers are phonetics, phonology, morphology, syntax, semantics, and pragmatics. The multimedia community is well represented in the first five areas. However, work in pragmatics is in its infancy.

Pragmatics studies context and how it affects meaning. Context is sometimes culturally, socially, and historically based. For example, pragmatics would encompass the speaker’s intent, body language, and penchant for sarcasm, as well as other signs, usually culturally based, such as the speaker’s type of clothing, which could influence a statement’s meaning. Generic signal/sensor-based retrieval should also use syntactical, semantic, and pragmatics-based approaches. If we are to understand and model the flow of life around us, this will be a necessity.

Our community has successfully developed various approaches to decode the syntax and semantics of these artifacts, or at least the dominant semantics, as image snippets (bags of visual words) are more polysemous than text. The application of context in its various forms, is not so well developed, however.

The NLP community has its own set of approaches in semantics and pragmatics. Natural language is certainly an excellent exemplar of multimedia, and the use of audio and text features has played a part in the development of our field.

However, if we are to develop more unified approaches to modeling the flow of life around us, both of our communities can certainly benefit by examining in detail what the other can offer. Many approaches are similar, but many arise in one community before percolating to the other. For example, the NLP research in word embeddings should have a positive benefit to the multimedia community.

After a successful first workshop in Miami, we intend to continue the tradition with the second workshop. Now is the perfect time to continue to actively promote this cross-fertilization of our ideas to solve some very hard and important problems.