escience graph

Enabling Data Transport between Web Services

Despite numerous benefits, many Web Services (WS) face problems with respect to data transport, either because SOAP doesn’t offer a scalable way of transporting large data-sets or because orchestration workflows (WF) don’t move data around efficiently. In this paper we address both problems with the development of the ProxyWS. This is a WS utilizing protocols offered by the Virtual Resource System (VRS), to enable other WS to transfer and access large datasets without modifying WS nor the underlying environment.

There is currently an abundance of deployed (legacy) WS using SOAP, which fail to produce access and return large datasets. Moreover, orchestration WF causes WS to pass messages containing data back through the WF engine. To address these problems we introduce the ProxyWS: a WS that is able to access data from remote resources (GridFTP, LFC, etc.), thanks to the VRS, and also transport larger data produced by WS, both legacy and new. For the ProxyWS to be able to provide larger data transfers to legacy WS, it has to be deployed on the same Axis-based container, just like a normal WS. This enables clients to make proxy calls to the ProxyWS instead of a legacy WS. As a consequence the ProxyWS returns a SOAP message containing a URI referring to the data location. For new implementations the ProxyWS is used as an API that can create data streams from remote data resources and other WS using the ProxyWS. This approach proved to be the most scalable since WS can process data as they are generated from producing WS. Thus with the introduction of the ProxyWS we are able to provide a separate channel for data transfers, that allows for more scalable SOA-based applications.

Many different approaches have been introduced in an attempt to address the problems mentioned earlier. Examples of these include Styx Grid Services, Data Proxy Web services for Taverna and Flex-SwA. Some noteworthy features of these approaches are: Direct streaming between WS, Usage of alternative protocols for data transports, and larger data delivery to legacy WS. However, each of these examples only addresses one part of the problem and, furthermore, do not include any means of allowing access to remote data resources. Leveraging these existing proposals and combining them with the VRS we implemented a ProxyWS. To validate it, we have tested its performance using 2 data-intensive WF. The first is a distributed indexing application that uses a set of WS to speedup the indexing of a large set of documents, while the second relies on the creation of that index for retrieving and recognizing protein names contained in results coming from a query. With the use of the ProxyWS we are able to retrieve data from remote locations (8.4 GB of documents for indexing), as well as to obtain more results relative to a query (8300 documents using the ProxyWS versus 1100 using SOAP).

We have presented the ProxyWS, which may be used to support large data transfers for legacy and new WS. We have verified its performance to deliver large datasets on two real-life tasks: Indexing using WS in a distributed environment and annotating documents from an index. From our experiments we have found that ProxyWS is able to facilitate data transports where normal SOAP messages would have failed. We have also demonstrated that with the use of the ProxyWS legacy WS can scale further, by avoiding data delivery via SOAP and by delivering data directly from the producing to the consuming WS.

  • [PDF] S. Koulouzis, E. Meij, and A. Belloum, “Enabling large data transfers between web services,” in 5th egee user forum, 2010.
    [Bibtex]
    @inproceedings{EGEE:2010:koulouzis,
    Author = {Koulouzis, S. and Meij, E. and Belloum, A.},
    Booktitle = {5th EGEE User Forum},
    Date-Added = {2011-10-20 10:00:08 +0200},
    Date-Modified = {2011-10-20 10:00:08 +0200},
    Title = {Enabling Large Data Transfers Between Web Services},
    Year = {2010}}
Annals of Information Systems

Semantic disclosure in an e-Science environment

The Virtual Laboratory for e-Science (VL-e) project serves as a backdrop for the ideas described in this chapter. VL-e is a project with academic and industrial partners where e-science has been applied to several domains of scientific research. Adaptive Information Disclosure (AID), a subprogram within VL-e, is a multi-disciplinary group that concentrates expertise in information extraction, machine learning, and Semantic Web – a powerful combination of technologies that can be used to extract and store knowledge in a Semantic Web framework. In this chapter, the authors explain what “semantic disclosure” means and how it is essential to knowledge sharing in e-Science. The authors describe several Semantic Web applications and how they were built using components of the AIDA Toolkit (AID Application Toolkit). The lessons learned and the future of e-Science are also discussed.

  • [PDF] M. S. Marshall, M. Roos, E. Meij, S. Katrenko, W. R. van Hage, and P. W. Adriaans, “Semantic disclosure in an e-science environment,” in Semantic e-science (springer annals of information systems aois), 2009.
    [Bibtex]
    @inproceedings{AIS:2009:marshall,
    Author = {Marshall, M.S. and Roos, M. and Meij, E. and Katrenko, S. and van Hage, W.R. and Adriaans, P.W.},
    Booktitle = {Semantic e-Science (Springer Annals of Information Systems AoIS)},
    Date-Added = {2011-10-16 15:03:17 +0200},
    Date-Modified = {2012-10-28 17:21:26 +0000},
    Publisher = {Springer},
    Series = {Annals of Information Systems},
    Title = {Semantic disclosure in an e-Science environment},
    Volume = {11},
    Year = {2009}}
escience graph

Enabling Data Transport between Web Services through alternative protocols and Streaming

As web services gain acceptance in the e-Science community, some of their shortcomings have begun to appear. A significant challenge is to find reliable and efficient methods to transfer large data between web services. This paper describes the problem of scalable data transport between web services, and proposes a solution: the development of a modular Server/Client library that uses SOAP as a control channel while the actual data transport is accomplished by various protocol implementation, as well as a simple API that developers can use for data-intensive applications. Apart from file transport, the proposed approach offers the facility of direct data streaming between web services, an approach that could benefit workflow execution time by creating a data pipeline between web services. Finally, the performance and usability of this library is evaluated, under the indexing application that the Adaptive Information Disclosure Application (AIDA) Toolkit offers as a Web Service.

  • [PDF] S. Koulouzis, E. Meij, M. S. Marshall, and A. Belloum, “Enabling data transport between web services through alternative protocols and streaming,” in 4th ieee international conference on e-science, 2008.
    [Bibtex]
    @inproceedings{IEEE:2008:koulouzis,
    Author = {Koulouzis, S. and Meij, E. and Marshall, M.S. and Belloum, A.},
    Booktitle = {4th IEEE International Conference on e-Science},
    Date-Added = {2011-10-16 10:35:31 +0200},
    Date-Modified = {2011-10-16 10:35:31 +0200},
    Title = {Enabling Data Transport between Web Services through alternative protocols and Streaming},
    Year = {2008}}