![]() |
|
Usability is a primary concern when designing any piece of software. But there are deeper issues in the design and architecture of systems which affect their long-term usefulness. Most casual users will not notice these issues, but they affect those who use systems in more advanced ways, and they affect the long-term viability of systems. Examples of issues include:
Recent decades have seen a number of important ideas developing in the disciplines around computer science, ranging from the Object Paradigm, through portable languages such as Java, to component designs and web services over the internet. Alongside have been proposals for design methodologies, leading to the Unified Modelling Language (UML).
How have our current statistical and survey systems taken up these ideas, and how well does their architecture support various requirements for flexibility and extensibility?
Algorithmic extension: The objective here is to extend the base functionality (command language) of a system by adding new operations (commands or functions). Some systems have built-in programming languages that support the idea of procedures. S/R, SAS and Stata are examples of this. Others systems are built on top of a more general purpose programming language, which is accessible to competent users - the MS Access database, and the SPSS Dimensions Data Model are examples of this, both built using (close relatives of) Visual Basic. Another approach is to provide a mechanism to call procedures external to the system. Here the emphasis is on the interface used, not the programming language of the external component. S/R supports this approach through the library concept.
Interface extension: Where a system uses a graphical user interface (generally in the form of menus and click functionality) integrating new features is more complex. Some systems are designed to support plug-in components that extend the interface, but this functionality is not common in statistical systems.
Interface construction: In some contexts a generic user interface is not sufficient, because the users are more concerned with their application domain than with statistical functionality as such. An obvious example is in the area of dissemination of statistical results. Then a set of statistical components is needed that can be brought together is ways that form an appropriate application. This is usually done through a web browsing interface, using templates (as in a standard web content management system) and components that are specific to statistical use. The statistical dissemination system Nesstar is an example of a system that specifically supports this style of use. The R system can also be used as a server to perform statistical calculations or display statistical results in response to commands from some other application. The general idea of Web Services covers the standardised approach to the remote invocation of specialised processing capabilities and the return of the results (generally using XML messages within the SOAP protocol).
Extensible data structures: XML is a language for holding complex data structures, and it is supported by a wealth of tools for exchange, manipulation and transformation. A number of applications have been built using XML as their basic data structure, such as the survey questionnaire system QEDML. XML is essentially extensible, so that additional components can easily be inserted, but how are the semantics of these additions communicated to and used by the processing applications?
System upgrading: The environments available to us for the construction and implementation of systems have changed radically over the last three decades, as have the conceptual models that we use to think about computing processes. We can expect continuing change in the future. Is it possible to design systems now so that they are readily adapted to new computing environments? Does compilation to a language-independent intermediate form (an old idea, now being promoted by Microsoft) help with migration across hardware? Does it mean that we can retain components written in no longer fashionable languages (Fortran, Cobol, will C go the same way?) and mix this with further development in new languages? Does the advent of compilers that can combine different languages make this irrelevant (given sufficient modularity), or are there penalties for a mixed language approach? What principles/features are essential to design a system with long life?
The session will present some of these ideas (including the Object Paradigm and UML) and developers of statistical and survey processing systems will discuss the current and future architecture of their systems.
| CStat | £27 (applies to MIS, FIS and GradStat) |
| RSS Fellow/ASC member | £30 |
| Others | £40 |
| Time | Item | Contributor |
|---|---|---|
| 10:00 | Registration & Coffee | |
| 10:30 | Introduction and Overview
| Andrew Westlake, Survey & Statistical Computing |
| 11:00 | R: Open and Extensible Software for Statistical Data Analysis R is a language and environment for statistical computing and graphics. The core of R is a Scheme-based interpreter for the S language, which is often the vehicle of choice for research in statistical methodology. R provides an Open Source route to participation in that activity, and probably is the fastest growing statistical software project at the moment. In this presentation I will discuss the basic architecture of R with special emphasis on portability, user extensibility and interfaces to other programming environments. The main parts of "base R" are the S interpreter, facilities for object-oriented programming (also known as S3 and S4), a graphics engine with support for multiple device drivers and functions for important statistical techniques like classic hypothesis tests, linear models or standard plots. Most of R is written either in the S language itself, or in (ISO versions of) C and FORTRAN and hence can be run on all major computing platforms (Linux, Mac, Unix, Windows). A unique feature of R is its packaging system, which allows to easily distribute contributed third party extensions over the Internet. Most users think of R packages as collections of R functions with help pages, but packages are really heavily standardized units for extending R. Examples for non-function packages are packages containing data sets or informational material, e.g., to accompany a book. Standardization imposes some additional load on package authors, which is more than outweighed by ease of installation and usage on a wide variety of computing platforms.
| Friedrich Leisch, TU Vienna |
| 11:45 | Evolution or Intelligent Design?
An objective view of software development This talk discusses the pragmatic aspects of devising and maintaining software architectures for survey systems over protracted time periods. It is illustrated with examples from Snap survey software. At the core of Snap is an underlying architecture that was established almost 20 years ago. During that time, user requirements and expectations have changed along with the operating environment. Amongst the software infrastructure technologies that have become ubiquitous during that time are the Internet, intranets, Windows, Linux, web and email. Software technologies introduced include UML, XML, Triple-s, SQL, C++, Java and various derivatives. User requirements include web interviewing, automated response management, scanning, mobile interviewing, kiosks and live, online results. These changes have to be incorporated in an evolutionary way that doesn't alienate existing users. The presentation highlights how the object-oriented metaphor was used in the original design of Snap and how it continues to be an essential part of continuing re-factoring processes. Also discussed will be a twist on the regular object metaphor which is at the heart of plans for an entirely new core of Snap currently under development.
| Steve Jenkins, Snap Surveys |
| 12:30 | Lunch | |
| 1:45 | Model Representation in WinBUGS: The Virtual Graph In this talk I will discuss some of the architecture underlying the WinBUGS framework for Bayesian statistical modelling. The software allows Markov chain Monte Carlo analysis of a very wide class of statistical models known as *Directed Acyclic Graphs* (DAGs). Its generality arises from our exploitation of the conceptual similarities between graphical modelling theory and object-oriented programming. Indeed, the user-specified model is stored internally as an object-oriented version of the implied graph, with 'objects' representing graphical nodes and 'pointers' linking related nodes together. This *Virtual Graphical Model* (VGM) is then used as the basis for abstract computation via the very general Gibbs sampling algorithm. In the talk I will discuss various key design features of the VGM that make all of this possible, as well as message-passing mechanisms that we have designed to improve efficiency in complex problems. I will also discuss the current limitations of the software and how the VGM might evolve to address them.
| David Lunn, Imperial College London |
| 2:30 | Battling entropy : the development of the MLwiN statistical modelling package In this paper we discuss the development trajectory of the MLwiN statistical software over the last fifteen years; where we are now and future plans. The software has grown in a piecemeal organic fashion which is a consequence of it being a tool developed by researchers rather than professional software developers. Over recent years the limitations of this approach have become very clear. The lack of a coherent underlying architecture, "spaghetti" code and poor documentation of code have made the extendability of the MLwiN software difficult. Also poor testing procedures have effected software reliability. Attempts in recent years at refactoring and software modelling procedures using UML have not been a success. Recent attempts at improved software testing and better communication between user support and programmers have lead to greater software reliability. We recently have begun working with software engineers and technical systems designers with experience of commercial and industrial systems design, implementation and management in an attempt to bring some order into our software development process, however we envisage this process of refactoring the code and transforming our software development processes taking at least three years. This is because this longer term work has to be fitted in around pressing short term requirements.
| Jon Rasbash, Bristol University |
| 3:00 | Developing GenStat into the 21st century The foresight of the designers of the internal structure of the original GenStat (c. 1970) and its first major update to GenStat 5 (1985) was demonstrated by their use of standardized data structures, workspace conventions, algorithmic interfaces and output utilities. GenStat was also one of the first statistical systems to provide a programming environment for users to develop, evaluate and disseminate new methods. We will describe how these concepts have evolved over the lifetime of GenStat in parallel with the many enhancements in the general computing environment: from batch use to interactivity, from command languages to menus, from main-frame to workstation to PC., and so on. We will also discuss the challenges and opportunities that are opening up at the start of the 21st century, and how we plan to exploit them within GenStat.
| Roger Payne & Simon Harding, VSN International |
| 3:45 | Tea | |
| 4:15 | Using XML to encode Questionnaire Designs: Existing Standards & Technical Implementation Issues This paper will examine the existing standards and technical implementation issues associated with the encoding of questionnaire designs using XML. In particular, this paper will focus on questionnaire designs in the practical context of the operational requirements for the creation, fielding, and respondent data collection of market research surveys. Where relevant, the more specialized requirements of questionnaire design for survey data repositories, and educational assessment testing will also be discussed. The paper will focus on the QEDML standard (Questionnaire Editing & Deployment Mark-up Language; www.qedml.com.au ), as the basis for a technical comparison.
| Philip Cookson & Jason Sobell, Philology Pty Ltd |
| 5:00 | Building distributed statistical services using XML and RDF The presentation will focus on the underlying architecture of the Nesstar system and its use of XML and RDF to provide distributed access to statistical resources. The role of metadata standards, like SDMX and DDI, for this type of systems will also be emphasised.
| Jostein Ryssevik |
| 5:45 | General Discussion | |
| 6:00 | Close (Discussion continued in Artillery Arms) |