NetEpi Collection and NetEpi Analysis: Open-Source Solutions to Some Pressing Data Management and Data Analysis Problems in Public Health Practice

By:
To add a paper, Login.

Public health is an area of endeavour concerned with protecting and improving the health of sub-groups of the population (or the entire population), rather than with the care of individual patients. The global SARS epidemic in 2003 reminded everyone that communicable diseases are still a threat, and there is now real concern over the threat posed by avian and future human pandemic influenza viruses. The first part of this presentation will describe NetEpi Collection, a free, open-source, Web-based disease outbreak data management tool written in Python, development of which commenced in 2003 during the SARS epidemic, and its strengths, limitations and future needs will be discussed. In particular the need for multi-master distributed database features which are robust in the face of slow and unreliable networks and frequent network partition will be explored, as well as the problem of semantic encoding and metadata management during rapidly evolving outbreaks and epidemics. The distributed task management functions involving thousands of health care workers needed to deal effectively with an influenza pandemic will also be described. The second part of the presentation will describe NetEpi Analysis, which is a tool for interactive exploratory data analysis of large population health data sets ("large" meaning in the range of 10 to 100 million records). This is written primarily in Python, NumPy and R and is also available under an open-source license. The simple but somewhat novel data reduction and summarisation approach used, involving an object-oriented implementation of fast set operations on sorted inverse ordinal mappings of vertically-disaggregated dataset columns will be described, and the strengths and weaknesses of this approach will be explored. Future directions, including the use of parallel computing, to which the our approach lends itself, will be discussed. Both tools will be briefly demonstrated.


Keywords: public health, disease outbreaks, web applications, data analysis, Python, PostgreSQL, R
Stream: Python
Presentation Type: 30 minute Presentation in English
Paper: A paper has not yet been submitted.


Dr Tim Churches

Medical epidemiologist, Population Health Division, New South Wales Department of Health
Sydney, NSW, Australia

Tim Churches is a medical practitioner who worked in general practice and geriatrics before training in epidemiology and switching to public health practice. He has been responsible for the conception, design and implementation of many epidemiological and population health information systems, including systems used in the NSW cancer registry, national outcomes monitoring databases for both diabetes and intensive care, a NSW population health data warehouse, a near real-time emergency department surveillance system and communicable disease monitoring and outbreak investigation systems.

Dr James Farrow

Farrow-Norris Pty Ltd and School of IT, University of Sydney
Sydney, NSW, Australia

James Farrow is a computer scientist who teaches and undertakes research at the School of IT, University of Sydney and is also a principal in his own IT consulting business.

Ref: OS7P0061