COSC430 XML Database Assignment 2017

The task is to take some World Development Indicator data downloaded from the World Bank as *.csv files, load the information into an XML data base, and run some queries against it.

The material can be found in
~ok/COSC430/xml.d in the shared file system and a subdirectory of it. These directories are all readable and searchable by everyone.

BDB-XML.tar
the source code for Oracle Berkeley XML DB. You can install this yourselves. You may be able to run
~ok/COSC430/xml.d/dbxml-6.0.18/install/bin/dbxml
yourselves. Please try this. I have asked cshelp to install it for you but this may work.
BerkeleyDBXML-Intro.pdf
this is the tutorial for using Berkeley XML DB. It is the very document that I learned the command line interface of this database from.
COSC430-XML-2017.pdf
the slides for the XML databases lecture.
WDI_csv/
The subdirectory with the World Bank data.

The parent of that directory, ~ok/COSC430/, includes

XPath-1.0.pdf
the first W3C standard for XPath, now superseded, but a much better place to start trying to understand XPath than the current standards.
XQuery-1.0.pdf
the first W3C standard for XQuery, now superseded, but a much better place to start trying to understand XQuery than the current standards.

The Register has a useful tutorial on Berkeley DB XML.

What to do

  1. dbxml should be available in your VMs now. Check that this is so.
  2. Ensure that you can reach the data files in
    ~ok/COSC430/xml.d/WDI_csv/*
  3. Start dbxml, create a container using createContainer, load WDI_Country.xml into it using putDocument (you will need to give this a name; I suggest country), and load WDI_Series.xml into it using putDocument again (I suggest the name 'series'). Quit dbxml. (The command for that is quit.) You can use help cmd to get help about a command called cmd.
  4. Start dbxml again, open the container using openContainer, and check that the data are still there. Using just the data in Country, how many countries in each Region belong to each Income Group? (Show your XQuery.)
  5. Read through WDI_Data.txt to get an idea of what is in WDI_Data.csv. It is actually much simpler than the Country and Series data. That file has some notes about how this information might be represented in XML.
  6. Think about the nature of the information in WDI_Data.csv, and some queries you might want to ask, choose or design a way to represent the information you need from this file as XML. Write up to a page giving your decision and your reasons for it.
  7. Convert the file to your chosen representation. You may use any programming language you like for converting CSV to XML. The conversion program must not be linked with the data base in any way, shape, or form. The conversion need not even be done on the same machine as the one you run dbxml on. You may submit your source code if you wish, and it will be inspected, but it will not be marked.
  8. Devise three questions involving these three XML files, and express them in XQuery, give them to dbxml, and show the results. (So the questions had better have short results...) The queries are expected to use more of XQuery than just XPath, and in particular, at least two of them should involve joining data from two or more documents.
  9. Submit a report with