Zhiyi Huang and Paul Werstein
There are many problems that cannot be solved in reasonable time without the use of supercomputers. An alternative is to use a group of standar off-the-shelf personal computers to form a powerful cluster computer.
We have a number of projects including:
Nathan Rountree and Ian McDonald
We often get large amounts of data where each object that is represented falls into a particular group depending on certain features. For instance, a particular lattitude and longitude may be associated with land rather than sea, or with high oxygen content rather than low. Sometimes is is nice to build models which condense this pattern into a brief but salient piece of "knowledge": for example, a rule expressing the relationship between levels of bacteria and the diagnosis of a disease. Developing that knowledge can be very difficult, especially when there are a lot of data (or a lot of features). Some methods seem to be more accurate than others---that is, they model the relationship between features and predicted group with a greater chance of the group being correct. Data mining projects aim to make the process of generating new knowledge from data faster, more acurate, and applicable to new fields of knowledge.
Current and previous projects include:
Paul Werstein and Ian McDonald
There are many problems that are most conveniently solved by storing (and retrieving) data in (and from) a relational database. To get fast access to your data, the database keeps an index on some attibutes of the data---allowing fast access by either student identification number or name or perhaps some other feature. However, sometimes we need to retrieve data by more than one feature; e.g. by lattitude, longitude, and time all together. The problem is that once data has been retrieved by longitude, the resultant dataset may still be very large, yet has no index on it for the other features.
Our experiments have shown that standard commercial and non-commercial databases cannot cope with certain reasonably modest problems without some sort of new indexing systems. Even purpose-built databases benefit from data structures that index more than one attribute at a time. Current projects in this area include: