Article Index

Ioannis Vlahavas

Aristidis Likas

Georgios Paliouras

Department of Informatics, Aristotle University of Thessaloniki

Department of Computer Science, University of Ioannina

Institute of Informatics and Telecommunications, National Centre for Scientific Research “Demokritos”


Machine Learning is a subfield of Artificial Intelligence that is concerned with algorithms and techniques that allow computer systems to “learn from experience” to successfully solve artificial intelligence problems. Experience is usually provided in the form of problem-specific “examples” (organized in “datasets”) that allow the learning system to discover new knowledge and improve its performance on a particular task. If the training examples are not available at the beginning of the learning process, but they are collected during training we have the case of “on-line” or incremental learning. If the system is already provided some knowledge about the domain and/or the task, we have the case of knowledge refinement or analytical learning.

Machine Learning problems are generally distinguished into three main categories depending on the nature of the datasets: In supervised learning, we are given a set of labeled examples and the aim is to discover the knowledge required for labeling new examples. Typical tasks that fall within the paradigm of supervised learning are classification where a class label is predicted for each input example and regression, where a numerical value is predicted for each input example. In unsupervised learning we are given a set of unlabeled examples and the aim is to identify the underlying structure of the data and extract it in the form of actionable knowledge. Typical unsupervised learning tasks are clustering, where the goal is to identify commonalities among the examples and form interesting clusters, and model estimation, where the knowledge model that generated the data is being sought, e.g. in the form of probability density functions. In semi-supervised learning only partial supervision is provided to the learning machine. Partial supervision can take the form of partial labeling of the examples, i.e., providing a set of labeled and a set of unlabeled examples. In particular, semi-supervised learning methods that involve the user in exploring the set of unlabelled examples are called active learning methods, while those that learn from data labeled with the same label are often called one-class or descriptive learning methods . Another form of partial supervision appears in reinforcement learning, where a label (reward/penalty) is provided for a complete sequence of actions and needs to be distributed to the individual actions, in order to let the system learn how to select actions. Reinforcement learning is particularly useful for control problems, games and sequential decision making.

Machine Learning problems are also distinguished according to the format of the training data. In most cases the data take the form of propositional assertions or equivalently feature vectors in the space of example attributes. In such cases, the knowledge to be discovered is of similar expressive power, e.g. propositional rules, decision trees etc. On the other hand, there are cases where we want to learn from relational data, e.g. first-order predicates or relational databases. In such cases the discovered knowledge is also necessarily of higher expressive power, e.g. logic programs. Inductive logic programming and statistical relational learning are two very active approaches to relational learning. In between first-order predicates and propositional assertions, there is a range of structured data, for which specialized and efficient learning methods have been developed: (a) sequential data, such as character strings, have given rise to a variety of sequence learning methods, which try to identify interesting sequential patterns, such as grammatical rules that could have generated the data, (b) graph data, i.e. data that are related with a single type of binary relation, and tree data have also led to specialized graph and tree learning methods.

In addition to the representation of the training examples, machine learning methods need to deal with the quality of the data provided. Most of the work in this area has focused on the manipulation of the attribute or feature space, i.e. the characteristics that are chosen for describing the examples. The effort to reduce the set of those descriptive features to the subset of useful ones has led to research in dimensionality reduction and feature selection. A variety of methods have been developed for this purpose, differing significantly between supervised and unsupervised learning problems. Apart from manipulating the feature space, work has been done in identifying and reducing the noise in the data, undersampling or oversampling the data, in particular when the given sample is considered to misrepresent the distribution of examples in the true world.

A great variety of models have been developed to tackle the learning problems. A well-studied category concerns neural networks, that are non-linear models inspired from the biological way of information processing and learning. The most popular neural architectures are the feedforward neural networks such as Multilayer Perceptrons (MLP) and Radial Basis Function (RBF) networks that have been successfully used in various learning problems. More recently, kernel models have emerged that provide state-of-the-art performance in many supervised and unsupervised learning tasks.

In this chapter we aim to provide a summary of the recent activity of researchers in Greece in the areas of “Machine Learning – Neural Networks”. This summary covers a wide range of research work, however it is by no means a complete description, since it contains information provided only by the research groups who responded to our invitation and submitted their contribution. The chapter also excludes people who are now active outside Greece and includes the past activity of people who are now in Greece. The chapter organization is group-based and, for each research group, a brief description of important proposed methods is provided along with an indication of the machine learning topic in which the method falls (e.g. supervised learning - neural networks). All references are given in the end of the chapter in alphabetical order. Finally it should be mentioned that this chapter focuses on general “machine learning – neural network” methods and algorithms and does not cover straigthforward applications of such algorithms to specific problem domains.

Computational Intelligence Laboratory (CIL),
Institute of Informatics and Telecommunications
National Centre for Scientific Research “Demokritos”
Contact person: Stavros Perantonis

Supervised Learning, Neural Networks

Novel supervised learning algorithms for feedforward networks have been proposed incorporating additional knowledge in the learning rule using constrained optimization [Per00a]. This has been shown to lead to the formulation of efficient Constrained Learning Algorithms with accelerated learning properties [Kar95, Per95, Amp02, Amp01, Per00b, Amp99]. The additional knowledge can be encoded in the form of objectives leading to single – or multi-objective optimization criteria that have to be satisfied simultaneously with the demand for a long-term decrease of the cost function.

In this approach, an optimization problem is formulated at each epoch of the learning process. The additional information incorporated into the algorithm can be either of a general nature (related to exploitation of certain features of the cost function landscape) or problem specific (exploiting specific characteristics of the application whose solution is sought by training a feedforward neural network). The family of proposed algorithms comprises both first order algorithms involving only gradient information [Kar95, Per95], as well as second order algorithms taking into account additional Hessian information [Amp02].

These algorithms have been widely cited in the literature and used by other authors in many different applications such as eye detection and face detection, gender recognition, dynamics identification and video quality estimation [Amp98] among other applications. Another successful application of the constrained learning algorithms that has gained momentum in recent years is numerical polynomial factorization and root finding.

Feature Extraction

In [Pet04] a general framework is proposed for feature generation in pattern recognition problems taking into account class information. This framework unifies state-of-the art feature extraction methods, like linear discriminant analysis, heteroscedastic discriminant analysis, maximization of mutual information as well as neural network based methods [Per99] under a general information theoretic framework, and serves as a spring board for the development of new efficient supervised feature generation algorithms.

Intelligent Systems Laboratory
School of Electrical and Computer Engineering
National Technical University of Athens
Contact Person: Andreas Stafylopatis

Research activities of the Intelligent Systems Laboratory in the area of “Learning - Neural Networks” include the following topics: neural networks, ensemble methods and meta-learning, memory-based learning, clustering, self-organization, feature selection etc.

Ensemble methods

Ensemble Classification: In [Fro03] a multi-net classification method is proposed. The very good performance of the proposed system is mainly due to the combination of supervised and unsupervised learning methods, the ability of the sub- classifiers to solve difficult tasks and finally the balance between sub- tasks simplification and decision-making efficiency. Moreover, inspired by a modular way of reasoning, a subsethood-product fuzzy neural classifier has been developed [Per08], with a novel dynamic architecture, involving a main module and a number of submodules. The CART algorithm is employed as a fast preprocess of structure identification, which divides the input space into high certainty and low certainty regions, each representing a primary fuzzy rule. These primary fuzzy rules use a minimum set of attributes and are mapped onto the main neuro-fuzzy module. The patterns belonging to a low certainty primary rule get further split into a subset of secondary rules that use an extended set of attributes. Each such rule subset is mapped onto an expert-submodule, which gets activated only when a pattern falls into the respective low certainty region. This dynamic resource-allocating model is optimized through a supervised learning procedure.

Ensemble Clustering: Exploiting the ‘ensemble’ idea of constructing a complex model from simple ones, ensemble clustering algorithms have been developed. The key feature of the proposed multi-clustering method [Fro04, Fro05] is the ability to partition a set of data points to the optimal number of clusters not constrained to be hyper-spherically shaped.

Memory-Based Learning

A memory-based learning methodology for classification has been proposed in [Pat07] that relies on the main idea of the k -nearest neighbors algorithm. In the proposed approach, given an unclassified pattern, a set of neighboring patterns is found, not necessarily using all input feature dimensions. In addition, a novel weighting scheme of the memory-base is proposed: using the self-organizing maps model dynamic weights of the memory-base patterns are produced during the execution of the algorithm.

Weight Learning in Connectionist Fuzzy Logic Programs

Fuzzy logic programs are a useful framework for imperfect knowledge representation and reasoning using the formalism of logic programming. Nevertheless, there is the need for modeling adaptation of fuzzy logic programs, so that machine learning techniques can be applied. Weighted fuzzy logic programs bring fuzzy logic programs and connectionist models closer together by associating a significance weight with each atom in the body of a fuzzy rule: By exploiting the existence of the weights, it is possible to construct a connectionist model that reflects the exact structure of a weighted fuzzy logic program [Cho08]. Based on the connectionist representation, first the weight adaptation problem is defined as the task of adapting the weights of the rules of a weighted fuzzy logic program, so that they fit best a set of training data, and then a subgradient descent learning algorithm is proposed that allows to obtain an approximate solution for the weight adaptation problem.

Image, Video and Multimedia Systems Lab
School of Electrical and Computer Engineering,
National Technical University of Athens
Contact person: Stefanos Kollias

The research work of this group that is related to learning and neural networks is focused on supervised neural network training, on-line training and hybrid fuzzy-neural network training with applications to semantic multimedia analysis, coding, search and retrieval, human computer interaction and multimodal emotion/affective recognition.

Supervised Learning, Neural Networks

In [Kol88] the Levenberg-Marquardt algorithm has been proposed for supervised training of feedforward neural networks. This method is very effective compared to typical gradient descent and has been successfully used in many applications. It is now in wide use since it is included in Matlab neural network toolbox.

In [Del94] third order neural networks are proposed for affine invariant recognition of images that based on third order signal (image) statistics (cumulants). Also in [Kol96] multiresolution neural networks are proposed for image recognition based on wavelet decomposition and training of the different scales through a constructive approach.

Context-adaptable supervised neural networks are presented in [Dou00] along with on-line training algorithms that have been successfully used in image analysis and other applications.

Adaptive neurofuzzy models and learning algorithms for semantic multimedia analysis are presented in [Sta05, Ath07, Sto06] while in [Ioa05, Car08] such hybrid models are trained for emotion recognition.

Blind Signal Processing and Machine Learning group
Department of Informatics
T.E.I. of Thessaloniki
Contact person: Konstantinos Diamantaras


The Blind signal processing and machine learning group at the Technological Education Institute of Thessaloniki is active in research in the area of machine learning methods and algorithms with applications in blind signal processing, image/video content based retrieval, text categorization, etc. Prof. K. Diamantaras was the chairman of the 2007 International Machine Learning for Signal Processing Workshop, held in Thessaloniki, Greece, in August 2007. He is also the author of the book Artificial Neural Networks (in Greek) [Dia07a].
Blind Signal Processing
The term blind in blind signal processing (BSP) refers to problems involving a system where neither the inputs nor the transfer function are available yet we seek to solve the problem using only the output information. There are many aspects of blind signal processing depending on whether we aim at reconstructing the sources (blind source separation - BSS) or at estimating the system (blind identification - BI). Novel methods for both BSS and BI problems have been developed (a) using second order statistics [Dia08b][Dia07b][Dia06c][Dia06e][Dia06f] or (b) using geometric characteristics of the data cloud [Kof07][Dia06a][Dia06b][Dia06d][Pap06][Dia05c][Dia04a][Dia04b]. The blind separation of images from their mixed reflections has been also treated in [Dia05a].

The parallel implementation of neural and/or batch BSP methods is recently investigated as well. In [Marg07] the parallel implementation is described of the natural gradient method for blind signal separation (BSS) using the MPI library on a cluster. Preliminary results show that this line of research may be promising in the future – especially for batch methods.
Unsupervised Learning, Feature extraction
One of the major lines of research for the group is the application of Principal Component Analysis (PCA) neural models or models implementing extensions of PCA, such as Oriented PCA (OPCA), in problems involving blind signal reconstruction or identification. It is shown that these second order unsupervised methods can be used effectively for solving the blind signal separation problem [Dia08a]
Supervised learning, Kernel methods
Kernel methods and SVM have been used in the content based video retrieval (CBVR) problem [Zam08][Zam07] with promising results. The main advantage of these techniques is speed and the capability of handling large pattern dimensions essential for this sort of application. A novel and fast supervised classification method is proposed in [Dia05b] based on the description of the solution in weight space. This approach is compared against SVMs with various kernels for a particular but very important text classification problem called Named Entity Recognition [Mic06][Mic05]. Named entities (NE) are words or phrases corresponding to names (of people or locations), titles (of movies, songs, books), etc., and they are very important for indexing large databases since most internet queries involve named entities.


National Observatory of Athens
Institute of Space Applications and Remote Sensing
Contact person: Konstantinos Koutroumbas

Neural Networks, Hamming MaxNet
A common problem for several applications (e.g. data mining) is the selection of the maximum among the k (positive) numbers of a set S. Among the most well known recurrent techniques to solve this problem is the so called Hamming Maxnet (HMN) which is a fully connected recurrent neural network consisting of k nodes, each one corresponding to a number of the set. Asynchronous modes of operation of HMN as well as a fully detailed analysis of its convergence are provided in [Kou94] where it is proved that after a finite number of iterations, the nodes of the HMN initialized with a less than maximum initial value stabilize to 0. In addition, (a) if S contains a single maximum, the corresponding node stabilize to a positive number after a finite number of iterations, while (b) if S contains more than one maxima, the corresponding nodes tend asymptotically to 0. In [Kou05a] a generalized version of HMN, called GHMN, is introduced and a fully detailed convergence analysis of GHMN is also provided.
In [Kou04] two other recurrent methods for the identification of the maximum of S are discussed. The aim of the first one, called TAA (Threshold Adjustment Algorithm) is the determination of a threshold T that is less than the maximum and greater than all the rest values of S. T is determined via an iterative scheme, after a finite number of iterations. In the second method, called SIA (Self Inhibitive Algorithm) a (positive) constant threshold T less than the maximum value of S is defined. It is proved that, after a finite number of iterations, all sequences stabilize to 0 except those that correspond to the maxima of S, which converge to a positive value. Neural network implementations are provided for both TAA and SIA. In [Kou05b] a method suitable for cases where the domain of the elements of S is discrete is presented that determines a threshold T that is greater than all elements of S except the maxima.


Intelligent Systems & Robotics Laboratory
Department of Production Engineering & Management
Technical University of Crete
Contact person: Nikos Vlassis

Unsupervised learning
Dirichlet process mixture models: Dirichlet Process (DP) mixture models are promising candidates for clustering applications where the number of clusters is not known a priori. A class of deterministic accelerated DP mixture models has been proposed that can routinely handle millions of data-cases. The speedup is achieved by incorporating kd-trees into a variational Bayesian algorithm for DP mixtures in the stick-breaking representation [Kur07].
Gossip-based (distributed) EM: The EM algorithm for Gaussian mixtures can be implemented in a decentralized way (assuming that the data are distributed over a number of processor units) as follows: in the M-step, random pairs of units repeatedly exchange their local parameter estimates and combine them by (weighted) averaging. Theoretical and experimental evidence is provided that, under such a gossip-based protocol, nodes converge exponentially fast to the correct estimates in each M-step of the EM algorithm [Kow05].
Accelerated mixture learning: Standard mixture learning algorithms like EM and k-means are slow for large datasets. For k-means there exists an accelerated version that uses a kd-tree and is exact. A similar approximate technique exists for EM but with no convergence guarantees. A variational approximation to the EM algorithm for Gaussian mixtures has been proposed which results in a provably convergent scheme with speedups that are at least linear with the sample size [Ver06].

Greedy mixture learning: Greedy mixture learning is the fitting of the parameters of a mixture probability density function by successive component allocation. In [Vla02, Ver03] greedy versions of EM are proposed that yield good solutions and they require no parameter initialisation.
Reinforcement learning
Bayesian Reinforcement Learning: A Bayesian RL algorithm (“BEETLE”) has been proposed for effective online learning, which is computationally efficient while minimizing the amount of exploration.A Bayesian model-based approach has been followed, framing RL as a partially observable Markov decision process. The two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parametrization [Pou06].
Approximate value iteration in POMDPs: Partially observable Markov decision processes (POMDPs) provide a rich mathematical framework for agent decision making under uncertainty, however, solving a POMDP exactly is an intractable problem. The Perseus algorithm has been developed that is a randomized version of approximate point-based value iteration and provides good tradeoffs in terms of computation time and solution quality [Spa05, Por06].

Multiagent Reinforcement Learning: The problem of coordination and learning of a large group of agents can be simplified by decomposing the global payoff function into a sum of local terms, aka coordination graph (CG). A novel algorithm is presented in [Kok06] for collaborative multiagent reinforcement learning on a CG that is analogous to belief propagation in Bayesian networks.


Signal Processing and Pattern Recognition group
Department of Informatics and Telecommunications
University of Athens

Contact person: Sergios Theodoridis

The Signal Processing and Pattern Recognition group is heavily involved in research in the theory of kernel methods, hidden Markov modeling and on applications with an emphasis on audio characterization / retrieval and medical image analysis / retrieval. The group has been awarded four best papers awards for publications in this area, including the 2009-Outstanding Paper Award published in IEEE Transactions on Neural Networks [Mav06].
Supervised Learning, kernel methods

A novel online time-adaptive algorithm for classification in Reproducing Kernel Hilbert Spaces (RKHS) is proposed in [Sla08] by exploiting projection-based adaptive filtering tools. The paper brings powerful convex analytic and set theoretic estimation arguments in machine learning by revisiting the standard kernel-based classification as the problem of finding a point which belongs to a closed halfspace (a special closed convex set) in an RKHS. The proposed algorithmic solution generalizes a number of well-known projection-based adaptive filtering algorithms such as the classical Normalized Least Mean Squares (NLMS) and the Affine Projection Algorithm (APA). Under mild conditions, the generated sequence of estimates enjoys monotone approximation, strong convergence, asymptotic optimality, and a characterization of the limit point. Furthermore, it is shown that an additional convex constraint on the norm of the classifier naturally leads to an online sparsification of linear complexity of the resulting kernel series expansion.
The geometric framework for the support vector machine (SVM) classification problem provides an intuitive ground for the understanding and the application of geometric optimization algorithms, leading to practical solutions of real world classification problems. In [Mav06], the notion of “reduced convex hull” is employed and supported by a set of new theoretical results. These results allow existing geometric algorithms to be directly and practically applied to solve not only separable, but also non-separable classification problems both accurately and efficiently. As a practical application of the new theoretical results, known geometric algorithms has been employed and transformed accordingly to solve non-separable problems successfully and efficiently.

Hidden Markov Models
A novel extension to the variable duration Hidden Markov model is presented in [Pik06] that is capable of classifying musical patterns, that have been extracted from raw audio data, into a set of predefined classes. The key novelty of the model is that the HMM model structure is modified and a “context” sensitive cost is embedded, for each node, that offers robustness in errors, which are usually encountered in a music recognition task. Although the method is developed in the framework of music recognition, it is of a more generic nature and it can be used in similar, from a modelling point of view, tasks, such as bioinformatics.


Information Processing and Analysis Group
Department of Computer Science

University of Ioannina
Contact persons:
Aristidis Likas URL:
Konstantinos Blekas URL:


Supervised Learning

Classification: Development of training algorithms for classification mixture models with shared kernels [Tit01, Tit02, Tit03]. This model called Probabilistic RBF (PRBF) constitutes an extension of the traditional statistical approach to classification in the sense that it allows the sharing of Gaussian components among classes. Several EM-like training algorithms have been developed that can be used to adjust not only the kernel parameters, but also the degree of sharing of the Gaussian components. Also in [Con06b] a deterministic incremental algorithm is proposed for PRBF training that appears to be competitive to SVM classifiers.

Unsupervised learning

Bayesian learning of Gaussian mixture models: A Bayesian approach has been proposed for unsupervised training of Gaussian mixture models [Con07]. This incremental algorithm is based on the principles of variational inference and provides solution to the difficult problem of estimating the number of Gaussian components.
Global k-means clustering: The global k-means algorithm has been proposed for finding near optimal clustering solutions [Lik03]. The global k-means is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure and provides solutions that are very close to optimal. The method is deterministic, does not depend on initial values and does not contain any empirically adjustable parameter. An extension of the method (called global kernel k-means) has also been proposed to solve clustering problems in the kernel space [Tzo08].

Newtonian Clustering: The Newtonian Clustering approach has been proposed that estimates the number of clusters in a dataset [Ble07a]. A dynamical procedure is applied to the data points in order to shrink and separate, possibly overlapping clusters. Newton's equations of motion are employed to concentrate the data points around their cluster centers, using an attractive potential. During this process, important information is gathered concerning the spread of each cluster. Global optimization is then used to retrieve the positions of the maxima that correspond to the locations of the cluster centers. Further refinement is achieved by applying the EM-algorithm to a suitably initialized Gaussian mixture model.
Split-Merge mixture learning: A new incremental method, SMILE, is proposed for learning a mixture model based on successive split and merge operations [Ble07b]. The method starts with a two-component mixture and it performs split-merge steps to improve the current solution by maximizing the log-likelihood. SMILE aims at eliminating the significant drawback of the EM algorithm of great dependence on initialization of model parameters.
Greedy multinomial mixture modeling: A new greedy algorithm is proposed for discovering motifs in categorical sequences based on learning a mixture of motifs model through likelihood maximization [Ble03]. The approach adds sequentially a new motif to a mixture model by performing a combined scheme of global and local search for appropriately initializing its parameters. In addition, a hierarchical clustering procedure is proposed based on kd-trees, which results in partitioning the large dataset (containing all possible substrings) into a remarkable smaller number of candidate motif-models used for global searching.

MAP-MRF mixture modeling: A spatial-constraint Gaussian mixture model is proposed for computer vision problems [Ble05, Ble08]. Using a Markov Random Field (MRF) we introduce a prior biased term in pixels that captures the spatial location information. An inherent difficulty with this formulation is that the M-step of the EM algorithm cannot be implemented straightforward using closed form expressions. We propose a new methodology based on an efficient projection method. The approach has been successfully applied to image segmentation [Ble05] and for clustering spatiotemporal data [Ble08].

Feature Selection

Unsupervised feature selection in mixture models: A Bayesian approach has been proposed for determining irrelevant features in unsupervised training of Gaussian mixture models [Con06a]. Learning is based on variational inference and the method is also capable of determining the number of mixture components.


Computational Intelligence Laboratory (CILab)

Department of Mathematics, University of Patras
Contact person: Michael N. Vrahatis

Supervised Learning, Neural Networks

Α mathematical framework for the convergence analysis of the well-known Quickprop method has been proposed. A modification of this method that exhibits improved convergence, speed and stability, while, at the same time, alleviates the use of heuristic learning parameters has also been introduced [Vra00a]. Furthermore, nonmonotone methods for feedforward Artificial Neural Networks training have been presented that allow the values of error function to increase for some iterations satisfying an Armijo-type criterion with respect to the maximum error function value of previous steps [Pla02a]. Also in [Mag99] a novel generalized theoretical result that underpins the development of globally convergent first-order batch training algorithms employing local learning rates is presented. This result allows to equip any training algorithm with a robust strategy for accurately adapting the direction of search to a descent one, accelerating and securing the convergence of the algorithm [Mag01a, Pla01]. Furthermore, new globally convergent training schemes based on the resilient propagation algorithm, as well as novel sign-based learning algorithms have been proposed [Ana06, Ana05a, Ana05b, Mag06a]. In addition, the development and convergence theory of a class of training algorithms with adaptive stepsize have been proposed [Vra00b, Vra03]. Also, new probabilistic neural networks have been proposed in [Geo06].

At a next step new algorithms have been developed for on-line learning with adaptive step size which is an important issue in real-life applications [Mag01, Mag04, Mag01b, Mag99, Mag97]. Efficient mathematical techniques, named Stretching and Deflection, were employed for the alleviation of local minima in ANNs training [Par01]. The geometry of the error surface has been researched in an attempt to visualize and provide qualitative measures for the computational cost, the convergence speed and the sensitivity of learning methods [And97].
Training neural networks occupied with threshold activation functions and integer weight values has also been proposed. Such neural networks are better suited for hardware implementation than those with real weight ones. New Evolutionary Algorithms were applied to achieve this by confining the search to a narrow band of integers. These algorithms have been designed keeping in mind that the resulting integer weights require fewer bits to be stored and the digital arithmetic operations between them are easier to implement in hardware. Another advantage of the proposed algorithms is that they are capable of continuing the training process “on-chip”, if needed [Pla06b, Pla02b]. In addition distributed neural network learning methods have been developed in order to speed up the learning process [Pla06a].

Also novel stochastic training methods have been proposed for Spiking Neural Networks [Pav05], as well as for training Fuzzy Cognitive Maps [Pap05]. In the latter case the learning algorithms are based on the principles of swarm intelligence

Artificial Nonmonotonic Neural Networks

Hybrid learning systems that are capable of nonmonotonic reasoning, named Artificial Nonmonotonic Neural Networks (ANNNs) have been introduced in [Bou01]. ANNNs are hybrid systems that use inheritance networks, as a nonmonotonic multiple inheritance knowledge representation scheme for the domain knowledge, and Artificial Neural Networks as a learning mechanism. The latter are supported by a proper training method, which suits perfectly in this approach and is applied by changing selected weights at each epoch.

Unsupervised learning:

Clustering: The k-windows algorithm is introduced in [Vra02] that combines notions from computational geometry and statistical density estimation and manages to produce high quality clustering in a time efficient manner. Additionally, it avoids the necessity of a prior knowledge of the number of clusters, which is a central issue in cluster analysis. k-windows' potential has been researched on dynamically changing data sets and a theoretical framework has been proposed [Tas05b]. Also, k-windows can be easily parallelized and applied on distributed databases, as well as distributed computational systems [Ale04, Tas04]. A first step towards the development of unsupervised clustering algorithms capable of identifying clusters within clusters is taken by incorporating information about the fractal dimension in the k-windows algorithm [Tas06b]. Furthermore, in order to enhance the performance of evolutionary clustering algorithms, the laboratory has developed a new density function, named Window Density Function (WDF) [Tas05a].

Feature space manipulation:

In [Tas06a] in approach has been proposed that is capable of automatically identifying local regions wherefrom specific features can be derived based on a local PCA technique. Thus the search space is reduced and the descriptive potential of the derived data representation is maximized.


Management and Decision Engineering Laboratory (MDE-Lab)
Department of Financial and Management Engineering
University of the Aegean

Contact person: Georgios Dounias

Supervised Learning

A prototype methodology for the construction of Evolutionary Neural Logic Networks (ENLN) through indirect encoding and genetic programming principles is presented in [Tsa04a], [Tsa04b]. Specifically, an evolutionary system is proposed that uses genetic programming advances and produces neural logic network structures that can be arbitrarily connected and are easily interpretable into expert rules. The genetic programming process is guided using a context-free grammar and the neural logic networks are encoded indirectly into the genetic programming individuals.The abovementioned ENLN methodology is the first of its kind in literature. Several successful applications of the ENLN methodology are presented, from different areas such as engineering, medicine, management and finance.

Novel implementation of grammar guided genetic programming methodologies for data classification and rule induction are presented in [Tsa06a]. In the work is demonstrated how efficient neural logic networks can be generated with the aid of genetic programming methods trained adaptively through an innovative scheme. The proposed adaptive training scheme of the genetic programming mechanism leads to the generation of high-diversity solutions and small-sized individuals. The overall methodology seems to be advantageous due to the adaptive training scheme proposed for offering both accurate and interpretable results in the form of expert rules. Moreover, a sensitivity analysis study is provided within the article, comparing the performance of the proposed evolutionary neural logic networks methodology with well-known competitive inductive machine learning approaches. Indicative credit management applications are also demonstrated in the paper.
Combination of fuzzy rule based systems and genetic programming for production of generalized decision rules is proposed in [Tsa04], [Tsa06b]. In particular within [Tsa06b], the efficient use of hybrid intelligent systems for solving the classification problem of splice-junction gene sequences is demonstrated. The aim in this work is to obtain classification schemes able to recognize, given a sequence of DNA, the boundaries between exons and introns. Indicative classification results are presented and discussed in detail in terms of both, classification accuracy and solution interpretability.
In [Tho06],[Tho07] innovative approaches for the design of semiparametric financial forecasting (NN-GARCH) models are proposed, that combine intelligent learning techniques based on neural networks, and statistical - econometric GARCH models of volatility. The proposed approaches can accommodate most of the stylized facts reported about financial prices or rates of return such as non linear corrections, asymmetric GARCH effects and non-gaussian errors. By jointly modeling the conditional mean and volatility of the data-generating process, the scope of NNs is extended from function approximation to density forecasting tasks and the construction of neural network models under special statistical features existing in financial and economic data is also reconsidered.

Feature Selection

In [Mari07], [Mari08], data classification based on the solution of the Feature Subset Selection problem is presented, using a hybrid scheme combining nearest neighbor search with a variety of intelligent and nature inspired techniques like genetic algorithms, ant colony optimization, swarm intelligence and honey bees optimization. The proposed methodologies are primarily applied to the known medical problem of pap-smear cell image classification.
Feature selection in large databases and complex data sets using Inductive Machine Learning is attempted in [Dou01]. The work faces the complex problem of modeling frequency analysis outcome from vibration tests of household appliances. The aim is to find critical frequencies that identify possible faults or operation failures of devices under testing. Inductive decision trees are used for the feature selection process, in the sense that the higher is placed a feature in the decision tree hierarchy (i.e. closer to the root of the tree) the most appropriate seems for selection as target feature.


Artificial Intelligence Group
Wire Communications Laboratory
Electrical & Computer Engineering Department, University of Patras

Contact persons:
Nikos Fakotakis
Kyriakos Sgarbas

Learning from sequential and structured data.

Recurrent Neural Networks [Gan08, Gan07, Gan04, Gan03].
In [Gan03, Gan04, Gan07, Gan08] the authors study several hybrid PNN-RNN (Probabilistic Neural Network – Recurrent Neural Network) architectures, which combine the positive characteristics of the feed-forward and recurrent neural networks, but also the advantages of the generative and discriminative classification approaches. These hybrid architectures are sensitive to correlation among subsequent inputs and thus are capable to learn from sequential and structured data. In the experimental validation, this functionality is exploited to capture the inter-frame correlations among the feature vectors computed for successive speech frames, which leads to improved modelling, and thus to improvement of the overall classification performance. Specifically, in [Gan04] the authors update the locally recurrent PNN (LR PNN) architecture, introduced in earlier work [Gan03], as well as its training. Subsequently, in [Gan07] a generalization of the locally recurrent global-feedforward PNN architecture, referred to as GLR PNN, is studied. It is obtained by adding time-lagged values of inputs to recurrent layer linkage of the LR PNN. The fully connected recurrent layer of the GLR PNN makes it capable to learn from the training data in a more efficient manner. In a recent work [Gan08] the partially connected LR PNN architecture was introduced. It offers the opportunity the recurrent layer linkage to be optimized with respect to the underlying structure of the training data. This results in local neighborhoods of interconnected recurrent neurons that are formed in the recurrent layer. This capability offers additional degree of freedom and enhanced flexibility to build precise models from the training data. It also facilitates the training process and reduces the computational demands when compared to the fully connected LR PNN.

Plan and strategy learning [Sga95, Mar04a, Mar07].

Evolution of evaluation functions for strategy games. In [Sga95] the use of genetic algorithms is demonstrated in game-playing instead of the heuristic evaluation functions that are commonly used in that field. The paper shows how a genetic algorithm learns by playing the game by itself and how it develops an acceptable strategy without deep searching in the problem state-space.

A statistical framework for extracting medical domain knowledge from heterogeneous corpora. In [Mar04a] a statistical data mining framework is used in conjunction with a natural language understanding agent to assist doctors in diagnosing several cases of pneumonia. The presented system automatically encodes semantic representation of a user's query using Bayesian networks. The evaluation of the platform was performed against the existing language natural understanding module of DIKTIS, the architecture of which is based on manually embedded domain knowledge.

Domain Knowledge Acquisition

In [Mar07], the issue of acquiring domain knowledge from dialogue data is presented, by applying a novel Bayesian network framework to a meteorological application. In order to extract the domain concepts, annotation is manually performed and upon that phase, Bayesian networks use this knowledge to create semantic relationships among domain concepts, using a limited set of training examples. The dialogue engine exploits the extracted Bayesian networks as an inference engine, to identify the user plan and further support mixed-initiative interaction with the user. In order to improve the recognition accuracy, the paper introduced a novel approach that uses semantic similarity estimators in an attempt to identify keywords that are not present within the training data set. The framework is easy to be incorporated to other dialogue applications with minimum effort.

Feature Selection

In [Mar04b, Mar06, Mar08] and algorithm is presented that augments a given set (training or test) with extra data that will cause an increase in the classification performance of a supervised learning algorithm. The algorithm aims to outflank the statistical improvement methods of second order statistics like covariance and to allow the methods that are based in the distance between examples to utilize the mutual relation or dependency between the features. Every set of data contains different values for each feature. If we examine each value separately and independently, we don’t have any particular knowledge over its relation with the others (if there is any). When a Bayesian network is trained from this set of data, a kind of knowledge concerning the distribution of the features and the relation that exists between them is encoded inside it. The goal is to channel this knowledge to the output in the form of probabilities, which symbolize the belief of the data itself about the worthiness of each value separately.


DB-NET group
Dept of Informatics, Athens University of Economics and Business (AUEB)
Contact person: Michalis Vazirgiannis

The DB-NET group in AUEB has been working on topics related to clustering and unsupervised learning and more specifically on dimensionality reduction for unsupervised learning as well as on constraint based semi-supervised learning.

Dimensionality Reduction
Unuspervised Feature Selection for Spectral Methods: In [Mav07] a feature selection approach for spectral methods is proposed. It is shown that, the property of stability (i.e. low variate solutions) can be achieved by removing certain features that contribute maximally to the instability of the output. This is based on the fact that we can easily employ, in several spectral clustering methods, instead of an instance-instance matrix, a feature-feature matrix, thus relating the properties of the solution to the properties of the feature-feature matrix. The method is almost directly applicable to several spectral approaches, such as the Principal Components Analysis (PCA), Latent Semantic Indexing (LSI), Spectral Clustering, Spectral Ordering as well as the Spectral Solution of the K-Means algorithm. Bootstrapping and Matrix Perturbation Theory are employed both for defining the algorithms (which are based on greedy search) as well as for guaranteeing the stability property of the output.
Distributed Dimensionality reduction for Clustering: Distributed Knowledge Discovery (DKD) has emerged as one of the most challenging tasks in large scale distributed data management. Distributed dimensionality reduction (DDR) algorithms are then necessary to decrease the representation costs and to reveal potentially interesting or hidden structure in the data. In [Mag06b], a DDR algorithm, called K-Landmarks, is presented aiming to retain clustering quality at the projected space. K-Landmarks first selects an aggregator node that picks k points (henceforth called landmark points) from the whole dataset of cardinality d, and projects them from Rn to Rk based on FastMap. The projections of the remaining d − k points are computed by requesting the preservation of distances, meaning that each point projected must be at equal distance from all landmark points, both in the original and in the projection space. In [Mag06b], the formal description of the algorithm is also presented, its geometric interpretation, the proof of convergence and extensive clustering experiments on various UCI datasets.
Semi-supervised learning
In [Hal08] a semi supervised framework is presented for learning the weighted Euclidean subspace, where the best clustering can be achieved. The approach capitalizes on: (i) user constraints; and (ii) the quality of intermediate clustering results in terms of their structural properties. The proposed framework uses the clustering algorithm and the validity measure as its parameters. Novel algorithms are presented for learning and tuning the weights of contributing dimensions and defining the “best” clustering obtained by satisfying user constraints.


Machine Learning - Data Mining & Knowledge Discovery (ML-DM/KD) /
Biomedical Informatics Laboratory (BMI) at FORTH-ICS
Institute of Computer Science (ICS),
Foundation of Research & Technology – Hellas (FORTH)

Computer Science Department, University of Crete;
Department of Production Engineering and Management, Technical University of Crete
Contact persons:
George Potamias, Vassilis Moustakis

Ioannis Tsamardinos


The R&D activities and efforts of FORTH-ICS’s ML-DM/KD group aim towards the utilization of intelligent data-analysis and knowledge discovery in various disciplines, focussing on: design and development of novel and prototypical ML-DM/KDD methods, techniques, algorithms, tools and systems, as well as design and development of principled theories for inducing causal models and relations from observational data.

Supervised Learning

In [Mou98] research is reported on the assessment of Boolean minimization in symbolic empirical learning. Training examples are viewed as logical expressions and implement Boolean Minimization (BM) heuristics to optimize input and to learn symbolic knowledge rules. The work on a BM learning is based on a system called BML. BML includes three components: a preprocessing, a BM, and a post-processing component. The system incorporates Espresso-II, a popular system in very large scale integration design. The preprocessing and post-processing components include utilities that support preparation of training examples on the one hand and assessment of learned output on the other. BML is tested using 10 different domains and compare performance with C4.5, AQ15, NewId, and CN2 using classification accuracy and rule quality statistics.
In [Pot99] a novel concept learning algorithm is presented, named MICSL: Multiple Iterative Constraint Satisfaction based Learning. The algorithm utilizes mathematical programming and constraint satisfaction techniques towards uniform representation and management of both data and background knowledge. It offers a flexible enough learning framework and respective services. The representation flexibility of MICSL rests on a method that transforms propositional cases, represented as propositional clauses, into constraint equivalents. The theoretical background as well as the validity of the transformation process are analyzed and studied. Following a ‘general-to-specific’ generalization strategy the algorithm iterates on multiple calls of a constraint satisfaction process. The outcome is a consistent set of rules. Each rule composes a minimal model of the given set of cases. Theoretical results relating the solutions of a constraint satisfaction process and the minimal models of a set of cases are stated and proved. The performance of the algorithm on some real-world benchmark domains is assessed and compared with widely used machine learning systems, such as C4.5 and CN2. Issues related to the algorithm’s complexity are also raised and discussed.
In [Pot04c] a novel graph-theoretical approach and its application on patterning brain developmental (time-series) events are reported. The biosynthetic activity of an individual brain nucleus is represented as a time-series object, and clustering of time-series contributes to the problem of inducing indicative patterns of brain developmental events and forming respective PS chronological maps. Clustering analysis of protein-synthesis (PS) chronological maps, in comparison with epigenetic influences of alpha2 adrenoceptors treatment, reveals relationships between distantly located brain structures. Clustering is performed with a novel graph theoretic clustering approach (GTC). The approach is based on the weighted graph arrangement of the input objects and the iterative partitioning of the corresponding minimum spanning tree. The final result is a hierarchical clustering-tree organization of the input objects. Application of GTC on the PS patterns in developing brain revealed five main clusters that correspond to respective brain development indicative profiles. The induced profiles confirm experimental findings, and provide evidence for further experimental studies.
Association Rule Mining

In [Pot04b] a seamless clinical data integration and intelligent processing environment is presented, the HealthObs system. HealthObs, and in particular its knowledge discovery component, aims towards the discovery of interesting associations from distributed and heterogeneous clinical information systems. HealthObs contributes to the semantic integration of patient health data stored across multiple sources. The system incorporates Association Rule Mining (ARM) operations, which operate on top of XML documents. A real-world case study, based on mining across patient records in the region of demonstrates the effectiveness, efficiency and reliability of the proposed approach and system. Healthobs was also applied for the identification of genotype-phenotype associations and the devise of a genetic susceptibility index (GSI) [Mou07].

Feature Selection and Causal Discovery

In [Pot04a] a system for the analysis and interpretation of gene-expression profiles, and the identification of respective molecular- or, gene-markers is presented. The problem is challenging because of the huge number of genes (thousands to tenths of thousands!) and the small number of samples (about 50 to 100 cases). In this paper a novel gene-selection methodology, based on the discretization of the continuous gene-expression values, is presented. With a specially devised gene-ranking metric the strength of each gene with respect to its power to discriminate between sample categories is measured. Then, a greedy feature-elimination algorithm is applied on the rank-ordered genes to form the final set of selected genes. Unseen samples are classified according to a specially devised prediction/matching metric. The methodology was applied on a number of real-world gene-expression studies yielding very good results. A revised version of the presented feature-selection approach is also appropriately customized and applied on the problem of promoter/gene recognition producing very good results [Pot06].
In [Tsa03a], it is theoretically shown that, under certain broad and typical conditions, the minimal-size set of variables with the optimal prediction performance is the Markov Blanket of the target variable of interest. The Markov Blanket of a variable T is a minimal-size set conditioned on which T becomes probabilistically independent of every other subset of variables. It coincides with the set of parents, children and spouses of T in any Bayesian Network faithfully capturing the distribution of the data. Time and sample efficient algorithms for identifying the Markov Blanket of T are developed, that perform very well compared against the state-of-the-art in variable selection [Tsa03b, Ali08a, Ali08b]. There is a strong connection of Bayesian Networks to causal models and causal induction from observational data: under certain conditions an edge A®B in the network corresponds to a causal relation between A and B. Thus, the above theoretical results and algorithmic advances have a direct implication for causal discovery induction. To assess the quality of learning such relations an algorithm for providing a theoretical bound on the False Discovery Rate, is developed in [Tsa08].



Artificial Intelligence Laboratory
Department of Information and Communication Systems Engineering
University of the Aegean
Contact person: Efstathios Stamatatos

Supervised Learning

Class imbalance problem. The class imbalance problem in textual data, a crucial factor in tasks such as authorship attribution, is dealt with. To handle this problem in multi-class corpora, techniques based on text sampling have been proposed that segment the training texts into text samples according to the size of the class, thus producing a fairer classification model [Sta06b, Sta08]. Hence, minority classes can be segmented into many short samples and majority classes into less and longer samples.
Ensemble classifiers. Work is performed on ensemble models taking into account the properties of specific ML problems. In particular, to identify the most likely pianist given a set of performances of the same piece by a number of skilled candidate pianists, an ensemble of simple classifiers derived by both subsampling the training set and subsampling the input features have been proposed [Sta05]. Experiments under inter-piece conditions, a difficult musical task, display a high level of accuracy. Moreover, an ensemble of classifiers based on feature set subspacing has been applied to authorship attribution problems [Sta06a]. A variation of cross-validated committees applied to feature set provides excellent results on authorship attribution benchmark corpora in comparison to support vector machines.

Feature Selection

Work on feature selection methods suitable for textual data is performed. In more detail, a feature selection algorithm for character n-gram features has been proposed [Hou06] that is able to extract character n-grams of variable-length from an initial pool of fixed-length n-grams. This feature selection approach is based on frequency information rather than the discriminatory power of each individual feature and has been applied successfully to tasks such as authorship attribution [Hou06], spam detection [Kan07], and webpage genre identification [Kan07b].


School of Science and Technology, Hellenic Open University
Contact person: Dimitris Kalles

Supervised Learning

Incremental induction. Work towards estimating a minimum number of steps where it is guaranteed that, regardless of the order of the new training instances, a decision tree will be left intact, thus saving on computations [Kal96]. Subsequently, this guaranteed behavior was relaxed in order to estimate how one might keep close to the target tree (to be learned) by capitalizing on the observation that changes in the tree fringe tend to be frequent and cancel each other out [Kal00].
Batch induction. Work towards substituting the conventional use of heuristics with evolution of populations of decision trees by genetic algorithms. While relatively more expensive, the resulting trees are usually a lot smaller [Pap01].
Learning from sequential and structured data
Game strategy learning. Work towards a game playing and learning mechanism for a new board game, based on reinforcement learning and neural networks [Kal01] – the latter to approximate the playing policy. Further work focused on estimating experimentally the right amount of expert involvement to supply examples of good playing tactics so as to guide the learning mechanism [Kal02]. Expert involvement is either by human or by mini-max player [Kal07, Kal08a]. Along that direction game interestingness metrics were developed that discriminate between learning and not-learning sessions [Kal08b].


Machine Learning & Knowledge Discovery Group
Programming Languages and Software Engineering Lab
Department of Informatics Aristotle University of Thessaloniki
Contact persons:

Ioannis Vlahavas, Grigorios Tsoumakas

Supervised Learning

Multi-label classification. Multilabel classification methods are increasingly required by modern applications, such as protein function classification, music categorization and semantic scene classification. An overview and taxonomy of multilabel classification methods are presented in [Tso07a]. A new algorithm for multilabel classification, called RAKEL (random k labelsets), is presented in [Tso07b]. RAKEL produces an ensemble of multilabel classifiers by learning from different random subsets of the set of labels. RAKEL and several other multilabel classification methods have been implemented within an open source software, called MULAN, which is used by several researchers and practitioners all over the world. MULAN, along with an active bibliography on multilabel classification and many multilabel datasets are freely available at

Ensemble pruning. Ensemble pruning deals with the reduction of the ensemble size prior to combining the members of the ensemble. A method for pruning ensembles of heterogeneous classifiers, called Selective Fusion, was presented in [Tso04, Tso05]. It is based on a group of statistical procedures called multiple comparisons procedures. Selective Fusion is simple and fast and leads to very competitive results in terms of predictive performance. In addition, a slower but more effective method for pruning ensembles of either homogeneous or heterogeneous classifiers was proposed in [Par06], based on Reinforcement Learning. Finally, a new diversity measure for guiding a greedy search in the space of subensembles was introduced in [Par08].
Concept drift. An interesting problem that occurs in the task of data stream classification is concept drift (the sudden or gradual change of concept of the target class). In [Kat08a] a framework that deals with concept drift in dynamic feature spaces is proposed. A basic implementation of the framework is used in an adaptive personalized news reader, called PersoNews. In [Kat08b] the problem of recurring contexts in concept drifting data streams is tackled by introducing a framework that utilizes a stream clustering algorithm in order to discover and organize batches of data into concepts. A different classifier for each concept is subsequently trained and applied.
Learning from sequential and structured data
The prediction of functional sites, such as the translation initiation site (TIS), the transcription start site, and the splice sites in biological sequences is an important issue in biological research. The incorporation of a new set of features for representing the biological sequences was proposed in [Tza05], in order to improve the prediction accuracy of the classifiers. The use of multiple classifier systems such as classifier selection [Tza06b], simple and weighted voting [Tza06a] has also been proposed for improving prediction accuracy. Finally, MANTIS, a modular prediction methodology consisting of three major decision components, was presented in [Tza06a]. Each component corresponds to a different biological aspect of the problem. These components are combined into a meta-classification system, using stacked generalization.
Reinforcement learning

Multi-agent reinforcement learning. Reinforcement Learning comprises an attractive solution to the problem of coordinating a group of agents due to its robustness for learning in uncertain environments. A Reinforcement Learning algorithm for coordinating a group of agents that share a common goal was proposed in [Par07]. The algorithm uses strategies that are defined as the coordinated actions that must be learned form the agents. Then, through a process of voting the decisions of the agents are combined in order to follow a common strategy.



Software and Knowledge Engineering Laboratory (SKEL)
Institute of Informatics and Telecommunications

National Centre for Scientific Research “Demokritos”
Contact person: Georgios Paliouras

This section presents work of SKEL researchers in the area of machine learning. Some of this work has been performed before the researchers joined the Lab, but it is not reported in other sections of this chapter. The main focus of machine learning research in SKEL is in learning complex and expressive knowledge structures, such as ontologies, grammars, graphs from simple or insufficient data.

Supervised Learning
Computational complexity of supervised learning methods. In [Pal95] the worst-case complexity of widely-used classifier learning methods is shown to be over-quadratic when numeric features are used. Experiments with artificial data confirm this result, while experiments with real data indicate that the average-case performance is closer to linear.
Unsupervised Learning
Learning the subsumption hierarchy of ontologies. In [Zav07], a topic modeling method is proposed for discovering ontology concepts and the subsumption relations that hold among them. The subsumption hierarchy is based on the conditional independence achieved among a pair of latent topics by the presence of a third one, their parent. Further to this work, in [Zav08] a method to evaluate ontologies against a gold standard that has been constructed manually is proposed. In contrast to most of the existing approaches that use name matching, the proposed method compares the concepts of the two ontologies at the level of their instance distributions. Thus, it facilitates evaluation, even when the concepts have not been named.

Using features graphs to produce overlapping clusters. In [Pal00], a graph-based method for clustering is presented and applied to the domain of user modeling. The method constructs a graph of connected features, e.g. the interests of the users, and clusters them into maximal cliques. Overlapping clusters are produced in this manner, that are desirable in various applications. In more recent work, e.g. [Pie05], this approach is extended to the construction of hierarchical models, such as personalized Web directories.
Semi-supervised and knowledge-driven learning
Bootstrapping knowledge-level learning with syntactic learning. An iterative methodology is proposed in [Pal05] that couples a syntactic information extraction parser with a semantic model (ontology). The ontology is refined with the use of the extracted information and is used in turn to learn to extract new information from text and multimedia documents.
Assigning blame to knowledge model components. In [Apo07], a method is presented that back-propagates errors through a logic model. The model is assumed to be described in fuzzy description logics and receives input from low-level trainable classifiers. The backpropagation of the error permits us to derive input-output data pairs for training the lower-layer classifiers. In a similar vein, an error-back propagation method is presented, among others, in [Pal97]. The proposed method is used to assign blame to the temporal parameters of a temporal logic model. It uses the given knowledge model and a minimal-damage bias, in order to achieve knowledge refinement even when the training data is limited or incomplete (partial supervision).

Learning from heterogeneous labeled and unlabeled data. In [Tro06] TPN2 is presented, a classifier training method that bootstraps positive-only learning with fully-supervised learning, in order to make the most of labeled and unlabeled data, under the assumption that the two are drawn from significantly different distributions. Furthermore, the unlabeled data themselves are separated into subsets that are assumed to be drawn from multiple distributions. TPN2 trains a different classifier for each subset, making use of all unlabeled data each time. The method participated in a spam filtering competition, consisting of two different tasks and achieved the runner-up performance in both tasks.
Incremental refinement of learned models. In [Apo03], a new method is presented that refines incrementally a Radial Basis Function Neural Network, without the need for re-training. The method generates a new RBFNN for each incoming training example, by simply adding a node to the existing network. It then measures the divergence of the new network from the old one and uses it to train the old network towards the new one that incorporates the new data.
Parallelizing inductive logic programming. [Kon03] presents a parallel version of the Aleph ILP system that uses the Message Passing Interface (MPI) to distribute the training set and evaluate hypotheses in parallel. The method is empirically evaluated on large artificially-constructed datasets, where the computational bottleneck in hypotheses evaluation varies between background knowledge complexity and training set volume.
Learning from sequential and structured data

Stacking beyond classification. In [Sig05] and [Sig04] the problem of combining base learners that are not classifiers is studied, under the framework of stacked generalization. In particular, a method is presented to transform the results of sequential recognizers, such as information extraction systems, into feature vectors that can be handled by a meta-level classifier. Experimental results show that this method can boost significantly the performance of base-level recognition systems.
Transforming multidimensional data to sequences for clustering. In [Vog07], a method is proposed to transform multidimensional data, such as microarray data, into one-dimensional signals, using space filling curves and wavelet based denoising. In this manner, clustering of the data becomes computationally possible. Initial experiments show that the method performs comparably to fuzzy c-means clustering and greedyEM.
Efficient induction of context free grammars. In [Pet04a], a method is presented for inducing context free grammars, based on the principle of minimum description length. Particular emphasis is given on the computational efficiency of the method, in order to make it applicable to large training sets. The method is improved further in [Pet04b], through the use of genetic search that allows the introduction of new search operators and thus faster convergence to suitable grammars. The method is used in recent work [Pet08] to learn grammars that extract relations from text.



[Ale04] P. Alevizos, D.K. Tasoulis and M.N. Vrahatis, “Parallelizing the unsupervised k-windows clustering algorithm”, In R. Wyrzykowski, editor, Lecture Notes in Computer Science, Springer-Verlag, vol. 3019, pp. 225-232, 2004.
[Ali08a] C.F. Aliferis, A. Statnikov, I. Tsamardinos, M. Subramani and X.D. Koutsoukos, “Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part I: Algorithms and Empirical Evaluation”, (to appear) Journal of Machine Learning Research.
[Ali08b] C.F. Aliferis, A. Statnikov, I. Tsamardinos, M. Subramani and X.D. Koutsoukos, “Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part II: Analysis and Extensions”, (to appear) Journal of Machine Learning Research

[Amp98] SJ. Perantonis, N. Ampazis, S. Varoufakis and G. Antoniou, “Constrained learning in neural networks: Application to stable factorization of 2-D polynomials”, Neural Processing Letters, vol. 7, no. 1, pp. 5-14, 1998.
[Amp99] N Ampazis, S.J. Perantonis and J.G. Taylor, “Dynamics of multiplayer networks in the vicinity of temporary minima”, Neural Networks, vol. 12, no. 1, pp. 43-58, 1999.
[Amp01] N Ampazis, S.J Perantonis and J.G Taylor, “A dynamical model for the analysis and acceleration of learning in feedforward networks” Neural Networks, vol. 14, no. 8, pp. 1075-1088, 2001.
[Amp02] N Ampazis and S.J Perantonis, “Two highly efficient second-order algorithms for training feedforward networks”, IEEE Transactions on Neural Networks, vol. 13, no. 5, pp. 1064-1074, 2002.
[Ana05a] A.D. Anastasiadis, G.D. Magoulas and M.N. Vrahatis, “New globally convergent training scheme based on the resilient propagation algorithm”, Neurocomputing, vol. 64, pp. 253-270, 2005.

[Ana05b] A.D. Anastasiadis, G.D. Magoulas and M.N. Vrahatis, “Sign-based learning schemes for pattern classification”, Pattern Recognition Letters, vol. 26, pp. 1926-1936, 2005.
[Ana06] A.D. Anastasiadis, G.D. Magoulas and M.N. Vrahatis, “Improved sign-based learning algorithm derived by the composite nonlinear Jacobi process”, Journal of Computational and Applied Mathematics, vol. 191, pp. 166-178, 2006.
[And97] G.S. Androulakis, G.D. Magoulas and M.N. Vrahatis, “Geometry of learning: visualizing the performance of neural network supervised training methods”, Nonlinear Analysis: Theory, Methods and Applications, vol. 30, pp. 4539-4544, 1997.
[Apo03] G. Apostolikas and S. Tzafestas, “On-line RBFNN Based Identification of Rapidly Time-Varying Nonlinear Systems with Optimal Structure-Adaptation,” Mathematics and Computers in Simulation, Vol. 63, Issue 1, pp. 1-13, 2003.
[Apo07] G. Apostolikas and S. Konstantopoulos, “Error Back-propagation in Multi-valued Logic Systems,” Proceedings of 7th Intl. Conf. on Computational Intelligence and Multimedia Applications (ICCIMA), Sivakasi, India, IEEE CS Press, 2007.

[Ath07] Th. Athanasiadis, Ph. Mylonas, Y. Avrithis and S. Kollias, "Semantic Image Segmentation and Object Labeling", IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 3, pp. 298 - 312, 2007.
[Ble03] K. Blekas, D. Fotiadis and A. Likas, “Greedy Mixture Learning for Multiple Motif Discovery in Biological Sequences”, Bioinformatics, vol. 19, Nο. 5, pp. 607-617, 2003.
[Ble05] K. Blekas, A. Likas, N.P. Galatsanos and I. Lagaris, “A Spatially-Constrained Mixture Model for Image Segmentation”, IEEE Trans. on Neural Networks, vol. 16 (2), pp. 494-498, 2005.
[Ble07a] K. Blekas and I.E. Lagaris, “Newtonian Clustering: An Approach based on Molecular Dynamics and Global Optimization”, Pattern Recognition, vol. 40, no. 6, pp. 1734-1744, 2007.

[Ble07b] K. Blekas and I.E. Lagaris, “Split-Merge Incremental Learning (SMILE) of Mixture Models”, Int. Conf. on Artificial Neural Networks (ICANN), Lecture Notes on Art. Neural Networks, vol.4669, pp.291-300, 2007.
[Ble08] K. Blekas, C. Nikou, N. Galatsanos and N. Tsekos, “A regression mixture model with spatial constraints for clustering spatiotemporal data”, Int. Journal on Artificial Intelligence Tools, 2008.
[Bou01] B. Boutsinas and M.N. Vrahatis, “Artificial nonmonotonic neural networks”, Artificial Intelligence, vol. 132, no. 1, pp. 1-38, 2001.

[Car08] G. Caridakis, K. Karpouzis and S. Kollias, “User and Context Adaptive Emotion Recognition”, Neurocomputing, 2008.
[Cho08] A. Chortaras, G. Stamou and A. Stafylopatis, Connectionist Weighted Fuzzy Logic Programs, Neurocomputing, 2008.
[Con06a] C. Constantinopoulos, M. Titsias and A. Likas, "Bayesian Feature and Model Selection for Gaussian Mixture Models", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 28, no. 6, pp. 1013-1018, June 2006.
[Con06b] C. Constantinopoulos and A. Likas, "An Incremental Training Method for the Probabilistic RBF Network", IEEE Trans. on Neural Networks, vol. 17, no.4, pp. 966-974, July 2006.

[Con07] C. Constantinopoulos and A. Likas, "Unsupervised Learning of Gaussian Mixtures Based on Variational Component Splitting", IEEE Trans. on Neural Networks, vol. 18, no. 3, pp. 745-755, 2007.
[Del94] A. Delopoulos, A. Tirakis and S. Kollias, “Triple-correlation Neural Networks and their usage in Invariant Image Classification”, IEEE Transactions on Neural Networks, vol. 5, no 3, pp. 392-409, 1994.
[Dia04a] K.I. Diamantaras and Th. Papadimitriou, “Blind Deconvolution of SISO Systems with Binary Source based on Recursive Channel Shortening”, in Fifth Int. Conference on Independent Component Analysis and Blind Signal Separation (ICA2004), Lecture Notes in Computer Science, Vol. 3195, C.G. Puntonet and A. Prieto, Eds., Granada, Spain, pp. 548-553, Springer, 2004.
[Dia04b] K.I. Diamantaras and Th. Papadimitriou, “MIMO Blind Deconvolution Using Subspace-based Filter Deflation”, in Proc. 2004 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP-2004), Montreal, 2004.

[Dia05a] K. I. Diamantaras and Th. Papadimitriou, “Blind Separation of Reflections Using the Image Mixtures Ratio”, in Proc. IEEE Int. Conf. Image Processing (ICIP-2005), vol. II, pp. 1034-1037, Genova, Italy, 2005.
[Dia05b] K.I. Diamantaras, I. Michailidis and S. Vasiliadis, “A Very Fast and Efficient Linear Classification Algorithm”, in Proc. IEEE Int. Workshop on Machine Learning for Signal Processing (MLSP-2005), Mystic, CT, USA, 2005.
[Dia05c] K.I. Diamantaras and Th. Papadimitriou, “Blind Deconvolution of Multi-Input Single-Output Systems with Binary Sources”, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP-2005), vol. III, pp. 549-552, Philadelphia, PA, USA, 2005.
[Dia06a] K.I. Diamantaras, “Blind Signal Processing Based on Data Geometric Properties”, in New Directions in Statistical Signal Processing: From Systems to Brain, S. Haykin, J. Principe, T. Sejnowski, and J. McWhirter (eds), MIT Press, 2006.

[Dia06b] K.I. Diamantaras and Th. Papadimitriou, “Blind Deconvolution of Multi-Input Single-Output Systems with Binary Sources”, IEEE Transactions on Signal Processing, vol. 54, no. 10, pp. 3720-3731, October 2006.
[Dia06c] K. I. Diamantaras and Th. Papadimitriou, “Subspace-based Channel Shortening for the Blind Separation of Convolutive Mixtures”, IEEE Transactions on Signal Processing, vol. 54, no. 10, pp. 3669-3677, October 2006.
[Dia06d] K. I. Diamantaras, “A Clustering Approach for the Blind Separation of Multiple Finite Alphabet Sequences from a Single Linear Mixture”, Signal Processing, vol. 86, no. 4, pp. 877-891, 2006.
[Dia06e] K.I. Diamantaras and Th. Papadimitriou, “Blind Multichannel Deconvolution Using Subspace-Based Single Delay Channel Deflation”, 12th IEEE Digital Signal Processing Workshop (DSP'2006), Wyoming, USA, 2006.

[Dia06f] K.I. Diamantaras, Th. Papadimitriou, and E. Kotsialos, “A Channel Deflation Approach for the Blind Deconvolution of a Complex FIR Channel With Real Input”, in Proc. 14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, 2006.
[Dia07a] K.I. Diamantaras, Artificial Neural Networks (book in Greek), Κλειδάριθμος, 2007.
[Dia07b] K.I. Diamantaras and Th. Papadimitriou, “Analytical Solution of the Blind identification Problem for Multichannel FIR Systems Based on Second Order Statistics”, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP-2007), vol. III, pp. 749-752, Honolulu, Hawaii, USA, 2007.
[Dia08a] K.I. Diamantaras and Th. Papadimitriou, “Applying PCA Neural Models for the Blind Separation of Signals”, Neurocomputing, 2008.

[Dia08b] K.I. Diamantaras and Th. Papadimitriou, “An Efficient Subspace Method for the Blind Identification of Multichannel FIR Systems”, IEEE Trans. Signal Processing, 2008.
[Dou00] N. Doulamis, A. Doulamis and S. Kollias, “On-line Retrainable Neural Networks: Improving the performance of Neural Networks in Image Analysis problems”, IEEE Transactions on Neural Networks, vol. 11, no.1, pp.1-20, January 2000.
[Dou01] G. Dounias, G. Tselentis and V.S. Moustakis, “Feature Extraction for Quality Control in a Production Line Using Inductive Machine Learning”, Journal of Integrated Computer Aided Engineering, vol. 8, no. 4, pp. 325-336, 2001.
[Fro03] D. Frossyniotis, A. Stafylopatis and A. Likas, “A Divide-and-conquer Method for Multi-net Classifiers”, Pattern Analysis & Applications, vol. 6, no. 1, pp 32-40, 2003.

[Fro04] D. Frossyniotis, A. Likas and A. Stafylopatis, “A Clustering Method Based on Boosting”, Pattern Recognition Letters, vol. 25, no. 6, pp 641-654, 2004.
[Fro05] D. Frossyniotis, Ch. Pateritsas and A. Stafylopatis, “A multi- clustering fusion scheme for data partitioning”, International Journal of Neural Systems, vol. 15, no. 5, 391-401, 2005.
[Gan03] T. Ganchev, D.K. Tasoulis, M.N. Vrahatis and N. Fakotakis, “Locally Recurrent Probabilistic Neural Network for Text-Independent Speaker Verification”, Proc. of the InterSpeech-2003, vol. 3, pp.1673-1676, Geneva, Switzerland, 2003.
[Gan04] T. Ganchev, D.K. Tasoulis, M.N. Vrahatis and N. Fakotakis, “Locally Recurrent Probabilistic Neural Networks with Application to Speaker Verification”, GESTS International Transaction on Speech Science and Engineering, ISSN 1738-4737, vol.1, no.2, pp. 1-13, December 2004.
[Gan07] T. Ganchev, D.K. Tasoulis, M.N. Vrahatis and N. Fakotakis, “Generalized Locally Recurrent Probabilistic Neural Networks with application to Text-Independent Speaker Verification”, Neurocomputing, vol. 70, no. 7-9, pp. 1424-1438, 2007.

[Gan08] T. Ganchev, K.E. Parsopoulos, M.N. Vrahatis and N. Fakotakis, “Partially Connected Locally Recurrent Probabilistic Neural Networks”, In Recurrent Neural Networks, ISBN 978-3-902613-28-8, 2008.
[Geo06] V.L. Georgiou, N.G. Pavlidis, K.E. Parsopoulos, P. Alevizos and M.N. Vrahatis, “New self-adaptive probabilistic neural networks in bioinformatic and medical tasks”, International Journal on Artificial Intelligence Tools, vol. 5, no. 3, pp. 371-396, 2006.
[Hal08] M. Halkidi, D. Gunopulos, M. Vazirgiannis, N. Kumar and C. Domeniconi, “A clustering framework based on subjective and objective validity criteria”, TKDD 1(4), 2008.
[Hou06] J. Houvardas and E. Stamatatos. “N-gram Feature Selection for Authorship Identification,” In J. Euzenat, and J. Domingue (Eds.) Proc. of the 12th Int. Conf. on Artificial Intelligence: Methodology, Systems, Applications (AIMSA'06), LNCS 4183, pp. 77-86, 2006.

[Ioa05] S. Ioannou, A. Raouzaiou, V. Tzouvaras, T. Mailis, K. Karpouzis and S. Kollias, "Emotion recognition through facial expression analysis based on a neurofuzzy network", Neural Networks, vol. 18, no. 4, pp. 423-435, 2005.
[Kal96] D. Kalles and D.T. Morris, “Efficient Incremental Induction of Decision Trees”, Machine Learning, vol. 24, no. 3, pp. 231 - 243, 1996.
[Kal00] D. Kalles and A. Papagelis, “Stable Decision Trees: Using Local Anarchy for Efficient Incremental Learning”, Int. Journal on Artificial Intelligence Tools, vol. 9, no. 1, pp. 79 - 95, 2000.
[Kal01] D. Kalles and P. Kanellopoulos, “On Verifying Game Designs and Playing Strategies using Reinforcement Learning”, ACM Symposium on Applied Computing, special track on Artificial Intelligence and Computation Logic, Las Vegas, 2001.

[Kal02] D. Kalles and E. Ntoutsi, “Interactive Verification of Game Design and Playing Strategies”, IEEE International Conference on Tools with Artificial Intelligence, Washington D.C., 2002.
[Kal07] D. Kalles, “Measuring Expert Impact on Learning how to Play a Board Game”, 4th IFIP Conference on Artificial Intelligence Applications and Innovations, Athens, Greece, 2007.
[Kal08a] D. Kalles and P. Kanellopoulos, “A Minimax Tutor for Learning to Play a Board Game”, Workshop on AI in Games, 18th European Conference on Artificial Intelligence, Patras, Greece, 2008.
[Kal08b] D. Kalles, “Player Co-modeling in a Strategy Board Game: Discovering how to Play Fast”, Cybernetics and Systems, vol. 39, no. 1, pp. 1-18, 2008.

[Kan07a] I. Kanaris, K. Kanaris, I. Houvardas and E. Stamatatos, “Words vs. Character N-grams for Anti-spam Filtering,” Int. Journal on Artificial Intelligence Tools, vol. 16, no. 6, pp. 1047-1067, 2007.
[Kan07b] I. Kanaris and E. Stamatatos, “Webpage Genre Identification Using Variable-length Character n-grams,” In Proc. of the 19th IEEE Int. Conf. on Tools with Artificial Intelligence, vol. 2, pp. 3-10, 2007.
[Kar95] D.A. Karras and S.J. Perantonis, “An Efficient constrained training algorithm for feedforward networks”, IEEE Transactions on Neural Networks, vol. 6, no. 6, 1420-1434, 1995.
[Kat08a] I. Katakis, G. Tsoumakas, E. Banos, N. Bassiliades, I. Vlahavas, “An Adaptive Personalized News Dissemination System”, Journal of Intelligent Information Systems, Springer, 2008.

[Kat08b] I. Katakis, G. Tsoumakas and I. Vlahavas, “An Ensemble of Classifiers for Coping with Recurring Contexts in Data Streams”, Proc. 18th European Conference on Artificial Intelligence (ECAI’08), Patras, Greece, 2008.
[Kol88] S. Kollias and D. Anastassiou, “An adaptive least squares algorithm for efficient training of artificial neural networks”, IEEE Trans. on Circuits and Systems, vol. 36, no. 8, 1988.
[Kol96] S. Kollias, “Multiresolution neural networks and invariant image recognition”, Neurocomputing, vol. 12, pp. 35-57, 1996.
[Kof07] N. Kofidis, K. Diamantaras, A. Margaris and M. Roumeliotis, “Blind System Identification: Instantaneous Mixtures of Binary Sources”, International Journal of Computer Mathematics, 2007
[Kou94] K. Koutroumbas and N. Kalouptsidis, “Qualitative analysis of the parallel and asynchronous modes of the Hamming network”, IEEE Transactions on Neural Networks, vol. 5, no. 3, pp. 380-391, May 1994.

[Kou04] K. Koutroumbas, “Recurrent Algorithms for selecting the maximum input”, Neural Processing Letters, vol. 20, no. 3, pp. 179-197, 2004.
[Kou05a] K. Koutroumbas and N. Kalouptsidis, “Generalized Hamming networks and applications”, Neural Networks, vol. 18, no. 7, pp. 896-913, 2005.
[Kou05b] K. Koutroumbas, “CO-MAX: A cooperative method for determining the position of the maxima”, Neural Processing Letters, vol. 22, no 2, pp. 205-221, 2005.
[Kok06] J.R. Kok and N. Vlassis, “Collaborative Multiagent Reinforcement Learning by Payoff Propagation”, Journal of Machine Learning Research, vol. 7, pp. 1789-1828, 2006.
[Kon03] S. Konstantopoulos, “A Data-Parallel Version of Aleph”, Proceedings of the Workshop on Parallel and Distributed Computing for Machine Learning at the Joint European Conference on Machine Learning and on Principles and Practices of Knowledge Discovery in Databases (ECML/PKDD), Dubrovnik, Croatia, 2003.

[Kow05] W. Kowalczyk and N. Vlassis, “Newscast EM”, In Advances in Neural Information Processing Systems 17. MIT Press, 2005.
[Kur07] K. Kurihara, M. Welling and N. Vlassis, “Accelerated variational Dirichlet process mixtures”, In Advances in Neural Information Processing Systems 19. MIT Press, 2007.
[Lik03] A. Likas, N. Vlassis and J. Verbeek, "The Global K-means Clustering Algorithm", Pattern Recognition, vol. 36, pp. 451-461, 2003.
[Mag97] G.D. Magoulas, M.N. Vrahatis and G.S. Androulakis, “Effective backpropagation training with variable stepsize”, Neural Networks, vol. 10, no. 1, pp. 69-82, 1997.

[Mag99] G.D. Magoulas, M.N. Vrahatis and G.S. Androulakis, “Improving the convergence of the backpropagation algorithm using learning rate adaptation methods”, Neural Computation, vol. 11, no. 7, pp. 1769-1796, 1999.
[Mag01] G.D. Magoulas, V.P. Plagianakos, G.S. Androulakis and M.N. Vrahatis, “A framework for the development of globally convergent adaptive learning rate algorithms”, International Journal of Computer Research, vol. 10, no. 1, pp. 1-10, 2001.
[Mag01a] G.D. Magoulas, V.P. Plagianakos and M.N. Vrahatis, “Adaptive stepsize algorithms for on-line training of neural networks”, Nonlinear Analysis: Theory, Methods and Applications, vol. 47, pp. 3425-3430, 2001.
[Mag01b] G.D. Magoulas, V.P. Plagianakos and M.N. Vrahatis, “Improved Neural Network-based Interpretation of Colonoscopy Images Through On-line Learning and Evolution”, In Proc. European Symposium on Intelligent Technologies (EUNITE 2001), Tenerife, Spain, pp. 402-407, 2001.
[Mag02] G.D. Magoulas, V.P. Plagianakos and M.N. Vrahatis, “Globally convergent algorithms with local learning rates”, IEEE Transactions on Neural Networks, vol. 13, no. 3, pp. 774 -779, 2002.

[Mag04] G.D. Magoulas, V.P. Plagianakos and M.N. Vrahatis, “Neural network-based colonoscopic diagnosis using on-line learning and differential evolution ”, Applied Soft Computing, vol. 4, no. 4, pp. 369-379, 2004.
[Mag06a] G.D. Magoulas and M.N. Vrahatis, “Adaptive algorithms for neural network supervised learning: A deterministic optimization approach”, International Journal of Bifurcation and Chaos, vol. 16, pp.1929-1950, 2006.
[Mag06b] P. Magdalinos, C. Doulkeridis, M. Vazirgiannis, "K-Landmarks: Distributed Dimensionality Reduction for Clustering Quality Maintenance", In Proc. ECML-PKDD 2006, Berlin, Germany, 2006.
[Mar04a] M. Maragoudakis, A. Thanopoulos, K. Sgarbas, N. Fakotakis, “Domain Knowledge Acquisition and Plan Recognition by Probabilistic Reasoning,” International Journal on Artificial Intelligence Tools, Special Issue on AI Techniques in Web-Based Educational Systems, vol. 13, no. 2, pp. 333-365, 2004.

[Mar04b] M. Maragoudakis, T. Ganchev, N. Fakotakis, G. Kokkinakis, “Bayesian Reinforcement for a Probabilistic Neural Net Part-of-Speech Tagger, Proc. of the 7th Int Conf on Text, Speech and Dialogue (TSD), Lecture Notes in Artificial Intelligence (LNAI), pp. 137-145, Brno, Czech Republic,2004.
[Mar06] M. Maragoudakis, N. Fakotakis, “Bayesian Feature Construction,” In Proc of the 4th Hellenic Conference on Artificial Intelligence, Heraklion, Crete, 2006.
[Mar07] M. Maragoudakis, A. Thanopoulos, N. Fakotakis, “Meteobayes: Effective Plan Recognition in a Weather Dialogue System”, IEEE Intelligent Systems, Vol. 22, No 1, pp. 66-78, 2007.
[Mar08] M. Maragoudakis, N. Fakotakis, “Bayesian Feature Construction for the Improvement of Classification Performance,” IEEE Transactions on Knowledge and Data Engineering (to appear).

[Marg07] A.I. Margaris and K.I. Diamantaras, “A parallel implementation of the Natural Gradient BSS method using MPI”, 2nd Int. Conference on Experiments/Process/System Modeling/Simulation & Optimization (2nd IC-EpsMsO), Athens, 2007.
[Mari07] Y. Marinakis, M. Marinaki and G. Dounias, “Particle Swarm Optimization for Pap-Smear Cell Classification”, Expert Systems with Applications, 2007.
[Mari08] Y. Marinakis and G. Dounias, “Nature Inspired Intelligence in Medicine: Ant Colony Optimization for Pap-Smear Diagnosis”, Int. Journal on Artificial Intelligence Tools, vol. 17, no. 2, pp. 279-301, 2008.
[Mav06] M. E. Mavroforakis and S. Theodoridis, “A Geometric Approach to Support Vector Machine (SVM) Classification,”, IEEE Transactions on Neural Networks, vol. 17, no. 3, pp. 671-682, 2006.

[Mav07] D. Mavroeidis and M. Vazirgiannis, “Stability based Sparse LSI/PCA: Incorporating Feature Selection in LSI/PCA”, in Proc. of the 18th European Conference on Machine Learning, volume 4701 of Lecture Notes in Computer Science, Springer-Verlag 2007.

[Mic05] I. Michailidis, K.I. Diamantaras and S. Vasileiadis, “Greek Named Entity Recognition using Support Vector Machines”, in Proc. 7th Int Conference in Greek Linguistics (ICGL), York, UK, 2005.
[Mic06] I. Michailidis, K.I. Diamantaras, S. Vasileiadis and Y. Frere, “Greek Named Entities Recognition using Support Vector Machines, Maximum Entropy Models and Onetime”, in Proc. 5th Int. Conference on Language Resources and Evaluation (LREC), pp. 47-52, Genova, Italy, 2006.

[Mou98] V. Moustakis, M. Blazadonakis, M. Marazakis and G. Potamias, “Assessment of Boolean minimization in symbolic empirical learning.” Applied Artificial Intelligence, vol. 12, no. 4, pp. 329-342, 1998.
[Mou07] V. Moustakis, M.L. Laine, L. Koumakis, G. Potamias, L. Zampetakis and B.G. Loos, “Modelling Genetic Susceptibility: a Case Study in Periodontitis”, IDAMAP-2007: Intelligent Data Analysis in bioMedicine And Pharmacology, AIME 07 workshop, pp. 59-64, Amsterdam, The Netherlands, 2007.
[Pal95] G. Paliouras and D.S. Brée, “The Effect of Numeric Features on the Scalability of Inductive Learning,” Proceedings of the European Conference on Machine Learning (ECML), Lecture Notes in Artificial Intelligence, no. 912, pp. 218-231, Springer-Verlag, 1995.

[Pal97] G. Paliouras, Refinement of Temporal Constraints in an Event Recognition System using Small Datasets. Ph.D. Thesis, Department of Computer Science, University of Manchester, UK, 1997.

[Pal00] G. Paliouras, C. Papatheodorou, V. Karkaletsis and C.D. Spyropoulos, “Clustering the Users of Large Web Sites into Communities”, Proc of the International Conference on Machine Learning (ICML 2000), pp. 719-726, Stanford, California, 2000.
[Pal05] G. Paliouras, “On the Need to Bootstrap Ontology Learning with Extraction Grammar Learning,” In Proceedings of the International Conference on Conceptual Structures (ICCS), Kassel, Germany, July, Lecture Notes in Artificial Intelligence, no. 3596, pp. 119-135, Springer Verlag, 2005, (Invited talk at the conference).

[Pap01] A. Papagelis and D. Kalles. “Breeding Decision Trees Using Evolutionary Techniques”, In Proc. International Conference on Machine Learning, Williamstown, Massachusetts, June-July 2001.
[Pap04] Th. Papadimitriou, K.I. Diamantaras, M.G. Strintzis and M. Roumeliotis, “Video Scene Segmentation using Spatial Contours and 3-D Robust Motion Estimation”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 4, April 2004.
[Pap05] E.I. Papageorgiou, K.E. Parsopoulos, C.D. Stylios, P.P. Groumpos and M.N. Vrahatis, “Fuzzy cognitive maps learning using particle swarm optimization”, JournalofIntelligentInformationSystems, vol. 25, no. 1, pp. 95-121, 2005.

[Pap06] Th. Papadimitriou and K.I. Diamantaras, “hannel Shortening Of Multi-Input Multi-Output Convolutive Systems With Binary Sources” in Proc. 2006 IEEE Int. Workshop on Machine Learning for Signal Processing (MLSP-2006), pp. 271-276, Maynooth, Ireland, 2006.
[Par01] K.E. Parsopoulos, V.P. Plagianakos, G.D. Magoulas and M.N. Vrahatis, “Objective function “stretching” to alleviate convergence to local minima”, Nonlinear Analysis: Theory, Methods and Applications, vol. 47, pp. 3419-3424, 2001.
[Par06] I. Partalas, G. Tsoumakas, I. Katakis and I. Vlahavas. "Ensemble Pruning using Reinforcement Learning ", Proc. 4th Hellenic Conference on Artificial Intelligence (SETN 2006), LNAI 3955, pp 301-310, Heraklion, Greece, 2006.

[Par07] I. Partalas, I. Feneris and I. Vlahavas, ''Multi-Agent Reinforcement Learning using Strategies and Voting'', 19th IEEE International Conference on Tools with Artificial Intelligence, pp 318-324, Patras, Greece, 29-31, 2007.
[Par08] I. Partalas, G. Tsoumakas and I. Vlahavas. “Focused Ensemble Selection: A Diversity-Based Method for Greedy Ensemble Selection”, Proc. 18th European Conference on Artificial Intelligence (ECAI’08), Patras, Greece, 21-25, 2008.
[Pat07] Ch. Pateritsas and A. Stafylopatis, “Memory-based Classification with Dynamic Feature Selection using Self-Organizing Maps for Pattern Evaluation”, International Journal on Artificial Intelligence Tools, vol. 16, no 5, pp 875-899, 2007.

[Pav05] N.G. Pavlidis, D.K. Tasoulis, V.P. Plagianakos, G. Nikiforidis and M.N. Vrahatis, “Spiking neural network training using evolutionary algorithms”¸ In: InternationalJointConferenceonNeuralNetworks (IJCNN 2005), pp. 2190-2194 2005.
[Per95] S.J. Perantonis and D.A. Karras, “An efficient constrained learning algorithm with momentum acceleration, Neural Networks, vol. 8, no. 2, pp. 237-249, 1995.
[Per99] S.J. Perantonis and V. Virvilis, “Input feature extraction for multilayered perceptrons using supervised principal components analysis”, Neural Processing Letters, vol. 10, no. 3, pp. 243-252, 1999.
[Per00a] S.J. Perantonis, N. Ampazis and V. Virvilis, “A learning framework for neural networks using constrained optimization methods”. Annals of Operations Research , vol. 99, pp. 385-401, 2000.

[Per00b] S.J. Perantonis and V. Virvilis, “Efficient perceptron learning using constrained steepest descent”, Neural Networks, vol. 13, no. 3, pp. 351-364, 2000.
[Pet04] S. Petridis and S.J. Perantonis, “On the relation between discriminant analysis and mutual information for supervised linear feature extraction”, Pattern Recognition, vol. 37, no. 5, pp. 857-874, 2004.
[Per08] M. Pertselakis and A. Stafylopatis, “Dynamic Modular Fuzzy Neural Classifier with Tree-based Structure Identification (MoDFuNC), Neurocomputing, vol. 71, no. 4-6, pp 801-812, 2008.
[Pet04a] G. Petasis, G. Paliouras, V. Karkaletsis, C. Halatsis, and C.D. Spyropoulos, “e-GRIDS: Computationally Efficient Grammatical Inference from Positive Examples,” GRAMMARS, vol.7, pp. 69-110, 2004.

[Pet04b] G. Petasis, G. Paliouras, C. D. Spyropoulos and C. Halatsis, “eg-GRIDS: Context-Free Grammatical Inference from Positive Examples using Genetic Search,” In Proceedings of the 7th International Colloquium on Grammatical Inference (ICGI), Lecture Notes in Artificial Intelligence, no. 3264, pp. 223-234, Springer Verlag, 2004.
[Pet08] G. Petasis, V. Karkaletsis, G. Paliouras and C. D. Spyropoulos, “Learning context-free grammars to extract relations from text,” In Proceedings of the European Conference on Artificial Intelligence (ECAI), Patras, Greece, 2008.
[Pie05] D. Pierrakos and G. Paliouras, “Exploiting Probabilistic Latent Information for the Construction of Community Web Directories,” In Proceedings of the International User Modelling Conference (UM), Edinburgh, UK, July, Lecture Notes in Artificial Intelligence, no. 3538, pp. 89-98, Springer Verlag, 2005, [Best Student Paper Award].

[Pik06] A. Pikrakis, S. Theodoridis and D. Kamarotos, “Recognition of musical patterns using hidden Markov models”, IEEE Transactions on Audio, Speech and Language Processing, vol.14, no. 5, pp. 1795-1807, 2006.
[Pla01] V.P. Plagianakos, G.D. Magoulas, and M.N. Vrahatis, “Learning in multilayer perceptrons using global optimization strategies”, Nonlinear Analysis: Theory, Methods and Applications, vol. 47, pp. 3431-3436, 2001.
[Pla02a] V.P. Plagianakos, G.D. Magoulas, and M.N. Vrahatis, “Deterministic nonmonotone strategies for effective training of multilayer perceptrons”, IEEE Transactions on Neural Networks, vol. 13, no. 6, pp. 1268-1284, 2002.
[Pla02b] V.P. Plagianakos and M.N. Vrahatis, “Parallel Evolutionary Trained Algorithms for “Hardware Friendly” Neural Networks”, Natural computing, vol. 1, no. 2-3, pp. 307-322, 2002.

[Pla06a] V.P. Plagianakos, G.D. Magoulas, and M.N. Vrahatis, “Distributed Computing Methodology for Training Neural Networks in an Image-guided Diagnostic Application”, Computer Methods and Programs in Biomedicine, vol. 81, no. 3, pp. 228-235, 2006.
[Pla06b] V.P. Plagianakos, G.D. Magoulas, and M.N. Vrahatis, “Evolutionary training of hardware realizable multilayer perceptrons”, Neural Computing & Applications, vol. 15, no. 1, pp. 33-40, 2006.
[Por06] J.M. Porta, N. Vlassis, M.T.J. Spaan and P. Poupart, “Point-based value iteration for continuous POMDPs”, Journal of Machine Learning Research, vol. 7, pp. 2329-2367, 2006.
[Pot99] G. Potamias, “MICSL: Multiple Iterative Constraint Satisfaction based Learning”, Intelligent Data Analysis 3(4): 245-265, 1999.

[Pot04a] G. Potamias, L. Koumakis and V. Moustakis, “Gene Selection via Discretized Gene-Expression Profiles and Greedy Feature-Elimination”, LECT NOTES ARTIF INT (LNAI), vol. 3025, pp. 256-266, 2004.
[Pot04b] G. Potamias, L. Koumakis and V. Moustakis, “Mining XML Clinical Data: The HealthObs System”, Ingenierie des systems d'information, vol.10, no. 1, pp. 59-79, 2004.
[Pot04c] G. Potamias and C. Dermon, “Protein Synthesis Profiling in the Developing Brain: A Graph Theoretic Clustering Approach”, Computer Methods and Programs in Biomedicine, vol. 76, no. 2, pp. 115-129, 2004.
[Pot06] G. Potamias and A. Kanterakis, “Feature Selection for the Promoter Recognition and Prediction Problem”, International Journal of Data Warehousing and Mining, vol. 3, no. 3, pp. 60-78, 2006.

[Pou06] P. Poupart, N. Vlassis, J. Hoey, and K. Regan, “An analytic solution to discrete Bayesian reinforcement learning”, In Proc. 23rd Int. Conf. on Machine Learning, Pittsburgh, USA, 2006.
[Sga95] K. Sgarbas, N. Fakotakis and G. Kokkinakis, “Genetically Evolved Strategies. Winning by Selective Processing of the Chromosome Pool,” IEEE Potentials, Vol.14, No.1, pp.36-40, 1995.
[Sig04] G. Sigletos, G. Paliouras, C. D. Spyropoulos and P. Stamatopoulos, “Stacked generalization for information extraction,” In Proceedings of the European Conference in Artificial Intelligence (ECAI), pp. 549–553, Valencia, Spain, 2004.
[Sig05] G. Sigletos, G. Paliouras, C.D. Spyropoulos and M. Hatzopoulos, “Combining Information Extraction Systems Using Voting and Stacked Generalization,” Journal of Machine Learning Research, vol.6, pp. 1751-1782, 2005.

[Sla08] K. Slavakis, S. Theodoridis and I. Yamada, “Online classification using kernels and projection-based adaptive algorithms”, IEEE Transactions on Signal Processing, 2008.
[Spa05] M.T.J. Spaan and N. Vlassis, “Perseus: randomized point-based value iteration for POMDPs”, Journal of Artificial Intelligence Research, vol. 2, pp. 195-220, 2005.
[Sta05] G. Stamou and S. Kollias, “Multimedia Content and the Semantic Web”, Wiley, 2005.
[Sto06] G.Stoilos, N. Simou, G. Stamou and S, Kollias, “Uncertainty and the semantic web”, IEEE Intelligent Systems, vol. 21, no. 5, pp. 84-87, 2006.

[Sta05] E. Stamatatos, and G. Widmer. “Automatic Identification of Music Performers with Learning Ensembles.” Artificial Intelligence, vol. 165, no. 1, pp. 37-56, Elsevier, 2005.
[Sta06a] E. Stamatatos, “Authorship Attribution Based on Feature Set Subspacing Ensembles,” Int. Journal on Artificial Intelligence Tools, vol. 15, no. 5, pp. 823-838, World Scientific, 2006
[Sta06b] E. Stamatatos, “Text Sampling and Re-Sampling for Imbalanced Author Identification Cases”, In Proc. of the 17th European Conference on Artificial Intelligence (ECAI'06), 2006.

[Sta08] E. Stamatatos, “Author Identification: Using Text Sampling to Handle the Class Imbalance Problem,” Information Processing and Management, vol. 44, no. 2, pp. 790-799, Elsevier, 2008.
[Tas04] D.K. Tasoulis and M.N. Vrahatis, “Unsupervised distributed clustering”, In IASTED International Conference on Parallel and Distributed Computing and Networks, pp. 347-351, Innsbruck, 2004.
[Tas05a] D.K. Tasoulis and M.N. Vrahatis, “The new window density function for efficient evolutionary unsupervised clustering”, IEEE Congress on Evolutionary Computation, CEC 2005, vol. 3, pp. 2388-2394, 2005.
[Tas05b] D.K. Tasoulis and M.N. Vrahatis, “Unsupervised clustering on dynamic databases”, Pattern Recognition Letters, vol. 26, no. 13, pp. 2116-2127, 2005.

[Tas06a] D.K, Tasoulis, D. Zeimpekis, E. Gallopoulos and M.N. Vrahatis, “Oriented k-windows: A PCA driven clustering method”, 2nd Workshop on Algorithmic Techniques for Data Mining, In: Advances in Web Intelligence and Data Mining, Series: Studies in Computational Intelligence, vol. 23, pp. 319-328, 2006.
[Tas06b] D.K. Tasoulis and M.N. Vrahatis, “Unsupervised Clustering Using Fractal Dimension”, International Journal of Bifurcation and Chaos, vol. 16, no. 7, pp. 2073-2079, 2006.
[Tho06] N.S. Thomaidis, V. Tzastoudis and G. Dounias, “A comparison of neural network model-selection strategies for the pricing of S&P 500 stock index options”, Int. Journal of Artificial Intelligence Tools, pp. 1093-1113, 2006.

[Tho07] N. Thomaidis, “New Trends in Financial Engineering: Combining Stochastic and Computational Intelligent Methodologies”, PhD Thesis, University of the Aegean, Department of Financial and Management Engineering, 2007.
[Tit01] M. Titsias and A. Likas, "Shared Kernel Models for Class Conditional Density Estimation", IEEE Trans. on Neural Networks, vol. 12, no. 5, pp. 987-997, 2001.
[Tit02] M. Titsias and A. Likas, "Mixture of Experts Classification Using a Hierarchical Mixture Model", Neural Computation, vol. 14, no. 9, pp. 2221-2244, 2002.

[Tit03] Μ. Titsias and Α. Likas, "Class Conditional Density Estimation Using Mixtures with Constrained Component Sharing", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 25, no.7, pp. 924-928, 2003.
[Tsa04a] A. Tsakonas, G. Dounias, V. Aggelis, I. Karkazis, “An Evolutionary System for Neural Logic Networks using Genetic Programming and Indirect Encoding,” Journal of Applied Logic Special Issue on Neural-Symbolic Systems, vol. 2, no.3, pp. 349-379, 2004.
[Tsa04b] A. Tsakonas, “Evolutionary Neural Logic Networks”, PhD Thesis (in Greek), 2004.
[Tsa06a] A. Tsakonas and G. Dounias, “Evolving Neural-Symbolic Systems Guided by Adaptive Training Schemes: Applications in Finance,” Applied Artificial Intelligence Journal vol. 21, no. 7, pp. 681-706, 2006.

[Tsa06b] A. Tsakonas, T. Tsiligianni and G. Dounias, “Evolutionary Neural Logic Networks in Splice-Junction Gene Sequences Classification”, Int. Journal of Artificial Intelligence Tools, vol. 15, no. 2, pp. 287-307, 2006.
[Tsa03a] I. Tsamardinos and C.F. Aliferis, “Towards Principled Feature Selection: Relevancy, Filters, and Wrappers”, Ninth International Workshop on Artificial Intelligence and Statistics (AI&Stats 2003),Key West, Florida, USA, 2003.
[Tsa03b] I. Tsamardinos, C.F. Aliferis and A. Statnikov, “Time and Sample Efficient Discovery of Markov Blankets and Direct Causal Relations”, in Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp. 673-678, 2003.

[Tsa08] I. Tsamardinos and L.E. Brown, “Bounding the False Discovery Rate in Local Bayesian Network Learning”, 23rd AAAI Conference on Artificial Intelligence, 2008 (AAAI-2008).
[Tso04] G. Tsoumakas, I, Katakis and I. Vlahavas, "Effective Voting of Heterogeneous Classifiers ", Proc. 15th European Conference on Machine Learning (ECML 2004), Springer-Verlag, LNAI 3201, pp. 465-476, Pisa, Italy, 2004.
[Tso05] G. Tsoumakas, L. Angelis and I. Vlahavas, , "Selective Fusion of Heterogeneous Classifiers", Intelligent Data Analysis, vol. 9, no. 6, pp. 511-525, IOS Press, 2005.

[Tso07a] G. Tsoumakas and I. Katakis, "Multi-Label Classification: An Overview", International Journal of Data Warehousing and Mining, vol. 3, no. 3, pp. 1-13, 2007.
[Tso07b] G. Tsoumakas and I. Vlahavas, "Random k-Labelsets: An Ensemble Method for Multilabel Classification", Proc. 18th European Conference on Machine Learning (ECML 2007), pp. 406-417, Warsaw, Poland, 2007.
[Tro06] N. Trogkanis and G. Paliouras, “TPN2: Using positive-only learning to deal with the heterogeneity of labeled and unlabeled data”, In Proceedings of the Discovery Challenge at the Joint European Conference on Machine Learning and on Principles and Practices of Knowledge Discovery in Databases (ECML/PKDD), pp. 63-74, Berlin, Germany, 2006 [Runner-up participation in both tasks of the challenge].

[Tza05] G. Tzanis, C. Berberidis, A. Alexandridou and I. Vlahavas, "Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences", In Proc. of the10th Panhellenic Conference on Informatics (PCI'2005), pp. 426-436, 2005.
[Tza06a] G. Tzanis, C. Berberidis and I. Vlahavas, "A Novel Data Mining Approach for the Accurate Prediction of Translation Initiation Sites", In Proc. of the 7th Int. Symposium on Biological and Medical Data Analysis, Thessaloniki, Greece, 2006.
[Tza06b] G. Tzanis and I. Vlahavas, "Prediction of Translation Initiation Sites Using Classifier Selection", In Proc. of the 4th Hellenic Conference on Artificial Intelligence (SETN'06), pp. 367-377, Heraklion, Crete, Greece, 2006.

[Tza07] G. Tzanis, C. Berberidis and I. Vlahavas. "MANTIS: A Data Mining Methodology for Effective Translation Initiation Site Prediction". In Proc. of the 29th Annual Int. Conference of the IEEE Engineering in Medicine and Biology Society, 2007.
[Tzo08] G. Tzortzis and A. Likas, “The Global Kernel K-Means Clustering Algorithm”, Proc. Int. Joint Conference on Neural Networks, Hong-Kong, 2008.
[Ver03] J.J. Verbeek, N. Vlassis and B.J.A. Kröse, “Efficient greedy learning of Gaussian mixture models”, Neural Computation, vol. 15, no. 2, pp. 469-485, 2003.

[Ver06] J.J. Verbeek, J.R.J. Nunnink and N. Vlassis, “Accelerated EM-based clustering of large datasets”, Data Mining and Knowledge Discovery, vol. 13, no. 3, pp. 291-307, 2006.
[Vla02] N. Vlassis and A. Likas, “A greedy EM algorithm for Gaussian mixture learning”, Neural Processing Letters, vol. 15, no. 1, pp. 77-87, 2002.
[Vog07] D. Vogiatzis and N. Tsapatsoulis, “Clustering Microarray Data with Space Filling Curves,” In Proceedings of the International Workshop on Fuzzy Logic and Applications (WILF), pp. 529-536, Genoa, Italy, 2007.
[Vra00a] M.N. Vrahatis, G.D. Magoulas and V.P. Plagianakos, “Globally convergent modification of the Qprop method”, Neural Processing Letters, vol. 12, no. 2, pp. 159 -170, 2000.

[Vra00b] M.N. Vrahatis, G.S. Androulakis, J.N. Lambrinos and G.D. Magoulas “A class of gradient unconstrained minimization algorithms with adaptive stepsize”, Journal of Computational and Applied Mathematics, vol. 114, pp. 367-386, 2000.
[Vra02] M.N. Vrahatis, B. Boutsinas, P. Alevizos and G. Pavlides, “The new k-windows algorithm for improving the k-means clustering algorithm”, Journal of Complexity, vol. 18, pp. 375-391, 2002.
[Vra03] M.N. Vrahatis, G.D. Magoulas and V.P. Plagianakos, “From linear to nonlinear iterative methods”, Applied Numerical Mathematics, vol. 45, pp. 59-77, 2003.

[Zam07] M. Zampoglou, Th. Papadimitriou and K.I. Diamantaras, “Support Vector Machines Content-Based Video Retrieval based solely on Motion Information”, in Proc. 17th IEEE Int. Workshop on Machine Learning for Signal Processing (MLSP-2007), Thessaloniki, Greece, 2007.
[Zam08] M. Zampoglou, Th. Papadimitriou and K.I. Diamantaras, “Integrating Motion and Color for Content Based Video Classification”, in Proc. IAPR Workshop on Cognitive Information Processing (CIP-2008), Santorini, Greece, 2008.
[Zav07] E. Zavitsanos, G. Paliouras, G. Vouros and S. Petridis, “Discovering Subsumption Hierarchies of Ontology Concepts from Text Corpora,” In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI), Silicon Valley, USA, 2007.

[Zav08] E. Zavitsanos, G. Paliouras and G. Vouros, “A Distributional Approach to Evaluating Ontology Learning Methods Using a Gold Standard,” In Proceedings of the 3rd Workshop on Ontology Learning and Population (OLP) at the European Conference on Artificial Intelligence (ECAI), Patras, Greece, 2008.