Inst. of Informatics
Institute for Language and Speech Processing
Department of Informatics
TABLE OF CONTENTS
Natural Language Processing
The following account attempts to present in a concise, yet informative, manner the achievements of Greek research and development work in the area of Natural Language Processing and neighboring scientific and application fields.
Apart from the intricacies of language as a communication medium in general, natural language processing for a less-widely spoken language like Greek poses its own challenges. Notably, while some of the research and development work reported has been based on English data sets, the majority of the research endeavour has focused on Greek, trying to model language phenomena on the one hand and develop useful applications on the other. This is reflected in the high number of research groups and researchers trying to attack language processing problems from the morphographemic and phonetic level to technological solutions for access to information and content.
In the sections to follow we append concise descriptions of the activities of individual groups. The data presented here have been sourced by way of structured questionnaires addressed to the members of the Hellenic Artificial Intelligence Association and the Hellenic Language Technologies List. This account is not considered to be exhaustive and is constantly updated.
The following NLP areas have received considerable attention by the Greek research community.
1. language processing tools
1.1. morphological - syntactic - semantic - discourse analysers
1.2. keyword(s) and term extraction
1.3. named entity recognition
1.4. automatic linguistic knowledge elicitation techniques
1.5. textual entailment and paraphrasing
2. authoring tools
2.1. intelligent authoring aids (spelling/grammar, thesauri)
2.2. controlled languages and applications
3. machine translation
3.1. computer-aided translation (bilingual terminology elicitation, translation memories)
3.2. fully automatic machine translation (rule or statistics based)
4. speech processing
4.1. speech recognition
4.2. speech synthesis
4.3. speech-based dialog processing
4.4. speech to speech translation
5. information retrieval and extraction
5.1. language-aware information retrieval (from term conflation and stemming to term extraction, thesaurus-based or conceptual indexing, latent semantics indexing)
5.2. cross-lingual information retrieval
5.3. information extraction (incl. all component technologies e.g., pattern-based syntactico-semantic analysis, coreference resolution, event recognition)
5.4. text/document classification
5.5. text/document summarisation (mono & multi)
5.6. question answering
6. multimedia information processing (MIP)
6.1. image/video processing for MIP
6.2. speech processing for MIP
6.3. language processing for MIP
6.4. information retrieval and extraction applications related to MIP
7. natural language generation
Speech Group, Department of Informatics and Telecommunications, University of Athens
Panepistimiopolis, Ilissia, GR-15784, Athens, Greece, http://speech.di.uoa.gr/
Speech synthesis: Prosody modeling and prediction for high quality text-to-speech (TtS) synthesis. Emotional speech synthesis. Document-to-Speech synthesis (DtS): DtS incorporates vocalization of: a) visual document elements, like typesetting (bold, italics, underlined) and font elements (size, type, color, and background color), b) non-visual document elements (chapter, section, subsection, title, subtitle, paragraph, header, footer, caption) and c) document structures like: mathematical equations, complex data tables and lists. Singing Voice synthesis (including chant synthesis). MBROLA Greek Voices. Greek prosodic corpora (ToBI enriched). Multilingual-polyglot TtS and DtS platform DEMOSTHeNES (http://demosthenes.di.uoa.gr). Voice-Output-Communication Aids and Augmentative and Alternative Communications Systems (AAC) for the speech disabled. Talking books.
Speech Recognition: Acoustic models and speech segmentation for the Greek language. Music and musical instrument recognition. Speech and text annotated corpora.
Spoken dialog technology: Spoken Dialog-based accessing of documents (e.g. newspapers, books, magazines). Location-based Voice Dialog services. User modeling and Design-for-all in Voice Interactive Systems. Speech-only User Interfaces for e-services (e.g. e-government, e-learning).
Language processing tools: Text Normalization of non-Standard words. Automatic semantic labeling of text formatting. Morphological analysis for Greek (powered by a 1,7G lexicon). Prosodic Feature Annotation of Text Corpora. Repairing ill-formed or telegraphic language to well-formed sentences. Translation from non-orthographic to natural language in AAC communication.
Recent research projects:
- RHETOR (2006-2008): Spoken dialogue interface using acoustic rendering of visual document meta-information for a design-for-all access to documents.
- GR-Prosody (2002-2005): Synthetic speech improvement using high-level prosodic feature annotation.
- M-PIRO (2000-2003): Multilingual Personalised Information Objects.
- AOIDOS (2005-2008): Digital analysis and Synthesis of Chant and Operatic Voices.
Recent Indicative References
Xydas, G. and Kouroupetroglou, G.: "Tone-group F0 selection for modelling focus prominence in small-footprint speech synthesis", Speech Communication, Vol. 48(9), 2006, pp 1057-1078
Calder. J. A. C. Melengoglou, C. Callaway, E. Not, F. Pianesi, I. Androutsopoulos, C. Spyropoulos G. Xydas, G. Kouroupetroglou and M. Roussou: “Multilingual Personalized Information Objects”, chapter in the book: Multimodal Intelligent Information Presentation, O. Stock and M. Zancanaro (Eds), "Text, Speech and Language Technology" Series, Springer, 2005, pp 177-202
Xydas G., Spiliotopoulos D. and Kouroupetroglou G.: "Modelling Improved Prosody Generation from High-Level Linguistically Annotated Corpora", IEICE Trans. Information and Systems, Special Issue on Corpus-Based Speech Technologies, Vol. E88D(3), 2005, pp 510-518
Fellbaum, K. and G. Kouroupetroglou: "Principles of Electronic Speech Processing with Applications for People with Disabilities", Technology and Disability, 2008
Tsonos, D., G. Xydas and G. Kouroupetroglou: “A Methodology for Reader's Emotional State Extraction to Augment Expressions in Speech Synthesis”, Proc. 19th IEEE Int. Conf. on Tools with Artificial Intelligence (ICTAI 2007), Patras, Greece, 29-31 Oct. 2007, Vol. II, pp 218-225
Xydas, G., G.Karberis and G.Kouroupetroglou: "Text Normalization for the Pronunciation of Non-Standard Words in an Inflected Language", Lecture Notes in Artificial Intelligence (LNAI) Vol. 3025, pp 390-399, 2004
Kouroupetroglou, G., D. Delviniotis and G. Chryssochoidis: "DAMASKINOS: The Model Tagged Acoustic Corpus of Byzantine Ecclesiastic Chant Voices", Proc. of the Conf. ACOUSTICS 2006, 18-19 Sept. 2006, Heraclion, Greece, pp. 68-76
Fourli-Kartsouni,F., K.Slavakis, G.Kouroupetroglou, and S.Theodoridis: "A Bayesian Network Approach to Semantic Labeling of Text Formatting in XML Corpora of Documents", Lecture Notes in Computer Science (LNCS) Vol. 4556, pp 299-308, 2007
Xydas, G., P. Zervas, G. Kouroupetroglou, N. Fakotakis and G. Kokkinakis: “Tree-Based Prediction of Prosodic Phrase Breaks on top of Shallow Textual Features”, Proc., 9th Conference on Speech Communication and Technology (INTERSPEECH 2005), Lisbon, 4-8 Sept. 2005, pp 3237-3240
The members of the group have over 30 recent NLP publications mainly in the fields of Question Answering, Information and Knowledge Extraction, Multimedia Information Processing and Lexicons.
SELECTED QUESTION ANSWERING PUBLICATIONS
- Kontos,J. (1992). ARISTA: Knowledge Engineering with Scientific Texts. Information and Software Technology, Vol. 34, No 9.
- Kontos,J., Malagardi, I. (1998). Question Answering and Information Extraction from Texts. EURISCON ’98 Third European Robotics, Intelligent Systems & Control Conference. Athens. Published in Conference Procedings “Advances in Intelligent Systems: Concepts, Tools and Applications” (Kluwer). ch. 11.
- Kontos,J., Malagardi, I. (1999). Information Extraction and Knowledge Acquisition from Texts using Bilingual Question-Answering. Journal of Intelligent and Robotic Systems. Kluwer Academic Publishers.
- Kontos,J., Malagardi, I., Peros, J. (2005a). Question Answering and Rhetoric Analysis of Biomedical Texts in the AROMA System. Proc. 7th Hellenic European Research on Computer Mathematics & its Applications Conference. Athens.
- Kontos,J., Malagardi, I., J. Lekakis and Peros, J. (2005b). Grammars for Question Answering Systems based on Intelligent Text Mining in Biomedicine.HERMIS International Journal of Computers Mathematics and its Applications. Vol. 8.
- Kontos,J.,Malagardi,I. (2006). “Question Answering from Procedural Semantics to Model Discovery”. Encyclopaedia of Human Computer Interaction Idea Group.
- Kontos,J , and Armaos, J. (2007). Metacognitive Question Answering from Euclid’s Elements Text. HERCMA 2007 8th Hellenic European Research on Computer Mathematics & its Applications Conference. Athens.
SELECTED INFORMATION AND KNOWLEDGE EXTRACTION Publications
- Kontos,J.(1985). Natural Language Processing of Scientific and Technical Data, Knowledge and Text Bases. (Invited paper) Proceedings of the EEC ARTINT.
- Kontos,J. (1991).On the Acquisition of Causal Knowledge from Scientific Texts with Attribute Grammars. Expert Systems & Information Management.Vol.4, No1
- Kontos,J., Malagardi, I. (1998). Information and Knowledge Extraction from Medical Texts. Health Telematics Education Conference. Athens.
- Malagardi,I., Kontos, J. (2001a). Information Extraction from Ancient Hellenic Texts using Automatic Knowledge Acquisition from Lexical Definitions. J. Neural, Parallel and Scientific Computations, Vol 9.
- Kontos,J., Malagardi, I. (2001b). Α Search Algorithm for Knowledge Acquisition from Texts HERCMA 2001, Proceedings of 5th Hellenic European Research on Computer Mathematics & its Applications Conference. Athens
- Kontos,J., Elmaoglou, A., Malagardi, I. (2002). ARISTA Causal Knowledge Discovery from Texts. Proceedings of the 5th International Conference DS 2002. Luebeck, Germany, November 2002, Springer Verlag.
- Kontos,J., Malagardi, I., Peros, J. (2003). “The AROMA System for Intelligent Text Mining” HERMIS International Journal of Computers Mathematics and its Applications. Vol. 4.
Dept. of Information and Communication Systems Eng.
University of the Aegean
Area of interest: 5.4. text/document classification
The research activity in the area of natural language processing of AI lab. of the Dept. of Information and Communication Systems Eng., University of the Aegean, focuses on text categorization problems. In particular, a significant part of their research deals with authorship attribution. A number of studies examine the significance of different kinds of features for this task including word-based features , character n-gram features , and syntax-based features . Moreover, methods specifically designed for authorship attribution have been proposed including an ensemble classification model  and a n-gram feature selection method  while special attention has been paid in dealing with limited and imbalanced training corpora [5,6], a very common condition in this problem.
Another text categorization task, in which this group is active, is text genre detection. The reported research focused mainly on the examination of the significance of specific textual features, namely, common word features  and character n-gram features  for the recognition of genre of both raw text and web pages. Finally, this group deals with the spam filtering problem. A character n-gram model has been proposed for this task and a detailed comparison with a similar word-based model revealed the potential of the character n-gram features under high cost scenarios .
- Houvardas, J. and E. Stamatatos (2006). N-gram Feature Selection for Authorship Identification, In J. Euzenat, and J. Domingue (Eds.) Proc. of the 12th Int. Conf. on Artificial Intelligence: Methodology, Systems, Applications (AIMSA'06), LNCS 4183, pp. 77-86.
- Kanaris, I., K. Kanaris, I. Houvardas, and E. Stamatatos (2007). Words vs. Character N-grams for Anti-spam Filtering, Int. Journal on Artificial Intelligence Tools, 16(6), pp. 1047-1067, World Scientific.
- Kanaris, I. and E. Stamatatos (2007). Webpage Genre Identification Using Variable-length Character n-grams, In Proc. of the 19th IEEE Int. Conf. on Tools with Artificial Intelligence, v.2, pp. 3-10.
- Stamatatos, E. (2006). Authorship Attribution Based on Feature Set Subspacing Ensembles, Int. Journal on Artificial Intelligence Tools, 15(5), pp. 823-838, World Scientific.
- Stamatatos, E. (2007). Author Identification Using Imbalanced and Limited Training Texts, In Proc. of the 4th International Workshop on Text-based Information Retrieval, DEXA 2007 Workshops, pp. 237-241.
- Stamatatos, E. (2008). Author Identification: Using Text Sampling to Handle the Class Imbalance Problem, Information Processing and Management, 44(2), pp. 790-799, Elsevier.
- Stamatatos, E., N. Fakotakis, and G. Kokkinakis (2000). Automatic Text Categorization in Terms of Genre and Author. Computational Linguistics, 26(4), pp. 461-485, MIT Press.
- Stamatatos, E., N. Fakotakis, and G. Kokkinakis (2000). Text Genre Detection Using Common Word Frequencies. In Proc. of the 18th Int. Conf. on Computational Linguistics (COLING2000), pp. 808-814.
Institute of Informatics and Telecommunications -
Software and Knowledge Engineering Laboratory (SKEL)
National Centre for Scientific Research (NCSR) “Demokritos”
The research activity in natural language processing (NLP), at NCSR, starts at late ‘70s, with two pioneering researchers in the area, namely J. Kontos and M. Katzouraki. The work of Kontos and his colleagues in knowledge acquisition from text [33, 34], as well as the work of Katzouraki and her colleagues in natural language interfaces [35, 36] need to be noted. These works created the necessary background upon which SKEL work was built and evolved since its establishment, by C.D. Spyropoulos, nearly 20 years ago.
Research in NLP was and still is a focus research area for SKEL which aims, since its establishment, at the development of technologies that enable the efficient, cost-effective and user-adaptive management and presentation of information. The processing of textual content to extract useful information, to acquire domain knowledge, and on the other hand, the presentation of extracted content and acquired knowledge with user-adaptive natural language descriptions, are among the key research areas for SKEL. More specifically, the research activity of SKEL in NLP is presented below with indicative references. Although emphasis is given to recent work, there are also a few indicative references to work performed in SKEL during the ‘90s.
Language processing tools:
SKEL strategy emphasises on the development of the necessary infrastructure to support research activities and the development of systems and services. This effort has resulted in a variety of language processing tools and resources, some of which are available to and are being used by the international research community:
- The multi-lingual, cross-platform, general-purpose text engineering environment, Ellogon (http://www.ellogon.org/) which is provided as open source S/W under LGPL license since 2004.
- Annotation tools  to facilitate the semantic annotation of textual content.
- Morphological analysis tools and resources . The Greek morphological lexicon of SKEL is used, under license, by research groups in Europe.
- Techniques for syntactic parsing , and language identification .
SKEL work in this area involved the development of:
- a tool, named Eleon for authoring natural language generation applications [5, 6]. Eleon enables the creation of domain ontologies or the import of existing ones, their maintenance and their enrichment with language (lexicon, grammar) and user related features, and the preview of the generated texts. Eleon provides an interface that can be used by different generation engines and is currently distributed as open source S/W.
- a controlled language checker [2, 7] for Greek exploiting the Ellogon platform,
Information retrieval and extraction:
This activity exploits and develops methods, techniques and tools, not only from the area of NLP, but also from machine learning and knowledge engineering. Emphasis is given on technology to facilitate the development of new applications as well as their customization to new domains and languages. In the context of this activity the spin off private company, i-sieve technologies ltd., was set up to exploit commercially the in-house developed technology for online content analysis (http://www.i-sieve.com). More specifically, SKEL work in sub-areas of information retrieval and extraction is summarized below:
- information filtering: [8, 9] present the application of memory-based learning in the context of anti-spam filtering, a cost-sensitive application of text categorization that attempts to identify automatically unsolicited commercial email messages
- named entity recognition: various approaches in the problem of named entity recognition, rule-based, machine learning based and hybrid ones are presented in [10, 11, 12] focusing on porting to new languages and domains
- relation extraction:  presents a novel relation extraction method, based on grammatical inference, in which the text that connects named entities in an annotated corpus is used to infer a context free grammar;  examines how linguistic processing of textual data can improve the role extraction performance of HMMs
- ontology learning from textual content:  presents an ontology learning method from text corpora that identifies concepts and organizes them into a subsumption hierarchy, without presupposing the existence of a seed ontology;  presents a method for building a formally defined ontology from domain specific corpora exploiting machine learning and information extraction techniques;
- wrapper induction: in [17, 18] the effectiveness of voting and stacked generalization methods is investigated in web information extraction;
- web content extraction and labeling:  presents a system for web content labeling that exploits web content collection and information extraction techniques; in  a S/W platform for web information extraction is presented that enables customization to different domains, languages and users’ interests;
- sentiment analysis: a methodology for sentiment analysis which uses Word Sense Disambiguation techniques is examined in  detecting subjectivity of senses at a first step and then assigning polarity;
- Word sense disambiguation:  examines the automatic construction of disambiguation rules for all the content words of a corpus using machine learning;
- Document summarization:  investigates the problem of summarizing evolving events described in corpora from multiple sources;  presents a novel automatic method for the evaluation of summarization systems.
Multimedia information processing:
The work of SKEL researchers in this area was mainly in collaboration with the researchers of the Computational Intelligence Lab (CIL) at NCSR due to their expertise in image and video analysis:
- evolution of multimedia ontologies:  presents the ontology evolution methodology developed in the context of the BOEMIE project, which employs reasoning for the interpretation of multimedia resources in order to evolve the ontology, through population with new instances, or through enrichment with new concepts and semantic relations;
- semantic segmentation of web pages:  proposes a new learning approach for semantic segmentation of web pages which at the first step partitions a web page into blocks based on its visual layout, while at the second step, performs subsequent partitioning based on the appearance of specific types of entities and the application of a number of simple heuristics;
- text are identification in web images:  presents a Web image processing algorithm that aims to locate text areas and prepare them for OCR processing; this methodology has been fully integrated with an OCR engine and with an Information Extraction system;
- filtering of multimedia web content:  presents a method for detecting automatically pornographic content on the Web, that combines techniques from language engineering and image analysis within a machine learning framework.
Towards the development of technology for user friendly information access, SKEL researchers work in the area of natural language interaction, continuing the work started more than 20 years ago at NCSR for natural language interfaces to data bases [35, 36]. More specifically, SKEL work in the area involves:
- exploitation of personality modelling in dialogue management:  presents personality modelling as the inference of emotion and affect parameters, used by the dialogue management and user interaction components of a robot architecture; this work examines how the inferred parameters modulate multiple user interface modalities, such as speech and facial expressions;
- spoken dialogue interaction:  presents the dialogue system of a robot that has been developed to serve as a museum guide; the robot interacts with human visitors in natural language, receiving instructions and providing information about the exhibits;  describes the work on human-robot spoken dialogue interaction performed in the context of the Hygeiorobot project for the development a mobile robotic assistant for hospitals;
- natural language interfaces:  presents a methodology for the creation of a language-independent knowledge base (KB), which can be used for the development of multilingual and user-tailored interfaces.
Institute for Language and Speech Processing (ILSP)
“Athena” Research and Innovation Centre in Information, Communication and Knowledge Technologies
URL : http://www.ilsp.gr
The Institute for Language and Speech Processing (ILSP) was founded in 1991, under the auspices of the General Secretariat of Research and Technology of the Ministry of Development, aiming to be a centre of excellence in basic and applied R&D in the areas of human language technologies (Natural Language Processing, Speech and Sound Processing) and their application in a wide range of areas including inter alia e-learning, cultural heritage, media processing and cognitive systems. ILSP capitalised on R&D work performed since 1984 in the framework of the European Commission Eurotra project and the Digital Signal Processing Lab of the National Technical University of Athens. Since 2003, ILSP has been one of the founding institutes of the “Athena” Research and Innovation Centre in Information, Communication and Knowledge Technologies.
ILSP / R.C. "Athena" develops technologies on the following axes:
In the following account of ILSP activity, we focus explicitly on those areas that have been by tradition thought of as instances of Artificial Intelligence research, while for the rest, e.g. language resources proper, e-learning, assistive technologies for disabled persons, cultural informatics, we refer the reader to the institute’s website.
Language Processing Tools
Recognising the need for the creation of a solid infrastructure in terms of language resources and processing tools for the Greek language, ILSP initiated in 1991 the development of the Hellenic National Corpus (http://hnc.ilsp.gr/) , . As large NLP applications usually build on pipelines of modules analyzing textual content at different linguistic levels, ILSP researchers have developed and integrated a number of such analysers for Greek, including a transformation-based POS tagger  and an FST shallow syntactic parser . Manually annotated corpora at the level of syntax  and event semantics  are being used for training and testing of dependency parsers and event recognizers.
The Electronic Lexicography and Language Resources Department of ILSP has focussed i.a. on intelligent authoring aids by developing a number of advanced spelling and grammar checkers, optimised for the Greek language . Symphonia (http://www.ilsp.gr/correct_eng.html) , ILSP’s commercial language checker, incorporates special logic (contextual rules at the syntax level and other non-language specific algorithms) as well as language resources (morphological lexicon, etc.) that exploit the Greek language characteristics, thus, managing to exhibit higher performance and ergonomics than the conventional checkers for Greek. In parallel, the Machine Translation Department has worked on developing a controlled language environment for certain sublanguages of Modern Greek  and produced the necessary lexica and grammar specifications.
Multilinguality has been catered for by ILSP from its very early stages by elaborating a number of techniques for automatic elicitation of structured multilingual resources out of multilingual, parallel or comparable data at various levels [9, 10, 11, 12], tackling also issues of lexical polysemy in a translation context . The translation technology proper has been initially addressed by developing an intelligent translation memory environment [14, 15], integrating efficient text matching techniques for translation example retrieval , and analogical modeling in an attempt to further automate translation synthesis [17,18]. Turning to fully automatic machine translation in the recent years, and capitalizing on researchers’ know-how, a spin-off private company has been set up Cognitron Knowledge Technologies Ltd (www.cognitron.gr) aiming to offer translation and multilinguality-oriented services. In parallel, alternative corpus-based machine translation approaches relying on monolingual corpora of the target language only , deploying pattern-matching, statistics and rule-based techniques [20,21] are investigated.
Research and development work conducted by the Voice and Sound Technology Department of ILSP involves speech recognition and synthesis as well as a wide range of applications integrating these processing engines. Speech recognition work involves : emotional recognition with application to human computer interaction , incorporation of speech recognition in Intelligent Assistive Reading Systems for school-aged dyslectic readers , use of speech recognition interfaces in smart home applications , speech recognition interfaces to improve communication for hard hearing and deaf people  and finally speech recognition in adverse noise environments . In addition, commercial services for speech recognition are offered by the spin-off private company Voice-In (www.voice-in.gr).
Speech Synthesis work involves elaboration of methods and techniques to optimize the speed and quality of speech synthesis systems in order to meet the computational constraints of low-capacity embedded devices . These efforts have lead to a scaled-down engine that fits the processing power of mobile phone devices while preserving most of the voice’s quality and naturalness. This engine is now integrated into a commercial service offered by the leading mobile operator in Greece . In addition, research interests include alternative TtS methods such as synthesis based on Hidden Markov Models and formant synthesis. Recently, ILSP reserachers have presented the first HMM-based speech synthesizer for Greek , while they are also involved in research on corpus-based expressive/emotional TtS, aiming at the development of synthetic voices that employ richer and more vivid prosodic patterns. A parallel activity has been the exploitation of TtS in the context of multimodal interfaces and assistive tools  aiming to make the human-information interaction more natural, intuitive and accessible. R&D efforts have recently lead to the creation of a spin-off private company, Innoetics (www.innoetics.gr ) offering premium quality near-natural speech synthesis for Greek based on unit selection concatenative speech synthesis techniques.
The Voice and Sound Technology Department has also been active in the area of speech-based dialog processing by using directed dialogs and yes-no questions aiming at the highest possible recognition rate of a very broad and varied user group .
Information Retrieval and Extraction
Building on its solid language technology infrastructure, the Natural Language and Knowledge Extraction Department has developed a range of tools and systems for language-aware information retrieval. In this realm, a term extraction module has been developed to allow for more efficient term based document indexing. Term extraction is implemented as a hybrid process comprising a term pattern grammar based on finite-state technology, and a statistical filter, used for the removal of grammar-extracted terms lacking statistical evidence [31, 32, 33]. A version of an IR system is currently being used at the National Publishing Office of Greece (www.et.gr).
Going one step further in the direction of extracting structured information from unstructured textual data, ILSP has been engaged in the development of Named Entity Recogntion tools and LRs since 2000 by initially implementing a rule-based approach to NERC for the Greek language , . In a recent implementation, we have adopted a single-level maximum entropy approach that is language and domain-independent . The system incorporates a more sophisticated classification schema, which is compliant with the ACE (Automatic Content Extraction) schema, and was developed in the framework of MUSE and Reveal-This (http://www.reveal-this.org) projects.
Moving to content technologies, the Natural Language and Knowledge Extraction Department addressed the problem of text classification in high dimensionality spaces by applying linear weight updating classifiers, highly studied in the domain of machine learning. Results are based on the Winnow family of algorithms that are simple to implement and efficient in terms of computation time and storage requirements .
Turning to content summarization, the department’s textual summarization component provides extract-based, single and/or multi document summaries. For each sentence in the text(s) a score, indicative of the sentence’s salience, is calculated as a weighted sum of several features . The top-ranked sentences are presented in their original order in order to form the final extract. Currently selected features [38, 40, 41] involve the location of the sentence inside the text, the sentence length as well as linguistic properties (inclusion and importance of centroids , terms , named entities , facts ).
Recently, the Natural Language and Knowledge Extraction Department has been engaged with the related fields of textual entailment and paraphrasing with a view to building tools for recognizing when two text segments – even distinct in form - overlap semantically. To this end, we maintain a manually compiled resource of unidirectional and bidirectional lexical paraphrases that we use in text simplification and text condensation applications . Moreover, reported work involves the creation and annotation of the Greek Textual Entailment Corpus (GTEC), and a methodology proposed for the automatic recognition of textual entailment in Greek texts .
In parallel to these activities, the Machine Translation Department has been active in ontology development [44,45,46]. Work has been reported on the documentation of artifacts . In addition, the group is developing a lexicon that is organized in terms of taxonomies that are both concept and grammar based  and follows the track of lexicographic works such as Roget’s thesaurus and of modern electronic lexica such as WordNet and VerbNet.
Multimedia Information Processing
Extending automated information processing from text to multimedia, ILSP has initiated pioneering research in multimedia information extraction with a special focus on automatic metadata extraction and fusion as well as on automatic monolingual and multilingual subtitling. The IST-CIMWOS project aimed at content-based indexing, archiving, retrieval, and on-demand delivery of audiovisual content, deploying state-of-the-art algorithms for text, speech and image processing [47,48, 49]. Taking a step further, the IST-Reveal-This project (www.reveal-this.org) focused on cross-media fusion of unimedia indexical data and deployed them in developing cross-media categorisation, cross-media summarisation and cross-lingual translation functionalities in a search and retrieval setting [50,51,52]. Research and development work results in the area of automatic metadata extraction from multimedai content have been taken up to launch a spin-off private company, Qualia (www.qualia.gr) offering media monitoring services.
In the applicative field of automatic subtitling, ILSP has designed and implemented systems for monolingual and multilingual subtitling of TV programmes, by integrating speech recogntion, text condensation and machine translation technologies .
Extending know-how in multimedia information processing further and taking a turn from content to human-human and human-machine communication processing, researchers of the Natural Language and Knowledge Extraction Department have initiated the Poeticon project ( www.poeticon.eu ) aiming to combine symbolic meaning representations with sensorimotor representations in an attempt to explore the integration mechanics of human cognition and ways to reproduce it in intelligent agents.
Patission 76, GR-104 34 Athens, Greece
Contact: Ion Androutsopoulos
The group's main research interests are currently:
D. Vogiatzis, D. Galanis, V. Karkaletsis, I. Androutsopoulos and C.D. Spyropoulos, "A Conversant Robotic Guide to Art Collections". Proceedings of the 2nd Workshop on Language Technology for Cultural Heritage Data, Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco, 2008.
J. Oberlander, G. Karakatsiotis, A. Isard and I. Androutsopoulos, "Building an Adaptive Museum Gallery in Second Life". Proceedings of Museums and the Web, Montreal, Quebec, Canada, 2008.
P. Malakasiotis and I. Androutsopoulos, "Learning Textual Entailment using SVMs and String Similarity Measures." Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 42-47, 2007.
D. Galanis and I. Androutsopoulos, "Generating Multilingual Descriptions from Linguistically Annotated OWL Ontologies: the NaturalOWL System". Proceedings of the 11th European Workshop on Natural Language Generation (ENLG 2007), Schloss Dagstuhl, Germany, pp. 143-146, 2007.
D.K. Vassilakis, I. Androutsopoulos and E.F. Mageirou, "A Game-Theoretic Investigation of the Effect of Human Interactive Proofs on Spam E-mail". Proceedings of the 4th Conference on Email and Anti-Spam (CEAS 2007), Mountain View, CA, USA, 2007.
G. Tsatsaronis, M. Vazirgiannis and I. Androutsopoulos, "Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri". Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, pp. 1725-1730, 2007.
G. Lucarelli, X. Vasilakos and I. Androutsopoulos, "Named Entity Recognition in Greek Texts with an Ensemble of SVMs and Active Learning". International Journal on Artificial Intelligence Tools, 16(6):1015-1045, World Scientific, 2007.
I. Androutsopoulos, J. Oberlander and V. Karkaletsis, "Source Authoring for Multilingual Generation of Personalised Object Descriptions". Natural Language Engineering, 13(3):191-233, Cambridge University Press, 2007.
V. Metsis, I. Androutsopoulos and G. Paliouras, "Spam Filtering with Naive Bayes -- Which Naive Bayes?". Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006.
G. Lucarelli and I. Androutsopoulos, "A Greek Named-Entity Recognizer that Uses Support Vector Machines and Active Learning". Proceedings of the 4th Hellenic Conference on Artificial Intelligence (SETN 2006), Heraklion, Crete, Greece, 2006.
J. Calder, A.C. Melengoglou, C. Callaway, E. Not, F. Pianesi, I. Androutsopoulos, C.D. Spyropoulos, G. Xydas, G. Kouroupetroglou and M. Roussou, "Multilingual Personalized Information Objects". In O. Stock and M. Zancanaro (Eds.), Multimodal Intelligent Information Presentation, pp. 177-201, Springer, 2005.
I. Androutsopoulos, E.F. Magirou and D.K. Vassilakis, "A Game Theoretic Model of Spam E-Mailing". Proceedings of the 2nd Conference on Email and Anti-Spam (CEAS 2005), Stanford University, CA, USA, 2005.
I. Androutsopoulos and D. Galanis, "A Practically Unsupervised Learning Method to Identify Single-Snippet Answers to Definition Questions on the Web". Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver, Canada, pp. 323-330, 2005.
I. Androutsopoulos, S. Kallonis and V. Karkaletsis, "Exploiting OWL Ontologies in the Multilingual Generation of Object Descriptions". Proceedings of the 10th European Workshop on Natural Language Generation (ENLG 2005), Aberdeen, U.K., pp. 150-155, 2005.
E. Michelakis, I. Androutsopoulos, G. Paliouras, G. Sakkis and P. Stamatopoulos, "Filtron: A Learning-Based Anti-Spam Filter". Proceedings of the 1st Conference on Email and Anti-Spam (CEAS 2004), Mountain View, CA, USA, 2004.
S. Miliaraki and I. Androutsopoulos, "Learning to Identify Single-Snippet Answers to Definition Questions". Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 1360-1366, 2004.
I. Androutsopoulos and M. Aretoulaki, "Natural Language Interaction". In R. Mitkov (Ed.), Handbook of Computational Linguistics, chapter 35, pp. 629-649, Oxford University Press, 2003.
A. Isard, J. Oberlander, I. Androutsopoulos and C. Matheson, "Speaking the Users' Languages". IEEE Intelligent Systems, 18(1):40-45, 2003.
G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C.D. Spyropoulos and P. Stamatopoulos, "A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists". Information Retrieval, 6(1):49-73, Kluwer, 2003.
A. Dimitromanolaki and I. Androutsopoulos, "Learning to Order Facts for Discourse Planning in Natural Language Generation". Proceedings of the 9th European Workshop on Natural Language Generation, 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), Budapest, Hungary, pp. 23-30, 2003.
I. Androutsopoulos, Exploring Time, Tense and Aspect in Natural Language Database Interfaces, John Benjamins, 2002.
Recent research projects:
INDIGO: Interaction with mobile robots that have personalities and support multimodal dialogues (2007-09). European FP6-IST project. http://www.ics.forth.gr/indigo/ (areas 5.6 and 7).
XENIOS (2006-07): Human-robot interaction using speech processing, natural language generation, and computer vision. National project. http://www.ics.forth.gr/xenios/ (area 7).
"Combined research in the areas of information retrieval, natural language processing, and user modeling aiming at the development of advanced search engines for document collections" (2006-08). National project (area 1.5).
Artificial Intelligence Group (AIG)
Wire Communications Laboratory, Electrical & Computer Engineering Department, University of Patras, GR-26500, Rion-Patras, Greece
Nikos Fakotakis, Professor, Director of WCL and Head of AIG
Fields of Activity: Language processing tools, Information retrieval and extraction, Speech processing, Natural language generation, Engineering and industry, Telecommunications, Mathematical foundations, Speech, Language and Dialogue Applications
Abstract: AIG (Artificial Intelligence Group) is one of the seven research groups of the Wire Communications Laboratory (WCL - established in 1967) in the Department of Electrical and Computer Engineering, University of Patras, Greece. The unit has more than 30 years of continuous research activity in the areas of Speech & Language Technology and Artificial Intelligence. During this period, it has published over 300 scientific publications, contributed over 20 PhD dissertations, developed various resources (databases, research tools, technology-prove prototypes, etc) and participated (or is currently participating) as a partner or coordinator in more than 30 RTD projects.
AIG has been developing speech, language and dialogue understanding systems for telecommunication and industrial applications, but also a significant part of its research focuses on theoretical and mathematical models of artificial intelligence methods and algorithms as well. In detail, AIG expertise can be summarized in three areas:
Some indicative recent references involve: morphological - syntactic - semantic - discourse analyzers [1-4]; keyword(s) and term extraction ; automatic linguistic knowledge elicitation techniques [6-7]; intelligent authoring aids ; speech recognition [9-10]; speaker recognition ; speech synthesis ; speech-based dialog processing [13-14]; information extraction [15-17] etc.
For an extensive list of the more than 300 publications of AIG’s researchers, please refer to URL: http://www.wcl.ee.upatras.gr/ai/Fakotakis/index.asp
 K. Sgarbas, N. Fakotakis, G. Kokkinakis, "A PC-KIMMO-Based Morphological Description of Modern Greek", Literary and Linguistic Computing, Oxford University Press, Vol.10, No.3, pp.189-201, September 1995.
 K. Kermanidis, K. Sgarbas, N. Fakotakis, G. Kokkinakis, "A PC-PATR-Based Syntactic Description of Modern Greek", Literary and Linguistic Computing, Oxford University Press, Vol.15, No.3, pp.291-312, 2000.
 K. Sgarbas, N. Fakotakis and G. Kokkinakis, "TEMPO: A Temporal Sub-parser for Modern Greek", International Journal on Artificial Intelligence Tools, World Scientific, Vol.7, No.1, pp.103-120, 1998.
 K. Sgarbas, N. Fakotakis, G. Kokkinakis, "A Straightforward Approach to Morphological Analysis and Synthesis", Proc. COMLEX 2000, Workshop on Computational Lexicography and Multimedia Dictionaries, pp.31-34, Kato Achaia, Greece, 22-23 September, 2000.
 D.P. Lyras, K.N. Sgarbas, N. Fakotakis, "Using the Levenshtein Edit Distance for Automatic Lemmatization: A Case Study for Modern Greek and English", Proc. ICTAI 2007, 19th IEEE International Conference on Tools with Artificial Intelligence, pp.428-435, Patras, Greece, 29-31 October, 2007.
 D.P. Lyras, K.N. Sgarbas, N.D. Fakotakis, “Learning Greek Phonetic Rules using Decision-Tree Based Models”, Proc. ICEIS 2007, 9th International Conference on Enterprise Information Systems: Artificial Intelligence and Decision Support Systems, pp.424-427, Funchal, Madeira, Portugal, 12-16 June 2007.
 K. Sgarbas, G. Londos, N. Fakotakis, G. Kokkinakis, "The WATCHER Project: Building An Agent for Automatic Extraction of Language Resources from the Internet", Literary and Linguistic Computing, Vol.18, No.4, pp.449-464, 2003.
 K. Sgarbas, N. Fakotakis, G. Kokkinakis, "A PC-KIMMO-Based Bi-directional Graphemic/Phonetic Converter for Modern Greek", Literary and Linguistic Computing, Oxford University Press, Vol.13, No.2, pp.65-75, 1998.
 K. Georgila, K. Sgarbas, N. Fakotakis, G. Kokkinakis, "Fast Very Large Vocabulary Recognition Based on Compact DAWG-Structured Language Models", Proc. ICSLP 2000, 6th International Conference on Spoken Language Processing, pp.987-990, Beijing, China,16-20 October, 2000.
 I. Mporas, T. Ganchev, M. Siafarikas, N.Fakotakis: Comparison of Speech Features on the Speech Recognition Task, Journal of Computer Science, ISSN 1549-3636, Science Publications New York, USA, Vol.3, No.8, 2007, pp.608-616.
 T. Ganchev, I. Potamitis, N. Fakotakis, G. Kokkinakis: Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments, International Journal of Speech Technology, ISSN 1381-2416, Kluwer Academic Publishers, Vol.7, No. 4, October 2004, pp. 281-292.
 A. Lazaridis, P. Zervas, G. Kokkinakis, "Segmental duration modeling for Greek Speech Synthesis", In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Patras, Greece, 2007, pp. 518-521.
 K. Georgila, K. Sgarbas, A. Tsopanoglou, N. Fakotakis, G. Kokkinakis, "A Speech-based Human-Computer Interaction System for Automating Directory Assistance Services", International Journal of Speech Technology, Special Issue on Speech and Human Computer Interaction,, Kluwer Academic Publishers, Vol. 6, pp. 145-159, 2003.
 T. Giannakopoulos, N.-A. Tatlas, T. Ganchev, I. Potamitis: A Practical, Real-Time Speech-Driven Home Automation Front-end, IEEE Transactions on Consumer Electronics, IEEE, May 2005, Vol.51, No.2, pp. 514-523, 2005.
 I. Mporas, D. P. Lyras, K. N. Sgarbas, N. Fakotakis “Detection of Dialogue Acts using Perplexity-based Word Clustering”, in V. Matousek and P. Mautner (eds.), "Text, Speech and Dialogue", Proc. TSD 2007, 10th International Conference on Text, Speech and Dialogue, LNAI n.4629, pp.638-643, Springer, Pilsen, Czech Republic, 3-7 September 2007.
 M. Maragoudakis, A. Thanopoulos, K. Sgarbas, N. Fakotakis, "Domain Knowledge Acquisition and Plan Recognition by Probabilistic Reasoning", International Journal on Artificial Intelligence Tools, Special Issue on AI Techniques in Web-Based Educational Systems, Vol.13, No.2, pp.333-365, 2004.
 M. Maragoudakis, A. Thanopoulos, N. Fakotakis, “Meteobayes: Effective Plan Recognition in a Weather Dialogue System”, IEEE Intelligent Systems, Vol. 22, No 1, pp. 66-78, 2007.
AIG’s current (2008) research personnel include:
Dr. Nikos Fakotakis
Professor, Director of WCL
Artificial Intelligence, Speech Recognition & Understanding, Speaker Recognition & Identification
Dr. Kyriakos Sgarbas
Symbolic Artificial Intelligence, Natural Language Processing, Quantum Artificial Intelligence
Dr. Todor Ganchev
Speech Technology, Bioacoustics, Neural Networks
Dr. Otilia Kocsis
Multimodal Dialogue Systems, User Modeling
Quantum Artificial Intelligence
Spoken Dialogue Systems & Applications
Emotion Recognition, Spoken Dialogue Systems
Machine Learning, Artificial Intelligence
Voice Conversion Systems, Text-to-Speech Synthesis
Machine Learning, Logic Deduction, Data Mining, Information Retrieval
Speech Recognition, Language Identification
Semantic Interpretation of Sounds, Pattern Recognition. Machine Learning
Game Playing, Artificial Intelligence
Speech Parameterization, Wavelets
Language and Information Processing Group
Dept. of Informatics, Technological Educational Institution of Athens
Address: Ag. Spyridona, GR-122 10 Egaleo, Greece
The research activities of group mainly concern – but are not limited to – specific areas of NLP such as:
Language Processing (Morphological, Syntactic, Semantic and Discourse Analysis)
1. Galiotou E. and A. Ralli Morpho-phonological Modelling in Natural Language Processing”, International Journal of Computational Intelligence, vol. 1, no. 3, 2004, pp. 179-182.
2. Grigoriadou, M., Kornilakis, H., Galiotou E., Stamou S. and Papakitsos, E. “The Software Infrastructure for the Development and Validation of the Greek WordNet”, Romanian Journal of Information, Science and Technology, vol. 7, no. 1-2, 2004, pp. 89-105.
3. Galiotou, E. “An SDRT Approach to the Temporal Structure of Modern Greek Narrative Texts”, Literary and Linguistic Computing, vol. 17, no.4, 2002, pp. 457-474
4. Galiotou E. and Ligozat G. “A Representational Scheme for Temporal and Causal Information Processing”, Proceedings of the LREC Workshop on Annotation Standards for Temporal Information in Natural Language, Las Palmas, Canary Islands, 2002, pp. 22-26.
Greek and European funded projects: - BALKANET: Design and Development of a Multilingual Balkan WordNet (IST-2000-29388) (2001-2004)
- DIALEXICO: Development of a Lexicological Data Base for the Greek Language (EPET II- GSRT) (1999-2001))
Language-aware information retrieval
Cross-lingual information retrieval
Greek and European funded projects : PA_CO_CLIR: Parallel, Content Based Cross Language Information Retrieval. (“Archimedes” – GSRT) (2004-2006)
Historical Document Processing
Greek and European funded projects : POLYTIMO: An information system for the processing, management and accessing the content of valuable collections of historical books and manuscripts. (“Information Society” - GSRT) (2006-2008)
NEUROLINGO LP - Language Technology applications
20-22 Renieri St., GR-111 43 Athens
NEUROLINGO LP is a Greek company specialised in NLP applications for Modern Greek, which was founded in 2005 by a team of language technology experts. We develop language resources (i.e. dictionaries, thesauri, ontologies), language tools (i.e. authoring/proofing tools, lexicographic environment for creating and authoring user dictionaries), software systems and infrastructure for text mining and information extraction. More specifically:
- P. Fragkou, G. Petasis, A. Theodorakos, V. Karkaletsis, C.D. Spyropoulos, BOEMIE Ontology-Based Text Annotation Tool, 6th international conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008.
- G. Petasis, V. Karkaletsis, D. Farmakiotou, I. Androutsopoulos, and C.D. Spyropoulos. "A Greek Morphological Lexicon and its Exploitation by Natural Language Processing Applications". Lecture Notes on Computer Science (LNCS), vol.2563, Advances in Informatics - Post-proceedings of the 8th Panhellenic Conference in Informatics. Vol. editors: Yannis Manolopoulos, Skevos Evripidou, Antonis Kakas, pp. 401-419, 2003.
- V. Rentoumi and S. Konstantopoulos, “Heuristic Disambiguation of Deverbal Nominals in Greek”. In: Zygmunt Vetulani (ed.) Human Language Technologies as a Challenge for Computer Science and Linguistics: Proceedings of 3rd Language and Technology Conference, 5-7 October 2007, Poznań, Poland.
- S. Konstantopoulos, What's in a Name? In: Petya Osenova et al (eds.) Proceedings of Computational Phonology Workshop, 6th Intl. Conf. on Recent Advances in NLP (RANLP 07), Borovets, Bulgaria, 2007.
- I. Androutsopoulos, J. Oberlander and V. Karkaletsis, "Source Authoring for Multilingual Generation of Personalised Object Descriptions". Natural Language Engineering, 13(3):191-233, Cambridge University Press, 2007.
- D. Bilidas, M. Theologou, V. Karkaletsis, “Enriching OWL Ontologies with Linguistic and User-related Annotations: the ELEON system”, Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2007), Patras, Greece, October 29-31, 2007.
- M. Vassiliou, S. Markantonatou, Y. Maistros, and V. Karkaletsis. «Evaluating Specifications for Controlled Greek», In Proceedings of the EAMT/CLAW 2003 Conference on “Controlled Language Translation”, pp. 185-193, Dublin, Ireland, May 15-17, 2003.
- G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C.D. Spyropoulos and P. Stamatopoulos, “A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists,” Information Retrieval Journal, 6(1): 49-73, 2003.
- G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C.D. Spyropoulos and P. Stamatopoulos, “Stacking classifiers for anti-spam filtering of e-mail.” Proceedings of the International Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 44-50, Carnegie Mellon University, 2001.
- G . Petasis, V . Karkaletsis, C . Grover, B . Hachey, M . T . Pazienza, M . Vindigni, J . Coch, "Adaptive, Multilingual Named Entity Recognition in Web Pages" In Proceedings of the European Conference in Artificial Intelligence (ECAI), pp. 1073 - 1074, Valencia, Spain, 2004.
- G. Petasis, F. Vichot, F. Wolinski, G. Paliouras, V. Karkaletsis, and C.D. Spyropoulos, "Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems,". Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 426-433, Toulouse, 2001.
- G. Petasis, A. Cucchiarelli, P. Velardi, G. Paliouras, V. Karkaletsis, and C.D. Spyropoulos, “Automatic adaptation of proper noun dictionaries through co-operation of machine learning and probabilistic methods”. Proceedings of the 23rd ACM SIGIR Conference on R&D in IR (SIGIR), pp. 128-135, Athens, Greece, 2000.
- G. Petasis, V. Karkaletsis, G. Paliouras and C.D. Spyropoulos, Learning context-free grammars to extract relations from text, Proceedings of the 18th European Conference on Artificial Intelligence (ECAI08), Patras, Greece, July 2008.
- G. Sigletos, G. Paliouras, V. Karkaletsis, “Role Identification From Free Text Using Hidden Markov Models,” Proceedings of the Panhellenic Conference in Artificial Intelligence (SETN), Lecture Notes in Artificial Intelligence, Springer Verlag, n. 2308, pp. 167-178, 2002.
- E. Zavitsanos, G. Paliouras, G. Vouros and S. Petridis, “Discovering Subsumption Hierarchies of Ontology Concepts from Text Corpora,” In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI), Silicon Valley, USA, November 2-5, 2007.
- A. Valarakos, V. Karkaletsis, D. Alexopoulou, E. Papadimitriou, C.D. Spyropoulos, and G. Vouros. "Building an Allergens Ontology and Maintaining it using Machine Learning Techniques". Computers in Biology and Medicine Journal (CBM), 36 (10): 1155-1184, 2006.
- G. Sigletos, G. Paliouras, C.D. Spyropoulos and M. Hatzopoulos, "Combining Information Extraction Systems Using Voting and Stacked Generalization," Journal of Machine Learning Research, v.6, pp. 1751-1782, 2005.
- G. Sigletos, G. Paliouras, C. D. Spyropoulos, P. Stamatopoulos. “Meta-learning beyond classification: A framework for information extraction from the Web,” In Berendt et al. (Eds.), “Web Mining: From Web to Semantic Web”, Lecture Notes in Computer Science, n. 3209, pp. 97 – 112, Springer Verlag, 2004.
- V. Karkaletsis, P. Karampiperis, K. Stamatakis, M. Labský, M. Růžička, V. Svátek, E. Amigó Cabrera, M. Pöllä, M.A. Mayer, A. Leis, D.V. Gonzales, Automating Accreditation of Medical Web Content, Proceedings of the 5th Prestigious Applications of Intelligent Systems (PAIS08), in the context of ECAI08, Patras, Greece, July 2008.
- V. Karkaletsis, C.D. Spyropoulos, C. Grover, M.T. Pazienza, J. Coch, D. Souflis, “A Platform for Cross-lingual, Domain and User Adaptive Web Information Extraction” In Proceedings of the European Conference in Artificial Intelligence (ECAI), pp. 725 - 729, Valencia, Spain, 2004.
- V. Rentoumi, V. Karkaletsis, G. A. Vouros, and A. Mozer, “Sentiment analysis exploring metaphorical and idiomatic senses: a word sense disambiguation approach”, Proceedings of the Workshop on Computational Aspects of Affectual and Emotional Interaction (CAFFEi-2008), in the context of ECAI-2008, July 2008, Patras, Greece.
- G. Paliouras, V. Karkaletsis and C.D. Spyropoulos, "Learning Rules for Large Vocabulary Word Sense Disambiguation", Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI '99), v. 2, pp. 674-679, August, 1999.
- S. Afantenos, V. Karkaletsis, P. Stamatopoulos, C. Halatsis, "Using Synchronic and Diachronic Relations for Summarizing Multiple Documents describing Evolving Events", Journal of Intelligent Information Systems (JIIS), 30(3): 183-226, Springer, 2008.
- G. Giannakopoulos, V. Karkaletsis, G. Vouros, P. Stamatopoulos, “The AutoSummENG Method”, NCSR Technical Report, April 2008.
- S. Castano, S. Espinosa, A. Ferrara, V. Karkaletsis, A. Kaya, S. Melzer, R. Möller, S. Montanelli, G. Petasis, "Ontology Dynamics with Multimedia Information: The BOEMIE Evolution Methodology", Proceedings of the International Workshop on Ontology Dynamics (IWOD-07), in the context of the 4th European Semantic Web Conference (ESWC-07), Innsbruck, Austria, June 7, 2007, pp. 41-54.
- G. Petasis, P. Fragkou, A. Theodorakos, V. Karkaletsis, C.D. Spyropoulos, Segmenting HTML pages using visual and semantic information, The 4th Web as Corpus Workshop: Can we do better than Google?, in the context of LREC-2008.
- B. Gatos, S. J. Perantonis, V. Maragos, V. Karkaletsis and G. Petasis, "Text Area Identification in Web Images", In Proceedings of the Panhellenic Conference in Artificial Intelligence (SETN), Lecture Notes in Artificial Intelligence, n. 3025, pp. 82- 92, Springer Verlag, 2004.
- K.V. Chandrinos, I. Androutsopoulos, G. Paliouras and C.D. Spyropoulos, "Automatic Web Rating: Filtering Obscene Content on the Web". Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries (ECDL), Lisbon, Portugal, Lecture Notes in Computer Science, n. 1923, pp. 403-406, Springer-Verlag, 2000.
- S. Konstantopoulos, V. Karkaletsis, and C. Matheson, “Robot Personality: Representation and Externalization”, Proceedings of the Workshop on Computational Aspects of Affectual and Emotional Interaction (CAFFEi-08), in the context of ECAI-08, Patras, July 2008.
- D. Vogiatzis, D. Galanis, V. Karkaletsis, I. Androutsopoulos and C.D. Spyropoulos, "A Conversant Robotic Guide to Art Collections". Proceedings of the 2nd Workshop on Language Technology for Cultural Heritage Data, Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco, 2008.
- D. Spiliotopoulos, I. Androutsopoulos and C.D. Spyropoulos, "Human-Robot Interaction Based on Spoken Natural Language Dialogue". European Workshop on Service and Humanoid Robots (Servicerob 2001), pp. 123-128, Santorini, Greece, 2001.
- V. Karkaletsis, C.D. Spyropoulos, and G. Vouros, “A Knowledge-Based Methodology for Supporting Multilingual and User-tailored Interfaces”. Interacting with Computers: The Interdisciplinary Journal of Human Computer Interaction, 9: 311-333, 1998.
- Kontos, J. and Cavouras, J.C. Knowledge Acquisition from Technical Texts Using Attribute Grammars. The Computer Journal, Vol. 31, No 6, pp. 525-530, 1988.
- Kontos, J. Syntax-Directed Processing of Texts with Action Semantics. Cybernetica, 23(2), pp. 157-175, 1980.
- P. Stamatopoulos, A. Arvillias, M. Katzouraki, G. Philokyprou. "Querying Databases in Natural Greek". Proceedings of the 1st European Conference on Information Technology for Organisational Systems EURINFO '88, pp. 730-737, Athens, 1988.
- P. Stamatopoulos, M. Katzouraki, G. Philokyprou. "A Natural Language System for Database Queries". Proceedings of the 9th International Symposium on Informatics JAHORINA '85, Jahorina, 1985.
- digital monolingual, multilingual and multimedia corpora and dictionaries, computational lexical databases
- text processing and analysis for information retrieval and knowledge extraction
- multimodal and multilingual information processing and retrieval
- machine translation and translation aid tools
- stand-alone and integrated voice recognition and text-to-speech systems
- assistive technologies for disabled persons
- digital curation and presentation of cultural content
- multimedia e-learning platforms for language and music
- Gavrilidou, M. (2002), The Hellenic National Corpus on-line, Revue belge de Philologie et d'Histoire.
- Hatzigeogiou, N., Gavrilidou, M., Piperidis, S., Carayannis, G., et al (2000) “Design and Implementation of the Online ILSP Greek Corpus” Proceedings of Second International Conference on Language Resources and Evaluation-LREC2000, 31 May- 2 June2000, Athens, Greece, 1737-1742
- Papageorgiou, P., P. Prokopidis, V. Giouli, and S. Piperidis (2000). A Unified POS Tagging Architecture and its Application to Greek. In Proc. of the 2nd International Conference on Language Resources and Evaluation, pp. 1455-1462, Athens.
- Boutsis, S., Prokopis Prokopidis, Voula Giouli, and Stelios Piperidis (2000). A Robust Parser for Unrestricted Greek Text. In Proceedings of the 2nd Language Resources and Evaluation Conference, pages 467-474, Athens.
- Prokopidis, P., E. Desypri, M. Koutsombogera, H. Papageorgiou, and S. Piperidis (2005). Theoretical and Practical Issues in the Construction of a Greek Dependency Treebank. In Proc. of The Fourth Workshop on Treebanks and Linguistic Theories, pp. 149-160, Barcelona.
- Papageorgiou, H., E. Desipri, M. Koutsombogera, K. Pouli, and P. Prokopidis (2006). Adding multi-layer Semantics to the Greek Dependency Treebank. In Proc. of The 5th International Conference on Language Resources and Evaluation, Genoa.
- P. Bouros, A. Fotopoulou and N. Glaros (2005), “An interactive environment for creating and validating syntactic rules”, Proceedings of RANLP2005, 15th International Conference on Recent Advances on Natural Language Processing, Borovets, Bulgaria , 21-23 September, pp.129-133.
- Vassiliou, Marina, Stella Markantonatou, Yanis Maistros and Vangelis. Karkaletsis (2003). “Evaluating Specifications for Controlled Greek”. In Proceedings of EAMT/CLAW 2003, Dublin, Ireland, 15-17 May 2003 http://nts.cc.ece.ntua.gr/nlp/publications/CL.pdf
- Piperidis, S., Papageorgiou, H., Boutsis, S. (2000) From sentences to words and clauses. In Veronis, J. (Ed) Parallel Text Processing, Alignment and use of translation corpora, Kluwer Academic Publishers, Text Speech and Language Technology Series, pp. 117-138
- Piperidis, S., Boutsis, S., Demiros, I., (1997). Automatic Translation Lexicon Generation from Multilingual texts, Workshop on Multilinguality in the Software Industry: the AI Contribution (MULSAIC’97), Fifteenth International Joint Conference on Artificial Intelligence (IJCAI’97), 25 August 1997, Nagoya, Japan, 57-62
- Boutsis, S., Piperidis, S. & Demiros, I. (1999) Generating Translation Lexica from Multilingual Texts. Journal of Applied Artificial Intelligence, Special issue on multilinguality in the Software Industry, 13 (6), 583-606
- Boutsis, S., Piperidis, S., (1998). Aligning Clauses in Parallel Texts, Third Conference on Empirical Methods in Natural Language Processing (EMNLP), 2 June 1998, Granada, Spain, 17-26
- Piperidis, S., Dimitrakis, P. Balta, E. (2007) Lexical transfer selection using annotated parallel corpora. In Recent Advances in Natural Language Processing IV, Nicolov, Nicolas, Kalina Bontcheva, Galia Angelova and Ruslan Mitkov (eds.), 227–236.
- Piperidis, S., (1995). Interactive Corpus-based Translation Drafting Tool, Aslib Proceedings, Vol.47, No 3, March 1995, 83-92
- Piperidis, S., Malavazos, C., Triantafyllou, Y., (1999). A Multi-level Framework for Memory-Based Translation Aid Tools, Aslib, Translating and the Computer 21, 10-11 November 1999, London
- Cranias, L., Papageorgiou, H., Piperidis, S., (1997). Example retrieval from a Translation Memory, Journal of Natural Language Engineering, 3, February 1997, 255-277
- Malavazos, C., Piperidis, S., Carayannis, G. (2000) Towards Memory and Template Based Translation Synthesis, Proceedings of the MT 2000: Machine Translation and Multilingual Applications in the New Millennium, Exeter, United Kingdom, 20-22 November 2000, pp. 1.1-1.8
- Malavazos C., Piperidis, S., (2000) “Application of Analogical Modelling to Example Based Machine Translation” Proceedings of 18th International Conference on Computational Linguistics (COLING’00), 31 July- 4 August 2000, Saarbruecken, Germany, 516-522
- Dologlou, I., Stella Markantonatou, George Tambouratzis, Olga Yannoutsou, Athanassia Fourla and Nickos Ioannou. (2003) “Using Monolingual Corpora for Statistical Machine Translation”. In Proceedings of EAMT/CLAW 2003, Dublin, Ireland, 15-17 May
- Markantonatou, S., Sokratis Sofianopoulos Vassiliki Spilioti, Yiorgos Tambouratzis, Marina Vassiliou, Olga Yannoutsou, Nikos Ioannou. 2005. ‘Monolingual Corpus-based MT using Chunks’. In Proceedings of Workshop Example-Based Machine Translation, MT Summit X, 12-16 September 2005, Phuket, Thailand, pp. 91-98
- Markantonatou, S., Sokratis Sofianopoulos, Vassiliki Spilioti, George Tambouratzis, Marina Vassiliou, Olga Yannoutsou. (2006). ‘Using patterns for Machine Translation (MT)’. Proceedings of EAMT 11th Annual Conference, June 19-20 2006, Oslo2006, Norway
- ERMIS – Emotionally Rich Man Machine Interaction Systems, EU IST project, http://www.ist-world.org/ProjectDetails.aspx?ProjectId=78b75a3711d5498f9b2cb7dcc8342ca1
- Agent-DYSL - Accomodative Intelligent Educational Environments for Dyslexic Learners, www.agent-dysl.eu
- SOPRANO - Service-oriented Programmable Smart Environments for Older Europeans, ip project in the FP6 framework, www.soprano-ip.org
- HEARCOM - Hearing in the Communication Society, ip project in the FP6 framework,http://hearcom.eu/
- FAST – Advanced Signal Processing for Ultra Fast Magnetic Resonance, eu network, http://www.fast-mariecurie-rtn-project.eu/partners-involved/partners-involved-in-the-network.html
- Tsiakoulis, P., A. Chalamandaris, S. Karabetsos, and S. Raptis (2008) “A Statistical Method for Database Reduction for Embedded Unit Selection Speech Synthesis,” submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), March 30 - April 4, Las Vegas, Nevada, USA (2008)
- MobiTalk project: “Speech user interface for mobile handsets,” http://www.mobitalk.gr/
- Karabetsos, S., P. Tsiakoulis, A. Chalamandaris, and Spyros Raptis (2008), “HMM-based Speech Synthesis for the Greek Language,” TSD 2008 11th International Conference on Text, Speech and Dialogue, Brno, Czech Republic, September 8–12 2008. (to appear)
- Raptis, S., I. Spais and P. Tsiakoulis (2005) “A Tool for Enhancing Web Accessibility: Synthetic Speech and Content Restructuring”, in Proc. HCII 2005: 11th International Conference on Human-Computer Interaction, 22-27 July, Las Vegas, Nevada, USA
- Georgantopoulos B. and Piperidis S. (2000). A Hybrid Technique for Automatic Term Extraction. Proceedings of International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications (ACIDCA'2000), pp. 124-128.
- Georgantopoulos B. and Piperidis S. (1998). Eliciting Terminological Knowledge for Information Extraction Applications. EURISCON/AMIE Workshop, June 1998, Athens.
- Georgantopoulos B., Piperidis S. (2000). Term–based identification of sentences for text summarization. In Proceedings of the 2nd Conference on Language Resources and Evaluation (LREC 2000). Athens, Greece.
- Boutsis, S., Demiros, S., Giouli, V., Liakata, M., Papageorgiou, H., Piperidis, S. (2000). A system for Recognition of Named Entities in Greek. In: Proceedings of Natural Language Processing 2000, Patras, Greece.
- Demiros, I., Boutsis, I., Giouli, V., Liakata, M., Papageorgiou, H., Piperidis, S., 2000. Named Entity Recognition in Greek Texts. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece.
- Giouli,V., Konstandinidis, A., Desypri, E., Papageorgiou, H., Piperidis, S. (2006). Multi-domain Multi-lingual Named Entity Recognition: Revisiting & Grounding the resources issue. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Geneva, Italy.
- Gkiokas, A., I. Demiros, S. Piperidis. (2006). An analysis of linear weight updating algorithms for text classification, 4th Hellenic Conference on Artificial Intelligence, Heraclion, Crete.
- Georgantopoulos B., Piperidis S. (2000). Term–based identification of sentences for text summarization. In Proceedings of the 2nd Conference on Language Resources and Evaluation (LREC 2000). Athens, Greece.
- Georgantopoulos B., Goedeme T., Lounis S., Papageorgiou H., Tuytelaars T., Van Gool L. (2006). Cross-media summarization in a retrieval setting. Conference on Language Resources and Evaluation, (LREC 2006), Genova, Italy.
- Demiros I., Antonopoulos V., Georgantopoulos B., Triantafyllou Y., Piperidis S. (2001). Connectionist Models for sentence-based text extracts. Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference. USA.
- Demiros I., Papageorgiou H., Georgantopoulos B., Piperidis S. (2001). Machine Learning Methods for Text Summarization. Proceedings of Recent Advances in NLP (RANLP 2001), Bulgaria.
- Prokopidis, P., V. Karra, A. Papagianopoulou and S. Piperidis (2008). Condensing sentences for subtitle generation. In Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco.
- Marzelou, E., M. Zourari, V. Giouli, S. Piperidis (2008). Building a Greek Corpus for Textual Entailment. In Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco.
- Vassiliou, M., Stella Markantonatou. (2003). «Developing a multilingual thesaurus respecting international standards». Greek Language and Terminology. Proceedings of the 4th Meeting, T.E.E., Athens, pp. 263-270
- Alexopoulou, M., Stella Markantonatou, Marianna Mini, Ageliki Fotopoulou. (2007). ‘Using lexical fields to build a semantically-organised lexicon for Modern Greek’. To appear in Proceedings of Advances in Language Engineering for Low- and Middle- Density languages, ASI, Batumi, Georgia, October 15-27, 2007
- Carayannis, G., Stella Markantonatou, Constantinos Vavliakis, Sophia Sotiropoulou, Sister Daniilia, Maria Alexopoulou, Olga Yannoutsou (2008). ‘Eikonognosia: An Integrated System for Advanced Retrieval of Scientific Data and Metadata of Byzantine Artworks Using Semantic Web Technologies’. To be presented in CIDOC 2008, Athens, 15-18 September
- Papageorgiou, H. & A. Protopapas. CIMWOS: A multimedia, multimodal and multilingual indexing and retrieval system. In Ebroul Izquierdo (ed.) Digital Media Processing for Multimedia Interactive Services (pp.563-568); World Scientific Publishing Co., 2003
- Papageorgiou, H., A. Protopapas, & T. Netousek. Retrieving video segments based on combined text, speech and image processing. (2003) Broadcast Engineering Conference Proceedings, pp. 177–182.
- Papageorgiou, H., P. Prokopidis, I. Demiros, N. Hatzigeorgiou, G. Carayanis (2004). CIMWOS: A Multimedia Retrieval System based on Combined Text, Speech and Image Processing, RIAO 2004, Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, University of Avignon (Vaucluse), France.
- Piperidis, S., Papageorgiou, H., Pastra, K., Netousek, T., Gaussier, E., Tuytellars, T., Crestani, F., Bodson, F., Mellor, C. (2006) Multimedia content processing and retrieval in the REVEAL THIS setting, 1st International Conference on Semantic and Digital Media Technologies (SAMT2006), 6-8 December 2006, Athens, Greece.
- Pastra, K., Piperidis, S. (2006) Crossing media for Video Search : enabling usability beyond traditional broadcast and TV, Proceedings of the European Conference EuroITV 2006, 25-26 May 2006, Athens, Greece
- Piperidis, S., Papageorgiou, H. (2005) REVEAL THIS : Retrieval of Multimedia Multilingual Content for the Home User in an Information Society, Proceedings fo the 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005), The IEE, London, 30 November-1 December 2005, pp. 461-465
- Piperidis, S., Demiros, I., Prokopidis, P., (2007) Infrastructure for Multilingual Subtitle Generation, Special Volume of the 9ISSC, Cambridge Scholar Press.
- natural language question-answering systems for document collections, databases, and ontologies (area 5.6),
- textual entailment and paraphrasing (area 1.5),
- text summarization, including query-focused summarization (area 5.5),
- generation of natural language documents from ontologies and databases (area 7),
- document classification, including anti-spam e-mail filtering (area 5.4),
- natural language processing tools, such as named-entity recognizers (area 1.3) and part-of-speech taggers (area 1.1).
- Speech Processing: Speech Enhancement, Speaker Localization and Tracking, Robust Automatic Speech Recognition, Speaker Recognition, Spoken Language Recognition, Emotion Recognition, Sound Recognition, Text-to-Speech Synthesis, Voice Conversion.
- Natural Language Processing: Natural Language Understanding, Spoken Dialogue Processing and Management, Spoken Interaction Strategies, Natural Language Generation, Lexicography, Text Engineering, Information Extraction, Development of Natural Language Interfaces.
- Artificial Intelligence: Search Methods, Problem Solving, Rule Based Systems, Knowledge Representation, Logic Programming, Machine Learning, Intelligent Human-Machine Interaction, User Modeling, Automata Theory, Game Playing, Quantum Artificial Intelligence.
- N. Karanikolas and C. Skourlas. Key-Phrase Extraction for Classification. MEDICON and HEALTH TELEMATICS 2004. IFMBE Proceedings, Health in the Information Society, vol. 6,2004
- Georgios Paliouras, Mouzakidis Alexandros, Christos Ntoutsis, Angelos Alexopoulos and Christos Skourlas, PNS: Personalized Multi-source News Delivery, Lecture Notes in Computer Science 4252, Knowledge-Based Intelligent Information and Engineering Systems, pp. 1152-1161,2006
- Vassilas N., Skourlas C., Content-Based Coin Retrieval Using Invariant Features and Self-Organizing Maps, 2006 International Conference on Artificial Neural Networks, Athens, Lecture Notes in Computer Science LNCS Volume 4132/2006, Springer, pp. 113-122
- Skourlas C., Alevizos T., Belsis P., Fragos K., Kaburlasos V.G., Papadakis S., “Fuzzy Interval Numbers (FINs) techniques and its applications in Natural Language Queries Processing and documents classification”, Proceedings of BCI 2007, (Boyanov K. et al., eds.), pp. 17-28.
- Karanikolas N.N., Skourlas C., Bratos J., Extraction of training sets for experimentation with CLIR Systems, Intl. Conf. Automatics and Informatics 06, Oct. 3-6, Sofia
- Alevizos T., Kaburlasos V.G., Papadakis S., Skourlas C., Belsis P., Fuzzy Interval Number Techniques for Multilingual and Cross Language Information Retrieval. International Conference on Enterprise Information Systems ICEIS 2007, Madeira Portugal, pp.348-355
- Skourlas C., Alevizos T., Belsis P, Doulkeridis C., Malatras A. “Comparison, Selection and Merging of Techniques, Methods and Tools for Operational CLIR Systems – The Case of the Greek Language’’, proceedings of wseas conference on Computers 2005, Athens
- Lampropoulos, E. Galiotou, I. Manolessou, A. Ralli, “A Finite-State Approach to the Computational Morphology of Early Modern Greek”, Proceedings of the 7th wseas International Conference on Applied Computer Science (ACS'07), Venice, 2007, pp. 242-245.
- D. Sotiropoulos, E. Galiotou, C. Skourlas, “Application of α Word-Alignment Algorithm to Bilingual Greek-Latin Documents”, Proceedings of the 7th wseas International Conference on Applied Computer Science (ACS'07), Venice, 2007, pp. 238-241.
- A Greek morphological lexicon, which contains over 90.000 lemmas, i.e. 1.200.000 word forms (available on line: http://www.neurolingo.gr/en/online_tools/lexiscope.htm).
- A Greek thesaurus of synonyms and antonyms, which contains 22.000 lemmas (paper edition by Patakis Publications, 2005).
- An electronic dictionary of geographic names and toponyms (available on line: http://www.neurolingo.gr/en/online_tools/toponyms.htm).
- An ontology-based electronic dictionary of biomedical terms.
- Greek proofing tools (speller, hyphenator & thesaurus) for word processors and desktop publishing systems: MS Office (Win and Mac), Open/Star/Neo Office (Win, Mac and Linux) and QuarkXpress (Win and Mac).
- Greek Hyphenator and Speller incorporated in ADOBE Creative Suite (Acrobat, Illustrator, InDesign, Flash, Photoshop and Dreamweaver) (Win and Mac).
- Proofing Tools for Open Office bundled together with MAGENTA's Office Suite.
- Greek Lemmatizer, implementing the Word Breaking and Stemming functionality for the Encyclopedia of Hellenic World (http://www.ehw.gr), used via Microsoft Indexing Services.
- Tools for text mining, storing and management of Greek texts (Project of Industrial Research Development, No 99ΒΕ19, Ministry of Development, General Secretariat for Research and Technology).
- Lexicographic database and tools for the creation of the “Dictionary of Modern Greek as Second Language for Secondary School students” (Program for the Education of Muslim Children, Phases 1 and 2, EPEAK ΙΙ, in collaboration with the University of Athens, http://www.museduc.gr/index.php?page=2&sub=223).
- Information extraction by using pattern-based syntactic and semantic analysis, named entities extraction, and event recognition for ICAP S.A.
- Text mining by using ontologies and language processing tools (PARMENIDES, European project IST-2001-39023).
- Text data cleansing by using language processing technologies (Greek National Census 2001).
- Language tools and an ontology in the domain of history and culture (Project Meta-On, Ministry of Development, General Secretariat for Research and Technology).
- Language resources and software systems for processing of biomedical texts: a search engine, a web speller, a web concordancer, a morphosyntactic tagger, a semantic annotator, and an ontology browser (available on the IATROLEXI project site: http://www.iatrolexi.gr).