Recherche:Pensées Profondes et Dialogues Pertinents/Annexe/Fiches de lecture

**Fiches de lecture**
Recherche : Pensées Profondes et Dialogues Pertinents

Annexe 2

Annexe de niveau 18.
Précédent :	Bibliographie

En raison de limitations techniques, la typographie souhaitable du titre, « Annexe : Fiches de lecture
Pensées Profondes et Dialogues Pertinents/Annexe/Fiches de lecture », n'a pu être restituée correctement ci-dessus.

Fiche 001: Revamping Question Answering with a Semantic Approach over World Knowledge

Conclusion

Cet article explique que les QAS basé sur le sens on battu les system de fouille de texte uniquement basé sur la forme, leur distribution et des lexiques / theausrus.

Il faut prendre en compte les ressources structurés à notre disposition (wipedia; wikidata), mais il faut un formalisme / metalangage / ontology (MultiNet?) pour pouvoir opérer sur des raisonnement sur la base de connaissance. Et pour le moment il y a pas de formalisme commun.

Metadata

Auteurs: ~~Nuno Cardoso~~, Iustin Dornescu, Sven Hartrumpf and ~~Johannes Leveling~~
Année: 2010
Mots clefs: QA, knowledge base, semantic, GikiCLEF, entailment
Citations: 4

Trame

GikiCLEF: QA, entity linking avec justification
Grâce à wikipedia les QAS semantique performent mieux que les approches dites textuelles
Description et limitations de l'approche textuel (basé sur la forme):
1. forme, mots clefs, co-occurrence, bag-of-word, ngram, stats → search engine → ranked paragraphs
2. pas de disambiguation, semantique "lexicale" basé sur un thesaurus / gazetteers
3. Fonctionne bien pour les question factuel ou de definition
4. Pas de raisonnement
5. Les formes textuel doivent apparaître dans la KB pour pouvoir répondre
Semantic QA (basé sur le sens):
1. web of data [AMA: web of sens]
2. search for and derive facts [AMA: sens]
3. question expansion on entity level & raisonnement
4. concepts & relation (vs. mots clef et co-occruence dans l'approche textuel)
5. disambiguation des concepts [AMA. sens ou entité]
6. RDF data
7. semantique formelle [pas sur dece que ça veux dire, peut-être "décrite avec un metalanguage"]
8. plusieurs sources peuvent être fusionnées
Question analysis
1. Grounding expected answer types (EAT) to a workable category / classification
2. Parse constraints from the question (ex: "born in Paris", "au XXIe siecle")
  1. [2]
  2. [4]
3. Semantic Analysis: Question → Machine-readable representation
  1. logical expansion via inference engine with rules for paraphrases, entailments and logical equivalent see [4]
  2. question → sparql based on grounded concpets see [2]
4. high precision, low recall

References Intéressantes

[2] Cardoso, N., Baptista, D., Lopez-Pellicer, F.J., Silva, M.J.: Where in the Wikipedia is that answer? the XLDB at the GikiCLEF 2009 task. In: Working Notes for CLEF 2009, Corfu, Greece (30 Sept.-2 Oct. 2009)
[4] Hartrumpf, S., Leveling, J.: GIRSA-WP at GikiCLEF: Integration of Structured Information and Decomposition of Questions. In: Working Notes for CLEF 2009, Corfu, Greece (30 Sept.-2 Oct. 2009)

Fiche 002: Semantic Decomposition for Question Answering

Conclusion

Décomposition de question complexe ou sous questions moins complexe.

Metadata

Auteurs: Sven Hartrumpf
Date: ???
Mots Clefs:

Trame (TODO)

TODO

Fiche 003: Question Answering using Sentence Parsing and Semantic Network Matching

Auteur: Sven Hartrumpf
Date: 204
Mots Clefs: QA, MultiNet, Insicht

Trame

Insicht:

Deep syntactico-semantic analysis with a parser for questions and documents.
Non-web QA, see Neumannand Xu, 2003
Generation of the answer from the semantic representation of the documents that support the answer. Answers are not directly extracted from the documents.

Similar to Harabagiu et al. (2001) except it use a theorem prover.

Document preprocessing
1. Elimination of duplicates
2. WOCADI WSD system (Helbig andHartrumpf, 1997; Hartrumpf, 2003) → Multinet semantic network of the document
3. Limitation of the size of thee semantic network
Query Exapansion
1. Semantic network is generated
2. Equivalent rules and Implementation rules for lexemes are used for bakckward chaining.
3. All rules are applied to find answers that are not explicitly contained in a documention but implied by it [AMA: vague].
4. Generation of alternative interpretation based on synonyms, hyponyms etc.
5. Query exapansion is done, instead of document expansion because the latter takes too much space,
Search
1. There is two space: Query feature space & Match network
2. Simplify the query networks [apparantly, they perform fact extraction]
3. ??? [Matching of the question of the KB is not explained]
Answers Generation
1. Generate an answer based on the matching feature spaces
Answer Selection
1. a preference for longer answers and a preference for more frequent answers are combined. Answer length is measured by the number of characters and words. In case of several supporting documents, the document whose ID comes alphabetically first is picked. This strategy is simple and open to improvements but works surprisingly well so far.
Conclusion:
1. Missing inferential knowledge: encode and semi-automatically acquire entailments etc.
2. Limited parser coverage
3. Ignoring partial semantic networks (produced by the parser in chunk mode): devise methods toutilize partial semantic networks for finding answers
4. Answers spread across several sentences are not found: apply the text mode of the parser (involvingintersentential coreference resolution
5. Long processing for documents: optimize the parser and develop on-demand processing strategies.

Fiche 004: Issues, Tasks and Program Structures, to Roadmap Research in Question & Answering (Q&A)

Conclusion

world knowledge / background knowledge / common knowledge are different things?
Très intéressant, ça trace la trame des objectifs pour les année 2005
À Relire

Metadata

Auteur: voir le document, il y a une liste d'email
Date: 2003
Mots Clefs: QA

Trame

Different level of question sophisitication
1. casual → simple facts / definitions found in a single document [fulltext search]
2. template →
3. cub
4. analyst →
  1. Complex, use judgment terms, knowledge of user context neeed, broadscope, background / world knowledge
  2. Search Multiple sources (media / languages), fusion of informations, Alternatives views, Adding interpretation [??], Drawing conclusions
What makes a good QA system:
1. Realtime, multiple users, fresh news / facts, always on
2. Accuracy,
  1. Precision >> incorrect answer ie. better no answer than the wrong.
  2. contradiction must be discovered and conflicting information must be dealt with
  3. QA must have world knowledge and mimic commen sens inference
3. Usability,
  1. domain-specific procedural knowledge must be incorporated
  2. domain-specific knowledge and its incorporation in the open-domain ontologies is very important
  3. heterogeneous data sources are used – information may be available in texts, in databases, in video clips or other media
  4. It must allow the user to describe the context of the question, and must provide with explanatory knowledge and ways of visualizing and navigating it
4. Completeness,
  1. Answer fusion in a coherent information is required.
  2. Must incorporate capabilities of reasoning
  3. Acquisition of user profiles is a method of enabling collaboration
5. Relevance,
  1. Specifics of the context
  2. Interactive Q/A, in which a sequence of questions helps clarify an information need, may be necessary
  3. The evaluation of Q&A system must be user-centered:
Dimensions of Q/A research (p4), how to move the R&D plane...
1. Question of the QA Problem, origin is simple factual question, the axis are labeled
  1. Context
  2. Increasing Knowledge Requirements
  3. Judgement
  4. Scope
2. Anser ofthe QA Problem, originis Simple answer, single source
  1. Fusion
  2. Increasing Knowledge Requirements
  3. Interpretation
  4. Multiple sources
Issues in Q&A Research
1. Question Classes: need for question taxonomies
  1. Graesser conducted a study of empirical completeness, showing that the taxonomy is able to accommodate virtually all inquiries that occur in a discourse.
  2. QUALM question categories
  3. TREC-8 Lasso
  4. book: The Process of Question Answering
  5. What is the question focus? But n addition, world knowledge interacts with profiling information
  6. Identify criteria along which question taxonomies should be formed.
  7. Correlate question classes with question complexity.
  8. Identify criteria marking the complexity of a question.
  9. Study models of question processing based on ontologies and knowledge bases. [specific domain vs NER]
2. Question Processing: Understanding, Ambiguities, Implicatures and Reformulation
  1. Detect uterances a questions
  2. Question processing must allow for follow-up questions and furthermore, for dialogues, in which the user and the system interact in a sequence of questions and answers, forming a common ground of beliefs, intentions and understanding.
  3. require implicatures. A class of such questions is represented by questions involving comparatives (e.g. “Which is the largest city in Europe? “ or superlatives “What French cities are larger than Bordeaux ? ”) The resolution of such questions requires the ability to generate scalar implicatures.
  4. sometimes more complicated inferences- that are based on pragmatics are needed. A classical example was introduced in Wendy Lehnert’s paper
  5. must incorporate also the process of translating a question into a set of equivalent questions
  6. ...
3. Context and Q&A
  1. A formal theory of the logic of contextual objects has been proposed by John McCarthy. Revisiting this theory and creating models for Q&A is necessary.
  2. Data soruces for Q&A
    1. Put together wordnet, framenet and other ontologies
    2. Put together data from different format (data.world, jailbreak, okfn datapackages
4. Data source for QA
  1. ...
5. Answer Extraction
  1. Extraction of simple and distributed answers; Answer Justification and Evaluation of Answer Correctness
  2. ...
6. Answer formulation
  1. ...
7. Real Time Question Answering
  1. Detect time bottlenecks: retrieval and answer extraction
  2. Study fast models of retrieval
  3. Study answer extraction techniques that have small execution times – enhance reasoning mechanisms with fast execution capabilities. In many applications execution time is essential: either the customer looses patience and interest or people are in critical missions – pilots in action, military in mission.
  4. Study scalability issues – Solution: distributed Q&A.
8. Multi-Lingual Question Answering
9. Interactive QA
10. User Profiling for Q&A
11. Collaborative Q&A

Fiche 005: Automatic Semantic Analysis for NLP Applications

Conclusion

Deep semantic representation appropriate for correct NLP
Problem of lexical and world knowledge acquisition. Automatic? Still need core concepts.
Presentation courte de MultiNet

Metadata

Auteurs: INGO GLÖCKNER, SVEN HARTRUMPF, HERMANN HELBIG, JOHANNES LEVELING & RAINER OSSWALD
Date: 2007
Mots Clefs: QA, deep semantic analysis, MultiNet

Trame

a “grand unified” formal semantic theory covering all or at least most of these phenomena is far from being available
Requirements for application-oriented semantic analysis
1. universal: not specific to a given model or language
2. complete: sufficently expressive to cover all aspects of NL
3. homogeneous & interoperable: because the same representation is used for disambiguation, reasoning, semantic parser, inference et NL generator
4. communicability: inter-coder agreement
5. allow for an efficient implementation
Question answering as an integrated task for semantic NLP
1. The most comprehensive NLP task integrating all aspects of NL understanding is question answering based on deep semantic analysis
2. approaches seem to have neglected word sense disambiguation
Meaning Representation with MultiNet
1. Concept (vertex) & relation (edges)
2. Distinction between generic concepts and instantiated concept which are related by subordination (SUB or SUBS)
3. Several attribute layers (FACT, REFER, QUANT)
4. Every concept is classified to a predefined conceptual ontology forming a hierarchy of sort
5. Every edge is labeled by a member of a fixed set of relations and functions
  1. 140 basic semantic relations
  2. three groups
    1. the intensional level
    2. the preextensional level
    3. lexial relation
  3. can be regarded as nodes at a meta level that are connected by axiomatic rules. These axioms, which have the form of logical implications, are of central importance to inference processes working over a MultiNet knowledge base. There are two general types of axioms:
    1. R-axioms: R-axioms connect relations and functions with each other and do not contain NL concepts.
    2. B-axioms: B-axioms connect relations and functions with representatives of NL concepts.
6. distinction between an intensional and a preextensional
7. Categorical and prototypical knowledge together form the immanent knowledge
8. stratification of different conceptual aspects aka. layers
  1. The layer specifications for edges are expressed by the attribute K-TYPE,
  2. GENER, FACT, REFER, QUANT, ETYPE, CARD, VARIA
Automatic Semantic Analysis
1. Lexicon: HanGaLex
2. Syntactico-Semantic Parser: WOCADI based on WCFA (Helbig 1986)
3. Semantic Integration (assimilation)
4. Resolution of direct references: anaphoric, cataphoric, deictic.
5. coreference resolution
6. Logical recurrence and bridging references: CORUDIS
Applications

Fiche 006: THE USE OF MULTILAYERED EXTENDED SEMANTIC NETWORKS FOR MEANING REPRESENTATION

Conclusion

Presentation courte de Multinet.

Metadata

Auteur: Hermann Helbig
Date: 2002
Mots Clefs: MultiNet, deep semantic

Trame

Logic oriented Knowledge Representation System cover a restricted fragment of NL
1. purely extensional ⇒ extension of a moutain? or hill? or charm?
2. model-theoric
3. claim that propositional sentences can be reduced to thruth conditions ⇒ makes it poor to represent conjunctions
KRS from AI (SN or Frame based),
1. have broader coverage but lack deeper understanding to their logical properties
2. basic idea stemming from cognition
MultiNet
1. underlying criteria for building it:
  1. universality
  2. cognitive adequacy: Every concept concept must have unique concept representative (not double negation??)
  3. homogeneity: usable for description of word sens and sentence meaning
  4. interoperability: must be applicable in all components ofan NLP system,
  5. completeness
  6. consistency
  7. optimal granularity
  8. local interpretability
2. [More description of multinet...]
3. [Example use for gradation entailment]
4. Different types of negation
5. Tools related to MultiNet

Fiche 007: Automatic Generation of Large Knowledge Bases using Deep Semantic and Linguistically Founded Methods

Conclusion

D'après cet article que la mise en capsule [tentative de traduction de frame] d'un texte reviens à resoudre les differentes formes de references.

Metadata

Auteurs: Sven Hartrumpf , Hermann Helbig and Ingo Phoenix
Date:
Mots-clefs: Semantic Analysis, KNowledge Bases, Text Understanding, NLP, Reference Resolution

Trame

From syntactic-semantic analysis of single sentences to KB representing the whole text
There is several approaches:
1. Sstatistical or pattern-based apparoach or vector space models → for extracting semantic information (eg.g semantic relation like conceptual subordination, part-whole relationship, etc.) BUT they don't cover the whole spectrum of semantic relationship, nor clear logic and semantic representation
2. Linguistically motivated approaches with strong syntactic-semantic analysis BUT limited semantic depth aka. shallow approach
There is also MultiNet.
Meaning Representation in MutliNet
1. Each node is classified inside 45 basic sorts
2. inner structure with attribute-value:
  1. Degree of generality (GENER in (ge, sp))
  2. Determination (REFER in (det, indet))
  3. Extensionality type:
    1. nil no extension
    2. 0 individual that is not a set,
    3. 1 entity with a set of ETYPE=0 elements as extension ex: many house, the family
    4. 2 entity with a set of ETYPE=1 elements ex: many families
3. The arcs may only be labeled by members of a fixed set of relations and functions
4. MultiNet provides a predefined set of semantic features
Treating text-constituting phenomena by assimilation
1. Grammatical and semantical References
  1. Coreference
  2. Proforms
  3. Inclufsion
  4. Semantic Recurrence

Pensées Profondes et Dialogues Pertinents

Bibliographie