Marcos Garcia
Postdoctoral Researcher in Natural Language Processing.
LyS (Language and Information Society) Group,
- I am currently working at the CiTIUS research center of the University of Santiago de Compostela.
- I gave an invited talk about open resources for Galician at OpenCor 2020 (co-located with PROPOR 2020, March, 2).
- I'm co-organizing the Workshop on Hybrid Intelligence for Natural Language Processing Tasks (HI4NLP), which will be co-located at ECAI-2020.
- I've been co-chair of techLING (track 3, Computational Linguistics), hold in Corunha (October 9 to 11).
- I've been guest editor of the Special Issue Natural Language Processing and Text Mining (Open Access Journal Information).
Education
- Ph.D. in Linguistics (NLP), University of Santiago de Compostela (2014).
- D.E.A. in Galician and Portuguese Philology, University of Santiago de Compostela (2009).
- MA in Linguistics, University of Lisbon (2008).
- Licenciatura in Portuguese Philology, University of Santiago de Compostela (2005).
Awards
- Best PhD Dissertation Award at PROPOR 2016.
- Best PhD Award in Arts and Humanities 2014/2015 at the USC.
Grants
- Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation, 2017.
- Juan de la Cierva incorporación 2016 (postdoctoral grant).
- Juan de la Cierva formación 2014 (postdoctoral grant).
- PhD Grant, University of Santiago de Compostela, 2010.
- Research Grant, Instituto Camões, 2007-2009.
Affiliations
- 2017 (current): CITIC.
- 2016 (current): LyS Group.
- 2011-2015: CiTIUS researcher.
- 2009-2015: GE / ProLNat Group.
- 2007-2008: Galabra Group.
- 2006-2007: Natural Language and Speech Group, NLX.
Teaching
- Introduction to Natural Language Processing for Lexicography (EMLex - European Master in Lexicography, UMinho), 2018/2019.
- Resources and tools for lexicography: use and design II (EMLex - European Master in Lexicography, USC), 2017/2018, 2018/2019.
- Language and technologies (Faculty of Philology, UdC), 2016/2017, 2017/2018.
- General linguistics (Faculty of Philology, UdC), 2016/2017, 2017/2018.
- Phonetics and phonology of Spanish (Faculty of Philology, USC), 2011/2012.
- Computational analysis of Spanish Texts (Faculty of Philology, USC), 2010/2011.
- Acoustic phonetics: theory and softwares (Voice Science Postgraduate at ISAVE), 2008/2009.
Publications
2019
- Garcia, Marcos, Marcos García-Salido and Margarita Alonso-Ramos, 2019. Weighted compositional vectors for translating collocations using monolingual corpora. In Computational and Corpus-Based Phraseology (EUROPHRAS 2019). Lecture Notes in Artificial Intelligence, 11755. Springer: 113-128.
- García-Salido, Marcos, Marcos Garcia and Margarita Alonso-Ramos, 2019. Identifying lexical bundles for an academic writing assistant in Spanish. In Computational and Corpus-Based Phraseology (EUROPHRAS 2019). Lecture Notes in Artificial Intelligence, 11755. Springer: 144-158.
- Gamallo, Pablo, Marcos Garcia, and Patricia Martín-Rodilla, 2019. NER and Open Information Extraction for Portuguese. Notebook for IberLEF 2019 Portuguese Named Entity Recognition and Relation Extraction Tasks. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), co-located with 35th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019): 457-467.
- Canosa, Xavier, Pablo Gamallo, Xavier Varela, José Ángel Taboada, Paulo Martínez Lema and Marcos Garcia, 2019. Uma utilidade para o reconhecimento de topónimos em documentos medievais. Linguamática, 11(1), p. 3-15.
- Garcia, Marcos and Marcos García-Salido, 2019. A method to automatically identify diachronic variation in collocations. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change 2019 at the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019): 71-80, Florence.
- Gamallo, Pablo and Marcos Garcia, 2019. Unsupervised Compositional Translation of Multiword Expressions. In Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019) at the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019): 40-48, Florence.
- Garcia, Marcos, Marcos García-Salido and Margarita Alonso-Ramos, 2019. A comparison of statistical association measures for identifying dependency-based collocations in various languages. In Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019) at the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019): 49-59, Florence.
- Garcia, Marcos, Marcos García-Salido, Susana Sotelo Docío, Estela Mosqueira and Margarita Alonso-Ramos, 2019. Pay attention when you pay the bills. A multilingual corpus with dependency-based and semantic annotation of collocations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019): 4012-4019, Florence.
- Garcia, Marcos, Marcos García-Salido and Margarita Alonso-Ramos, 2019. Towards the automatic construction of a multilingual dictionary of collocations using distributional semantics. In Proceedings of eLex 2019: Smart Lexicography: 747-762, Sintra.
- García-Salido, Marcos Garcia and Margarita Alonso-Ramos, 2019. Towards a graded dictionary of Spanish collocations. In Proceedings of eLex 2019: Smart Lexicography: 849-864, Sintra.
- Garcia, Marcos, Marcos García-Salido and Miguel A. Alonso, 2019. Exploring cross-lingual word embeddings for the inference of bilingual dictionaries. In Proceedings of TIAD-2019 Shared Task – Translation Inference Across Dictionaries co-located with the 2nd Language, Data and Knowledge Conference (LDK 2019): 32-41. Leipzig. CEUR-WS, Vol. 2493.
- Garcia, Marcos, Marcos García-Salido and Margarita Alonso-Ramos, 2019. Discovering bilingual collocations in parallel corpora: A first attempt at using distributional semantics. In Irene Doval & María Teresa Sánchez-Nieto (eds.), Parallel corpora for contrastive and translation studies: New resources and applications. Studies in Corpus Linguistics, 90, p. 267-279. John Benjamins Publishing Company.
2018
- Garcia, Marcos, 2018. Comparing bilingual word embeddings to translation dictionaries for extracting multilingual collocation equivalents. In Stella Markantonatou, Carlos Ramisch, Agata Savary and Veronika Vincze (eds.), Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop. Phraseology and Multiword Expressions 3: 319-342. Language Science Press.
- Gamallo, Pablo, Marcos Garcia, César Piñeiro, Rodrigo Martínez-Castaño and Juan C. Pichel, 2018. LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction. In Proceedings of The Second International Workshop on Advances in Natural Language Processing (ANLP 2018) at The Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS-2018): 239-244. Valencia.
- Gamallo, Pablo and Marcos Garcia, 2018. Task-Oriented Evaluation of Dependency Parsing with Open Information Extraction, In Villavicencio, Aline, Viviane Moreira, Alberto Abad, Helena Caseli, Pablo Gamallo, Carlos Ramisch, Hugo Gonçalo Oliveira and Gustavo Henrique Paetzold (eds.), Computational Processing of the Portuguese Language. 13th International Conference, PROPOR 2018, Canela, Brazil Proceedings, volume 11122 of Lecture Notes in Artificial Intelligence: 77-82, Springer. (draft)
- Silva, João Silva, Marcos Garcia, João Rodrigues and António Branco, 2018. LX-SemanticSimilarity. In 13th International Conference on the Computational Processing of the Portuguese Language (PROPOR 2018). Demo papers: 4-6. Canela, Brazil, 2018.
- Garcia, Marcos, 2018. Extracción automática de equivalentes multilingües de colocaciones. Procesamiento del Lenguaje Natural, 61, p. 131-134.
- García-Salido, Marcos and Marcos Garcia, 2018. Comparing learners’ and native speakers’ use of collocations in written Spanish. International Review of Applied Linguistics in Language Teaching (IRAL) 56(4), p. 401-426 (aop 2017). (draft)
- Gamallo, Pablo and Marcos Garcia, 2018. Dependency parsing with finite state transducers and compression rules. Information Processing & Management, 54(6), p. 1244-1261. (draft)
- García-Salido, Marcos, Marcos Garcia, Milka Villayandre and Margarita Alonso-Ramos, 2018. A Lexical Tool for Academic Writing in Spanish based on Expert and Novice Corpora. In Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis and Takenobu Tokunaga (eds.), Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC 2018), Miyazaki: 260-265.
- Gamallo, Pablo, Iván Rodríguez-Torres and Marcos Garcia, 2018. Distributional Semantics for Diachronic Search. Computers and Electrical Engineering (Special section on New Trends in Humanistic Informatics: Implementations and Applications), 65, p. 438-448.
- Garcia, Marcos, Carlos Gómez-Rodríguez and Miguel A. Alonso, 2018. New treebank or repurposed? On the feasibility of cross-lingual parsing of Romance languages with Universal Dependencies. Natural Language Engineering, 24(1), p. 91-122. (draft)
2017
- Querido, Andreia, Rita de Carvalho, João Rodrigues, Marcos Garcia, João Silva, Catarina Correia, Nuno Rendeiro, Rita Pereira, Marisa Campos and António Branco, 2017. LX-LR4DistSemEval: a collection of language resources for the evaluation of distributional semantic models of Portuguese. Revista da Associação Portuguesa de Linguística, 3, p. 265-283.
- Alonso-Ramos, Margarita, Marcos García-Salido and Marcos Garcia, 2017. Exploiting a Corpus to Compile a Lexical Resource for Academic Writing: Spanish Lexical Combinations. In Electronic lexicography in the 21st century. Proceedings of the eLex 2017 conference, Leiden: 571-584.
- Vilares, David, Marcos Garcia, Miguel A. Alonso and Carlos Gómez-Rodríguez, 2017. Towards Syntactic Iberian Polarity Classification. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2017) at EMNLP 2017: Conference on Empirical Methods in Natural Language Processing, Copenhagen: 67-73.
- Garcia, Marcos and Pablo Gamallo, 2017. A rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver: 274-282.
- Gamallo, Pablo and Marcos Garcia, 2017. LinguaKit: uma ferramenta multilingue para a análise linguística e a extração de informação. Linguamática, 9(1), p. 19-28.
- Gamallo, Pablo, Iván Rodríguez-Torres and Marcos Garcia, 2017. A Web Interface for Diachronic Semantic Search in Spanish. In Proceedings of the Software Demonstrations at the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia: 45-48.
- Garcia, Marcos, Marcos García-Salido and Margarita Alonso-Ramos, 2017. Using bilingual word-embeddings for multilingual collocation extraction. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) at the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia: 21-30.
2016
- Garcia, Marcos, Carlos Gómez-Rodríguez and Miguel A. Alonso, 2016. Creación de un treebank de dependencias universales mediante recursos existentes para lenguas próximas: el caso del gallego. Procesamiento del Lenguaje Natural, 57, p. 33-40.
- Garcia, Marcos, 2016. Universal Dependencies Guidelines for the Galician-TreeGal Treebank. Technical Report, LyS Group, University of Corunha.
- Garcia, Marcos, 2016. Semantic Relation Extraction. Resources, Tools and Strategies. In João Silva, Ricardo Ribeiro, Paulo Quaresma, André Adami and António Branco (eds.), PROPOR 2016, Computational Processing of the Portuguese Language. Lecture Notes in Artificial Intelligence, 9727. Springer: 141-152. Best PhD Dissertation Award at PROPOR 2016.
- Gamallo, Pablo and Marcos Garcia, 2016. Entity Linking with Distributional Semantics. In João Silva, Ricardo Ribeiro, Paulo Quaresma, André Adami and António Branco (eds.), PROPOR 2016, Computational Processing of the Portuguese Language. Lecture Notes in Artificial Intelligence, 9727. Springer: 177-188.
- Garcia, Marcos, 2016. Incorporating Lexico-semantic Heuristics into Coreference Resolution Sieves for Named Entity Recognition at Document-level. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis (eds.), Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), Portorož: 3357-3361.
2015
- Gamallo, Pablo and Marcos Garcia, 2015. Multilingual Open Information Extraction. In Francisco Pereira, Penousal Machado, Ernesto Costa and Amílcar Cardoso (eds.): EPIA 2015, Progress in Artificial Intelligence. Lecture Notes in Computer Science, 9273. Berlin: Springer-Verlag: 711-722.
- Garcia, Marcos and Pablo Gamallo, 2015. Exploring the Effectiveness of Linguistic Knowledge for Biographical Relation Extraction. Natural Language Engineering, 21(4), p. 519-551 (First Online, 2013).
- Garcia, Marcos and Pablo Gamallo, 2015. Yet Another Suite of Multilingual NLP Tools. In José-Luis Sierra-Rodríguez, José Paulo Leal and Alberto Simões (eds.), Languages, Applications and Technologies. Communications in Computer and Information Science, 563. Switzerland: Springer: 65-75. Revised Selected Papers of the Symposium on Languages, Applications and Technologies (SLATE 2015), Madrid.
- Gamallo, Pablo, Marcos Garcia, Iria del Río and Isaac González López, 2015. Avalingua: Natural language processing for automatic error detection. In Marcus Callies and Sandra Götz (eds.), Learner Corpora in Language Testing and Assessment. Studies in Corpus Linguistics, 70, p. 35-58. John Benjamins Publishing Company. (draft)
2014
- Garcia, Marcos, 2014. Extracção de relações semânticas. Recursos, ferramentas e estratégias. PhD Thesis. University of Santiago de Compostela. Best PhD Award in Arts and Humanities 2014/2015.
- Abuín, José Manuel, Juan Carlos Pichel, Tomás Fernández Pena, Pablo Gamallo and Marcos Garcia, 2014. Perldoop: Efficient Execution of Perl Scripts on Hadoop Clusters. In Proceedings of the 2014 IEEE International Conference on Big Data (IEEE Big Data 2014). Washington DC.
- Gamallo, Pablo, Juan Carlos Pichel, Marcos Garcia, José Manuel Abuín and Tomás Fernández Pena, 2014. Análisis morfosintáctico y clasificación de entidades nombradas en un entorno Big Data. Procesamiento del Lenguaje Natural, 53, p. 17-24.
- Garcia, Marcos and Pablo Gamallo, 2014. Entity-Centric Coreference Resolution of Person Entities for Open Information Extraction. Procesamiento del Lenguaje Natural, 53, p. 25-32.
- Garcia, Marcos, Pablo Gamallo, Iria Gayo and Miguel Anxo Pousada Cruz, 2014. PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems. Procesamiento del Lenguaje Natural, 53, p. 95-101.
- Gamallo, Pablo, Marcos Garcia, Susana Sotelo and José Ramom Pichel, 2014. Comparing Ranking-based and Naive Bayes Approaches to Language Detection on Tweets. In Proceedings of TweetLID: Twitter Language Identification Workshop at XXX Congreso de la Sociedad Española de Procesamiento del Lenguaje Natural (SEPLN 2014), Girona: 12-16.
- Garcia, Marcos and Pablo Gamallo, 2014. An Entity-Centric Coreference Resolution System for Person Entities with Rich Linguistic Information. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin: 741-752.
- Gamallo, Pablo and Marcos Garcia, 2014. Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin: 171-175.
- Garcia, Marcos and Pablo Gamallo, 2014. Multilingual corpora with coreferential annotation of person entities. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk and Stelios Piperidis (eds.), Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.
2013
- Gamallo, Pablo, Marcos Garcia and Santiago Fernández-Lanza, 2013. A Naive-Bayes strategy for sentiment analysis on Spanish tweets. In Alberto Díaz Esteban, Iñaki Alegria and Julio Villena Román (eds.), Proceedings of the Workshop on Sentiment Analysis (TASS 2013) at the XXIX Congreso de la Sociedad Española de Procesamiento del Lenguaje Natural (SEPLN 2013), Madrid: 126-132.
- Gamallo, Pablo, Marcos Garcia and José Ramom Pichel, 2013. A Method to Lexical Normalisation of Tweets. In Alberto Díaz Esteban, Iñaki Alegria and Julio Villena Román (eds.), Proceedings of the Tweet Normalization Workshop at the XXIX Congreso de la Sociedad Española de Procesamiento del Lenguaje Natural (SEPLN 2013), Madrid: 81-85.
- Gamallo, Pablo, Marcos Garcia, Isaac González, Marta Muñoz and Iria del Río, 2013. An evaluation of Avalingua based on learner corpora. In Proceedings of the Workhsop on (Learner) Corpora and their application in language testing and assessment at English corpus linguistics on the move: Applications and implications (ICAME 34), Santiago de Compostela: 52-53.
- Gamallo, Pablo and Marcos Garcia, 2013. FreeLing e TreeTagger: um estudo comparativo no âmbito do Português. Technical Report, ProLNat Group, University of Santiago de Compostela.
- Gamallo, Pablo, Marcos Garcia, Isaac González, Marta Muñoz and Iria del Río, 2013. Learning verb inflection using Cilenis conjugators. In Ana Gimeno (ed.), The Eurocall Review, 21(1), p. 12-19.
2012
- Gamallo, Pablo and Marcos Garcia, 2012. Técnicas de procesamiento del lenguaje natural en la Recuperación de Información. Novática, 215, p. 42-47.
- Garcia, Marcos, Iria Gayo and Isaac González López, 2012. Identificação e Classificação de Entidades Mencionadas em Galego. Estudos de Lingüística Galega, 4, p. 13-25.
- Gamallo, Pablo, Marcos Garcia and Santiago Fernández-Lanza, 2012. Dependency-Based Open Information Extraction. In Proceedings of the ROBUS-UNSUP 2012: Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012). Avignon: 10-18.
- Gamallo, Pablo and Marcos Garcia, 2012. Extraction of Bilingual Cognates from Wikipedia. In Helena Caseli, Aline Villavicencio, António Teixeira and Fernando Perdigão (eds.): PROPOR 2012, Computational Processing of the Portuguese Language. Lecture Notes in Artificial Intelligence, 7243. Berlin: Springer-Verlag: 63-72.
- Garcia, Marcos and Isaac González López, 2012. Automatic Phonetic Transcription by Phonological Derivation. In Helena Caseli, Aline Villavicencio, António Teixeira and Fernando Perdigão (eds.): PROPOR 2012, Computational Processing of the Portuguese Language. Lecture Notes in Artificial Intelligence, 7243. Berlin: Springer-Verlag: 350-361.
2011
- Garcia, Marcos and Pablo Gamallo, 2011. A Weakly-Supervised Rule-Based Approach for Relation Extraction. In Jose A. Lozano, Jose A. Gámez and José A. Moreno Pérez (eds.), Proceedings of the XIV Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2011). Workshop on Knowledge Extraction and Exploitation from Semi-structures Online Sources (KEESOS). La Laguna.
- Gamallo, Pablo and Marcos Garcia, 2011. A Resource-Based Method for Named Entity Extraction and Classification. In L. Antunes and H. S. Pinto (eds.): EPIA 2011, Progress in Artificial Intelligence. Lecture Notes in Computer Science (LNCS/LNAI), 7026/2011. Berlin: Springer-Verlag: 610-623.
- Garcia, Marcos and Pablo Gamallo, 2011. Dependency-Based Text Compression for Semantic Relation Extraction. In Preslav Nakov, Zornitsa Kozareva, Kuzman Ganchev and Jerry Hobbs (eds.), Proceedings of the Workshop on Information Extraction and Knowledge Acquisition (IEKA 2011) at 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), Hissar: 21-28.
- Garcia, Marcos and Pablo Gamallo, 2011. Evaluating Various Linguistic Features on Semantic Relation Extraction. In Galia Angelova, Kalina Bontcheva, Ruslan Mitkov and Nikolai Mikolov (eds.), Proceedings of the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), Hissar: 721-726.
- Garcia, Marcos and Isaac González López, 2011. Conversión Fonética Automática con Información Fonológica para el Gallego. Procesamiento del Lenguaje Natural, 47, p. 283-291.
- Garcia, Marcos and Pablo Gamallo, 2011. Resolución de Correferencia de Nombres de Persona para Extracción de Información Biográfica. Procesamiento del Lenguaje Natural, 47, p. 47-55.
- Garcia, Marcos and Pablo Gamallo, 2011. An Exploration of the Linguistic Knowledge for Semantic Relation Extraction in Spanish. In Patrick Saint-Dizier and Rutu Mehta-Melkar (eds.), Proceedings of the Joint Workshop FAM-LbR/KRAQ'11. Learning by Reading and its Applications in Intelligent Question-Answering at 22nd International Joint Conference on Artificial Intelligence (IJCAI'11), Barcelona: 7-12.
2010
- Garcia, Marcos, 2010. O Segmento lateral /l/ em Rima Interna. Sonoridade e Nuclearização em Português Europeu. Linguística. Revista de Estudos Linguísticos da Universidade do Porto, 5, p. 53-70.
- Garcia, Marcos and Pablo Gamallo, 2010. Análise Morfossintáctica para Português Europeu e Galego: Problemas, Soluções e Avaliação. Linguamática, 2(2), p. 59-67.
- Garcia, Marcos and Pablo Gamallo, 2010. Using Morphosyntactic Post-processing to Improve POS-tagging Accuracy. In Proceedings of the 9th International Conference on Computational Processing of Portuguese Language (PROPOR 2010). Extended Activities Proceedings, Porto Alegre.
- Garcia, Marcos and Pablo Gamallo, 2010. Do processamento morfológico à análise sintáctica de corpora multilíngue. In Actas del XXXIX Simposio Internacional de la Sociedad Española de Lingüística, Santiago de Compostela.
2009
- Garcia, Marcos, 2009. Como somos vistos em Portugal? A visão da Galiza através dos visitantes portugueses. In Actas do IX Congreso Internacional de Estudos Galegos. Novas achegas ao estudo da cultura galega II. Enfoques socio-históricos e lingüístico-literarios. Capítulo III, 345-352 (2012).
- Garcia, Marcos, 2009. A imagem da Galiza através dos visitantes portugueses. Literatura, Turismo e Identidade. TIT, University of Santiago de Compostela.
2008
- Garcia, Marcos, 2008. Português Europeu e Galego. Estudo fonético e fonológico das consoantes em rima medial. MA Thesis. University of Lisbon.
- Garcia, Marcos, 2008. Aproximação ao rotacismo de /S/ pós-nasal nos dialectos ocidentais galegos. Estudos Linguísticos/Linguistic Studies, 1, p. 179-192.
- Garcia, Marcos, 2008. Turismo e Identidade. As motivações culturais dos visitantes portugueses à Galiza. Primeiras aproximações. In Helena Rebelo (coord.), Actas do IX Congresso da Associação Internacional de Lusitanistas (vol. 1), p. 265-270 (2011).
Projects
- 2020-2023: Study of lexical combinations in an academic corpus of novices for a tool to assist in the writing of academic texts (MICINN).
- 2018-2021: Advances in new answer extraction systems with semantic analysis and deep learning (MINECO).
- 2017-2019 (PI): Automatic extraction of multilingual collocation equivalents (FBBVA).
- 2017-2019: Corpus-based study of lexical combinations of academic Spanish for a tool to assist in the writing of academic texts (MINECO).
- 2015-2017: Language technologies for opinion analysis in social networks (MINECO).
- 2013-2016: HPCNLP: High Performance Computing for Natural Language Processing (Galician Government).
- 2012: CELTIC: Strategic knowledge with Competitive Intelligence technologies (FEDER-Innterconecta).
- 2011-2013: OntoPedia: Automatic extraction of ontological and encyclopedic information about named entities (MICINN).
- 2011: CORUXA Biomedical Text Mining: automatic extractor and codifier of relevant medical information by open-source language engineering (Industry project).
- 2010: COATI: multilingual opinion mining for industry and public administration (INCITE).
- 2008-2009: Automatic Design of a Proper Noun Ontology for a Question-Answering System (MEC).
- 2005-2007: GramaXing - Computational Grammar for Deep Linguistic Processing of Portuguese (FCT).
- 2004-2006: TagShare - Tagging and Shallow Processing Tools and Resources (FCT).
Committees
- ACL (2020; SRW 2020, 2019; Demos 2019, 2018).
- STIL 2019.
- CILC 2019.
- EMNLP (Demos 2019 and 2018).
- SEMAPRO 2018 and 2019.
- PROPOR (2018, SRW 2020).
- ICIW 2018 and 2019.
- NAACL-HLT 2018.
- CoNLL (2019, and UD Shared Tasks 2017 and 2018).
- SLATE 2017.
- LREC 2020, 2016, 2014.
- LinguaMÁTICA.
Reviewing
Resources and tools
- Diachronic explorer.
- LX Semantic Similarity.
- LinguaKit.
- Galician-TreeGal (Universal Dependencies treebank).
- FreeLing (Portuguese and Galician modules).
Also, you can find some outdated NLP resources and tools developed during my PhD at the University of Santiago de Compostela.