Logo do repositório
 
Publicação

A Graph Database Representation of Portuguese Criminal-Related Documents

datacite.subject.fosCiências Sociais::Ciências da Comunicação
datacite.subject.fosCiências Naturais::Ciências da Computação e da Informação
datacite.subject.sdg08:Trabalho Digno e Crescimento Económico
datacite.subject.sdg09:Indústria, Inovação e Infraestruturas
datacite.subject.sdg10:Reduzir as Desigualdades
dc.contributor.authorCarnaz, Gonçalo
dc.contributor.authorNogueira, Vitor Beires
dc.contributor.authorAntunes, Mário
dc.date.accessioned2026-02-27T19:31:10Z
dc.date.available2026-02-27T19:31:10Z
dc.date.issued2021-06-04
dc.description.abstractOrganizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.eng
dc.description.sponsorshipThe authors would like to thank project “MOPREVIS - Modelação e Predição de Acidentes de Viação no Distrito de Setúbal”, with reference FCT DSAIPA/DS/0090/2018, financed by the Foundation for Science and Technology (FCT) within the scope of the National Initiative on Digital Skills e.2030, Portugal INCoDe.2030.
dc.identifier.citationCarnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037.
dc.identifier.doi10.3390/informatics8020037
dc.identifier.eissn2227-9709
dc.identifier.urihttp://hdl.handle.net/10400.8/15746
dc.language.isoeng
dc.peerreviewedyes
dc.publisherMDPI
dc.relationModeling and prediction of road traffic accidents in the district of Setúbal
dc.relation.hasversionhttps://www.mdpi.com/2227-9709/8/2/37
dc.relation.ispartofInformatics
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectknowledge representation
dc.subjectgraph databases
dc.subjectnatural language processing
dc.subjectcriminalrelated documents
dc.subjectcybersecurity
dc.subjectcriminal domain
dc.subjectpolice reports
dc.titleA Graph Database Representation of Portuguese Criminal-Related Documentseng
dc.typejournal article
dspace.entity.typePublication
oaire.awardTitleModeling and prediction of road traffic accidents in the district of Setúbal
oaire.awardURIhttp://hdl.handle.net/10400.8/15745
oaire.citation.endPage22
oaire.citation.issue2
oaire.citation.startPage1
oaire.citation.titleInformatics
oaire.citation.volume8
oaire.fundingStreamConcurso de Projetos de Investigação Científica e Desenvolvimento Tecnológico em Ciência dos dados e inteligência artificial na Administração Pública - 2018
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85
person.familyNameAntunes
person.givenNameMário
person.identifierR-000-NX4
person.identifier.ciencia-idAF10-7EDD-5153
person.identifier.orcid0000-0003-3448-6726
person.identifier.scopus-author-id25930820200
relation.isAuthorOfPublicatione3e87fb0-d1d6-44c3-985d-920a5560f8c1
relation.isAuthorOfPublication.latestForDiscoverye3e87fb0-d1d6-44c3-985d-920a5560f8c1
relation.isProjectOfPublication834c0624-1459-454b-9725-775541ae6ff9
relation.isProjectOfPublication.latestForDiscovery834c0624-1459-454b-9725-775541ae6ff9

Ficheiros

Principais
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
A graph database representation of Portuguese criminal-related documents.pdf
Tamanho:
1.34 MB
Formato:
Adobe Portable Document Format
Descrição:
Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.32 KB
Formato:
Item-specific license agreed upon to submission
Descrição: