Publicação
A Graph Database Representation of Portuguese Criminal-Related Documents
| datacite.subject.fos | Ciências Sociais::Ciências da Comunicação | |
| datacite.subject.fos | Ciências Naturais::Ciências da Computação e da Informação | |
| datacite.subject.sdg | 08:Trabalho Digno e Crescimento Económico | |
| datacite.subject.sdg | 09:Indústria, Inovação e Infraestruturas | |
| datacite.subject.sdg | 10:Reduzir as Desigualdades | |
| dc.contributor.author | Carnaz, Gonçalo | |
| dc.contributor.author | Nogueira, Vitor Beires | |
| dc.contributor.author | Antunes, Mário | |
| dc.date.accessioned | 2026-02-27T19:31:10Z | |
| dc.date.available | 2026-02-27T19:31:10Z | |
| dc.date.issued | 2021-06-04 | |
| dc.description.abstract | Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65. | eng |
| dc.description.sponsorship | The authors would like to thank project “MOPREVIS - Modelação e Predição de Acidentes de Viação no Distrito de Setúbal”, with reference FCT DSAIPA/DS/0090/2018, financed by the Foundation for Science and Technology (FCT) within the scope of the National Initiative on Digital Skills e.2030, Portugal INCoDe.2030. | |
| dc.identifier.citation | Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037. | |
| dc.identifier.doi | 10.3390/informatics8020037 | |
| dc.identifier.eissn | 2227-9709 | |
| dc.identifier.uri | http://hdl.handle.net/10400.8/15746 | |
| dc.language.iso | eng | |
| dc.peerreviewed | yes | |
| dc.publisher | MDPI | |
| dc.relation | Modeling and prediction of road traffic accidents in the district of Setúbal | |
| dc.relation.hasversion | https://www.mdpi.com/2227-9709/8/2/37 | |
| dc.relation.ispartof | Informatics | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | knowledge representation | |
| dc.subject | graph databases | |
| dc.subject | natural language processing | |
| dc.subject | criminalrelated documents | |
| dc.subject | cybersecurity | |
| dc.subject | criminal domain | |
| dc.subject | police reports | |
| dc.title | A Graph Database Representation of Portuguese Criminal-Related Documents | eng |
| dc.type | journal article | |
| dspace.entity.type | Publication | |
| oaire.awardTitle | Modeling and prediction of road traffic accidents in the district of Setúbal | |
| oaire.awardURI | http://hdl.handle.net/10400.8/15745 | |
| oaire.citation.endPage | 22 | |
| oaire.citation.issue | 2 | |
| oaire.citation.startPage | 1 | |
| oaire.citation.title | Informatics | |
| oaire.citation.volume | 8 | |
| oaire.fundingStream | Concurso de Projetos de Investigação Científica e Desenvolvimento Tecnológico em Ciência dos dados e inteligência artificial na Administração Pública - 2018 | |
| oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |
| person.familyName | Antunes | |
| person.givenName | Mário | |
| person.identifier | R-000-NX4 | |
| person.identifier.ciencia-id | AF10-7EDD-5153 | |
| person.identifier.orcid | 0000-0003-3448-6726 | |
| person.identifier.scopus-author-id | 25930820200 | |
| relation.isAuthorOfPublication | e3e87fb0-d1d6-44c3-985d-920a5560f8c1 | |
| relation.isAuthorOfPublication.latestForDiscovery | e3e87fb0-d1d6-44c3-985d-920a5560f8c1 | |
| relation.isProjectOfPublication | 834c0624-1459-454b-9725-775541ae6ff9 | |
| relation.isProjectOfPublication.latestForDiscovery | 834c0624-1459-454b-9725-775541ae6ff9 |
Ficheiros
Principais
1 - 1 de 1
Miniatura indisponível
- Nome:
- A graph database representation of Portuguese criminal-related documents.pdf
- Tamanho:
- 1.34 MB
- Formato:
- Adobe Portable Document Format
- Descrição:
- Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.
Licença
1 - 1 de 1
Miniatura indisponível
- Nome:
- license.txt
- Tamanho:
- 1.32 KB
- Formato:
- Item-specific license agreed upon to submission
- Descrição:
