Benchmark dataset

Artificial Intelligence Human-Computer Interaction

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence

root February 18, 2024 0

The article ‘Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence’ critically assesses 23 state-of-the-art LLM benchmarks. The authors highlight significant limitations, such as biases, difficulties…

Artificial Intelligence Cryptography and Security

CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity

root February 13, 2024 0

The paper presents ‘CyberMetric’, a benchmark dataset composed of 10,000 questions from various cybersecurity sources. The dataset’s purpose is to assess and compare the knowledge of large language models (LLMs),…

Computer Vision and Pattern Recognition

DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation

root February 11, 2024 0

The article introduces a new dataset called DAPlankton that is aimed at developing and benchmarking domain adaptation methods for image recognition. Different imaging instruments and local differences in plankton species…

Computer Vision and Pattern Recognition

DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation

root February 10, 2024 0

The paper presents a new dataset called DAPlankton for developing and benchmarking domain adaptation methods for image recognition. The data consists of phytoplankton images captured using different imaging instruments. This…

Computer Vision and Pattern Recognition

UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial Imagery

root February 10, 2024 0

The article introduces UAV-Rain1k, a new benchmark dataset for removing raindrops from unmanned aerial vehicle (UAV) images. The researchers created the dataset to address the lack of focus on raindrop…

Artificial Intelligence Computation and Language

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

root January 28, 2024 0

The study introduces CMMU, a benchmark tool for evaluating the understanding and reasoning abilities of multi-modal large language models (MLLMs) in Chinese. It comprises 3,603 questions across 7 subjects, covering…

Machine Learning

LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method

root January 24, 2024 0

The article presents LLpowershap, a novel feature selection method that uses loss-based Shapley values to identify informative features and reduce noise. The method also demonstrates higher or equal predictive performance…

Artificial Intelligence Computation and Language

AlignBench: Benchmarking Chinese Alignment of Large Language Models

root December 1, 2023 0

The article introduces ALIGN BENCH, a comprehensive benchmark for evaluating alignment in Large Language Models (LLMs) for the Chinese language. The benchmark utilizes a human-in-the-loop data curation pipeline and includes…

Artificial Intelligence Cryptography and Security

Evaluating LLMs for Privilege-Escalation Scenarios

root October 19, 2023 0

The paper discusses the application of Large Language Models (LLMs) in penetration testing, with a focus on Linux privilege escalation. The authors developed a benchmark for Linux privilege escalation and…

Cryptography and Security

MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers

root October 19, 2023 0

The article presents MalDICT, a collection of four benchmark datasets that support different, under-represented malware classification tasks. Malware can be classified according to various attributes, and the ability to identify…

Page 1 of 2 Next

Press ESC to close

Benchmark dataset

Please allow ads on our site