Low-temperature thermochronology is a key technology for studying neotectonics and landscape evolution. However, it is intrinsically different from the other geochronological methods in the data expression, analysis and interpretation. In recent years, with the widespread adoption of low-temperature thermochronology techniques, the size volume of data has continuously increased, giving rise to many studies on tectonic geomorphic evolution based on big data. However, these data are mostly scattered across literature from different sources, with inconsistent formats and contents, and varying data quality, which to a certain extent hampers innovative research based on big data. There is a need to construct specialized databases to cope with the growing low-temperature thermochronology data and meet the demands of innovative research using big data.
In this paper, four conventional geochronological databases, including National Geochronological Data Base, Geochron, Petlab, DataView, and recent databases, AusGeochem and Sparrow are reviewed for comparison of their capability in data sources, data volume, data storage structure, completeness of data content, data entry methods, data retrieval methods, coverage areas, database update patterns, and data analysis tools. The conventional geochronological databases, of which the thermochronological data comprise only a small part, are generally stored in databases similar to or outside this subject, such as radioisotope chronology database, geochronology database, petrological mineral and geological analysis databases. They amplify the commonalities between different disciplines, and thus focus only on the presentation of sample units. It is not suitable for “big data” research, because all the data are managed by relational database with strictly structured tables and limited data sources. It was found that conventional geochronological databases design approaches are often suitable for absolute age data. However, low-temperature thermochronology differs from conventional geological dating methods, as its age values only record cooling time. The more geologically significant cooling history comes from numerical simulations based on elevation profiles, track lengths, and the diffusion dynamics models of the(U-Th)/He system. Additionally, the innovation in experimental techniques also imposes new requirements on the construction of thermochronology databases.
Comparing with the conventional geochronology databases, recent databases focus more on low-temperature thermochronological data and support both the structured and unstructured data with variable data sources, which makes it more comprehensive and professional. These databases own the characteristics of flexibility and expandability, especially for the addition of new dating methods and experimental methods, the storage of big data and the linkage between laboratories and database. Using different types of database platform and associated APIs, both relational and non-relational data can be involved and managed for data query, analysis and visualization. However, the construction of these recent databases is still in the preliminary exploration stage, and ensuring the continuous growth of data remains a challenge. Moreover, establishing a flexible numbering system for sustainable and expandable unique identification of samples and data is also an important task for recent databases. Finally, in addition to raw data, numerous thermal history information is included in published paper related to fission track. These interpretations or inverted results constitute interpretive data, which are crucial for reconstructing cooling history or tectonic uplift. Therefore, how to incorporate such data into the database is also a question that must be considered during database design.
The key to supporting the database lies in the users who it oriented. Considering the needs of users in professional field for scientific research management, experimental analysis and “big data” innovative research, as well as in view of the problems existing in the current databases, we put forward following suggestions for the future construction of low-temperature thermochronology database.
Firstly, in order to ensure the activity of specific low-temperature thermochronology database. from a technical perspective, artificial intelligence technologies such as natural language processing or other forms of machine learning algorithms should be utilized to semi-automatically or automatically extract information from paper, assisting users in quickly extracting relevant information and understanding the content of the literature. Platforms like Semantic Scholar, GeoDeepDive, and DeepShovel have implemented interactive features in data mining, wherein data is normalized and automated into the database based on user-specified rules, significantly reducing manpower and time costs in data acquisition, providing great convenience. In terms of ideology, the open-sharing academic ecosystem has given rise to open-sharing platforms such as arXiv for preprints, data repositories like Pangaea, and the Deep-Time Digital Earth integrated online research platforms, drastically shortening the cycle from research and experimentation to publication. This facilitates the incorporation of the latest research data into databases, greatly expanding the data sources. Regarding user volume, academic social networks possess advantages in academic tracking and dissemination, breaking down academic-related hierarchies, promoting academic exchange and cooperation, and attracting more users.
Secondly, more detailed data storage capabilities and simpler data operation behaviors help improve the expansibility of the database. Most existing geochronological databases use relational databases, which are a strictly structured way of storing data. The most typical data structure presentation form is two-dimensional table, which is very suitable for logical geological data. However, non-relational databases are not tables but databases oriented towards structured and unstructured data storage requirements, which have filled the gaps in relational databases. In practical applications, the advantages of both types of databases can be combined to comprehensively include basic geological information and interpretive information, achieving the effect of New SQL.
Thirdly, highlight its highlight. Chronological data of sample and the single data that make up the sample chronology are significant, it will be effective in distinguishing low-temperature thermochronology from other similar disciplines if the coding style of sample and single data that are not registered on IGSN can be standardized to highlight the characteristics of subject data.
Finally, by combining the strengths of both conventional and recent databases, incorporating the concept of open academia, leveraging advanced information mining and transmission technologies, and utilizing a storage approach that combines structured and unstructured data, it can greatly meet the comprehensive needs of users, ranging from laboratories to scientists, and further to data consumers.