Mario Capurso
Published:
Total Pages: 355
Get eBook
This work follows the ICDL (International Certification of Digital Literacy) Database Basic and Advanced Syllabus, expanded according to the document on Curriculum Guidelines for Undergraduate Degree Programs in Computer Science of December 20, 2013 by the Association for Computing Machinery and IEEE Computer Society. As for the know-how aspects (skills), some use Microsoft Access, which is not professional and has a non-standard version of SQL. This text uses MySQL and SQLite. They are professional, open source, totally free and widely used and easy to install. This satisfies the skills of the ICDL modules. However, the question of how to use this data remains. To do this today you need to master the Python language or the R language, which require learning times and delay the start of practice by weeks. There is a third possibility: using visual environments that allow you to make applications without knowing any language. Orange is one of these. It is visual but is based on Python, it allows you to make applications without knowing the language but also allows you to extend the application if and when you know Python. In addition, MySQL and SQLite coexist with Python and Orange Data Mining. This text uses Orange as an environment for experimentation and exercise in Data Science. It is possible to decide not to install Orange in case one is interested exclusively in SQL. In this case the reader will be free to skip the application exercises with Orange and return to them later if he/she feels the need. It should be clarified that this text follows the ICDL Syllabus and provides the skills associated with the modules in question, but it is not able to guarantee that the reader will be able to automatically pass the certification exam. In fact, it requires the purchase of a skill card, registration with a test center, compliance with a series of rules dictated by the national member organizations of the ICDL consortium and by the test center, and all of this is beyond what we can guarantee. After describing the installation of the programs used for the exercises, the text considers the types of data and their representations, including images and documents. The concepts of System, Information System and Database are introduced, as well as the most common practices of data security and privacy. The relational model and SQL are also explained with application examples with MySQL and SQLite. The various types of Joins, sorting, aggregation and grouping queries, integrity constraints, GRANT and REVOKE security features, views, indexing, Normal Forms and Normalization are then analyzed. Multi-user access to databases, interference and deadlock, locking techniques and transactions are then considered. Distributed databases and the possible options with MySQL and SQLite are then described. The limits of the relational model and the most common non-relational models (NOSQL) are outlined, the conceptual Entity-Relationship and object models according to ISO/UM and the process for moving from the problem text to the conceptual and logical relational model. The data integration process is outlined also with the use of data warehouses, data lakes and mediators, data cleaning, management of missing, repeated, anomalous and incorrect values, coding of categorical values. Finally, the project objectives are distinguished according to the best model, whether relational or non-relational. The text is accompanied by supporting material and it is possible to download the examples and test data.