Archived and Closed
This conversation is no longer open for comments or replies and is no longer visible to community members. The community moderator provided the following reason for archiving: Old thread
I have one complaint, however, of a technical kind. I find some of the files really hard to implement in a database, such as the actors and the actresses flat files. To include some data in these files - such as line numbers and actor/actress ID's on every line - would be very helpful in setting these files up for inclusion in a relational DBMS. It is possible to insert line numbers in front of each line with a vbs script operating on the file, but inserting actor/actress ID's is a lot more challenging. On close inspection, there seems to be some ambiguity in these files as to the establishing of the identity of the actors/actresses. Even though great care seems to have been taken to ensure that each actor/actress name is unique in the database and belongs to one and the same person, some identical names still come across as to belong to different people in several cases. And - the other way around - in some cases the same person seems to carry slightly different names. I also noticed that there are blank lines (or vertical spaces) in the files. In the Actors and Actresses files (for instance) I take these blank lines to be separators between two different actors/actresses, but I have my doubts if the separator is always correctly placed. For in several instances I noticed that such a separator was indeed incorrectly placed. It would be a time-consuming job (and even not always possible) to correct all of these mistakes. There are many instances of such ambiguities, which makes it very hard to create fully normalized tables.
Also, these particular two files seem to be made up with a reading purpose in mind rather than to be used in a database system. Needless to say that documents like these aren't very practical for reading purposes, as they contain millions of lines.
So in the end, I'm very happy to be allowed to use these data files for my personal use, but if the files would have been given a more database-friendly format, that would have made me even happier.