original file vs duplicate file: identifying the right one
When dupeguru identifies the primary file it often uses an obvious copy not the original.
i.e. "Hells bells 1 (AC/DC)" is picked over "Hells Bells (AC/DC)"
Is there a way to override this?
i.e. "Hells bells 1 (AC/DC)" is picked over "Hells Bells (AC/DC)"
Is there a way to override this?
3
people have this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?First, I'd like to link to a similar question. It doesn't address this specific issue, but it does address the broader issue of dealing with file priorities.
dupeGuru will put the biggest file in reference position during a scan. If 2 files or more have the same size, the file that will get the reference position is basically random.
In this case, putting the file with the shortest filename at the reference position would do the trick, but it is not always the case. Sometimes, you have files like "Track 2" files, and you would prefer the "Built To Spill - Traces" file over the unnamed file if both have the same size.
I'd like to address this issue in a future version, but I'd like to do it smart. I'd appreciate some feedback over what rules dupeGuru should follow when doing filename based prioritization. So far, we would have this rule:
If the filename has the same name as another dupe in the group, but with a number appended to it, reduce its priority.
Any other rules? -
Inappropriate?You mention, "If the filename has the same name as another dupe in the group, but with a number appended to it, reduce its priority." However, I keep multiple versions of songs (live vs. studio, of course, but also sometimes multiple live or studio versions if they have notable differences). The result is that I have something like "Song [live] v3" and so on.
If their quality is the same or nearly so, the more deeply nested items should be given greater precedence. The logic behind it is that files buried deeply in subdirectories have a greater relative weight of "intelligence" attached to them through user organization/categorization.
Some MP3s have a few extra seconds of silence while others have perhaps an artist or album image embedded in them, but when the primary consideration is sound quality, a few bytes more or less have little bearing on the choice of reference. Bit rate and duration are much more important, and when they match then levels of nesting in the directory tree could play a part.
As an example, I keep new unsorted files just a level or two below the root, while audio collections go fairly deeply. This is not something that is easily sorted in a columnar fashion since it's an alphabetic sort method, and sorting on that alone would invalidate more important file attributes.
I’m here
-
Inappropriate?Sounds like we're going in a path where a preference is needed for the user to decide how priority's determined. Mp3 Filter used to have this preference, and I would prefer to avoid it. Almost nobody uses it or understands what it's for, and it just clutters the preference pane. There must be a way to add weighting to each file attributes so that the right file is generally chosen for reference position.
Once such a weighting system is implemented, it should be possible to tweak empirically so it ends up doing a good job in most situations.
Yeah, I think this one is coming in the next major version...
1 person says
this answers the question
-
Inappropriate?Thanks for looking into it. :)
-
Inappropriate?Quick follow-up to the new priority handling: most of the time, deeper dirs are given greater weight now. However, when two files are in the same dir, prioritizing seems to be random. I would suggest that like sorting into directories, perhaps longer filenames should be given greater weight since they have more (hopefully useful) information about the file.
Here's an example of a subtle change that permitted dupes to exist. The slightly longer filename's easier to read. (Sorry I don't have a better example right now.) In this case, the files were found by DupeGuru (not ME):
-
A little out of topic. Are these duplicate found by dupeGuru that the ME version could not find?
As for the topic itself: sure, I can add this, this new priority system is made to be tweaked. I'll add this to my todo list. -
Inappropriate?How could it possibly be off-topic? The topic's about DG and file priorities. Please, feel free to move a comment if you deem it necessary. I did my best to find the most relevant topic, but it's not necessary to criticize the post.
Meanwhile, I'm not sure I understand the question. I wasn't using ME -- I only mentioned it because it came up in the discussion further up the page.
I’m feeling a bit heckled...
-
I meant "what follows is a little off topic". I was referring to my own comment. -
Inappropriate?Yow, sorry -- that's one of the drawbacks of text communication, people lose the little details sometimes that help understanding. Thanks for the reply.
Now I understand what you meant about ME, too -- I didn't try ME in the example because I knew there would be variations in content and I just wanted to compare names. -
Inappropriate?Has there been any movement on this?
I've combined a couple iTunes directories so I'm trying to clear out the duplicates. My reference file priorities would be...
1. Bit rate
2. File type (order of preference here: mp3, m4a, aif, m4p)
3. File size
4. Name or just randomly decide at this point.
The other thing that would be nice is the ability to specify that for files to be considered duplicates, the duration must be within x seconds of each other. -
Inappropriate?The "within X seconds" thing can already be done post-scan (it's in the F.A.Q)
As for file types, it's a tricky issue for which allowing the user to set custom priorities is not necessarily the answer because it would be too "simple" to be adequate prioritizing rules. For example, aiff files have a much higher bitrate than typical file, so its position in the file type list is pointless (a aiff file being matched with another will always get the ref position). If I tackled the issue, I think I'd make a list of perceived quality vs filesize of each file type and have the ref position decided by this ratio.
But then of course, there's always the possibility that this ratio wouldn't fit for someone (someone who wants to keep his lossless files for example). I think the best way to deal with this is for the user to do it manually. Sort the results by "Kind" and take a quick glance at the results.
The nice thing about sorting results by kind is that file types are grouped together. So when you're in the "m4a" part of your results, you can just quickly check for a "mp3" that stands out. Then it's only a matter of selecting it and making it the reference.
So, in short, I don't think that working on complex priority rules is worth the effort, as it is doomed to always be inadequate for some user somewhere. Post-scan re-prioritizing is the answer.
Loading Profile...



EMPLOYEE