Help get this topic noticed by sharing it on Twitter, Facebook, or email.
I’m sad

original file vs duplicate file: identifying the right one

When dupeguru identifies the primary file it often uses an obvious copy not the original.
i.e. "Hells bells 1 (AC/DC)" is picked over "Hells Bells (AC/DC)"
Is there a way to override this?
3 people have
this question
+1
Reply
  • First, I'd like to link to a similar question. It doesn't address this specific issue, but it does address the broader issue of dealing with file priorities.

    dupeGuru will put the biggest file in reference position during a scan. If 2 files or more have the same size, the file that will get the reference position is basically random.

    In this case, putting the file with the shortest filename at the reference position would do the trick, but it is not always the case. Sometimes, you have files like "Track 2" files, and you would prefer the "Built To Spill - Traces" file over the unnamed file if both have the same size.

    I'd like to address this issue in a future version, but I'd like to do it smart. I'd appreciate some feedback over what rules dupeGuru should follow when doing filename based prioritization. So far, we would have this rule:

    If the filename has the same name as another dupe in the group, but with a number appended to it, reduce its priority.

    Any other rules?
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • I’m here
    You mention, "If the filename has the same name as another dupe in the group, but with a number appended to it, reduce its priority." However, I keep multiple versions of songs (live vs. studio, of course, but also sometimes multiple live or studio versions if they have notable differences). The result is that I have something like "Song [live] v3" and so on.

    If their quality is the same or nearly so, the more deeply nested items should be given greater precedence. The logic behind it is that files buried deeply in subdirectories have a greater relative weight of "intelligence" attached to them through user organization/categorization.

    Some MP3s have a few extra seconds of silence while others have perhaps an artist or album image embedded in them, but when the primary consideration is sound quality, a few bytes more or less have little bearing on the choice of reference. Bit rate and duration are much more important, and when they match then levels of nesting in the directory tree could play a part.

    As an example, I keep new unsorted files just a level or two below the root, while audio collections go fairly deeply. This is not something that is easily sorted in a columnar fashion since it's an alphabetic sort method, and sorting on that alone would invalidate more important file attributes.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • Sounds like we're going in a path where a preference is needed for the user to decide how priority's determined. Mp3 Filter used to have this preference, and I would prefer to avoid it. Almost nobody uses it or understands what it's for, and it just clutters the preference pane. There must be a way to add weighting to each file attributes so that the right file is generally chosen for reference position.

    Once such a weighting system is implemented, it should be possible to tweak empirically so it ends up doing a good job in most situations.

    Yeah, I think this one is coming in the next major version...
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • Quick follow-up to the new priority handling: most of the time, deeper dirs are given greater weight now. However, when two files are in the same dir, prioritizing seems to be random. I would suggest that like sorting into directories, perhaps longer filenames should be given greater weight since they have more (hopefully useful) information about the file.

    Here's an example of a subtle change that permitted dupes to exist. The slightly longer filename's easier to read. (Sorry I don't have a better example right now.) In this case, the files were found by DupeGuru (not ME):

  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • I’m feeling a bit heckled...
    How could it possibly be off-topic? The topic's about DG and file priorities. Please, feel free to move a comment if you deem it necessary. I did my best to find the most relevant topic, but it's not necessary to criticize the post.

    Meanwhile, I'm not sure I understand the question. I wasn't using ME -- I only mentioned it because it came up in the discussion further up the page.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • Yow, sorry -- that's one of the drawbacks of text communication, people lose the little details sometimes that help understanding. Thanks for the reply.

    Now I understand what you meant about ME, too -- I didn't try ME in the example because I knew there would be variations in content and I just wanted to compare names.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • Has there been any movement on this?

    I've combined a couple iTunes directories so I'm trying to clear out the duplicates. My reference file priorities would be...

    1. Bit rate
    2. File type (order of preference here: mp3, m4a, aif, m4p)
    3. File size
    4. Name or just randomly decide at this point.

    The other thing that would be nice is the ability to specify that for files to be considered duplicates, the duration must be within x seconds of each other.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • The "within X seconds" thing can already be done post-scan (it's in the F.A.Q)

    As for file types, it's a tricky issue for which allowing the user to set custom priorities is not necessarily the answer because it would be too "simple" to be adequate prioritizing rules. For example, aiff files have a much higher bitrate than typical file, so its position in the file type list is pointless (a aiff file being matched with another will always get the ref position). If I tackled the issue, I think I'd make a list of perceived quality vs filesize of each file type and have the ref position decided by this ratio.

    But then of course, there's always the possibility that this ratio wouldn't fit for someone (someone who wants to keep his lossless files for example). I think the best way to deal with this is for the user to do it manually. Sort the results by "Kind" and take a quick glance at the results.

    The nice thing about sorting results by kind is that file types are grouped together. So when you're in the "m4a" part of your results, you can just quickly check for a "mp3" that stands out. Then it's only a matter of selecting it and making it the reference.

    So, in short, I don't think that working on complex priority rules is worth the effort, as it is doomed to always be inadequate for some user somewhere. Post-scan re-prioritizing is the answer.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • I’m happy
    Dunno if this is off topic, but dgpe chooses the reference in ways I don't find optimal as well, but the solution I'd have preferred, and would have found most intuitive, would be drag and drop re-referencing. If you decide to do drag and drop re-ordering, I'd suggest making it like finder, click on the name = drag, click on the empty part of the column = multiple select. also, so it's easy to target, I'd say that anything dropped on the reference or higher on the y axis than the reference becomes the reference, but anything dropped lower on the y axis than the reference does nothing. also if I knew how to code, I'd make it highlight the reference in red when it would be demoted by a drop. If any of that didn't make sense, feel free to e-mail me at segers (att) wnrmagic (dawt) com put dupeguru in the subject though.
    --Jerry Segers, Jr.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • Just to make sure, Jerry: You are aware that it's possible to manually replace the reference by selecting a dupe and pressing Cmd-Up, right? So you're saying that you'd find drag & drop more convenient than that (in that case, I'll open a ticket)?
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • I’m happy
    Yes, That's just what I tried before reading the help here on get satisfaction. And to be honest, if both worked, I'd use drag and drop.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated

  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. indifferent, undecided, unconcerned happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated