Help get this topic noticed by sharing it on Twitter, Facebook, or email.
I’m looking for better documentation

What lets you define separate records?

I'm wondering if there's something like tagging pieces, but for defining each separate record, the grouping that should become a new row in the final database structure.

In the documentation there's an example of tagging different pieces.

Sometimes you want to tag one chunk of data that is broken up on the page in multiple places. For example, a CSV file may have separate first name, middle name, and last name columns, where your data model has only a single name type. Needle lets you collect these distributed data components into a single value by tagging the component values in pieces.


It gives an example of defining something in pieces, but the image also shows how each record is identified, is this something that can be done for more arbitrary datasets that are not as structured as an uploaded CSV?

I'm trying to define each community board on this page as a separate piece since Needlebase doesn't automatically detect it:

http://www.nyc.gov/html/cau/html/cb/c...
2 people have
this question
+1
Reply
  • It looks like the way NB detects what constitutes a distinct record is determined based on not just the data model, but also on the first thing that is tagged on the page, I think it says "what comes first" or something like that. It can be easy for this to mismatch what you intend for your data model if you're not careful.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • It's mostly the data model, actually. Your example had a couple things that needed to be changed. First, Needle is much happier when the navigation is uniform, so it's hitting the same kinds of pages at each level. See the "tagging example" I added to your domain, in which I tagged just the 5 borough links at the top level, and then the five boroughs' board data at the second level.

    Second, you had checked a bunch of extra stuff in your data model. The connections in the data model are just the *direct* relationships between things, which in your case is a bunch of non-multiple relationships from Community Board to everything else, and a link back to Community Board in each of the other types. I fixed that in your model, re-collected, and the data looks pretty good now.

    All those second-order relationships, like connecting precinct to boroughs or connecting Chairs to Cabinent Meetings, can be done in the explorer by building fields, but aren't part of the data model itself. Take a look at what I did, and see if it makes sense.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • Thanks for the clarifications. I definitely realized my mistake with tagging the navigation when I had to retrain it on tagging each community board set again on the second page.

    Thanks for cleaning up the data model too. I accidentally deleted your tagging example because I didn't realize you had created it, I thought it was just an unused source I had created.

    I think I have things almost the way I want them, everything looks great when I do "Check Data" on each individual page, but when I go to collect all of them, something gets messed up. I'm only supposed to have 59 records, but it seems to mix them up and produce far more than that. I'm not sure what's wrong with the model or something else that's causing this. I would expect that the "Check Data" view would be a preview of the final result, but clearly that's not the case. If I could just do a CSV export of the "Check Data" view for each page I would have the data structured the way I want.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • You have to remove the data tags from the start page, and do Brooklyn in the second level with the other 4. Otherwise Needle thinks the 4 details pages are providing more data for Brooklyn, which is why things looked weird.

    I did this for you. For some reason this site also blocks scrapers from the Queens page, so I switched this source to "tagged data only" in source settings. Now you have all 59.

    I see that you went back into your model and made a bunch of links back to Community Board single, instead of multiple like I fixed them for you. This really isn't necessary, but it's your domain, so I'll leave them alone!
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • I have problem with the boundary of separate records also:
    Original: http://www.anusedcar.com/index.php/tu...
    Needlebase: https://my.needlebase.com/actions/vis...
    Fault Rates are put into the next record
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • Yours is a different case. You had "vintage" as a property of "car", when it's logically a property of your "car aged" blank-type. This was throwing off Needle's attempt to figure out the structure. I moved this into "car aged", and started a recollection for you. This should be better...
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • It works correctly this way :) So I can delete "car" type and connect directly "car aged" with "model". Thank you!
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • Yes, I was just coming back to tell you exactly that. In fact, I just did it for you, and retagged a new version of your source and set it to collect. Should be much simpler this way.

    Let me know if you have any questions about views and analyses and such. There are tons of things we could do, but I don't want to steal your fun...
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited

  • We got really simple structure ...
    I am still exploring the potential of the system and I'll share the difficulties encountered. It is not clear for me how "car age" name was produced (e.g. "the Alfa Romeo 145 · 1993 in 2005"). Or is it possible to produce a grid where the cells contain average failure rates by "report year" and "age of vehicle". Any ideas for possible analyses will be of help.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly indifferent, undecided, unconcerned sad, anxious, confused, frustrated happy, confident, thankful, excited