How do you join files on PK/FK?
I am trying to join 2 CSV datasets from FoodMart namely sales_fact_1997 and promotion - I uploaded both files, defined cols etc..Now I want to join on promotion_id. I'll be damned if I can get it to work. It tells me both files are joinable but in the popup dialog the right had side is always grayed out so I can't complete the join. What am I doing wrong?
Adding a screenshot here:

Adding a screenshot here:

1
person has this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
The company marked this question as answered.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?First, when you upload a data set with a primary key (e.g. promotion) do not mark any column countable. If your data set contains any natural facts that don't make a sense as non-countable, just check "don't import".
When you are prompted to select a 'connection point' for this data set select the attribute you consider to be your primary key.
Now you should be able to join the sales_fact_1997 data set with promotion using the promotion_id attribute.
If you want to add the promotion facts you have not imported just create another data set from the same file (let's name this data set promotion_fact). Just don't import any fields except the fact fields and the field you have already selected as a connection point
Then you can use the same join dialog to join promotion_fact with the promotion data set.
-
Inappropriate?Ok sorry but this time I got the attached error. Here's the DDL for both files:
CREATE TABLE promotion(
promotion_id int ,
promotion_district_id int ,
promotion_name char varying(255) ,
media_type char varying(255) ,
cost decimal(12,2) ,
start_date date ,
end_date date
) from "c:\projects\xsprada\mondrian\promotion.csv"
CREATE TABLE sales_fact_1997(
product_id int ,
time_id int ,
customer_id int ,
promotion_id int ,
store_id int ,
store_sales decimal(12,2) ,
store_cost decimal(12,2) ,
unit_sales decimal(12,2)
) from "c:\projects\xsprada\mondrian\sales_fact_1997.csv"
(don't worry about the FROM part, not relevant) - So my assumption was that I could upload both files and then tell GD ok, now join them on promotion_id. Actually, I thought I could upload multiple dimension files and then tell it join store, promotions and product on corresponding keys. I am still not clear on how to do this and I guess I dont really understand your suggestion then I'm sorry but I am somewhat confused.
-
Inappropriate?Mind you I am watching
http://www.gooddata.com/good-data-tv/...
so hopefully can figure it out from that.
Thanks. -
Inappropriate?Hi Jerome,
From following this thread, I think I see two potential pitfalls which I hope to address here to help you succeed.
Joining data sets in Good Data requires a Source data set and a Target data set. The Target data set, which we call a lookup file or lookup table, should be a data set that contains no facts (no countable numbers). In other words, when uploading a lookup table, all columns should be marked as "Uncountable".
If all columns are marked as uncountable, then at the bottom of the upload screen, the application will ask you to define a "Connection Point." A connection point is the attribute that future data sets will connect with. A connection point should contain all possible values of the connection attribute.
Now in regards to the Source data set, this is what we call a fact table (data set with countable and uncountable columns). The Source data set should contain the connection point attribute contained in the target. The values for the connection point attribute in this Source data set should be a subset of the complete list of values located in the lookup table.
When mapping, it is important to map from the fact table to the lookup table (Source > Target). If the connection point is set during the Source file upload, then this option will be available when mapping. This is the only way to activate options in the right column of the mapping dialog (ie, the only choice available is the active connection point).
A more detailed explanation can be found in the joining data sets tutorial located here:
https://secure.gooddata.com/docs/html...
The key is that all lookup tables can only contain one connection point. However, fact tables can be mapped to multiple lookup tables provided that they contain the appropriate connection attributes.
To explain the second screenshot you sent, it seems as though the attribute used in the Source is part of an already mapped hierarchy. Hierarchies can include Dates or Geographical hierarchies. For example, if a date has already been mapped to the date dimension, then attributes in that dimension (such as Year, Month, Day of Week, etc.), cannot be mapped to something else.
I hope this explains the issue. Please feel free to let me know if you continue having problems with this.
Regards,
Ray
I’m thankful
-
Inappropriate?Thanks Ray,
So you're saying (taking my example again) I have sales_facts, products, and stores. So I should upload say products (no facts), make the connection using product_id, then I should upload stores, make the connection using store_id, and THEN only upload sales_facts. Once all the sets are uploaded THEN I should be able to connect sales_facts to products and stores using those 2 columns? -
Inappropriate?Ok got it. It works just nice now. It would be really nice if I could mark all columns as don't import with one click for the purpose of loading dimension tables (also numbering the cols would be cool as in 1 of N, 2 of N, 3 of N etc..).
Now if I want to upload an incremental OR if I want to upload the same table (but this time I want to include a col i previously did NOT import) do I have to go thru the whole process again?
Thanks man. This is cool. -
Inappropriate?I was just writing a response when I saw that you were able to get it working. I'm glad it worked for you!
Your question can be broken into two parts:
1) If you upload an incremental upload, then you don't have to do the mapping again. The application knows that some attributes are mapped and will automatically perform the mapping to the new data (provided that the data CAN be mapped--ie, that there is a corresponding value in the lookup table).
2) Unfortunately, the present state of the application doesn't let you easily "add columns" to a previous data set. In this case, the data would need to be loaded as a new data set and the mapping would have to be re-done.
This is a limitation we are aware of, and are working towards a more fluid data model that allows adding more columns as needed. At this point, I am not sure where this feature falls on our roadmap.
I'm glad you were able to get something working, and feel free to contact us with any other questions.
I’m happy
-
Inappropriate?Thanks Ray. I take it you're not in Prague where it's almost 3am right? :)
Loading Profile...



EMPLOYEE
EMPLOYEE
EMPLOYEE