Issue uploading BFS data from a path (public cloud link)
Please check why the following data upload issues happen when using the transfer sdmxFile
method with the filepath
parameter:
- 1 GB file https://webpub-a.bfs.admin.ch/dotstat/test_dataset_1_go.csv: Upload is ok
- 2 GB file https://webpub-a.bfs.admin.ch/dotstat/test_dataset_2_go.csv: HTTP 502 Network Error
- 5 GB file https://webpub-a.bfs.admin.ch/dotstat/test_dataset_4_7_go.csv: HTTP 400 Bad request ("Unable to connect to the provided SDMX service at URL: https://webpub-a.bfs.admin.ch/dotstat/test_dataset_4_7_go.csv. Please check that the provided URL is correct and that the service is online and accessible. Then re-submit the request.")
Note: all 3 links are fully publicly accessible links to data files. This exact behaviour could be reproduced when trying to load the files in qa:reset.
previous description
Use Case STATPOP
As a Statistician Modeler
when I upload a large dataset using the SDMX Specification,
I want to have it rendered in the .Stat Suite (Explorer) and visible in the Data Lifecycle Manager.
Specification of a large statistics:
- the dataset spans over 20 years.
- the underlying metadata contains 7 dimensions
- one dimension is the reference area, which uses the codelists of 2700 elements
STATPOP Dataset
For your information, our STATPOP dataset entails the statistic for permanent and non-permanent resident population.
The DSD for STATPOP contains the following dimensions:
- RESIDENCY_1Y : Residency the year 1y prior to survey
- NATIONALITY : Nationality (Swiss, European, EFTA or others)
- BEV_TYP : Population Type (permanent population or non permanent population)
- AGE : Age class
- FREQ : Frequency
- GESCHLECHT : Sex
- REF_AREA : reference area
- MARITAL_STATUS : Marital status
- TIME_PERIOD : Time period
Purpose :
A data journalist should be able to query the data available in .Stat.
The data journalist should be able to visualize the evolution of the permanent and non permanent population over the dimensions specified in the DSD (see above for dimensions) and the time dimension which is TIME_PERIOD.
Size considerations :
In our current system (PxWeb) the cubes weigh 53 files for 3,37GB. We hope to merge it in a unique dataflow in DotStat.
In our target system (which is .Stat) we would have 14209 files a 4.7 mio observations each : with our modelisation represent 4To of data.
Files access :
Be careful you need to go to this folder : httpdocs/dotstat/statpop :
- folder sdmx : structure and sample of data (10 of the 14209 files)
- folder pxweb : 36 files of 53 PxWeb cubes.
FTP access :
- Technology : use sftp (not ftp)
- Hostname : webpub-a.bfs.admin.ch
- User : webpub_a_hosting
- Password : only by email
- Path : httpdocs/dotstat/statpop