Commit 7f1b0240 authored by Halley Pacheco de Oliveira's avatar Halley Pacheco de Oliveira 😊

Load Data

parent ff7d3570
......@@ -5,94 +5,35 @@ weight: 12
draft: false
---
### Read meta information from image files
### Read meta information in image files
The processing should generate a tabular output suitable to be loaded into a database table.
The meta information read should generate a tabular output suitable to be loaded into a database table.
1. Script using ExifTool to read meta information from JPEG files stored in the current directory and its subdirectories, with the output directed to *stdout*.
1. [getimgmeta.sh](https://gitlab.com/HalleyOliv/myimgcoll/blob/master/source/shell/getimgmeta.sh)
```
#!/bin/bash
# File: getjpegmeta.sh
# Description: Read meta information from JPEG files
# Author: Halley Pacheco de Oliveira
# Date: 2019-02-01
IFS=$'\n'
for f in $(find . -name '*.jpg' -or -name '*.JPG'); do
exiftool \
-FileName \
-CreateDate \
-ModifyDate \
-MIMEType \
-Quality \
-DerivedFrom \
-Title \
-Description \
-Subject \
-Creator \
-Copyright \
-Software \
-ExifImageWidth \
-ExifImageHeight \
-Orientation \
-XResolution \
-YResolution \
-ResolutionUnit \
-ColorSpace \
-ISO \
-MeteringMode \
-ExposureTime \
-FNumber# \
-FocalLength# \
-Flash \
-Make \
-Model \
-SerialNumber \
-LensModel \
-LensType \
-LensSerialNumber \
-GPSLatitude \
-GPSLatitudeRef \
-GPSLongitude \
-GPSLongitudeRef \
-GPSAltitude# \
-GPSAltitudeRef \
-GPSDateStamp \
-GPSTimeStamp \
-GPSSatellites \
-GPSStatus \
-GPSMeasureMode \
-GPSMapDatum \
-Country \
-State \
-City \
-Location \
-Person \
-Event \
-XMP:Category \
-Notes \
-Volume \
-Directory \
-T \
-d "%Y-%m-%d %H:%M:%S" \
-c "%+.6f" \
"$f"
done
```
This script uses the _ExifTool_ program to read the meta information in JPEG files stored in the current directory and its subdirectories. The output is directed to *stdout*.
This script produces an output in tabular format (-T), with date/time values in "%Y-%m-%d %H:%M:%S" format (-d) and GPS coordinates in "%+.6f" format (-c).
It produces an output in tabular format (option _-table_), with date/time values in "YYYY-MM-DD H:M:S" format (option _-dateFormat_) and six-decimal format GPS coordinates (option _-coordFormat_).
2. To later load the output produced by the script into the database, it is necessary to redirect the output from *stdout* to a file. This is done using the redirection symbol '>':
2. To create a file using the output produced by the script, it is necessary to redirect it's output. This is done using the redirection symbol '>'. For example:
```
$ getimgmeta.sh > SeagateBackupPlusDrive.tsv
```
3. Before loading the file into the database, the invalid characters (Latin1, or another code page) must be removed. This is done using the *iconv* program:
3. Invalid characters (Latin1, or another code page) must be removed from the file. To do this, the _iconv_ utility was used. For example:
```
$ iconv -c --from-code=utf8 --to-code=utf8 --output=SeagateBackupPlusDriveUTF8.tsv SeagateBackupPlusDrive.tsv
$ ls -lt *.tsv
-rw-r--r-- 1 halley halley 60456660 fev 16 05:51 SeagateBackupPlusDriveUTF8.tsv (size two bytes smaller)
-rw-rw-r-- 1 halley halley 60456662 fev 16 04:42 SeagateBackupPlusDrive.tsv
```
```
4. Additional maintenance
Some additional maintenance must be done on the data before loading them into the database, especially for cases where the _ExifTool_ program did not put the hyphen to the values that did not exist. To do this, the _sed_ utility was used. For example:
```
sed 's/jpg\t\t\t-\timage/jpg\t-\t-\t-\timage/g' <SeagateBackupPlusDriveUTF8.tsv >SeagateBackupPlusDriveUTF8sed.tsv
```
---
title: "Manipulation"
title: "Load"
date: 2019-02-06T18:32:35-02:00
weight: 22
draft: false
---
### Data manipulaton statements (load, insert, delete, update, …)
### Load the data into the database and perform later adjustments
#### a) Upload data
#### a) Upload
The images are stored on two external disks, _Seagate Backup Plus Drive_ and _Seagate Expansion Drive_, whose data are loaded using two distinct SQL commands:
......@@ -37,7 +37,7 @@ The images are stored on two external disks, _Seagate Backup Plus Drive_ and _Se
Records: 88667 Deleted: 10399 Skipped: 0 Warnings: 1
```
#### b) Uploaded data maintenance
#### b) Adjustments
1. Remove images without creation date or modification date
......@@ -108,13 +108,4 @@ The images are stored on two external disks, _Seagate Backup Plus Drive_ and _Se
| 2012-08-01_16-25-13_610a.jpg | 19:11:05 |
+------------------------------+--------------+
1 row in set (0.001 sec)
```
3. Additional maintenance
Other additional maintenance was done on the data, especially for cases where the ExifTool program did not put the hyphen in the nonexistent values. For these cases the _sed_ utility was used, as shown in the example below:
```
sed 's/jpg\t\t\t-\timage/jpg\t-\t-\t-\timage/g' <SeagateBackupPlusDriveUTF8.tsv >SeagateBackupPlusDriveUTF8sed.tsv
```
......@@ -58,8 +58,8 @@ for f in $(find . -name '*.jpg' -or -name '*.JPG'); do
-GPSMapDatum \
-Volume \
-Directory \
-T \
-d "%Y-%m-%d %H:%M:%S" \
-c "%+.6f" \
-table \
-dateFormat "%Y-%m-%d %H:%M:%S" \
-coordFormat "%+.6f" \
"$f"
done
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment