Discussion:
[bareos-users] how to know more about an error that causes a job to end with warning
Alejandro Hamann
2018-07-03 11:55:31 UTC
Permalink
Hello BareOS community.

I have backup jobs that end with status OK with warning.

The result of messages command of bconsole showme that that I have a couple of SD Errors. But I want to know more specific information about those SD Errors. I've see in /var/log/bareos/ and there's nothing there (apart from bareos-audit.log.* and bareos.log.* ).

When running "dmesg -T" and cross dates and times of BareOS problematic jobs, I can get some information that might be useful.
But in reality, I would like to know if there is anything in BareOS that gives me more information than just a problem counter.

Regards!

--
Pablo Alejandro Hamann
--
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+***@googlegroups.com.
To post to this group, send email to bareos-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Alejandro Hamann
2018-07-03 12:09:38 UTC
Permalink
Post by Alejandro Hamann
Hello BareOS community.
I have backup jobs that end with status OK with warning.
The result of messages command of bconsole showme that that I have a couple of SD Errors. But I want to know more specific information about those SD Errors. I've see in /var/log/bareos/ and there's nothing there (apart from bareos-audit.log.* and bareos.log.* ).
When running "dmesg -T" and cross dates and times of BareOS problematic jobs, I can get some information that might be useful.
But in reality, I would like to know if there is anything in BareOS that gives me more information than just a problem counter.
Regards!
--
Pablo Alejandro Hamann
Below I post the maximum collection of error messages I can get at the moment (I would not know how to get more about it either).
But I can assume that I have an SD error for each Tape written. Is this normal? (maybe because the tape cartridge is new or something similar, or because when the destination is an Filestorage Device, that does not happen).


##################################################
# cat /var/log/bareos/bareos.log
##################################################
01-Jul 04:39 bareos-dir JobId 177: Start Backup JobId 177, Job=BackupDICOM-1.2018-07-01_04.39.04_12
01-Jul 04:39 bareos-sd JobId 177: 3307 Issuing autochanger "unload slot 2, drive 0" command.
01-Jul 04:39 bareos-dir JobId 177: Using Volume "HU0003L5" from 'Scratch' pool.
01-Jul 04:39 bareos-dir JobId 177: Using Device "ULT3580-HH5" to write.
01-Jul 04:39 bareos-sd JobId 177: 3304 Issuing autochanger "load slot 3, drive 0" command.
01-Jul 04:39 bareos-sd JobId 177: 3305 Autochanger "load slot 3, drive 0", status is OK.
01-Jul 04:40 bareos-sd JobId 177: Error: generic_tape_device.c:142 Unable to open device "ULT3580-HH5" (/dev/nst0): ERR=Input/output error
01-Jul 04:40 bareos-sd JobId 177: Wrote label to prelabeled Volume "HU0003L5" on device "ULT3580-HH5" (/dev/nst0)
01-Jul 07:38 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
01-Jul 10:31 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
01-Jul 17:55 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
01-Jul 21:34 bareos-sd JobId 177: End of Volume "HU0003L5" at 56:82277 on device "ULT3580-HH5" (/dev/nst0). Write of 524288 bytes got -1.
01-Jul 21:34 bareos-sd JobId 177: Re-read of last block succeeded.
01-Jul 21:34 bareos-sd JobId 177: End of medium on Volume "HU0003L5" Bytes=3,049,584,655,360 Blocks=5,816,621 at 01-Jul-2018 21:34.
01-Jul 21:34 bareos-sd JobId 177: 3307 Issuing autochanger "unload slot 3, drive 0" command.
01-Jul 21:35 bareos-dir JobId 177: Using Volume "HU0004L5" from 'Scratch' pool.
01-Jul 21:35 bareos-sd JobId 177: 3304 Issuing autochanger "load slot 4, drive 0" command.
01-Jul 21:36 bareos-sd JobId 177: 3305 Autochanger "load slot 4, drive 0", status is OK.
01-Jul 21:36 bareos-sd JobId 177: Error: generic_tape_device.c:142 Unable to open device "ULT3580-HH5" (/dev/nst0): ERR=Input/output error
01-Jul 21:36 bareos-sd JobId 177: Wrote label to prelabeled Volume "HU0004L5" on device "ULT3580-HH5" (/dev/nst0)
01-Jul 21:36 bareos-sd JobId 177: New volume "HU0004L5" mounted on device "ULT3580-HH5" (/dev/nst0) at 01-Jul-2018 21:36.
02-Jul 01:58 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
02-Jul 06:03 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
02-Jul 08:47 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
01-Jul 04:39 hu-fs1-fd JobId 177: Created 40 wildcard excludes from FilesNotToBackup Registry key
02-Jul 10:47 bareos-sd JobId 177: Elapsed time=30:06:02, Transfer rate=48.53 M Bytes/second
02-Jul 10:48 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
02-Jul 10:48 bareos-dir JobId 177: Bareos bareos-dir 17.2.4 (21Sep17):
Build OS: x86_64-pc-linux-gnu debian Debian GNU/Linux 9.3 (stretch)
JobId: 177
Job: BackupDICOM-1.2018-07-01_04.39.04_12
Backup Level: Full
Client: "hu-fs1-fd" 17.2.4 (21Sep17) Microsoft Windows Server 2012 Datacenter Edition (build 9200), 64-bit,Cross-compile,Win64
FileSet: "BackupDICOM-1" 2018-06-22 21:47:31
Pool: "Jefatura-DXI" (From command line)
Catalog: "MyCatalog" (From Client resource)
Storage: "Tape" (From Pool resource)
Scheduled time: 01-Jul-2018 04:39:04
Start time: 01-Jul-2018 04:39:28
End time: 02-Jul-2018 10:48:05
Elapsed time: 1 day 6 hours 8 mins 37 secs
Priority: 5
FD Files Written: 5,335,031
SD Files Written: 5,335,031
FD Bytes Written: 5,258,557,494,329 (5.258 TB)
SD Bytes Written: 5,259,384,649,076 (5.259 TB)
Rate: 48458.4 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: no
Volume name(s): HU0003L5|HU0004L5
Volume Session Id: 7
Volume Session Time: 1530213283
Last Volume Bytes: 2,211,225,010,176 (2.211 TB)
Non-fatal FD errors: 0
SD Errors: 2
FD termination status: OK
SD termination status: OK
Termination: Backup OK -- with warnings



##################################################
# dmesg -T
##################################################
[Sun Jul 1 04:40:02 2018] st 0:0:0:0: [st0] Failed to read 64512 byte block with 80 byte transfer.
[Sun Jul 1 21:36:09 2018] st 0:0:0:0: [st0] Failed to read 64512 byte block with 80 byte transfer.
[Tue Jul 3 07:50:02 2018] st 0:0:0:0: [st0] Failed to read 64512 byte block with 80 byte transfer.



##################################################
# cat /etc/bareos/bareos-sd.d/device/ULT3580-HH5.conf
##################################################
Device {
Name = ULT3580-HH5
Description = "IBM TS3100 Tape Library"
Media Type = LTO-5
Archive Device = /dev/nst0
Device Type = tape
Check Labels = yes
Autochanger = yes
Automatic Mount = yes; # when device opened, read it
Always Open = yes
Label Media = no; # con atochangers, esto se hace manualmente una vez con "label barcodes" y luego con "update slots"
Random Access = no;
RemovableMedia = yes;
# Options for Speed Tuning of Tape Drives, setting block size to 1M (1048576) and max file size to bigger value:
#Minimum Block Size = 1048576
Maximum Block Size = 524288
Maximum File Size = 50g
}
--
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+***@googlegroups.com.
To post to this group, send email to bareos-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jörg Steffens
2018-07-04 15:54:33 UTC
Permalink
Well, Bareos treats non-fatal SD errors as Job warnings. So all jobs "OK
-- with warnings" have be checked.

If you use bconsole instead of bareos-webui, you get the joblog with
"list joblog jobid=177"

From what you have already found out, you see that there are problems
with the volume block sizes.
Have theses volumes been unlabeled or labeled with another blocksize
before? It seams, that the volumes are labeled with a 80 bytes block,
instead of a 64512 bytes block used by Bareos as tape label.

See
http://doc.bareos.org/master/html/bareos-manual-main-reference.html#TapespeedAndBlocksizes,
section "How can I find out which block size was used when the tape was
written?"

regards,
Jörg
Post by Alejandro Hamann
Post by Alejandro Hamann
Hello BareOS community.
I have backup jobs that end with status OK with warning.
The result of messages command of bconsole showme that that I have a couple of SD Errors. But I want to know more specific information about those SD Errors. I've see in /var/log/bareos/ and there's nothing there (apart from bareos-audit.log.* and bareos.log.* ).
When running "dmesg -T" and cross dates and times of BareOS problematic jobs, I can get some information that might be useful.
But in reality, I would like to know if there is anything in BareOS that gives me more information than just a problem counter.
Regards!
--
Pablo Alejandro Hamann
Below I post the maximum collection of error messages I can get at the moment (I would not know how to get more about it either).
But I can assume that I have an SD error for each Tape written. Is this normal? (maybe because the tape cartridge is new or something similar, or because when the destination is an Filestorage Device, that does not happen).
##################################################
# cat /var/log/bareos/bareos.log
##################################################
01-Jul 04:39 bareos-dir JobId 177: Start Backup JobId 177, Job=BackupDICOM-1.2018-07-01_04.39.04_12
01-Jul 04:39 bareos-sd JobId 177: 3307 Issuing autochanger "unload slot 2, drive 0" command.
01-Jul 04:39 bareos-dir JobId 177: Using Volume "HU0003L5" from 'Scratch' pool.
01-Jul 04:39 bareos-dir JobId 177: Using Device "ULT3580-HH5" to write.
01-Jul 04:39 bareos-sd JobId 177: 3304 Issuing autochanger "load slot 3, drive 0" command.
01-Jul 04:39 bareos-sd JobId 177: 3305 Autochanger "load slot 3, drive 0", status is OK.
01-Jul 04:40 bareos-sd JobId 177: Error: generic_tape_device.c:142 Unable to open device "ULT3580-HH5" (/dev/nst0): ERR=Input/output error
01-Jul 04:40 bareos-sd JobId 177: Wrote label to prelabeled Volume "HU0003L5" on device "ULT3580-HH5" (/dev/nst0)
01-Jul 07:38 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
01-Jul 10:31 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
01-Jul 17:55 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
01-Jul 21:34 bareos-sd JobId 177: End of Volume "HU0003L5" at 56:82277 on device "ULT3580-HH5" (/dev/nst0). Write of 524288 bytes got -1.
01-Jul 21:34 bareos-sd JobId 177: Re-read of last block succeeded.
01-Jul 21:34 bareos-sd JobId 177: End of medium on Volume "HU0003L5" Bytes=3,049,584,655,360 Blocks=5,816,621 at 01-Jul-2018 21:34.
01-Jul 21:34 bareos-sd JobId 177: 3307 Issuing autochanger "unload slot 3, drive 0" command.
01-Jul 21:35 bareos-dir JobId 177: Using Volume "HU0004L5" from 'Scratch' pool.
01-Jul 21:35 bareos-sd JobId 177: 3304 Issuing autochanger "load slot 4, drive 0" command.
01-Jul 21:36 bareos-sd JobId 177: 3305 Autochanger "load slot 4, drive 0", status is OK.
01-Jul 21:36 bareos-sd JobId 177: Error: generic_tape_device.c:142 Unable to open device "ULT3580-HH5" (/dev/nst0): ERR=Input/output error
01-Jul 21:36 bareos-sd JobId 177: Wrote label to prelabeled Volume "HU0004L5" on device "ULT3580-HH5" (/dev/nst0)
01-Jul 21:36 bareos-sd JobId 177: New volume "HU0004L5" mounted on device "ULT3580-HH5" (/dev/nst0) at 01-Jul-2018 21:36.
02-Jul 01:58 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
02-Jul 06:03 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
02-Jul 08:47 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
01-Jul 04:39 hu-fs1-fd JobId 177: Created 40 wildcard excludes from FilesNotToBackup Registry key
02-Jul 10:47 bareos-sd JobId 177: Elapsed time=30:06:02, Transfer rate=48.53 M Bytes/second
02-Jul 10:48 bareos-dir JobId 177: sql_create.c:872 Insert of attributes batch table done
Build OS: x86_64-pc-linux-gnu debian Debian GNU/Linux 9.3 (stretch)
JobId: 177
Job: BackupDICOM-1.2018-07-01_04.39.04_12
Backup Level: Full
Client: "hu-fs1-fd" 17.2.4 (21Sep17) Microsoft Windows Server 2012 Datacenter Edition (build 9200), 64-bit,Cross-compile,Win64
FileSet: "BackupDICOM-1" 2018-06-22 21:47:31
Pool: "Jefatura-DXI" (From command line)
Catalog: "MyCatalog" (From Client resource)
Storage: "Tape" (From Pool resource)
Scheduled time: 01-Jul-2018 04:39:04
Start time: 01-Jul-2018 04:39:28
End time: 02-Jul-2018 10:48:05
Elapsed time: 1 day 6 hours 8 mins 37 secs
Priority: 5
FD Files Written: 5,335,031
SD Files Written: 5,335,031
FD Bytes Written: 5,258,557,494,329 (5.258 TB)
SD Bytes Written: 5,259,384,649,076 (5.259 TB)
Rate: 48458.4 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: no
Volume name(s): HU0003L5|HU0004L5
Volume Session Id: 7
Volume Session Time: 1530213283
Last Volume Bytes: 2,211,225,010,176 (2.211 TB)
Non-fatal FD errors: 0
SD Errors: 2
FD termination status: OK
SD termination status: OK
Termination: Backup OK -- with warnings
##################################################
# dmesg -T
##################################################
[Sun Jul 1 04:40:02 2018] st 0:0:0:0: [st0] Failed to read 64512 byte block with 80 byte transfer.
[Sun Jul 1 21:36:09 2018] st 0:0:0:0: [st0] Failed to read 64512 byte block with 80 byte transfer.
[Tue Jul 3 07:50:02 2018] st 0:0:0:0: [st0] Failed to read 64512 byte block with 80 byte transfer.
##################################################
# cat /etc/bareos/bareos-sd.d/device/ULT3580-HH5.conf
##################################################
Device {
Name = ULT3580-HH5
Description = "IBM TS3100 Tape Library"
Media Type = LTO-5
Archive Device = /dev/nst0
Device Type = tape
Check Labels = yes
Autochanger = yes
Automatic Mount = yes; # when device opened, read it
Always Open = yes
Label Media = no; # con atochangers, esto se hace manualmente una vez con "label barcodes" y luego con "update slots"
Random Access = no;
RemovableMedia = yes;
#Minimum Block Size = 1048576
Maximum Block Size = 524288
Maximum File Size = 50g
}
--
Jörg Steffens ***@bareos.com
Bareos GmbH & Co. KG Phone: +49 221 630693-91
http://www.bareos.com Fax: +49 221 630693-10

Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
Komplementär: Bareos Verwaltungs-GmbH
Geschäftsführer:
S. Dühr, M. Außendorf, Jörg Steffens, P. Storz
--
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+***@googlegroups.com.
To post to this group, send email to bareos-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Alejandro Hamann
2018-10-10 21:45:26 UTC
Permalink
Post by Alejandro Hamann
Hello BareOS community.
I have backup jobs that end with status OK with warning.
The result of messages command of bconsole showme that that I have a couple of SD Errors. But I want to know more specific information about those SD Errors. I've see in /var/log/bareos/ and there's nothing there (apart from bareos-audit.log.* and bareos.log.* ).
When running "dmesg -T" and cross dates and times of BareOS problematic jobs, I can get some information that might be useful.
But in reality, I would like to know if there is anything in BareOS that gives me more information than just a problem counter.
Regards!
--
Pablo Alejandro Hamann
Hello Jörg Steffens.
Thank you for your response (and sorry for the delay in thanking, I'm just getting a little updated with our implementation of backup solution, after a long pause).
I read all the documentation of Bareos (very long and complete by the way), which always in one way or another, covers almost all the necessary aspects to have the system in operation.
Now, according to the documentation, when I get a system message like:

st 0: 0: 0: 0: [st0] Failed to read 64512 byte block with 80 byte transfer.

that means that, the block was written with 64512 block size but we only read 80 bytes (exactly the opposite of what you described to me).
So, the label would be fine (64K which is the desired size and default), and the problem would resides when the tape unit tries to read the label.

However, when I did the tests to determine the maximum possible speed (using btape, as described in the manual and reproducing the tests that are discussed in Bareos Whitepaper Tape Speed ​​Tuning), I had no problem. dmesg simply did not report any errors.

By the way, in my configuration I use 512k blocksizes because it is the biggest thing that my system supports (if I use bigger blocks, like 1 MB, something fails, I do not know if the tapedrive, the SAS controller, or any other hardware component) . the system is a GNU / Linux Debian Stretch 9.5 (using the Proxmox kernels, although if I use the Debian kernels, that does not change at all). The hardware is one of those small Dell PowerEdge T30 Servers (equipped with an Intel Xeon E3-1225 v5 @ 3.30GHz and 8GB of ECC RAM) and an HP Smart Array P411 SAS controller, and (obviously) the tape library that is an IBM TS3100 LTO-5.

Even more, every time I make a tape backup, I get these errors, which causes the backup jobs to end with warnings (instead of OK); however, nothing fails, I can restore perfectly and without problems (although dmesg keeps reporting that Failed to read 64512 byte block with 80 byte transfer).

What else can I do with these error messages? (It seems that in the end they are just that: error messages, but not errors in themselves).
I have never touched the mt configuration. It is the default one in Debian. I understand that it uses variable block size by default (otherwise, all tests with btape would have failed me).

A little light please :-/


Regards!
--
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+***@googlegroups.com.
To post to this group, send email to bareos-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...