Error detection in a spreadsheet file
Introduction
You can use “Error detection in a spreadsheet file” to quickly find errors in a spreadsheet.
You can find this feature in the menu when creating a new template or by clicking “File error detection” in the bottom right corner of each template. In the second case, this template will be automatically selected for further error detection.
This function works in the following way:
- A template will be generated based on the loaded file, just like you can do with “Template generation from a spreadsheet file”. But you can also choose an already pre-configured template, which will allow you to check the data for errors more accurately.
- Then, based on the template, the data will be checked for errors according to the data types and template limits.
- The data will also be searched for possible errors. These are errors that match the data types and limits, but differ significantly from other values in the column.
- It is also possible to correct these errors according to the settings you have selected. When correcting errors, the previously created template plays an important role. The more accurate and extensive it is, the more accurate the recovered values will be.
- As a result, you will receive a file with highlighted errors and their corrected values, if you have selected this option.
You can read more about the template and its limits in the articles: "Column settings" and "Column limits settings"
Template to which the file must conform
By default, automatic generation of a new template will be selected, or if you have navigated to a function from an already created template, this template will be selected.
If automatic template generation is selected, a new template will be created based on the data from the file, just as the template generation function does.
If a certain template has been selected, the error search will be performed according to the template. A quality preconfigured template will increase the accuracy of error search and correction.
Highlight values that are probably incorrect
This function is enabled by default. In addition to checking the data for template compliance, the file will also check for potential errors. Potential errors can be values that meet the template but are very different from other values in the column.
These errors will not be corrected if error correction function is enabled, as they are values that are probably worth paying attention to. If you still want to fix them like the other errors, you can enable this in the advanced settings.
These errors will be highlighted in yellow even if they have been fixed.
Add comments with the reason for the error
This function is enabled by default. This function adds a comment to the file for each error that describes the reasons for the error.
After file processing, save the generated template
This setting can only be activated if template generation is selected above. If it is active, the recognized template will be added to the list of your templates.
This will help you understand how the data has been recognized, and optimize the template for your needs so that you can further select it instead of generating a new template.
Correct errors
This setting is turned off by default. It is responsible for whether incorrect values should be corrected.
It has several sub-settings, which can be found in the advanced settings on the button below. If this setting is not enabled, the sub-settings related to error correction will not work.
If this function is disabled, the errors found will be highlighted in red color.
If the setting is active and the value has been corrected or predicted, it will be highlighted in green.
Error correction works according to the following rule, where the options are executed sequentially until the error has been corrected:
- Attempting to correct errors related to an inappropriate data type. For example, if there are extra spaces at the beginning or end of the value, wrong date format, wrong separator for fractional numbers, etc.
- Predicting the value based on the template limits. Therefore, it is very useful for this feature to fine-tune the template limits, including adding an exclude variant.
- If the option to predict wrong values is enabled, it will attempt to predict a value based on the values in the current and other columns.
- Replacing a value with the most frequent or average value in the column.
- Replacing the value with some default value for the given data type.
If the error could not be resolved after all this, it will be highlighted in red as well.
In addition to the values in the columns, the columns themselves can change their order according to the template, and the extra ones will be deleted along with the values, so be careful with the names and order of the columns. This only works if the template was originally selected, because otherwise the order of the columns will be considered correct.
Advanced settings
Data format is completely correct
This setting is only relevant if it is necessary to generate the template first and affects template generation.
ON - the data type of a column will be recognized only if the values of this column fully correspond to the format of one of the available data types.
I.e. columns will have the data type:
- Date - all values fully match the format of YYYY-MM-DD
- Date and time - YYYY-MM-DD HH:mm:ss
- Time - HH:mm:ss
OFF (default) - it will automatically check if it is possible to convert column data to one of the valid formats. And if it is possible, the column will be assigned the appropriate data type.
So if the setting is turned off, some columns may get a data type that they don't currently match, but they can be converted to it.
If the column does not match any data type, it will be considered as Text.
Return the result as a new file
This setting is only available if you do not need to correct errors in the file.
Then, if it is turned off and the uploaded file has the extension .xlsx or .xlsm, error search and highlighting will be done in the original file, which will save other pages of the file as well as its formatting.
In all other cases, the result will be a new .xlsx file where the errors will be highlighted or corrected. Note that in some cases, even if the errors did not need to be corrected, the formatting of dates and times in the newly created file may change.
Advanced error correction settings
All settings below are only relevant if error correction is enabled.
Converts all date and time columns to text
This setting helps to make the date and time format in all columns look the same.
Our standards store the date and time in the format YYYY-MM-DD hh:mm:ss. But this data in many spreadsheets can be stored in a different format.
If this setting is active, all columns of the corresponding data types will be converted to this format and the values will be saved as text. Even values that were not errors will be converted to this format. Be careful in this case, as the corrected date and time format will not be highlighted unless it was an incorrect value.
If this setting is disabled, the values may remain in spreadsheet format, and the corrected errors may have their own format.
Correct errors caused by incorrect data type
If the setting is active (default), it will attempt to correct errors that occur due to an incorrect data type.
These errors occur when a value has an inappropriate format for the data type. For example, wrong date format, text instead of number, and so on.
If it is disabled, errors of this type will be highlighted in red and will not be corrected.
Correct errors arising from non-compliance with template limits
If the setting is active, it will attempt to correct errors resulting from the value not matching the template limits.
If disabled, these errors will be highlighted in red and will not be corrected.
Correct probably incorrect values in the same way as other errors
By default, probably incorrect values are not corrected and are highlighted in yellow. If this setting is active, they will be corrected like other errors.
Once corrected, they will also remain highlighted yellow, not green like other errors.
Fill in all blank values
If this setting is active, all blank values will be considered an error and an attempt will be made to predict them based on the same rules as other incorrect values. If they cannot be filled, they will be highlighted in red.
Attempt to predict wrong values that cannot be recovered
If this setting is active, it will be triggered to predict the value from other values in the column or other columns if it could not be recovered by template limits or correcting existing data.
This setting significantly lengthens the file processing, so we recommend to use it only if necessary. The more data you have in the file, the longer the processing will take.
By its logic, it makes a prediction of what should be in a given cell. Currently, it does not use LLM technologies to predict values, but builds its own models based on your data. If there is a heavy load on the servers, this setting may be temporarily unavailable.
This function is still in the active phase of improvement. If you have it active and you see a completely inconsistent prediction, we would be very grateful if you could let us know and attach an example of the data.