Help with speeding up a log parser
-
Hello all, I have a program that reads a IIS log and then parses every line and puts that info into a structure.. I later use this info to disply different items of value. The following code is used in a Win UI Thread.. This works fine but its SUPER Slow.. Can anyone help me speed this up? I thought about parsing the file line by line as I read it in but I don't see any line by line options in CFile. Any suggestions are welcome..
// Select the file to be parsed. static char BASED_CODE szFilter[] = _T("Log Files (*.log)|*.log||"); CFileDialog m_ldFile(TRUE,_T(".log"),_T(""),OFN_HIDEREADONLY | OFN_OVERWRITEPROMPT, szFilter); if (m_ldFile.DoModal() == IDOK) { ::PostMessage(m_pMainWnd->GetSafeHwnd(), UWM_PT_START, 0,0); // Get the filename selected CString strFilePath = m_ldFile.GetPathName(), strData = _T(""); // Read the file and dump the data into strData CFile file; if(file.Open(strFilePath, CFile::modeRead | CFile::shareDenyNone)) { // Declare a buffer for reading the text char cBuf[65536]; UINT uBytesRead; // Continue reading until no more data is read while(uBytesRead = file.Read(cBuf, sizeof(cBuf)-1)) { // Null terminate after the last charcter cBuf[uBytesRead] = NULL; // Add the buffer to the mapped CString strData += CString(cBuf); } // Close the file file.Close(); strData.MakeLower(); int nHeaderStart = 0, nHeaderEnd = 0, nEndLine = 0; // Find the header nHeaderStart = strData.Find(_T("#fields: "),0); nHeaderEnd = strData.Find(_T("\r\n"), nHeaderStart); // Delete up to and including the header if(strData.GetLength() >= (nHeaderEnd +2)) strData.Delete(0, nHeaderEnd +2); else strData.Empty(); // Clear out the vector v_items.erase(v_items.begin(), v_items.end()); // Parse the data and populate the vector while(!strData.IsEmpty()) { // Split up the data per line nEndLine = strData.Find(_T("\r\n"), 0); if(nEndLine > 0) { CString strLine = strData.Mid(0, nEndLine); int nToken = 0; s_item pItem; // Date nToken = strLine.Find(_T(" "), 0); pItem.sDate = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Time nToken = strLine.Find(_T(" "), 0); pItem.sTime = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Server IP nToken = strLine.Find(_T(" "), 0); pItem.sIPServer = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Method
-
Hello all, I have a program that reads a IIS log and then parses every line and puts that info into a structure.. I later use this info to disply different items of value. The following code is used in a Win UI Thread.. This works fine but its SUPER Slow.. Can anyone help me speed this up? I thought about parsing the file line by line as I read it in but I don't see any line by line options in CFile. Any suggestions are welcome..
// Select the file to be parsed. static char BASED_CODE szFilter[] = _T("Log Files (*.log)|*.log||"); CFileDialog m_ldFile(TRUE,_T(".log"),_T(""),OFN_HIDEREADONLY | OFN_OVERWRITEPROMPT, szFilter); if (m_ldFile.DoModal() == IDOK) { ::PostMessage(m_pMainWnd->GetSafeHwnd(), UWM_PT_START, 0,0); // Get the filename selected CString strFilePath = m_ldFile.GetPathName(), strData = _T(""); // Read the file and dump the data into strData CFile file; if(file.Open(strFilePath, CFile::modeRead | CFile::shareDenyNone)) { // Declare a buffer for reading the text char cBuf[65536]; UINT uBytesRead; // Continue reading until no more data is read while(uBytesRead = file.Read(cBuf, sizeof(cBuf)-1)) { // Null terminate after the last charcter cBuf[uBytesRead] = NULL; // Add the buffer to the mapped CString strData += CString(cBuf); } // Close the file file.Close(); strData.MakeLower(); int nHeaderStart = 0, nHeaderEnd = 0, nEndLine = 0; // Find the header nHeaderStart = strData.Find(_T("#fields: "),0); nHeaderEnd = strData.Find(_T("\r\n"), nHeaderStart); // Delete up to and including the header if(strData.GetLength() >= (nHeaderEnd +2)) strData.Delete(0, nHeaderEnd +2); else strData.Empty(); // Clear out the vector v_items.erase(v_items.begin(), v_items.end()); // Parse the data and populate the vector while(!strData.IsEmpty()) { // Split up the data per line nEndLine = strData.Find(_T("\r\n"), 0); if(nEndLine > 0) { CString strLine = strData.Mid(0, nEndLine); int nToken = 0; s_item pItem; // Date nToken = strLine.Find(_T(" "), 0); pItem.sDate = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Time nToken = strLine.Find(_T(" "), 0); pItem.sTime = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Server IP nToken = strLine.Find(_T(" "), 0); pItem.sIPServer = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Method
RobJones wrote:
I thought about parsing the file line by line as I read it in but I don't see any line by line options in CFile.
try CStdioFile . it has line-based reading and writing. Cleek | Image Toolkits | Thumbnail maker
-
Hello all, I have a program that reads a IIS log and then parses every line and puts that info into a structure.. I later use this info to disply different items of value. The following code is used in a Win UI Thread.. This works fine but its SUPER Slow.. Can anyone help me speed this up? I thought about parsing the file line by line as I read it in but I don't see any line by line options in CFile. Any suggestions are welcome..
// Select the file to be parsed. static char BASED_CODE szFilter[] = _T("Log Files (*.log)|*.log||"); CFileDialog m_ldFile(TRUE,_T(".log"),_T(""),OFN_HIDEREADONLY | OFN_OVERWRITEPROMPT, szFilter); if (m_ldFile.DoModal() == IDOK) { ::PostMessage(m_pMainWnd->GetSafeHwnd(), UWM_PT_START, 0,0); // Get the filename selected CString strFilePath = m_ldFile.GetPathName(), strData = _T(""); // Read the file and dump the data into strData CFile file; if(file.Open(strFilePath, CFile::modeRead | CFile::shareDenyNone)) { // Declare a buffer for reading the text char cBuf[65536]; UINT uBytesRead; // Continue reading until no more data is read while(uBytesRead = file.Read(cBuf, sizeof(cBuf)-1)) { // Null terminate after the last charcter cBuf[uBytesRead] = NULL; // Add the buffer to the mapped CString strData += CString(cBuf); } // Close the file file.Close(); strData.MakeLower(); int nHeaderStart = 0, nHeaderEnd = 0, nEndLine = 0; // Find the header nHeaderStart = strData.Find(_T("#fields: "),0); nHeaderEnd = strData.Find(_T("\r\n"), nHeaderStart); // Delete up to and including the header if(strData.GetLength() >= (nHeaderEnd +2)) strData.Delete(0, nHeaderEnd +2); else strData.Empty(); // Clear out the vector v_items.erase(v_items.begin(), v_items.end()); // Parse the data and populate the vector while(!strData.IsEmpty()) { // Split up the data per line nEndLine = strData.Find(_T("\r\n"), 0); if(nEndLine > 0) { CString strLine = strData.Mid(0, nEndLine); int nToken = 0; s_item pItem; // Date nToken = strLine.Find(_T(" "), 0); pItem.sDate = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Time nToken = strLine.Find(_T(" "), 0); pItem.sTime = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Server IP nToken = strLine.Find(_T(" "), 0); pItem.sIPServer = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Method
for text files, use CStdIoFile, you can read one line at a time. also, if the format for each line if fixed, why can't you use scanf ( or related or better function ) to parse one line in one pass ?
Maximilien Lincourt Your Head A Splode - Strong Bad
-
Hello all, I have a program that reads a IIS log and then parses every line and puts that info into a structure.. I later use this info to disply different items of value. The following code is used in a Win UI Thread.. This works fine but its SUPER Slow.. Can anyone help me speed this up? I thought about parsing the file line by line as I read it in but I don't see any line by line options in CFile. Any suggestions are welcome..
// Select the file to be parsed. static char BASED_CODE szFilter[] = _T("Log Files (*.log)|*.log||"); CFileDialog m_ldFile(TRUE,_T(".log"),_T(""),OFN_HIDEREADONLY | OFN_OVERWRITEPROMPT, szFilter); if (m_ldFile.DoModal() == IDOK) { ::PostMessage(m_pMainWnd->GetSafeHwnd(), UWM_PT_START, 0,0); // Get the filename selected CString strFilePath = m_ldFile.GetPathName(), strData = _T(""); // Read the file and dump the data into strData CFile file; if(file.Open(strFilePath, CFile::modeRead | CFile::shareDenyNone)) { // Declare a buffer for reading the text char cBuf[65536]; UINT uBytesRead; // Continue reading until no more data is read while(uBytesRead = file.Read(cBuf, sizeof(cBuf)-1)) { // Null terminate after the last charcter cBuf[uBytesRead] = NULL; // Add the buffer to the mapped CString strData += CString(cBuf); } // Close the file file.Close(); strData.MakeLower(); int nHeaderStart = 0, nHeaderEnd = 0, nEndLine = 0; // Find the header nHeaderStart = strData.Find(_T("#fields: "),0); nHeaderEnd = strData.Find(_T("\r\n"), nHeaderStart); // Delete up to and including the header if(strData.GetLength() >= (nHeaderEnd +2)) strData.Delete(0, nHeaderEnd +2); else strData.Empty(); // Clear out the vector v_items.erase(v_items.begin(), v_items.end()); // Parse the data and populate the vector while(!strData.IsEmpty()) { // Split up the data per line nEndLine = strData.Find(_T("\r\n"), 0); if(nEndLine > 0) { CString strLine = strData.Mid(0, nEndLine); int nToken = 0; s_item pItem; // Date nToken = strLine.Find(_T(" "), 0); pItem.sDate = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Time nToken = strLine.Find(_T(" "), 0); pItem.sTime = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Server IP nToken = strLine.Find(_T(" "), 0); pItem.sIPServer = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Method
RobJones wrote:
char cBuf[65536]; UINT uBytesRead; // Continue reading until no more data is read while(uBytesRead = file.Read(cBuf, sizeof(cBuf)-1)) { // Null terminate after the last charcter cBuf[uBytesRead] = NULL; // Add the buffer to the mapped CString strData += CString(cBuf); } // Close the file file.Close();
If the file is several MB in size, you could save a few steps by reading the file once rather than in 64KB chunks.
DWORD dwLength = file.GetLength();
char *cBuf = new char[dwLength];
DWORD dwBytesRead = file.Read(cBuf, dwLength);
file.Close();RobJones wrote:
// Split up the data per line nEndLine = strData.Find(_T("\r\n"), 0); if(nEndLine > 0) { CString strLine = strData.Mid(0, nEndLine);
If you are processing the file line-by-line, why not use
CStdioFile
instead? In the lastwhile
loop, you appear to be doing a lot of operations onstrLine
. This might account for some of the sluggishness.
"Take only what you need and leave the land as you found it." - Native American Proverb
-
for text files, use CStdIoFile, you can read one line at a time. also, if the format for each line if fixed, why can't you use scanf ( or related or better function ) to parse one line in one pass ?
Maximilien Lincourt Your Head A Splode - Strong Bad
Maximilien wrote:
also, if the format for each line if fixed, why can't you use scanf ( or related or better function ) to parse one line in one pass ?
While that is technically possible, checking for error is a real pain.
"Take only what you need and leave the land as you found it." - Native American Proverb
-
Hello all, I have a program that reads a IIS log and then parses every line and puts that info into a structure.. I later use this info to disply different items of value. The following code is used in a Win UI Thread.. This works fine but its SUPER Slow.. Can anyone help me speed this up? I thought about parsing the file line by line as I read it in but I don't see any line by line options in CFile. Any suggestions are welcome..
// Select the file to be parsed. static char BASED_CODE szFilter[] = _T("Log Files (*.log)|*.log||"); CFileDialog m_ldFile(TRUE,_T(".log"),_T(""),OFN_HIDEREADONLY | OFN_OVERWRITEPROMPT, szFilter); if (m_ldFile.DoModal() == IDOK) { ::PostMessage(m_pMainWnd->GetSafeHwnd(), UWM_PT_START, 0,0); // Get the filename selected CString strFilePath = m_ldFile.GetPathName(), strData = _T(""); // Read the file and dump the data into strData CFile file; if(file.Open(strFilePath, CFile::modeRead | CFile::shareDenyNone)) { // Declare a buffer for reading the text char cBuf[65536]; UINT uBytesRead; // Continue reading until no more data is read while(uBytesRead = file.Read(cBuf, sizeof(cBuf)-1)) { // Null terminate after the last charcter cBuf[uBytesRead] = NULL; // Add the buffer to the mapped CString strData += CString(cBuf); } // Close the file file.Close(); strData.MakeLower(); int nHeaderStart = 0, nHeaderEnd = 0, nEndLine = 0; // Find the header nHeaderStart = strData.Find(_T("#fields: "),0); nHeaderEnd = strData.Find(_T("\r\n"), nHeaderStart); // Delete up to and including the header if(strData.GetLength() >= (nHeaderEnd +2)) strData.Delete(0, nHeaderEnd +2); else strData.Empty(); // Clear out the vector v_items.erase(v_items.begin(), v_items.end()); // Parse the data and populate the vector while(!strData.IsEmpty()) { // Split up the data per line nEndLine = strData.Find(_T("\r\n"), 0); if(nEndLine > 0) { CString strLine = strData.Mid(0, nEndLine); int nToken = 0; s_item pItem; // Date nToken = strLine.Find(_T(" "), 0); pItem.sDate = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Time nToken = strLine.Find(_T(" "), 0); pItem.sTime = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Server IP nToken = strLine.Find(_T(" "), 0); pItem.sIPServer = strLine.Mid(0, nToken); strLine.Delete(0, nToken +1); // Method