Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. The Weird and The Wonderful
  4. Generate SHA256 Hash of every file on my computer.

Generate SHA256 Hash of every file on my computer.

Scheduled Pinned Locked Moved The Weird and The Wonderful
databaselearningcsharpcsssqlite
3 Posts 3 Posters 15 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R Offline
    R Offline
    raddevus
    wrote on last edited by
    #1

    If You Give A Mouse A Cookie* I'm on a path to do a weird thing: generate the SHA256 hash of every file on my computer (store filename and hash in db). To do that, I started thinking about how I might set multiple instances of the process to do a portion of the work so it'll be (overall) faster. I figure I could write every directory on my system to a sqlite db then let numerous processes grab a folder, get all files, independently of each other. That made me start wondering about how much data it would be to store every directory on my system in a sqlite database. Wrote A Quick C# Program I wrote a quick little C# program that: 1. user gives starting path 2. program iterates through every directory 3. writes each directory to the sqlite db. Fast Iteration, Slow Insert I (of course) discovered : 1. it's super fast to iterate over the directories. It can iterate over 239618 directories in my Linux user directory in a few seconds. 2. It's super slow to use EntityFramework to insert those dir names into the sqlite db. Super slow means it takes more than 10 minutes. Two Weird Parts (but maybe expected) So, instead of inserting the records into the db directly I write data to a file (yes,it's 239,618 lines long) Data looks like (pipe delimited):

    239609|/home/fakepath/faker|2024-10-15 17:16:27

    Weird 1 C# iterates all those directories and writes to file in 2-3 seconds Weird 2 (also kind of wonderful) sqlite imports that data (over 239 thousand rows) in less than 1 second on the SQLite command line: > .import allPaths.dat finfo Import command takes file name with data and the target table name (finfo) I'm figuring this will make a lot of people say, "yeah, I figured so". :-D *In that book, one thing leads to another. Give a mouse a cookie, he'll want some milk. To get the milk you'll have to...

    M M 2 Replies Last reply
    0
    • R raddevus

      If You Give A Mouse A Cookie* I'm on a path to do a weird thing: generate the SHA256 hash of every file on my computer (store filename and hash in db). To do that, I started thinking about how I might set multiple instances of the process to do a portion of the work so it'll be (overall) faster. I figure I could write every directory on my system to a sqlite db then let numerous processes grab a folder, get all files, independently of each other. That made me start wondering about how much data it would be to store every directory on my system in a sqlite database. Wrote A Quick C# Program I wrote a quick little C# program that: 1. user gives starting path 2. program iterates through every directory 3. writes each directory to the sqlite db. Fast Iteration, Slow Insert I (of course) discovered : 1. it's super fast to iterate over the directories. It can iterate over 239618 directories in my Linux user directory in a few seconds. 2. It's super slow to use EntityFramework to insert those dir names into the sqlite db. Super slow means it takes more than 10 minutes. Two Weird Parts (but maybe expected) So, instead of inserting the records into the db directly I write data to a file (yes,it's 239,618 lines long) Data looks like (pipe delimited):

      239609|/home/fakepath/faker|2024-10-15 17:16:27

      Weird 1 C# iterates all those directories and writes to file in 2-3 seconds Weird 2 (also kind of wonderful) sqlite imports that data (over 239 thousand rows) in less than 1 second on the SQLite command line: > .import allPaths.dat finfo Import command takes file name with data and the target table name (finfo) I'm figuring this will make a lot of people say, "yeah, I figured so". :-D *In that book, one thing leads to another. Give a mouse a cookie, he'll want some milk. To get the milk you'll have to...

      M Offline
      M Offline
      MarkTJohnson
      wrote on last edited by
      #2

      raddevus wrote:

      *In that book, one thing leads to another. Give a mouse a cookie, he'll want some milk. To get the milk you'll have to...

      Which leads to "If you give a Pig a Pancake" and "If you give a Moose a Muffin"

      I’ve given up trying to be calm. However, I am open to feeling slightly less agitated. I’m begging you for the benefit of everyone, don’t be STUPID.

      1 Reply Last reply
      0
      • R raddevus

        If You Give A Mouse A Cookie* I'm on a path to do a weird thing: generate the SHA256 hash of every file on my computer (store filename and hash in db). To do that, I started thinking about how I might set multiple instances of the process to do a portion of the work so it'll be (overall) faster. I figure I could write every directory on my system to a sqlite db then let numerous processes grab a folder, get all files, independently of each other. That made me start wondering about how much data it would be to store every directory on my system in a sqlite database. Wrote A Quick C# Program I wrote a quick little C# program that: 1. user gives starting path 2. program iterates through every directory 3. writes each directory to the sqlite db. Fast Iteration, Slow Insert I (of course) discovered : 1. it's super fast to iterate over the directories. It can iterate over 239618 directories in my Linux user directory in a few seconds. 2. It's super slow to use EntityFramework to insert those dir names into the sqlite db. Super slow means it takes more than 10 minutes. Two Weird Parts (but maybe expected) So, instead of inserting the records into the db directly I write data to a file (yes,it's 239,618 lines long) Data looks like (pipe delimited):

        239609|/home/fakepath/faker|2024-10-15 17:16:27

        Weird 1 C# iterates all those directories and writes to file in 2-3 seconds Weird 2 (also kind of wonderful) sqlite imports that data (over 239 thousand rows) in less than 1 second on the SQLite command line: > .import allPaths.dat finfo Import command takes file name with data and the target table name (finfo) I'm figuring this will make a lot of people say, "yeah, I figured so". :-D *In that book, one thing leads to another. Give a mouse a cookie, he'll want some milk. To get the milk you'll have to...

        M Offline
        M Offline
        Mircea Neacsu
        wrote on last edited by
        #3

        raddevus wrote:

        2. It's super slow to use EntityFramework to insert those dir names into the sqlite db. Super slow means it takes more than 10 minutes.

        Take a look at my article[^] about sqlite multi-threading. Might give you some ideas how to speed up things.

        Mircea (see my latest musings at neacsu.net)

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups