Parsing Text Files and Spliting on '>'
-
Hi can some kind person help with this JAVA Q on parsing text files: INPUT TEXT (FASTA) FILE: >AB485992 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485993 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485994 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485922 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485912 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485942 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT I need JAVA Code that parses the input file and generates 6 files each file contains file is labelled after the '>' and before the space ie file 1 will be named AB485992 and that file will contain the text between the '>' and the following '>' ie: >AB485992 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT file 2 will be labelled: AB485993 and it contents >AB485993 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT file 3 will be labelled: AB485993 and it contents >AB485994 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT and so on for all six files So far I have code that parses out the word after '>' ie the file label and prints those to args file like below, can some kind person PLEASE show me how to parse the INPUT file and generate the 6 seperate files as described above each file labeled after the '>' eg file1, labelled AB485992 and so on for the 6 files >AB485992 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT THANKS SO MUCH! import java.io.*; import java.util.Scanner; public final class ReadWithScanner { public static void main(String... aArgs) throws FileNotFoundException { File f = new File("args"); f.delete(); ReadWithScanner parser = new ReadWithScanner("file.text"); parser.processLineByLine(); log("Done."); } public ReadWithScanner(String aFileName){ fFile = new File(aFileName); } public final void processLineByLine() throws FileNotFoundException { Scanner scanner = new Scanner(fFile); try { while ( scanner.hasNextLine() ){ processLine( scanner.nextLine() ); } } finally { scanner.clo
-
Hi can some kind person help with this JAVA Q on parsing text files: INPUT TEXT (FASTA) FILE: >AB485992 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485993 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485994 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485922 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485912 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT >AB485942 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT I need JAVA Code that parses the input file and generates 6 files each file contains file is labelled after the '>' and before the space ie file 1 will be named AB485992 and that file will contain the text between the '>' and the following '>' ie: >AB485992 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT file 2 will be labelled: AB485993 and it contents >AB485993 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT file 3 will be labelled: AB485993 and it contents >AB485994 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT and so on for all six files So far I have code that parses out the word after '>' ie the file label and prints those to args file like below, can some kind person PLEASE show me how to parse the INPUT file and generate the 6 seperate files as described above each file labeled after the '>' eg file1, labelled AB485992 and so on for the 6 files >AB485992 some text ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT ATTGGGAATGGAGGGAAATAAATGACTGGATGGTCGCTGCT THANKS SO MUCH! import java.io.*; import java.util.Scanner; public final class ReadWithScanner { public static void main(String... aArgs) throws FileNotFoundException { File f = new File("args"); f.delete(); ReadWithScanner parser = new ReadWithScanner("file.text"); parser.processLineByLine(); log("Done."); } public ReadWithScanner(String aFileName){ fFile = new File(aFileName); } public final void processLineByLine() throws FileNotFoundException { Scanner scanner = new Scanner(fFile); try { while ( scanner.hasNextLine() ){ processLine( scanner.nextLine() ); } } finally { scanner.clo
I took it one step further.
import java.io.*;
import java.util.*;public class SplitFiles {
public static void main(String[] args) throws Exception {
File mainfile = new File("MainFile.txt");Scanner scan = new Scanner(mainfile); ArrayList<String> List = new ArrayList<String>(); String getFileName = ""; while (scan.hasNext()) { List.add(scan.nextLine()); } for (int i = 0; i < List.size(); i++) { if (List.get(i).charAt(0) == '>') { getFileName = List.get(i).replace(">", ""); String result = ""; for (int j = 0; j < getFileName.length(); j++) { if (getFileName.charAt(j) == ' ') { result = getFileName.substring(0, j); break; } } File output; if (new File(result + ".txt").exists()) { BufferedWriter out = new BufferedWriter(new FileWriter( result + ".txt", true)); out.write(List.get(i) + "\\n"); int k; for (k = i + 1; k < List.size(); k++) { if (List.get(k).charAt(0) == '>') break; else { out.write(List.get(k) + "\\n"); } } out.close(); } else { output = new File(result + ".txt"); PrintWriter out = new PrintWriter(output); out.println(List.get(i)); int k; for (k = i + 1; k < List.size(); k++) { if (List.get(k).charAt(0) == '>') break; else { out.println(List.get(k)); } } out.close(); } } } }
}
Good Luck
-
I took it one step further.
import java.io.*;
import java.util.*;public class SplitFiles {
public static void main(String[] args) throws Exception {
File mainfile = new File("MainFile.txt");Scanner scan = new Scanner(mainfile); ArrayList<String> List = new ArrayList<String>(); String getFileName = ""; while (scan.hasNext()) { List.add(scan.nextLine()); } for (int i = 0; i < List.size(); i++) { if (List.get(i).charAt(0) == '>') { getFileName = List.get(i).replace(">", ""); String result = ""; for (int j = 0; j < getFileName.length(); j++) { if (getFileName.charAt(j) == ' ') { result = getFileName.substring(0, j); break; } } File output; if (new File(result + ".txt").exists()) { BufferedWriter out = new BufferedWriter(new FileWriter( result + ".txt", true)); out.write(List.get(i) + "\\n"); int k; for (k = i + 1; k < List.size(); k++) { if (List.get(k).charAt(0) == '>') break; else { out.write(List.get(k) + "\\n"); } } out.close(); } else { output = new File(result + ".txt"); PrintWriter out = new PrintWriter(output); out.println(List.get(i)); int k; for (k = i + 1; k < List.size(); k++) { if (List.get(k).charAt(0) == '>') break; else { out.println(List.get(k)); } } out.close(); } } } }
}
Good Luck
-
Thanks so much for your help! Just one small point the code you wrote here "for (k = i+1; k " is missing the end part any chances you know what the ending should be? Thanks again!
-
Thanks so much! I tried compiling and I am getting complaints as follows, any ideas? Sorry I am new to JAVA and cant figure out how to fix it. thanks again! SplitFiles.java:24: cannot find symbol symbol : method charAt(int) location: class java.lang.Object if (List.get(i).charAt(0) == '>') ^ SplitFiles.java:27: cannot find symbol symbol : method replace(java.lang.String,java.lang.String) location: class java.lang.Object getFileName = List.get(i).replace(">", ""); ^ SplitFiles.java:52: cannot find symbol symbol : method charAt(int) location: class java.lang.Object if (List.get(k).charAt(0) == '>') ^ SplitFiles.java:74: cannot find symbol symbol : method charAt(int) location: class java.lang.Object if (List.get(k).charAt(0) == '>')
-
Thanks so much! I tried compiling and I am getting complaints as follows, any ideas? Sorry I am new to JAVA and cant figure out how to fix it. thanks again! SplitFiles.java:24: cannot find symbol symbol : method charAt(int) location: class java.lang.Object if (List.get(i).charAt(0) == '>') ^ SplitFiles.java:27: cannot find symbol symbol : method replace(java.lang.String,java.lang.String) location: class java.lang.Object getFileName = List.get(i).replace(">", ""); ^ SplitFiles.java:52: cannot find symbol symbol : method charAt(int) location: class java.lang.Object if (List.get(k).charAt(0) == '>') ^ SplitFiles.java:74: cannot find symbol symbol : method charAt(int) location: class java.lang.Object if (List.get(k).charAt(0) == '>')
-
Steps: 1. Copy the code into your Java project. 2. Create a Text file called MainFile.txt containing what you wanted in the first post. It should work since I corrected the < and > (HTML problem) in my answer post.